Building discriminative features of scene recognition using multi-stages of inception-ResNet-v2
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Access Rights
Abstract
Scene recognition is a challenging problem due to intra-class variations and inter-class similarities. Traditional methods and convolutional neural networks (CNN) represent the global spatial structure, which is suitable for general scene classification and object recognition, but show poor presentation for particular indoor or outdoor medium-scale scene datasets. In this manuscript, we study the local and global structures of image scene, and then combine both types of information for indoor and outdoor scenes to improve the scene recognition accuracy. Local region structure indicates sub-part of the scene, such as sky or ground, etc., and global structure indicates whole scene structure, such as sky-background-ground outdoor scene type. For this purpose, the multi-layer convolutional features of inception and residual-based architecture are used at intermediate and higher layers to preserve both local and global structures of image scene. Each layer used for feature extraction, is connected with the global average pooling to obtain a discriminative representation of the image scenes. In this way, local structure is explored at the intermediate convolutional layers, and global spatial structure is obtained from the higher layers. The proposed method is evaluated on 8-scene, 15-scene, UMC-21, MIT67, and 12-scene challenging datasets achieving 98.51%, 96.49%, 99.05%, 80.31%, and 84.88%, respectively, significantly outperforming state-of-the-art approaches.










