TY - JOUR
T1 - Attention-Based Deep Neural Network Combined Local and Global Features for Indoor Scene Recognition
AU - Chen, Luefeng
AU - Duan, Wenhao
AU - Li, Jiazhuo
AU - Wu, Min
AU - Pedrycz, Witold
AU - Hirota, Kaoru
N1 - Publisher Copyright:
© 2005-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - An original attention-based indoor scene recognition model combining local and global features is proposed. Multi-strategy data augmentation using several different functions and intensities can improve the classification performance. Then, local features are extracted using a convolutional layer and a single self-attention, thus solving the problem of large intra-class variance. The multi-attention mechanism is used to fuse the local feature information extracted from different foci to obtain a more complete global feature representation. The multi-head attention mechanism allows the network to extract features in parallel in different directions of attention, which helps the network to better capture global information, improves the network's ability to understand and represent the input data, and solves the problem of high inter-class similarity. Finally, the extracted features are fed into the classifier to complete the classification of indoor scene images. Experiments are conducted on four data sets (IndoorCVPR09, SUN397, 15-Scenes and self-built small sample scientific indoor scene dataset), yield excellent results. The results show that the developed algorithm effectively solves the two problems of high intra-class diversity and high inter-class similarity. As a result, the model has achieved competitive results. Preliminary application experiments are developed in our HRI system, indicating that the proposed indoor scene recognition model can be applied to the complete environmental perception in HRI.
AB - An original attention-based indoor scene recognition model combining local and global features is proposed. Multi-strategy data augmentation using several different functions and intensities can improve the classification performance. Then, local features are extracted using a convolutional layer and a single self-attention, thus solving the problem of large intra-class variance. The multi-attention mechanism is used to fuse the local feature information extracted from different foci to obtain a more complete global feature representation. The multi-head attention mechanism allows the network to extract features in parallel in different directions of attention, which helps the network to better capture global information, improves the network's ability to understand and represent the input data, and solves the problem of high inter-class similarity. Finally, the extracted features are fed into the classifier to complete the classification of indoor scene images. Experiments are conducted on four data sets (IndoorCVPR09, SUN397, 15-Scenes and self-built small sample scientific indoor scene dataset), yield excellent results. The results show that the developed algorithm effectively solves the two problems of high intra-class diversity and high inter-class similarity. As a result, the model has achieved competitive results. Preliminary application experiments are developed in our HRI system, indicating that the proposed indoor scene recognition model can be applied to the complete environmental perception in HRI.
KW - Human-robot interaction
KW - indoor scene recognition
KW - local and global features
KW - multihead attention
UR - http://www.scopus.com/inward/record.url?scp=85208750821&partnerID=8YFLogxK
U2 - 10.1109/TII.2024.3424197
DO - 10.1109/TII.2024.3424197
M3 - Article
AN - SCOPUS:85208750821
SN - 1551-3203
VL - 20
SP - 12684
EP - 12693
JO - IEEE Transactions on Industrial Informatics
JF - IEEE Transactions on Industrial Informatics
IS - 11
ER -