TY - GEN
T1 - A Transformer-based Dual Position Attention Network for Recognizing Human-object Interaction
AU - Xing, Yi
AU - Dai, Yaping
AU - Hirota, Kaoru
AU - Jia, Zhiyang
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - In this paper, a transformer-based dual position attention network (TDPAN) is proposed for recognizing human-object interaction. The dual attention module embedding position information is designed to scan the entire area of an image space and adaptively aggregate crucial features. Moreover, the transformer architecture is adopted for effectively extract essential region features in a binary pairwise manner from the sequence image data. Compared to the CNN-based method, TDPAN feature aggregation does not require prior modification of the region of interest, but also focus more on the important context information in images. The experiments demonstrate that the TDPAN outperforms previous methods on two datasets (the HICO-DET dataset and the V-COCO dataset). Specifically, the recognition accuracy is increased by 4.3% compared with the prior convolutional neural network methods in V-COCO dataset.
AB - In this paper, a transformer-based dual position attention network (TDPAN) is proposed for recognizing human-object interaction. The dual attention module embedding position information is designed to scan the entire area of an image space and adaptively aggregate crucial features. Moreover, the transformer architecture is adopted for effectively extract essential region features in a binary pairwise manner from the sequence image data. Compared to the CNN-based method, TDPAN feature aggregation does not require prior modification of the region of interest, but also focus more on the important context information in images. The experiments demonstrate that the TDPAN outperforms previous methods on two datasets (the HICO-DET dataset and the V-COCO dataset). Specifically, the recognition accuracy is increased by 4.3% compared with the prior convolutional neural network methods in V-COCO dataset.
KW - Action recognition
KW - Dual position attention
KW - Human-object interaction
KW - Transformer
UR - http://www.scopus.com/inward/record.url?scp=85128105175&partnerID=8YFLogxK
U2 - 10.1109/CAC53003.2021.9727900
DO - 10.1109/CAC53003.2021.9727900
M3 - Conference contribution
AN - SCOPUS:85128105175
T3 - Proceeding - 2021 China Automation Congress, CAC 2021
SP - 4444
EP - 4449
BT - Proceeding - 2021 China Automation Congress, CAC 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 China Automation Congress, CAC 2021
Y2 - 22 October 2021 through 24 October 2021
ER -