A Transformer-based Dual Position Attention Network for Recognizing Human-object Interaction

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

In this paper, a transformer-based dual position attention network (TDPAN) is proposed for recognizing human-object interaction. The dual attention module embedding position information is designed to scan the entire area of an image space and adaptively aggregate crucial features. Moreover, the transformer architecture is adopted for effectively extract essential region features in a binary pairwise manner from the sequence image data. Compared to the CNN-based method, TDPAN feature aggregation does not require prior modification of the region of interest, but also focus more on the important context information in images. The experiments demonstrate that the TDPAN outperforms previous methods on two datasets (the HICO-DET dataset and the V-COCO dataset). Specifically, the recognition accuracy is increased by 4.3% compared with the prior convolutional neural network methods in V-COCO dataset.

源语言英语
主期刊名Proceeding - 2021 China Automation Congress, CAC 2021
出版商Institute of Electrical and Electronics Engineers Inc.
4444-4449
页数6
ISBN(电子版)9781665426473
DOI
出版状态已出版 - 2021
活动2021 China Automation Congress, CAC 2021 - Beijing, 中国
期限: 22 10月 202124 10月 2021

出版系列

姓名Proceeding - 2021 China Automation Congress, CAC 2021

会议

会议2021 China Automation Congress, CAC 2021
国家/地区中国
Beijing
时期22/10/2124/10/21

指纹

探究 'A Transformer-based Dual Position Attention Network for Recognizing Human-object Interaction' 的科研主题。它们共同构成独一无二的指纹。

引用此

Xing, Y., Dai, Y., Hirota, K., & Jia, Z. (2021). A Transformer-based Dual Position Attention Network for Recognizing Human-object Interaction. 在 Proceeding - 2021 China Automation Congress, CAC 2021 (页码 4444-4449). (Proceeding - 2021 China Automation Congress, CAC 2021). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CAC53003.2021.9727900