A Transformer-based Dual Position Attention Network for Recognizing Human-object Interaction

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, a transformer-based dual position attention network (TDPAN) is proposed for recognizing human-object interaction. The dual attention module embedding position information is designed to scan the entire area of an image space and adaptively aggregate crucial features. Moreover, the transformer architecture is adopted for effectively extract essential region features in a binary pairwise manner from the sequence image data. Compared to the CNN-based method, TDPAN feature aggregation does not require prior modification of the region of interest, but also focus more on the important context information in images. The experiments demonstrate that the TDPAN outperforms previous methods on two datasets (the HICO-DET dataset and the V-COCO dataset). Specifically, the recognition accuracy is increased by 4.3% compared with the prior convolutional neural network methods in V-COCO dataset.

Original languageEnglish
Title of host publicationProceeding - 2021 China Automation Congress, CAC 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4444-4449
Number of pages6
ISBN (Electronic)9781665426473
DOIs
Publication statusPublished - 2021
Event2021 China Automation Congress, CAC 2021 - Beijing, China
Duration: 22 Oct 202124 Oct 2021

Publication series

NameProceeding - 2021 China Automation Congress, CAC 2021

Conference

Conference2021 China Automation Congress, CAC 2021
Country/TerritoryChina
CityBeijing
Period22/10/2124/10/21

Keywords

  • Action recognition
  • Dual position attention
  • Human-object interaction
  • Transformer

Fingerprint

Dive into the research topics of 'A Transformer-based Dual Position Attention Network for Recognizing Human-object Interaction'. Together they form a unique fingerprint.

Cite this