TY - GEN
T1 - Relational Context Learning for Human-Object Interaction Detection
AU - Dong, Dandan
AU - Jia, Zhiyang
AU - Chen, Hang
AU - Ji, Kang
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2025/2/20
Y1 - 2025/2/20
N2 - Interaction action recognition is a popular direction in recent years, as a key technology for understanding human behavior, it plays a crucial role in many fields such as intelligent surveillance and human-computer interaction. Current research often adopts CNN-based or Transformer-based methods, but most of these methods suffer from the problem of lack of context exchange. Aiming at the above problems, the thesis designs and implements a relational context-based interaction action recognition model based on the multivariate relational network based on Transformer method. The main work is as follows: The paper obtains and verifies the runnability of the multivariate relational network (MUREN) source code and explores suitable attention mechanisms for comparison with the original model. First, a channel attention mechanism is used, which, when combined with the MUREN model, achieves a performance improvement of more than 3% on the V-COCO dataset. Tests were conducted on the HICO-DET dataset, which showed that the improved channel attention mechanism was ineffective. Then moving to the use of the global contextual attention mechanism, which matches the relational contextual properties of MUREN, V-COCO is tested only on the HICO-DET dataset since it has already been improved on the channel attention, and the experimental results show an improvement of about 4% in the overall accuracy, and an improvement of about 7% and 6% in the rare samples and the recall rate, respectively.
AB - Interaction action recognition is a popular direction in recent years, as a key technology for understanding human behavior, it plays a crucial role in many fields such as intelligent surveillance and human-computer interaction. Current research often adopts CNN-based or Transformer-based methods, but most of these methods suffer from the problem of lack of context exchange. Aiming at the above problems, the thesis designs and implements a relational context-based interaction action recognition model based on the multivariate relational network based on Transformer method. The main work is as follows: The paper obtains and verifies the runnability of the multivariate relational network (MUREN) source code and explores suitable attention mechanisms for comparison with the original model. First, a channel attention mechanism is used, which, when combined with the MUREN model, achieves a performance improvement of more than 3% on the V-COCO dataset. Tests were conducted on the HICO-DET dataset, which showed that the improved channel attention mechanism was ineffective. Then moving to the use of the global contextual attention mechanism, which matches the relational contextual properties of MUREN, V-COCO is tested only on the HICO-DET dataset since it has already been improved on the channel attention, and the experimental results show an improvement of about 4% in the overall accuracy, and an improvement of about 7% and 6% in the rare samples and the recall rate, respectively.
KW - channel attention mechanism
KW - global contextual attention mechanism
KW - interaction action recognition
KW - multivariate relational networks
UR - http://www.scopus.com/inward/record.url?scp=105001802641&partnerID=8YFLogxK
U2 - 10.1145/3711129.3711132
DO - 10.1145/3711129.3711132
M3 - Conference contribution
AN - SCOPUS:105001802641
T3 - Proceedings of 2024 8th International Conference on Electronic Information Technology and Computer Engineering, EITCE 2024
SP - 11
EP - 15
BT - Proceedings of 2024 8th International Conference on Electronic Information Technology and Computer Engineering, EITCE 2024
PB - Association for Computing Machinery, Inc
T2 - 8th International Conference on Electronic Information Technology and Computer Engineering, EITCE 2024
Y2 - 18 October 2024 through 20 October 2024
ER -