TY - GEN
T1 - Multi-level Proposal Relations Aggregation for Video Object Detection
AU - Yu, Chongkai
AU - Chen, Wenjie
AU - Wu, Bing
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Video information often deteriorates in certain frames, which is a great challenge for object detection. It is difficult to identify the object in this frame by just utilizing the information of one frame. Recently, plenty of studies have shown that context aggregating information through the self-attention mechanism can enhance the features in key frames. However, these methods only exploit some of inter-video and intra-video global-local information, not all of it. Global semantic and local localization information in the same video can assist object classification and regression. The intra-proposal relation among different videos can provide important cues to distinguish confusing objects. All of this information is able to enhance the performance of video object detection. In this paper, we design a Multi-Level Proposal Relations Aggregation network to mine inter-video and intra-video global-local pro-posal relations. For intra-video, we effectively aggregate global and local information to augments the proposal features of key frames. For inter-video, we aggregate the inter-video key frame features to the target video under the constraint of relation regularization. We flexibly utilize the relation module to aggregate the proposals from different frames. Experiments on ImageNet VID dataset demonstrate the effectiveness of our method.
AB - Video information often deteriorates in certain frames, which is a great challenge for object detection. It is difficult to identify the object in this frame by just utilizing the information of one frame. Recently, plenty of studies have shown that context aggregating information through the self-attention mechanism can enhance the features in key frames. However, these methods only exploit some of inter-video and intra-video global-local information, not all of it. Global semantic and local localization information in the same video can assist object classification and regression. The intra-proposal relation among different videos can provide important cues to distinguish confusing objects. All of this information is able to enhance the performance of video object detection. In this paper, we design a Multi-Level Proposal Relations Aggregation network to mine inter-video and intra-video global-local pro-posal relations. For intra-video, we effectively aggregate global and local information to augments the proposal features of key frames. For inter-video, we aggregate the inter-video key frame features to the target video under the constraint of relation regularization. We flexibly utilize the relation module to aggregate the proposals from different frames. Experiments on ImageNet VID dataset demonstrate the effectiveness of our method.
KW - Global-local information
KW - Relation aggregation
KW - Video object detection
UR - http://www.scopus.com/inward/record.url?scp=85138765222&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-15919-0_61
DO - 10.1007/978-3-031-15919-0_61
M3 - Conference contribution
AN - SCOPUS:85138765222
SN - 9783031159183
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 734
EP - 745
BT - Artificial Neural Networks and Machine Learning - ICANN 2022 - 31st International Conference on Artificial Neural Networks, Proceedings
A2 - Pimenidis, Elias
A2 - Aydin, Mehmet
A2 - Angelov, Plamen
A2 - Jayne, Chrisina
A2 - Papaleonidas, Antonios
PB - Springer Science and Business Media Deutschland GmbH
T2 - 31st International Conference on Artificial Neural Networks, ICANN 2022
Y2 - 6 September 2022 through 9 September 2022
ER -