Multi-level Proposal Relations Aggregation for Video Object Detection

Chongkai Yu, Wenjie Chen*, Bing Wu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Video information often deteriorates in certain frames, which is a great challenge for object detection. It is difficult to identify the object in this frame by just utilizing the information of one frame. Recently, plenty of studies have shown that context aggregating information through the self-attention mechanism can enhance the features in key frames. However, these methods only exploit some of inter-video and intra-video global-local information, not all of it. Global semantic and local localization information in the same video can assist object classification and regression. The intra-proposal relation among different videos can provide important cues to distinguish confusing objects. All of this information is able to enhance the performance of video object detection. In this paper, we design a Multi-Level Proposal Relations Aggregation network to mine inter-video and intra-video global-local pro-posal relations. For intra-video, we effectively aggregate global and local information to augments the proposal features of key frames. For inter-video, we aggregate the inter-video key frame features to the target video under the constraint of relation regularization. We flexibly utilize the relation module to aggregate the proposals from different frames. Experiments on ImageNet VID dataset demonstrate the effectiveness of our method.

Original languageEnglish
Title of host publicationArtificial Neural Networks and Machine Learning - ICANN 2022 - 31st International Conference on Artificial Neural Networks, Proceedings
EditorsElias Pimenidis, Mehmet Aydin, Plamen Angelov, Chrisina Jayne, Antonios Papaleonidas
PublisherSpringer Science and Business Media Deutschland GmbH
Pages734-745
Number of pages12
ISBN (Print)9783031159183
DOIs
Publication statusPublished - 2022
Event31st International Conference on Artificial Neural Networks, ICANN 2022 - Bristol, United Kingdom
Duration: 6 Sept 20229 Sept 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13529 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference31st International Conference on Artificial Neural Networks, ICANN 2022
Country/TerritoryUnited Kingdom
CityBristol
Period6/09/229/09/22

Keywords

  • Global-local information
  • Relation aggregation
  • Video object detection

Fingerprint

Dive into the research topics of 'Multi-level Proposal Relations Aggregation for Video Object Detection'. Together they form a unique fingerprint.

Cite this