Multi-level Proposal Relations Aggregation for Video Object Detection

Chongkai Yu; Wenjie Chen; Bing Wu

doi:10.1007/978-3-031-15919-0_61

Multi-level Proposal Relations Aggregation for Video Object Detection

Chongkai Yu, Wenjie Chen^*, Bing Wu

^*Corresponding author for this work

School of Automation

Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Video information often deteriorates in certain frames, which is a great challenge for object detection. It is difficult to identify the object in this frame by just utilizing the information of one frame. Recently, plenty of studies have shown that context aggregating information through the self-attention mechanism can enhance the features in key frames. However, these methods only exploit some of inter-video and intra-video global-local information, not all of it. Global semantic and local localization information in the same video can assist object classification and regression. The intra-proposal relation among different videos can provide important cues to distinguish confusing objects. All of this information is able to enhance the performance of video object detection. In this paper, we design a Multi-Level Proposal Relations Aggregation network to mine inter-video and intra-video global-local pro-posal relations. For intra-video, we effectively aggregate global and local information to augments the proposal features of key frames. For inter-video, we aggregate the inter-video key frame features to the target video under the constraint of relation regularization. We flexibly utilize the relation module to aggregate the proposals from different frames. Experiments on ImageNet VID dataset demonstrate the effectiveness of our method.

Original language	English
Title of host publication	Artificial Neural Networks and Machine Learning - ICANN 2022 - 31st International Conference on Artificial Neural Networks, Proceedings
Editors	Elias Pimenidis, Mehmet Aydin, Plamen Angelov, Chrisina Jayne, Antonios Papaleonidas
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	734-745
Number of pages	12
ISBN (Print)	9783031159183
DOIs	https://doi.org/10.1007/978-3-031-15919-0_61
Publication status	Published - 2022
Event	31st International Conference on Artificial Neural Networks, ICANN 2022 - Bristol, United Kingdom Duration: 6 Sept 2022 → 9 Sept 2022

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	13529 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	31st International Conference on Artificial Neural Networks, ICANN 2022
Country/Territory	United Kingdom
City	Bristol
Period	6/09/22 → 9/09/22

Keywords

Global-local information
Relation aggregation
Video object detection

Access to Document

10.1007/978-3-031-15919-0_61

Cite this

Yu, C., Chen, W., & Wu, B. (2022). Multi-level Proposal Relations Aggregation for Video Object Detection. In E. Pimenidis, M. Aydin, P. Angelov, C. Jayne, & A. Papaleonidas (Eds.), Artificial Neural Networks and Machine Learning - ICANN 2022 - 31st International Conference on Artificial Neural Networks, Proceedings (pp. 734-745). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13529 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-15919-0_61

Yu, Chongkai ; Chen, Wenjie ; Wu, Bing. / Multi-level Proposal Relations Aggregation for Video Object Detection. Artificial Neural Networks and Machine Learning - ICANN 2022 - 31st International Conference on Artificial Neural Networks, Proceedings. editor / Elias Pimenidis ; Mehmet Aydin ; Plamen Angelov ; Chrisina Jayne ; Antonios Papaleonidas. Springer Science and Business Media Deutschland GmbH, 2022. pp. 734-745 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{d7227ea90d184a8f8ca6f88a6d4120ee,

title = "Multi-level Proposal Relations Aggregation for Video Object Detection",

abstract = "Video information often deteriorates in certain frames, which is a great challenge for object detection. It is difficult to identify the object in this frame by just utilizing the information of one frame. Recently, plenty of studies have shown that context aggregating information through the self-attention mechanism can enhance the features in key frames. However, these methods only exploit some of inter-video and intra-video global-local information, not all of it. Global semantic and local localization information in the same video can assist object classification and regression. The intra-proposal relation among different videos can provide important cues to distinguish confusing objects. All of this information is able to enhance the performance of video object detection. In this paper, we design a Multi-Level Proposal Relations Aggregation network to mine inter-video and intra-video global-local pro-posal relations. For intra-video, we effectively aggregate global and local information to augments the proposal features of key frames. For inter-video, we aggregate the inter-video key frame features to the target video under the constraint of relation regularization. We flexibly utilize the relation module to aggregate the proposals from different frames. Experiments on ImageNet VID dataset demonstrate the effectiveness of our method.",

keywords = "Global-local information, Relation aggregation, Video object detection",

author = "Chongkai Yu and Wenjie Chen and Bing Wu",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.; 31st International Conference on Artificial Neural Networks, ICANN 2022 ; Conference date: 06-09-2022 Through 09-09-2022",

year = "2022",

doi = "10.1007/978-3-031-15919-0_61",

language = "English",

isbn = "9783031159183",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "734--745",

editor = "Elias Pimenidis and Mehmet Aydin and Plamen Angelov and Chrisina Jayne and Antonios Papaleonidas",

booktitle = "Artificial Neural Networks and Machine Learning - ICANN 2022 - 31st International Conference on Artificial Neural Networks, Proceedings",

address = "Germany",

}

Yu, C, Chen, W & Wu, B 2022, Multi-level Proposal Relations Aggregation for Video Object Detection. in E Pimenidis, M Aydin, P Angelov, C Jayne & A Papaleonidas (eds), Artificial Neural Networks and Machine Learning - ICANN 2022 - 31st International Conference on Artificial Neural Networks, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13529 LNCS, Springer Science and Business Media Deutschland GmbH, pp. 734-745, 31st International Conference on Artificial Neural Networks, ICANN 2022, Bristol, United Kingdom, 6/09/22. https://doi.org/10.1007/978-3-031-15919-0_61

Multi-level Proposal Relations Aggregation for Video Object Detection. / Yu, Chongkai; Chen, Wenjie; Wu, Bing.
Artificial Neural Networks and Machine Learning - ICANN 2022 - 31st International Conference on Artificial Neural Networks, Proceedings. ed. / Elias Pimenidis; Mehmet Aydin; Plamen Angelov; Chrisina Jayne; Antonios Papaleonidas. Springer Science and Business Media Deutschland GmbH, 2022. p. 734-745 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13529 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Multi-level Proposal Relations Aggregation for Video Object Detection

AU - Yu, Chongkai

AU - Chen, Wenjie

AU - Wu, Bing

PY - 2022

Y1 - 2022

N2 - Video information often deteriorates in certain frames, which is a great challenge for object detection. It is difficult to identify the object in this frame by just utilizing the information of one frame. Recently, plenty of studies have shown that context aggregating information through the self-attention mechanism can enhance the features in key frames. However, these methods only exploit some of inter-video and intra-video global-local information, not all of it. Global semantic and local localization information in the same video can assist object classification and regression. The intra-proposal relation among different videos can provide important cues to distinguish confusing objects. All of this information is able to enhance the performance of video object detection. In this paper, we design a Multi-Level Proposal Relations Aggregation network to mine inter-video and intra-video global-local pro-posal relations. For intra-video, we effectively aggregate global and local information to augments the proposal features of key frames. For inter-video, we aggregate the inter-video key frame features to the target video under the constraint of relation regularization. We flexibly utilize the relation module to aggregate the proposals from different frames. Experiments on ImageNet VID dataset demonstrate the effectiveness of our method.

AB - Video information often deteriorates in certain frames, which is a great challenge for object detection. It is difficult to identify the object in this frame by just utilizing the information of one frame. Recently, plenty of studies have shown that context aggregating information through the self-attention mechanism can enhance the features in key frames. However, these methods only exploit some of inter-video and intra-video global-local information, not all of it. Global semantic and local localization information in the same video can assist object classification and regression. The intra-proposal relation among different videos can provide important cues to distinguish confusing objects. All of this information is able to enhance the performance of video object detection. In this paper, we design a Multi-Level Proposal Relations Aggregation network to mine inter-video and intra-video global-local pro-posal relations. For intra-video, we effectively aggregate global and local information to augments the proposal features of key frames. For inter-video, we aggregate the inter-video key frame features to the target video under the constraint of relation regularization. We flexibly utilize the relation module to aggregate the proposals from different frames. Experiments on ImageNet VID dataset demonstrate the effectiveness of our method.

KW - Global-local information

KW - Relation aggregation

KW - Video object detection

UR - http://www.scopus.com/inward/record.url?scp=85138765222&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-15919-0_61

DO - 10.1007/978-3-031-15919-0_61

M3 - Conference contribution

AN - SCOPUS:85138765222

SN - 9783031159183

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 734

EP - 745

BT - Artificial Neural Networks and Machine Learning - ICANN 2022 - 31st International Conference on Artificial Neural Networks, Proceedings

A2 - Pimenidis, Elias

A2 - Aydin, Mehmet

A2 - Angelov, Plamen

A2 - Jayne, Chrisina

A2 - Papaleonidas, Antonios

PB - Springer Science and Business Media Deutschland GmbH

T2 - 31st International Conference on Artificial Neural Networks, ICANN 2022

Y2 - 6 September 2022 through 9 September 2022

ER -

Yu C, Chen W, Wu B. Multi-level Proposal Relations Aggregation for Video Object Detection. In Pimenidis E, Aydin M, Angelov P, Jayne C, Papaleonidas A, editors, Artificial Neural Networks and Machine Learning - ICANN 2022 - 31st International Conference on Artificial Neural Networks, Proceedings. Springer Science and Business Media Deutschland GmbH. 2022. p. 734-745. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-15919-0_61