Exploring Spatial-Temporal Instance Relationships in an Intermediate Domain for Image-to-Video Object Detection

Zihan Wen, Jin Chen, Xinxiao Wu*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Image-to-video object detection leverages annotated images to help detect objects in unannotated videos, so as to break the heavy dependency on the expensive annotation of large-scale video frames. This task is extremely challenging due to the serious domain discrepancy between images and video frames caused by appearance variance and motion blur. Previous methods perform both image-level and instance-level alignments to reduce the domain discrepancy, but the existing false instance alignments may limit their performance in real scenarios. We propose a novel spatial-temporal graph to model the contextual relationships between instances to alleviate the false alignments. Through message propagation over the graph, the visual information from the spatial and temporal neighboring object proposals are adaptively aggregated to enhance the current instance representation. Moreover, to adapt the source-biased decision boundary to the target data, we generate an intermediate domain between images and frames. It is worth mentioning that our method can be easily applied as a plug-and-play component to other image-to-video object detection models based on the instance alignment. Experiments on several datasets demonstrate the effectiveness of our method. Code will be available at: https://github.com/wenzihan/STMP.

Original languageEnglish
Title of host publicationComputer Vision – ACCV 2022 Workshops - 16th Asian Conference on Computer Vision, Revised Selected Papers
EditorsYinqiang Zheng, Hacer Yalim Keleş, Piotr Koniusz
PublisherSpringer Science and Business Media Deutschland GmbH
Pages360-375
Number of pages16
ISBN (Print)9783031270659
DOIs
Publication statusPublished - 2023
Event16th Asian Conference on Computer Vision , ACCV 2022 - Macao, China
Duration: 4 Dec 20228 Dec 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13848 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference16th Asian Conference on Computer Vision , ACCV 2022
Country/TerritoryChina
CityMacao
Period4/12/228/12/22

Keywords

  • Deep learning
  • Domain adaptation
  • Object detection

Fingerprint

Dive into the research topics of 'Exploring Spatial-Temporal Instance Relationships in an Intermediate Domain for Image-to-Video Object Detection'. Together they form a unique fingerprint.

Cite this