Meta-VOS: Learning to Adapt Online Target-Specific Segmentation

Chunyan Xu, Li Wei, Zhen Cui*, Tong Zhang, Jian Yang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

9 Citations (Scopus)

Abstract

The task of video object segmentation is a fundamental but challenging problem in the field of computer vision. To deal with large variations in target objects and background clutter, we propose an online adaptive video object segmentation (VOS) framework, named Meta-VOS, that learns to adapt the target-specific segmentation. Meta-VOS builds an online adaptive learning process by exploiting cumulative expertise after searching for confidence patterns across different videos/frames, and then dynamically improves the model learning from two aspects: Meta-seg learner (i.e., module updating) and Meta-seg criterion (i.e., rule of expertise). As our goal is to rapidly determine which patterns best represent the essential characteristics of specific targets in a video, Meta-seg learner is introduced to adaptively learn to update the parameters and hyperparameters of segmentation network in very few gradient descent steps. Furthermore, a Meta-seg criterion of learned expertise, which is constructed to evaluate the Meta-seg learner for the online adaptation of the segmentation network, can confidently online update positive/negative patterns under the guidance of motion cues, object appearances and learned knowledge. Comprehensive evaluations on several benchmark datasets demonstrate the superiority of our proposed Meta-VOS when compared with other state-of-the-art methods applied to the VOS problem.

Original languageEnglish
Article number9418517
Pages (from-to)4760-4772
Number of pages13
JournalIEEE Transactions on Image Processing
Volume30
DOIs
Publication statusPublished - 2021
Externally publishedYes

Keywords

  • Video object segmentation
  • adaptation
  • meta-learning
  • online learning

Fingerprint

Dive into the research topics of 'Meta-VOS: Learning to Adapt Online Target-Specific Segmentation'. Together they form a unique fingerprint.

Cite this