Self-Teaching Video Object Segmentation

Chuanwei Zhou; Chunyan Xu; Zhen Cui; Tong Zhang; Jian Yang

doi:10.1109/TNNLS.2020.3043099

Self-Teaching Video Object Segmentation

Chuanwei Zhou, Chunyan Xu^*, Zhen Cui, Tong Zhang, Jian Yang^*

^*Corresponding author for this work

Nanjing University of Science and Technology

Research output: Contribution to journal › Article › peer-review

3 Citations (Scopus)

Abstract

Video object segmentation (VOS) is one of the most fundamental tasks for numerous sequent video applications. The crucial issue of online VOS is the drifting of segmenter when incrementally updated on continuous video frames under unconfident supervision constraints. In this work, we propose a self-teaching VOS (ST-VOS) method to make segmenter to learn online adaptation confidently as much as possible. In the segmenter learning at each time slice, the segment hypothesis and segmenter update are enclosed into a self-looping optimization circle such that they can be mutually improved for each other. To reduce error accumulation of the self-looping process, we specifically introduce a metalearning strategy to learn how to do this optimization within only a few iteration steps. To this end, the learning rates of segmenter are adaptively derived through metaoptimization in the channel space of convolutional kernels. Furthermore, to better launch the self-looping process, we calculate an initial mask map through part detectors and motion flow to well-establish a foundation for subsequent refinement, which could result in the robustness of the segmenter update. Extensive experiments demonstrate that this ST idea can boost the performance of baselines, and in the meantime, our ST-VOS achieves encouraging performance on the DAVIS16, Youtube-objects, DAVIS17, and SegTrackV2 data sets, where, in particular, the accuracy of 75.7% in J-mean metric is obtained on the multi-instance DAVIS17 data set.

Original language	English
Pages (from-to)	1623-1637
Number of pages	15
Journal	IEEE Transactions on Neural Networks and Learning Systems
Volume	33
Issue number	4
DOIs	https://doi.org/10.1109/TNNLS.2020.3043099
Publication status	Published - 1 Apr 2022
Externally published	Yes

Keywords

Metaoptimization
Self-teaching (ST)
Video object segmentation (VOS)

Access to Document

10.1109/TNNLS.2020.3043099

Cite this

@article{24d2692ec9e242bcb09d9bd6993aaab6,

title = "Self-Teaching Video Object Segmentation",

abstract = "Video object segmentation (VOS) is one of the most fundamental tasks for numerous sequent video applications. The crucial issue of online VOS is the drifting of segmenter when incrementally updated on continuous video frames under unconfident supervision constraints. In this work, we propose a self-teaching VOS (ST-VOS) method to make segmenter to learn online adaptation confidently as much as possible. In the segmenter learning at each time slice, the segment hypothesis and segmenter update are enclosed into a self-looping optimization circle such that they can be mutually improved for each other. To reduce error accumulation of the self-looping process, we specifically introduce a metalearning strategy to learn how to do this optimization within only a few iteration steps. To this end, the learning rates of segmenter are adaptively derived through metaoptimization in the channel space of convolutional kernels. Furthermore, to better launch the self-looping process, we calculate an initial mask map through part detectors and motion flow to well-establish a foundation for subsequent refinement, which could result in the robustness of the segmenter update. Extensive experiments demonstrate that this ST idea can boost the performance of baselines, and in the meantime, our ST-VOS achieves encouraging performance on the DAVIS16, Youtube-objects, DAVIS17, and SegTrackV2 data sets, where, in particular, the accuracy of 75.7% in J-mean metric is obtained on the multi-instance DAVIS17 data set.",

keywords = "Metaoptimization, Self-teaching (ST), Video object segmentation (VOS)",

author = "Chuanwei Zhou and Chunyan Xu and Zhen Cui and Tong Zhang and Jian Yang",

note = "Publisher Copyright: {\textcopyright} 2012 IEEE.",

year = "2022",

month = apr,

day = "1",

doi = "10.1109/TNNLS.2020.3043099",

language = "English",

volume = "33",

pages = "1623--1637",

journal = "IEEE Transactions on Neural Networks and Learning Systems",

issn = "2162-237X",

publisher = "IEEE Computational Intelligence Society",

number = "4",

}

TY - JOUR

T1 - Self-Teaching Video Object Segmentation

AU - Zhou, Chuanwei

AU - Xu, Chunyan

AU - Cui, Zhen

AU - Zhang, Tong

AU - Yang, Jian

PY - 2022/4/1

Y1 - 2022/4/1

N2 - Video object segmentation (VOS) is one of the most fundamental tasks for numerous sequent video applications. The crucial issue of online VOS is the drifting of segmenter when incrementally updated on continuous video frames under unconfident supervision constraints. In this work, we propose a self-teaching VOS (ST-VOS) method to make segmenter to learn online adaptation confidently as much as possible. In the segmenter learning at each time slice, the segment hypothesis and segmenter update are enclosed into a self-looping optimization circle such that they can be mutually improved for each other. To reduce error accumulation of the self-looping process, we specifically introduce a metalearning strategy to learn how to do this optimization within only a few iteration steps. To this end, the learning rates of segmenter are adaptively derived through metaoptimization in the channel space of convolutional kernels. Furthermore, to better launch the self-looping process, we calculate an initial mask map through part detectors and motion flow to well-establish a foundation for subsequent refinement, which could result in the robustness of the segmenter update. Extensive experiments demonstrate that this ST idea can boost the performance of baselines, and in the meantime, our ST-VOS achieves encouraging performance on the DAVIS16, Youtube-objects, DAVIS17, and SegTrackV2 data sets, where, in particular, the accuracy of 75.7% in J-mean metric is obtained on the multi-instance DAVIS17 data set.

AB - Video object segmentation (VOS) is one of the most fundamental tasks for numerous sequent video applications. The crucial issue of online VOS is the drifting of segmenter when incrementally updated on continuous video frames under unconfident supervision constraints. In this work, we propose a self-teaching VOS (ST-VOS) method to make segmenter to learn online adaptation confidently as much as possible. In the segmenter learning at each time slice, the segment hypothesis and segmenter update are enclosed into a self-looping optimization circle such that they can be mutually improved for each other. To reduce error accumulation of the self-looping process, we specifically introduce a metalearning strategy to learn how to do this optimization within only a few iteration steps. To this end, the learning rates of segmenter are adaptively derived through metaoptimization in the channel space of convolutional kernels. Furthermore, to better launch the self-looping process, we calculate an initial mask map through part detectors and motion flow to well-establish a foundation for subsequent refinement, which could result in the robustness of the segmenter update. Extensive experiments demonstrate that this ST idea can boost the performance of baselines, and in the meantime, our ST-VOS achieves encouraging performance on the DAVIS16, Youtube-objects, DAVIS17, and SegTrackV2 data sets, where, in particular, the accuracy of 75.7% in J-mean metric is obtained on the multi-instance DAVIS17 data set.

KW - Metaoptimization

KW - Self-teaching (ST)

KW - Video object segmentation (VOS)

UR - http://www.scopus.com/inward/record.url?scp=85102650073&partnerID=8YFLogxK

U2 - 10.1109/TNNLS.2020.3043099

DO - 10.1109/TNNLS.2020.3043099

M3 - Article

C2 - 33690125

AN - SCOPUS:85102650073

SN - 2162-237X

VL - 33

SP - 1623

EP - 1637

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

IS - 4

ER -

Self-Teaching Video Object Segmentation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this