Real-time and light-weighted unsupervised video object segmentation network

Zongji Zhao, Sanyuan Zhao*, Jianbing Shen

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

54 Citations (Scopus)

Abstract

Video object segmentation is one of the most practical computer vision tasks, especially in the unsupervised case, which has no manually labeled segmentation mask at the beginning of a video sequence. In this paper, we propose a new real-time unsupervised video object segmentation network. Based on the encoder-decoder framework, we present a Dynamic ASPP module and a RNN-Conv module. The former adds a dynamic selection mechanism into the Astrous Spatial Pyramid Pooling structure, and then the dilated convolutional kernels adaptively select appropriate features according to the scales by the channel attention mechanism. Compared with directly concatenating the dilated convolutional features, dynamically selecting feature maps reduces the amount of parameters and makes the module more efficient. The RNN-Conv module incorporates the RNN units with external convolutional blocks, aggregating the temporal features of a video sequence with the spatial information extracted by the convolutional network. We stack this module to extract deeper spatiotemporal features than the traditional RNN network. This module helps to avoid the gradient disappearance and explosion during network training. We test our network on the popular video object segmentation datasets. The experiment results demonstrate the effectiveness of our model.1

Original languageEnglish
Article number108120
JournalPattern Recognition
Volume120
DOIs
Publication statusPublished - Dec 2021

Keywords

  • Salient object detection
  • Unsupervised video object segmentation

Fingerprint

Dive into the research topics of 'Real-time and light-weighted unsupervised video object segmentation network'. Together they form a unique fingerprint.

Cite this