Real-time and light-weighted unsupervised video object segmentation network

Zongji Zhao, Sanyuan Zhao*, Jianbing Shen

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

54 引用 (Scopus)

摘要

Video object segmentation is one of the most practical computer vision tasks, especially in the unsupervised case, which has no manually labeled segmentation mask at the beginning of a video sequence. In this paper, we propose a new real-time unsupervised video object segmentation network. Based on the encoder-decoder framework, we present a Dynamic ASPP module and a RNN-Conv module. The former adds a dynamic selection mechanism into the Astrous Spatial Pyramid Pooling structure, and then the dilated convolutional kernels adaptively select appropriate features according to the scales by the channel attention mechanism. Compared with directly concatenating the dilated convolutional features, dynamically selecting feature maps reduces the amount of parameters and makes the module more efficient. The RNN-Conv module incorporates the RNN units with external convolutional blocks, aggregating the temporal features of a video sequence with the spatial information extracted by the convolutional network. We stack this module to extract deeper spatiotemporal features than the traditional RNN network. This module helps to avoid the gradient disappearance and explosion during network training. We test our network on the popular video object segmentation datasets. The experiment results demonstrate the effectiveness of our model.1

源语言英语
文章编号108120
期刊Pattern Recognition
120
DOI
出版状态已出版 - 12月 2021

指纹

探究 'Real-time and light-weighted unsupervised video object segmentation network' 的科研主题。它们共同构成独一无二的指纹。

引用此