Dual Attention Based Network with Hierarchical ConvLSTM for Video Object Segmentation

  • Zongji Zhao
  • , Sanyuan Zhao*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Semi-supervised Video object segmentation is one of the most basic tasks in the field of computer vision, especially in the multi-object case. It aims to segment masks of multiple foreground objects in given video sequence with annotation mask of the first frame as prior knowledge. In this paper, we propose a novel multi-object video segmentation model. We use the U-Net architecture to obtain multi-scale spatial features. In the encoder part, the spatial attention mechanism and channel attention is used to enhance the spatial features simultaneously. We use the recurrent ConvLSTM module in the decoder to segment different object instances in one stage and keep the segmentation object consistent over time. In addition, we use three loss functions for joint training to improve the model training effect. We test our network on the popular video object segmentation dataset DAVIS2017. The experiment results demonstrate that our model achieves state-of-art performance. Moreover, our model achieves faster inference runtimes than other methods.

Original languageEnglish
Title of host publicationPattern Recognition and Computer Vision - 4th Chinese Conference, PRCV 2021, Proceedings
EditorsHuimin Ma, Liang Wang, Changshui Zhang, Fei Wu, Tieniu Tan, Yaonan Wang, Jianhuang Lai, Yao Zhao
PublisherSpringer Science and Business Media Deutschland GmbH
Pages323-335
Number of pages13
ISBN (Print)9783030880125
DOIs
Publication statusPublished - 2021
Event4th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2021 - Beijing, China
Duration: 29 Oct 20211 Nov 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13022 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference4th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2021
Country/TerritoryChina
CityBeijing
Period29/10/211/11/21

Keywords

  • Attention
  • ConvLSTM
  • Video object segmentation

Fingerprint

Dive into the research topics of 'Dual Attention Based Network with Hierarchical ConvLSTM for Video Object Segmentation'. Together they form a unique fingerprint.

Cite this