Video semantic segmentation via feature propagation with holistic attention

Junrong Wu, Zongzheng Wen, Sanyuan Zhao*, Kele Huang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

26 Citations (Scopus)

Abstract

Since the frames of a video are inherently contiguous, information redundancy is ubiquitous. Unlike previous works densely process each frame of a video, in this paper we present a novel method to focus on efficient feature propagation across frames to tackle the challenging video semantic segmentation task. Firstly, we propose a Light, Efficient and Real-time network (denoted as LERNet) as a strong backbone network for per-frame processing. Then we mine rich features within a key frame and propagate the across-frame consistency information by calculating a temporal holistic attention with the following non-key frame. Each element of the attention matrix represents the global correlation between pixels of a non-key frame and the previous key frame. Concretely, we propose a brand-new attention module to capture the spatial consistency on low-level features along temporal dimension. Then we employ the attention weights as a spatial transition guidance for directly generating high-level features of the current non-key frame from the weighted corresponding key frame. Finally, we efficiently fuse the hierarchical features of the non-key frame and obtain the final segmentation result. Extensive experiments on two popular datasets, i.e. the CityScapes and the CamVid, demonstrate that the proposed approach achieves a remarkable balance between inference speed and accuracy.

Original languageEnglish
Article number107268
JournalPattern Recognition
Volume104
DOIs
Publication statusPublished - Aug 2020

Keywords

  • Attention mechanism
  • Feature propagation
  • Real-time
  • Video semantic segmentation

Fingerprint

Dive into the research topics of 'Video semantic segmentation via feature propagation with holistic attention'. Together they form a unique fingerprint.

Cite this