Abstract
To achieve accurate segmentation in unconstrained videos, we propose a novel segmentation framework based on a two-stream deep convolution network. Our algorithm exploits the object's robust pixel-level features within all the video frames and generates foreground likelihood maps with sufficient details. At first, a two-stream video segmentation network using multiple hierarchical features is designed to generate initial segmentation masks. Then, all initial segmentation masks and original corresponding images are collected to learn an appearance model by the least-square regression method. The model computes appearance likelihood map for every image. Finally, pairs of initial segmentation masks and appearance likelihood maps are fused by a proposed fusion network to generate final high-quality segmentation maps. Experiments on the challenging dataset DAVIS verify the effectiveness of our appearance regression and demonstrate that our proposed algorithm outperforms the state-of-the-art algorithms.
Original language | English |
---|---|
Pages (from-to) | 59-67 |
Number of pages | 9 |
Journal | Neurocomputing |
Volume | 334 |
DOIs | |
Publication status | Published - 21 Mar 2019 |
Externally published | Yes |
Keywords
- Appearance regression
- Fusion network
- Video segmentation