Abstract
This paper presents an attention-based video segmentation network and its global information optimization training method. We propose an improved segmentation network, and use it to compute initial segmentation masks. Then the initial masks are considered as priors to finetune the network. Finally, the network with the learnt weight generates fine masks. Our two-stream segmentation network includes appearance branch and motion branch. Fed with image and optical flow image separately, the network extracts appearance features and motion features to generate segmentation mask. An attention module is embedded in the network, between the adjacent high level feature and low level feature. Thus the high level features locate the semantic region for the low level feature, speeding up the network convergence and improving segmentation quality. We propose to optimize the initial masks to finetune the original appearance network weights, making the network recognize the object and improving the network performance. Experiments on DAVIS show the effectiveness of the segmentation framework. Our method outperforms the traditional two-stream segmentation algorithms, and achieves comparable results with algorithms on the dataset's leaderboard. Validation experiment illustrates our attention module greatly improves the network performance than the baseline.
Translated title of the contribution | An Improved Video Segmentation Network and Its Global Information Optimization Method |
---|---|
Original language | Chinese (Traditional) |
Pages (from-to) | 787-796 |
Number of pages | 10 |
Journal | Zidonghua Xuebao/Acta Automatica Sinica |
Volume | 48 |
Issue number | 3 |
DOIs | |
Publication status | Published - Mar 2022 |