Abstract
Most existing deep learning-based motion segmentation methods treat motion segmentation as a binary segmentation problem, which is generally not the real case in dynamic scenes. In addition, the object and camera motion are often mixed, making the motion segmentation problem difficult. This paper proposes a joint learning method which fuses semantic features and motion clues using CNNs with deformable convolution and a motion embedding module, to address multi-object motion segmentation problem. The deformable convolution module serves to fusion color and motion information. And the motion embedding module learns to distinguish objects' motion status with inspiration from geometric modeling methods. We perform extensive quantitative and qualitative experiments on benchmark datasets. Especially, we label over 9000 images of KITTI visual odometry dataset to help training the deformable module. Our method achieves superior performance in comparison to the current state-of-the-art in terms of speed and accuracy.
Original language | English |
---|---|
Article number | 9380630 |
Pages (from-to) | 56812-56821 |
Number of pages | 10 |
Journal | IEEE Access |
Volume | 9 |
DOIs | |
Publication status | Published - 2021 |
Keywords
- Supervised learning
- motion segmentation
- video object segmentation