Abstract
In recent years, there have been a lot of interests in incorporating semantics into simultaneous localization and mapping (SLAM) systems. This paper presents an approach to generate an outdoor large-scale 3D dense semantic map based on binocular stereo vision. The inputs to system are stereo color images from a moving vehicle. First, dense 3D space around the vehicle is constructed, and the motion of camera is estimated by visual odometry. Meanwhile, semantic segmentation is performed through the deep learning technology online, and the semantic labels are also used to verify the feature matching in visual odometry. These three processes calculate the motion, depth and semantic label of every pixel in the input views. Then, a voxel conditional random field (CRF) inference is introduced to fuse semantic labels to voxel. After that, we present a method to remove the moving objects by incorporating the semantic labels, which improves the motion segmentation accuracy. The last is to generate the dense 3D semantic map of an urban environment from arbitrary long image sequence. We evaluate our approach on KITTI vision benchmark, and the results show that the proposed method is effective.
Original language | English |
---|---|
Pages (from-to) | 194-206 |
Number of pages | 13 |
Journal | International Journal of Automation and Computing |
Volume | 15 |
Issue number | 2 |
DOIs | |
Publication status | Published - 1 Apr 2018 |
Keywords
- Semantic map
- motion segmentation
- simultaneous localization and mapping (SLAM)
- stereo vision
- visual odometry