Building and optimization of 3D semantic map based on Lidar and camera fusion

Jing Li; Xin Zhang; Jiehao Li; Yanyu Liu; Junzheng Wang

doi:10.1016/j.neucom.2020.06.004

Building and optimization of 3D semantic map based on Lidar and camera fusion

Jing Li, Xin Zhang, Jiehao Li^*, Yanyu Liu, Junzheng Wang

^*Corresponding author for this work

School of Automation

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

90 Citations (Scopus)

Abstract

When considering the robot application of the complex scenarios, the traditional geometric maps are insufficient because of the lack of interactions with the environment. In this paper, a three-dimensional (3D) semantic map with large-scale and accurate integrating Lidar and camera information is presented to achieve real-time road scenes. Firstly, simultaneous localization and mapping (SLAM) is performed to locate the robot position with the multi-sensor fusion of the Lidar and inertial measurement unit (IMU), and the map of the surrounding scenes is constructed while the robot is moving. Moreover, a convolutional neural networks (CNNs)-based semantic segmentation of images is employed to develop the semantic map of the environment. Following the synchronization of the time and space, the sensor fusion of Lidar and camera are used to generate the semantic labeled frame of point clouds and then create a semantic map in term of the posture. Besides, improving the capacity of classification, a higher-order 3D full connection conditional random fields (CRFs) method is utilized to optimize the semantic map. Finally, extensive experiment results evaluated on the KITTI dataset have illustrated the effectiveness of the proposed method.

Original language	English
Pages (from-to)	394-407
Number of pages	14
Journal	Neurocomputing
Volume	409
DOIs	https://doi.org/10.1016/j.neucom.2020.06.004
Publication status	Published - 7 Oct 2020

Keywords

Higher-order CRFs
Lidar SLAM
Semantic map
Semantic segmentation

Access to Document

10.1016/j.neucom.2020.06.004

Cite this

@article{9ec143c2995e41f19a9e68422179fb77,

title = "Building and optimization of 3D semantic map based on Lidar and camera fusion",

abstract = "When considering the robot application of the complex scenarios, the traditional geometric maps are insufficient because of the lack of interactions with the environment. In this paper, a three-dimensional (3D) semantic map with large-scale and accurate integrating Lidar and camera information is presented to achieve real-time road scenes. Firstly, simultaneous localization and mapping (SLAM) is performed to locate the robot position with the multi-sensor fusion of the Lidar and inertial measurement unit (IMU), and the map of the surrounding scenes is constructed while the robot is moving. Moreover, a convolutional neural networks (CNNs)-based semantic segmentation of images is employed to develop the semantic map of the environment. Following the synchronization of the time and space, the sensor fusion of Lidar and camera are used to generate the semantic labeled frame of point clouds and then create a semantic map in term of the posture. Besides, improving the capacity of classification, a higher-order 3D full connection conditional random fields (CRFs) method is utilized to optimize the semantic map. Finally, extensive experiment results evaluated on the KITTI dataset have illustrated the effectiveness of the proposed method.",

keywords = "Higher-order CRFs, Lidar SLAM, Semantic map, Semantic segmentation",

author = "Jing Li and Xin Zhang and Jiehao Li and Yanyu Liu and Junzheng Wang",

note = "Publisher Copyright: {\textcopyright} 2020 Elsevier B.V.",

year = "2020",

month = oct,

day = "7",

doi = "10.1016/j.neucom.2020.06.004",

language = "English",

volume = "409",

pages = "394--407",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Building and optimization of 3D semantic map based on Lidar and camera fusion

AU - Li, Jing

AU - Zhang, Xin

AU - Li, Jiehao

AU - Liu, Yanyu

AU - Wang, Junzheng

PY - 2020/10/7

Y1 - 2020/10/7

N2 - When considering the robot application of the complex scenarios, the traditional geometric maps are insufficient because of the lack of interactions with the environment. In this paper, a three-dimensional (3D) semantic map with large-scale and accurate integrating Lidar and camera information is presented to achieve real-time road scenes. Firstly, simultaneous localization and mapping (SLAM) is performed to locate the robot position with the multi-sensor fusion of the Lidar and inertial measurement unit (IMU), and the map of the surrounding scenes is constructed while the robot is moving. Moreover, a convolutional neural networks (CNNs)-based semantic segmentation of images is employed to develop the semantic map of the environment. Following the synchronization of the time and space, the sensor fusion of Lidar and camera are used to generate the semantic labeled frame of point clouds and then create a semantic map in term of the posture. Besides, improving the capacity of classification, a higher-order 3D full connection conditional random fields (CRFs) method is utilized to optimize the semantic map. Finally, extensive experiment results evaluated on the KITTI dataset have illustrated the effectiveness of the proposed method.

AB - When considering the robot application of the complex scenarios, the traditional geometric maps are insufficient because of the lack of interactions with the environment. In this paper, a three-dimensional (3D) semantic map with large-scale and accurate integrating Lidar and camera information is presented to achieve real-time road scenes. Firstly, simultaneous localization and mapping (SLAM) is performed to locate the robot position with the multi-sensor fusion of the Lidar and inertial measurement unit (IMU), and the map of the surrounding scenes is constructed while the robot is moving. Moreover, a convolutional neural networks (CNNs)-based semantic segmentation of images is employed to develop the semantic map of the environment. Following the synchronization of the time and space, the sensor fusion of Lidar and camera are used to generate the semantic labeled frame of point clouds and then create a semantic map in term of the posture. Besides, improving the capacity of classification, a higher-order 3D full connection conditional random fields (CRFs) method is utilized to optimize the semantic map. Finally, extensive experiment results evaluated on the KITTI dataset have illustrated the effectiveness of the proposed method.

KW - Higher-order CRFs

KW - Lidar SLAM

KW - Semantic map

KW - Semantic segmentation

UR - http://www.scopus.com/inward/record.url?scp=85087333210&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2020.06.004

DO - 10.1016/j.neucom.2020.06.004

M3 - Article

AN - SCOPUS:85087333210

SN - 0925-2312

VL - 409

SP - 394

EP - 407

JO - Neurocomputing

JF - Neurocomputing

ER -

Building and optimization of 3D semantic map based on Lidar and camera fusion

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this