VID-SLAM: Robust Pose Estimation with RGBD-Inertial Input for Indoor Robotic Localization

Dan Shan; Jinhe Su; Xiaofeng Wang; Yujun Liu; Taojian Zhou; Zebiao Wu

doi:10.3390/electronics13020318

VID-SLAM: Robust Pose Estimation with RGBD-Inertial Input for Indoor Robotic Localization

Dan Shan, Jinhe Su^*, Xiaofeng Wang, Yujun Liu, Taojian Zhou, Zebiao Wu^*

^*Corresponding author for this work

School of Mechatronical Engineering

Research output: Contribution to journal › Article › peer-review

4 Citations (Scopus)

Abstract

This study proposes a tightly coupled multi-sensor Simultaneous Localization and Mapping (SLAM) framework that integrates RGB-D and inertial measurements to achieve highly accurate 6 degree of freedom (6DOF) metric localization in a variety of environments. Through the consideration of geometric consistency, inertial measurement unit constraints, and visual re-projection errors, we present visual-inertial-depth odometry (called VIDO), an efficient state estimation back-end, to minimise the cascading losses of all factors. Existing visual-inertial odometers rely on visual feature-based constraints to eliminate the translational displacement and angular drift produced by Inertial Measurement Unit (IMU) noise. To mitigate these constraints, we introduce the iterative closest point error of adjacent frames and update the state vectors of observed frames through the minimisation of the estimation errors of all sensors. Moreover, the closed-loop module allows for further optimization of the global attitude map to correct the long-term drift. For experiments, we collect an RGBD-inertial data set for a comprehensive evaluation of VID-SLAM. The data set contains RGB-D image pairs, IMU measurements, and two types of ground truth data. The experimental results show that VID-SLAM achieves state-of-the-art positioning accuracy and outperforms mainstream vSLAM solutions, including ElasticFusion, ORB-SLAM2, and VINS-Mono.

Original language	English
Article number	318
Journal	Electronics (Switzerland)
Volume	13
Issue number	2
DOIs	https://doi.org/10.3390/electronics13020318
Publication status	Published - Jan 2024

Keywords

Simultaneous Localization and Mapping
motion and tracking
vision for robotics and autonomous vehicles

Access to Document

10.3390/electronics13020318

Cite this

Shan, D., Su, J., Wang, X., Liu, Y., Zhou, T., & Wu, Z. (2024). VID-SLAM: Robust Pose Estimation with RGBD-Inertial Input for Indoor Robotic Localization. Electronics (Switzerland), 13(2), Article 318. https://doi.org/10.3390/electronics13020318

@article{161d0d88972249bbb3cf2966ee82e39c,

title = "VID-SLAM: Robust Pose Estimation with RGBD-Inertial Input for Indoor Robotic Localization",

abstract = "This study proposes a tightly coupled multi-sensor Simultaneous Localization and Mapping (SLAM) framework that integrates RGB-D and inertial measurements to achieve highly accurate 6 degree of freedom (6DOF) metric localization in a variety of environments. Through the consideration of geometric consistency, inertial measurement unit constraints, and visual re-projection errors, we present visual-inertial-depth odometry (called VIDO), an efficient state estimation back-end, to minimise the cascading losses of all factors. Existing visual-inertial odometers rely on visual feature-based constraints to eliminate the translational displacement and angular drift produced by Inertial Measurement Unit (IMU) noise. To mitigate these constraints, we introduce the iterative closest point error of adjacent frames and update the state vectors of observed frames through the minimisation of the estimation errors of all sensors. Moreover, the closed-loop module allows for further optimization of the global attitude map to correct the long-term drift. For experiments, we collect an RGBD-inertial data set for a comprehensive evaluation of VID-SLAM. The data set contains RGB-D image pairs, IMU measurements, and two types of ground truth data. The experimental results show that VID-SLAM achieves state-of-the-art positioning accuracy and outperforms mainstream vSLAM solutions, including ElasticFusion, ORB-SLAM2, and VINS-Mono.",

keywords = "Simultaneous Localization and Mapping, motion and tracking, vision for robotics and autonomous vehicles",

author = "Dan Shan and Jinhe Su and Xiaofeng Wang and Yujun Liu and Taojian Zhou and Zebiao Wu",

note = "Publisher Copyright: {\textcopyright} 2024 by the authors.",

year = "2024",

month = jan,

doi = "10.3390/electronics13020318",

language = "English",

volume = "13",

journal = "Electronics (Switzerland)",

issn = "2079-9292",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "2",

}

TY - JOUR

T1 - VID-SLAM

T2 - Robust Pose Estimation with RGBD-Inertial Input for Indoor Robotic Localization

AU - Shan, Dan

AU - Su, Jinhe

AU - Wang, Xiaofeng

AU - Liu, Yujun

AU - Zhou, Taojian

AU - Wu, Zebiao

PY - 2024/1

Y1 - 2024/1

N2 - This study proposes a tightly coupled multi-sensor Simultaneous Localization and Mapping (SLAM) framework that integrates RGB-D and inertial measurements to achieve highly accurate 6 degree of freedom (6DOF) metric localization in a variety of environments. Through the consideration of geometric consistency, inertial measurement unit constraints, and visual re-projection errors, we present visual-inertial-depth odometry (called VIDO), an efficient state estimation back-end, to minimise the cascading losses of all factors. Existing visual-inertial odometers rely on visual feature-based constraints to eliminate the translational displacement and angular drift produced by Inertial Measurement Unit (IMU) noise. To mitigate these constraints, we introduce the iterative closest point error of adjacent frames and update the state vectors of observed frames through the minimisation of the estimation errors of all sensors. Moreover, the closed-loop module allows for further optimization of the global attitude map to correct the long-term drift. For experiments, we collect an RGBD-inertial data set for a comprehensive evaluation of VID-SLAM. The data set contains RGB-D image pairs, IMU measurements, and two types of ground truth data. The experimental results show that VID-SLAM achieves state-of-the-art positioning accuracy and outperforms mainstream vSLAM solutions, including ElasticFusion, ORB-SLAM2, and VINS-Mono.

AB - This study proposes a tightly coupled multi-sensor Simultaneous Localization and Mapping (SLAM) framework that integrates RGB-D and inertial measurements to achieve highly accurate 6 degree of freedom (6DOF) metric localization in a variety of environments. Through the consideration of geometric consistency, inertial measurement unit constraints, and visual re-projection errors, we present visual-inertial-depth odometry (called VIDO), an efficient state estimation back-end, to minimise the cascading losses of all factors. Existing visual-inertial odometers rely on visual feature-based constraints to eliminate the translational displacement and angular drift produced by Inertial Measurement Unit (IMU) noise. To mitigate these constraints, we introduce the iterative closest point error of adjacent frames and update the state vectors of observed frames through the minimisation of the estimation errors of all sensors. Moreover, the closed-loop module allows for further optimization of the global attitude map to correct the long-term drift. For experiments, we collect an RGBD-inertial data set for a comprehensive evaluation of VID-SLAM. The data set contains RGB-D image pairs, IMU measurements, and two types of ground truth data. The experimental results show that VID-SLAM achieves state-of-the-art positioning accuracy and outperforms mainstream vSLAM solutions, including ElasticFusion, ORB-SLAM2, and VINS-Mono.

KW - Simultaneous Localization and Mapping

KW - motion and tracking

KW - vision for robotics and autonomous vehicles

UR - http://www.scopus.com/inward/record.url?scp=85183356626&partnerID=8YFLogxK

U2 - 10.3390/electronics13020318

DO - 10.3390/electronics13020318

M3 - Article

AN - SCOPUS:85183356626

SN - 2079-9292

VL - 13

JO - Electronics (Switzerland)

JF - Electronics (Switzerland)

IS - 2

M1 - 318

ER -

VID-SLAM: Robust Pose Estimation with RGBD-Inertial Input for Indoor Robotic Localization

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this