VINS-FEN: Monocular Visual-Inertial SLAM Based on Feature Extraction Network

Ke Wang; Cheng Zhang; Di Su; Kai Sun; Tian Zhan

doi:10.1109/CMVIT57620.2023.00025

VINS-FEN: Monocular Visual-Inertial SLAM Based on Feature Extraction Network

Ke Wang, Cheng Zhang^*, Di Su, Kai Sun, Tian Zhan

^*Corresponding author for this work

School of Aerospace Engineering

Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Monocular visual-inertial simultaneous localization and mapping (SLAM) technology is able to be widely used to provide pose for unmanned aerial vehicles. It usually uses artificially designed feature points and descriptors as the feature and basis for image matching. However, it is easy to cause the problem of difficult feature extraction and feature matching error under uneven illumination and weak texture environment. In order to solve the above problems, this paper adopts the deep convolutional neural network (CNN) instead of traditional artificial design features to replace the traditional front end of visual-inertial system (VINS). My main work includes designing deep convolutional neural Network-Feature Extraction Network (FEN), for feature extraction, proposing a two-stage matching strategy, and porting the above improvements to the front end of VINS to form a complete system. Finally, verification is conducted on HPatches dataset and EuRoc dataset. The experimental results show that FEN is 3%~23% higher than the traditional method in repeatability and accuracy of extracting feature points. The VINS with FEN as the front end has stronger robustness and improves localization accuracy by 17.3% under uneven illumination and weak texture conditions.

Original language	English
Title of host publication	Proceedings - 2023 7th International Conference on Machine Vision and Information Technology, CMVIT 2023
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	86-91
Number of pages	6
ISBN (Electronic)	9781665464857
DOIs	https://doi.org/10.1109/CMVIT57620.2023.00025
Publication status	Published - 2023
Event	7th International Conference on Machine Vision and Information Technology, CMVIT 2023 - Virtual, Online, China Duration: 25 Mar 2023 → …

Publication series

Name	Proceedings - 2023 7th International Conference on Machine Vision and Information Technology, CMVIT 2023

Conference

Conference	7th International Conference on Machine Vision and Information Technology, CMVIT 2023
Country/Territory	China
City	Virtual, Online
Period	25/03/23 → …

Keywords

deep convolutional neural network
feature extraction
feature matching
simultaneous localization and mapping
visual-inertial system

Access to Document

10.1109/CMVIT57620.2023.00025

Cite this

Wang, K., Zhang, C., Su, D., Sun, K., & Zhan, T. (2023). VINS-FEN: Monocular Visual-Inertial SLAM Based on Feature Extraction Network. In Proceedings - 2023 7th International Conference on Machine Vision and Information Technology, CMVIT 2023 (pp. 86-91). (Proceedings - 2023 7th International Conference on Machine Vision and Information Technology, CMVIT 2023). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CMVIT57620.2023.00025

Wang, Ke ; Zhang, Cheng ; Su, Di et al. / VINS-FEN : Monocular Visual-Inertial SLAM Based on Feature Extraction Network. Proceedings - 2023 7th International Conference on Machine Vision and Information Technology, CMVIT 2023. Institute of Electrical and Electronics Engineers Inc., 2023. pp. 86-91 (Proceedings - 2023 7th International Conference on Machine Vision and Information Technology, CMVIT 2023).

@inproceedings{7a9818438eed4704b6f3866dbf818228,

title = "VINS-FEN: Monocular Visual-Inertial SLAM Based on Feature Extraction Network",

abstract = "Monocular visual-inertial simultaneous localization and mapping (SLAM) technology is able to be widely used to provide pose for unmanned aerial vehicles. It usually uses artificially designed feature points and descriptors as the feature and basis for image matching. However, it is easy to cause the problem of difficult feature extraction and feature matching error under uneven illumination and weak texture environment. In order to solve the above problems, this paper adopts the deep convolutional neural network (CNN) instead of traditional artificial design features to replace the traditional front end of visual-inertial system (VINS). My main work includes designing deep convolutional neural Network-Feature Extraction Network (FEN), for feature extraction, proposing a two-stage matching strategy, and porting the above improvements to the front end of VINS to form a complete system. Finally, verification is conducted on HPatches dataset and EuRoc dataset. The experimental results show that FEN is 3%~23% higher than the traditional method in repeatability and accuracy of extracting feature points. The VINS with FEN as the front end has stronger robustness and improves localization accuracy by 17.3% under uneven illumination and weak texture conditions.",

keywords = "deep convolutional neural network, feature extraction, feature matching, simultaneous localization and mapping, visual-inertial system",

author = "Ke Wang and Cheng Zhang and Di Su and Kai Sun and Tian Zhan",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 7th International Conference on Machine Vision and Information Technology, CMVIT 2023 ; Conference date: 25-03-2023",

year = "2023",

doi = "10.1109/CMVIT57620.2023.00025",

language = "English",

series = "Proceedings - 2023 7th International Conference on Machine Vision and Information Technology, CMVIT 2023",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "86--91",

booktitle = "Proceedings - 2023 7th International Conference on Machine Vision and Information Technology, CMVIT 2023",

address = "United States",

}

Wang, K, Zhang, C, Su, D, Sun, K & Zhan, T 2023, VINS-FEN: Monocular Visual-Inertial SLAM Based on Feature Extraction Network. in Proceedings - 2023 7th International Conference on Machine Vision and Information Technology, CMVIT 2023. Proceedings - 2023 7th International Conference on Machine Vision and Information Technology, CMVIT 2023, Institute of Electrical and Electronics Engineers Inc., pp. 86-91, 7th International Conference on Machine Vision and Information Technology, CMVIT 2023, Virtual, Online, China, 25/03/23. https://doi.org/10.1109/CMVIT57620.2023.00025

VINS-FEN: Monocular Visual-Inertial SLAM Based on Feature Extraction Network. / Wang, Ke; Zhang, Cheng; Su, Di et al.
Proceedings - 2023 7th International Conference on Machine Vision and Information Technology, CMVIT 2023. Institute of Electrical and Electronics Engineers Inc., 2023. p. 86-91 (Proceedings - 2023 7th International Conference on Machine Vision and Information Technology, CMVIT 2023).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - VINS-FEN

T2 - 7th International Conference on Machine Vision and Information Technology, CMVIT 2023

AU - Wang, Ke

AU - Zhang, Cheng

AU - Su, Di

AU - Sun, Kai

AU - Zhan, Tian

PY - 2023

Y1 - 2023

N2 - Monocular visual-inertial simultaneous localization and mapping (SLAM) technology is able to be widely used to provide pose for unmanned aerial vehicles. It usually uses artificially designed feature points and descriptors as the feature and basis for image matching. However, it is easy to cause the problem of difficult feature extraction and feature matching error under uneven illumination and weak texture environment. In order to solve the above problems, this paper adopts the deep convolutional neural network (CNN) instead of traditional artificial design features to replace the traditional front end of visual-inertial system (VINS). My main work includes designing deep convolutional neural Network-Feature Extraction Network (FEN), for feature extraction, proposing a two-stage matching strategy, and porting the above improvements to the front end of VINS to form a complete system. Finally, verification is conducted on HPatches dataset and EuRoc dataset. The experimental results show that FEN is 3%~23% higher than the traditional method in repeatability and accuracy of extracting feature points. The VINS with FEN as the front end has stronger robustness and improves localization accuracy by 17.3% under uneven illumination and weak texture conditions.

AB - Monocular visual-inertial simultaneous localization and mapping (SLAM) technology is able to be widely used to provide pose for unmanned aerial vehicles. It usually uses artificially designed feature points and descriptors as the feature and basis for image matching. However, it is easy to cause the problem of difficult feature extraction and feature matching error under uneven illumination and weak texture environment. In order to solve the above problems, this paper adopts the deep convolutional neural network (CNN) instead of traditional artificial design features to replace the traditional front end of visual-inertial system (VINS). My main work includes designing deep convolutional neural Network-Feature Extraction Network (FEN), for feature extraction, proposing a two-stage matching strategy, and porting the above improvements to the front end of VINS to form a complete system. Finally, verification is conducted on HPatches dataset and EuRoc dataset. The experimental results show that FEN is 3%~23% higher than the traditional method in repeatability and accuracy of extracting feature points. The VINS with FEN as the front end has stronger robustness and improves localization accuracy by 17.3% under uneven illumination and weak texture conditions.

KW - deep convolutional neural network

KW - feature extraction

KW - feature matching

KW - simultaneous localization and mapping

KW - visual-inertial system

UR - http://www.scopus.com/inward/record.url?scp=85164836080&partnerID=8YFLogxK

U2 - 10.1109/CMVIT57620.2023.00025

DO - 10.1109/CMVIT57620.2023.00025

M3 - Conference contribution

AN - SCOPUS:85164836080

T3 - Proceedings - 2023 7th International Conference on Machine Vision and Information Technology, CMVIT 2023

SP - 86

EP - 91

BT - Proceedings - 2023 7th International Conference on Machine Vision and Information Technology, CMVIT 2023

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 25 March 2023

ER -

Wang K, Zhang C, Su D, Sun K, Zhan T. VINS-FEN: Monocular Visual-Inertial SLAM Based on Feature Extraction Network. In Proceedings - 2023 7th International Conference on Machine Vision and Information Technology, CMVIT 2023. Institute of Electrical and Electronics Engineers Inc. 2023. p. 86-91. (Proceedings - 2023 7th International Conference on Machine Vision and Information Technology, CMVIT 2023). doi: 10.1109/CMVIT57620.2023.00025

VINS-FEN: Monocular Visual-Inertial SLAM Based on Feature Extraction Network

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this