CVTNet: A Cross-View Transformer Network for LiDAR-Based Place Recognition in Autonomous Driving Environments

Junyi Ma; Guangming Xiong; Jingyi Xu; Xieyuanli Chen

doi:10.1109/TII.2023.3313635

CVTNet: A Cross-View Transformer Network for LiDAR-Based Place Recognition in Autonomous Driving Environments

Junyi Ma, Guangming Xiong, Jingyi Xu, Xieyuanli Chen^*

^*此作品的通讯作者

机械与车辆学院

科研成果: 期刊稿件 › 文章 › 同行评审

13 引用（Scopus）

摘要

LiDAR-based place recognition (LPR) is one of the most crucial components of autonomous vehicles to identify previously visited places in GPS-denied environments. Most existing LPR methods use mundane representations of the input point cloud without considering different views, which may not fully exploit the information from LiDAR sensors. In this article, we propose a cross-view transformer-based network, dubbed CVTNet, to fuse the range image views and bird's eye views generated from the LiDAR data. It extracts correlations within the views using intratransformers and between the two different views using intertransformers. Based on that, our proposed CVTNet generates a yaw-angle-invariant global descriptor for each laser scan end-to-end online and retrieves previously seen places by descriptor matching between the current query scan and the prebuilt database. We evaluate our approach on three datasets collected with different sensor setups and environmental conditions. The experimental results show that our method outperforms the state-of-the-art LPR methods with strong robustness to viewpoint changes and long-time spans. Furthermore, our approach has better real-time performance that can run faster than the typical LiDAR frame rate does.

源语言	英语
页（从-至）	4039-4048
页数	10
期刊	IEEE Transactions on Industrial Informatics
卷	20
期	3
DOI	https://doi.org/10.1109/TII.2023.3313635
出版状态	已出版 - 1 3月 2024

访问文件

10.1109/TII.2023.3313635

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{90c7e782d9c24687b25b4f7520b5242a,

title = "CVTNet: A Cross-View Transformer Network for LiDAR-Based Place Recognition in Autonomous Driving Environments",

abstract = "LiDAR-based place recognition (LPR) is one of the most crucial components of autonomous vehicles to identify previously visited places in GPS-denied environments. Most existing LPR methods use mundane representations of the input point cloud without considering different views, which may not fully exploit the information from LiDAR sensors. In this article, we propose a cross-view transformer-based network, dubbed CVTNet, to fuse the range image views and bird's eye views generated from the LiDAR data. It extracts correlations within the views using intratransformers and between the two different views using intertransformers. Based on that, our proposed CVTNet generates a yaw-angle-invariant global descriptor for each laser scan end-to-end online and retrieves previously seen places by descriptor matching between the current query scan and the prebuilt database. We evaluate our approach on three datasets collected with different sensor setups and environmental conditions. The experimental results show that our method outperforms the state-of-the-art LPR methods with strong robustness to viewpoint changes and long-time spans. Furthermore, our approach has better real-time performance that can run faster than the typical LiDAR frame rate does.",

keywords = "Autonomous driving, LiDAR place recognition (LPR), multiview fusion, transformer network",

author = "Junyi Ma and Guangming Xiong and Jingyi Xu and Xieyuanli Chen",

note = "Publisher Copyright: {\textcopyright} 2005-2012 IEEE.",

year = "2024",

month = mar,

day = "1",

doi = "10.1109/TII.2023.3313635",

language = "English",

volume = "20",

pages = "4039--4048",

journal = "IEEE Transactions on Industrial Informatics",

issn = "1551-3203",

publisher = "IEEE Computer Society",

number = "3",

}

TY - JOUR

T1 - CVTNet

T2 - A Cross-View Transformer Network for LiDAR-Based Place Recognition in Autonomous Driving Environments

AU - Ma, Junyi

AU - Xiong, Guangming

AU - Xu, Jingyi

AU - Chen, Xieyuanli

PY - 2024/3/1

Y1 - 2024/3/1

N2 - LiDAR-based place recognition (LPR) is one of the most crucial components of autonomous vehicles to identify previously visited places in GPS-denied environments. Most existing LPR methods use mundane representations of the input point cloud without considering different views, which may not fully exploit the information from LiDAR sensors. In this article, we propose a cross-view transformer-based network, dubbed CVTNet, to fuse the range image views and bird's eye views generated from the LiDAR data. It extracts correlations within the views using intratransformers and between the two different views using intertransformers. Based on that, our proposed CVTNet generates a yaw-angle-invariant global descriptor for each laser scan end-to-end online and retrieves previously seen places by descriptor matching between the current query scan and the prebuilt database. We evaluate our approach on three datasets collected with different sensor setups and environmental conditions. The experimental results show that our method outperforms the state-of-the-art LPR methods with strong robustness to viewpoint changes and long-time spans. Furthermore, our approach has better real-time performance that can run faster than the typical LiDAR frame rate does.

AB - LiDAR-based place recognition (LPR) is one of the most crucial components of autonomous vehicles to identify previously visited places in GPS-denied environments. Most existing LPR methods use mundane representations of the input point cloud without considering different views, which may not fully exploit the information from LiDAR sensors. In this article, we propose a cross-view transformer-based network, dubbed CVTNet, to fuse the range image views and bird's eye views generated from the LiDAR data. It extracts correlations within the views using intratransformers and between the two different views using intertransformers. Based on that, our proposed CVTNet generates a yaw-angle-invariant global descriptor for each laser scan end-to-end online and retrieves previously seen places by descriptor matching between the current query scan and the prebuilt database. We evaluate our approach on three datasets collected with different sensor setups and environmental conditions. The experimental results show that our method outperforms the state-of-the-art LPR methods with strong robustness to viewpoint changes and long-time spans. Furthermore, our approach has better real-time performance that can run faster than the typical LiDAR frame rate does.

KW - Autonomous driving

KW - LiDAR place recognition (LPR)

KW - multiview fusion

KW - transformer network

UR - http://www.scopus.com/inward/record.url?scp=85172439470&partnerID=8YFLogxK

U2 - 10.1109/TII.2023.3313635

DO - 10.1109/TII.2023.3313635

M3 - Article

AN - SCOPUS:85172439470

SN - 1551-3203

VL - 20

SP - 4039

EP - 4048

JO - IEEE Transactions on Industrial Informatics

JF - IEEE Transactions on Industrial Informatics

IS - 3

ER -

CVTNet: A Cross-View Transformer Network for LiDAR-Based Place Recognition in Autonomous Driving Environments

摘要

访问文件

其它文件与链接

指纹

引用此