TY - JOUR
T1 - OpenGraph
T2 - Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments
AU - Deng, Yinan
AU - Wang, Jiahui
AU - Zhao, Jingyu
AU - Tian, Xinyu
AU - Chen, Guangyan
AU - Yang, Yi
AU - Yue, Yufeng
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2024
Y1 - 2024
N2 - Environment representations endowed with sophisticated semantics are pivotal for facilitating seamless interaction between robots and humans, enabling them to effectively carry out various tasks. Open-vocabulary representation, powered by Visual-Language models (VLMs), possesses inherent advantages, including zero-shot learning and open-set cognition. However, existing open-vocabulary maps are primarily designed for small-scale environments, such as desktops or rooms, and are typically geared towards limited-Area tasks involving robotic indoor navigation or in-place manipulation. They face challenges in direct generalization to outdoor environments characterized by numerous objects and complex tasks, owing to limitations in both understanding level and map structure. In this work, we propose OpenGraph, a novel open-vocabulary hierarchical graph representation designed for large-scale outdoor environments. OpenGraph initially extracts instances and their captions from visual images, enhancing textual reasoning by encoding captions. Subsequently, it achieves 3D incremental object-centric mapping with feature embedding by projecting images onto LiDAR point clouds. Finally, the environment is segmented based on lane graph connectivity to construct a hierarchical representation. Validation results from SemanticKITTI and real-world scene demonstrate that OpenGraph achieves high segmentation and query accuracy.
AB - Environment representations endowed with sophisticated semantics are pivotal for facilitating seamless interaction between robots and humans, enabling them to effectively carry out various tasks. Open-vocabulary representation, powered by Visual-Language models (VLMs), possesses inherent advantages, including zero-shot learning and open-set cognition. However, existing open-vocabulary maps are primarily designed for small-scale environments, such as desktops or rooms, and are typically geared towards limited-Area tasks involving robotic indoor navigation or in-place manipulation. They face challenges in direct generalization to outdoor environments characterized by numerous objects and complex tasks, owing to limitations in both understanding level and map structure. In this work, we propose OpenGraph, a novel open-vocabulary hierarchical graph representation designed for large-scale outdoor environments. OpenGraph initially extracts instances and their captions from visual images, enhancing textual reasoning by encoding captions. Subsequently, it achieves 3D incremental object-centric mapping with feature embedding by projecting images onto LiDAR point clouds. Finally, the environment is segmented based on lane graph connectivity to construct a hierarchical representation. Validation results from SemanticKITTI and real-world scene demonstrate that OpenGraph achieves high segmentation and query accuracy.
KW - Mapping
KW - open-vocabulary
KW - scene graph
UR - http://www.scopus.com/inward/record.url?scp=85201780154&partnerID=8YFLogxK
U2 - 10.1109/LRA.2024.3445607
DO - 10.1109/LRA.2024.3445607
M3 - Article
AN - SCOPUS:85201780154
SN - 2377-3766
VL - 9
SP - 8402
EP - 8409
JO - IEEE Robotics and Automation Letters
JF - IEEE Robotics and Automation Letters
IS - 10
ER -