TY - JOUR
T1 - OmniMap
T2 - A General Mapping Framework Integrating Optics, Geometry, and Semantics
AU - Deng, Yinan
AU - Yue, Yufeng
AU - Dou, Jianyu
AU - Zhao, Jingyu
AU - Wang, Jiahui
AU - Tang, Yujie
AU - Yang, Yi
AU - Fu, Mengyin
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Robotic systems demand accurate and comprehensive 3-D environment perception, requiring simultaneous capture of photorealistic appearance (optical), precise layout shape (geometric), and open-vocabulary scene understanding (semantic). Existing methods typically achieve only partial fulfillment of these requirements while exhibiting optical blurring, geometric irregularities, and semantic ambiguities. To address these challenges, we propose OmniMap. Overall, OmniMap represents the first online mapping framework that simultaneously captures optical, geometric, and semantic scene attributes while maintaining real-time performance and model compactness. At the architectural level, OmniMap employs a tightly coupled 3DGS–Voxel hybrid representation that combines fine-grained modeling with structural stability. At the implementation level, OmniMap identifies key challenges across different modalities and introduces several innovations: adaptive camera modeling for motion blur and exposure compensation, hybrid incremental representation with normal constraints, and probabilistic fusion for robust instance-level understanding. Extensive experiments show OmniMap’s superior performance in rendering fidelity, geometric accuracy, and zero-shot semantic segmentation compared to state-of-the-art methods across diverse scenes. The framework’s versatility is further evidenced through a variety of downstream applications, including multidomain scene Q&A, interactive editing, perception-guided manipulation, and map-assisted navigation.
AB - Robotic systems demand accurate and comprehensive 3-D environment perception, requiring simultaneous capture of photorealistic appearance (optical), precise layout shape (geometric), and open-vocabulary scene understanding (semantic). Existing methods typically achieve only partial fulfillment of these requirements while exhibiting optical blurring, geometric irregularities, and semantic ambiguities. To address these challenges, we propose OmniMap. Overall, OmniMap represents the first online mapping framework that simultaneously captures optical, geometric, and semantic scene attributes while maintaining real-time performance and model compactness. At the architectural level, OmniMap employs a tightly coupled 3DGS–Voxel hybrid representation that combines fine-grained modeling with structural stability. At the implementation level, OmniMap identifies key challenges across different modalities and introduces several innovations: adaptive camera modeling for motion blur and exposure compensation, hybrid incremental representation with normal constraints, and probabilistic fusion for robust instance-level understanding. Extensive experiments show OmniMap’s superior performance in rendering fidelity, geometric accuracy, and zero-shot semantic segmentation compared to state-of-the-art methods across diverse scenes. The framework’s versatility is further evidenced through a variety of downstream applications, including multidomain scene Q&A, interactive editing, perception-guided manipulation, and map-assisted navigation.
KW - Gaussian splatting
KW - RGB-D perception
KW - mapping
KW - open-vocabulary
UR - https://www.scopus.com/pages/publications/105019579548
U2 - 10.1109/TRO.2025.3621333
DO - 10.1109/TRO.2025.3621333
M3 - Article
AN - SCOPUS:105019579548
SN - 1552-3098
VL - 41
SP - 6549
EP - 6569
JO - IEEE Transactions on Robotics
JF - IEEE Transactions on Robotics
ER -