TY - GEN
T1 - OpenVox
T2 - 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2025
AU - Deng, Yinan
AU - Yao, Bicheng
AU - Tang, Yihang
AU - Zhou, Tianxing
AU - Yang, Yi
AU - Yue, Yufeng
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - In recent years, vision-language models (VLMs) have advanced open-vocabulary mapping, enabling mobile robots to simultaneously achieve environmental reconstruction and high-level semantic understanding. While integrated object cognition helps mitigate semantic ambiguity in point-wise feature maps, efficiently obtaining rich semantic understanding and robust incremental reconstruction at the instance-level remains challenging. To address these challenges, we introduce OpenVox, a real-time incremental open-vocabulary probabilistic instance voxel representation. In the front-end, we design an efficient instance segmentation and comprehension pipeline that enhances language reasoning through encoding captions. In the back-end, we implement probabilistic instance voxels and formulate the cross-frame incremental fusion process into two subtasks: instance association and live map evolution, ensuring robustness to sensor and segmentation noise. Extensive evaluations across multiple datasets demonstrate that OpenVox achieves state-of-the-art performance in zero-shot instance segmentation, semantic segmentation, and open-vocabulary retrieval. The project page of OpenVox is available at https://open-vox.github.io/.
AB - In recent years, vision-language models (VLMs) have advanced open-vocabulary mapping, enabling mobile robots to simultaneously achieve environmental reconstruction and high-level semantic understanding. While integrated object cognition helps mitigate semantic ambiguity in point-wise feature maps, efficiently obtaining rich semantic understanding and robust incremental reconstruction at the instance-level remains challenging. To address these challenges, we introduce OpenVox, a real-time incremental open-vocabulary probabilistic instance voxel representation. In the front-end, we design an efficient instance segmentation and comprehension pipeline that enhances language reasoning through encoding captions. In the back-end, we implement probabilistic instance voxels and formulate the cross-frame incremental fusion process into two subtasks: instance association and live map evolution, ensuring robustness to sensor and segmentation noise. Extensive evaluations across multiple datasets demonstrate that OpenVox achieves state-of-the-art performance in zero-shot instance segmentation, semantic segmentation, and open-vocabulary retrieval. The project page of OpenVox is available at https://open-vox.github.io/.
UR - https://www.scopus.com/pages/publications/105029968248
U2 - 10.1109/IROS60139.2025.11246455
DO - 10.1109/IROS60139.2025.11246455
M3 - Conference contribution
AN - SCOPUS:105029968248
T3 - IEEE International Conference on Intelligent Robots and Systems
SP - 1305
EP - 1311
BT - IROS 2025 - 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems, Conference Proceedings
A2 - Laugier, Christian
A2 - Renzaglia, Alessandro
A2 - Atanasov, Nikolay
A2 - Birchfield, Stan
A2 - Cielniak, Grzegorz
A2 - De Mattos, Leonardo
A2 - Fiorini, Laura
A2 - Giguere, Philippe
A2 - Hashimoto, Kenji
A2 - Ibanez-Guzman, Javier
A2 - Kamegawa, Tetsushi
A2 - Lee, Jinoh
A2 - Loianno, Giuseppe
A2 - Luck, Kevin
A2 - Maruyama, Hisataka
A2 - Martinet, Philippe
A2 - Moradi, Hadi
A2 - Nunes, Urbano
A2 - Pettre, Julien
A2 - Pretto, Alberto
A2 - Ranzani, Tommaso
A2 - Ronnau, Arne
A2 - Rossi, Silvia
A2 - Rouse, Elliott
A2 - Ruggiero, Fabio
A2 - Simonin, Olivier
A2 - Wang, Danwei
A2 - Yang, Ming
A2 - Yoshida, Eiichi
A2 - Zhao, Huijing
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 19 October 2025 through 25 October 2025
ER -