OpenVox: Real-time Instance-level Open-vocabulary Probabilistic Voxel Representation

  • Yinan Deng
  • , Bicheng Yao
  • , Yihang Tang
  • , Tianxing Zhou
  • , Yi Yang
  • , Yufeng Yue*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In recent years, vision-language models (VLMs) have advanced open-vocabulary mapping, enabling mobile robots to simultaneously achieve environmental reconstruction and high-level semantic understanding. While integrated object cognition helps mitigate semantic ambiguity in point-wise feature maps, efficiently obtaining rich semantic understanding and robust incremental reconstruction at the instance-level remains challenging. To address these challenges, we introduce OpenVox, a real-time incremental open-vocabulary probabilistic instance voxel representation. In the front-end, we design an efficient instance segmentation and comprehension pipeline that enhances language reasoning through encoding captions. In the back-end, we implement probabilistic instance voxels and formulate the cross-frame incremental fusion process into two subtasks: instance association and live map evolution, ensuring robustness to sensor and segmentation noise. Extensive evaluations across multiple datasets demonstrate that OpenVox achieves state-of-the-art performance in zero-shot instance segmentation, semantic segmentation, and open-vocabulary retrieval. The project page of OpenVox is available at https://open-vox.github.io/.

Original languageEnglish
Title of host publicationIROS 2025 - 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems, Conference Proceedings
EditorsChristian Laugier, Alessandro Renzaglia, Nikolay Atanasov, Stan Birchfield, Grzegorz Cielniak, Leonardo De Mattos, Laura Fiorini, Philippe Giguere, Kenji Hashimoto, Javier Ibanez-Guzman, Tetsushi Kamegawa, Jinoh Lee, Giuseppe Loianno, Kevin Luck, Hisataka Maruyama, Philippe Martinet, Hadi Moradi, Urbano Nunes, Julien Pettre, Alberto Pretto, Tommaso Ranzani, Arne Ronnau, Silvia Rossi, Elliott Rouse, Fabio Ruggiero, Olivier Simonin, Danwei Wang, Ming Yang, Eiichi Yoshida, Huijing Zhao
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1305-1311
Number of pages7
ISBN (Electronic)9798331543938
DOIs
Publication statusPublished - 2025
Externally publishedYes
Event2025 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2025 - Hangzhou, China
Duration: 19 Oct 202525 Oct 2025

Publication series

NameIEEE International Conference on Intelligent Robots and Systems
ISSN (Print)2153-0858
ISSN (Electronic)2153-0866

Conference

Conference2025 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2025
Country/TerritoryChina
CityHangzhou
Period19/10/2525/10/25

Fingerprint

Dive into the research topics of 'OpenVox: Real-time Instance-level Open-vocabulary Probabilistic Voxel Representation'. Together they form a unique fingerprint.

Cite this