Abstract
In recent years, there has been a surge of interest in open-vocabulary 3D scene reconstruction facilitated by visual language models (VLMs), which showcase remarkable capabilities in open-set retrieval tasks. Although the semantic ambiguity of existing point-wise feature maps is alleviated by open-vocabulary mask segmenters for object-level understanding, effectively retaining fine-grained features within objects simultaneously remains challenging. To address these challenges, we introduce OpenObj, an innovative approach to build open-vocabulary object-level Neural Radiance Fields (NeRF) with fine-grained understanding. In essence, OpenObj establishes a robust framework for efficient and watertight scene modeling and comprehension at the object level. Specifically, we obtain cross-frame consistent instance-level masks for supervision through our two-stage mask clustering module. Moreover, by incorporating part-level features into the object NeRF models, OpenObj not only captures object-level instances but also preserves an understanding of their internal granularity. The results on multiple datasets demonstrate that OpenObj achieves superior performance in zero-shot segmentation and retrieval tasks. Additionally, OpenObj supports real-world robotics tasks at several levels, including global movement and local manipulation.
| Original language | English |
|---|---|
| Pages (from-to) | 652-659 |
| Number of pages | 8 |
| Journal | IEEE Robotics and Automation Letters |
| Volume | 10 |
| Issue number | 1 |
| DOIs | |
| Publication status | Published - 2025 |
Keywords
- Implicit mapping
- object-level NeRF
- open-vocabulary
- representation
Fingerprint
Dive into the research topics of 'OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields With Fine-Grained Understanding'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver