摘要
In recent years, there has been a surge of interest in open-vocabulary 3D scene reconstruction facilitated by visual language models (VLMs), which showcase remarkable capabilities in open-set retrieval tasks. Although the semantic ambiguity of existing point-wise feature maps is alleviated by open-vocabulary mask segmenters for object-level understanding, effectively retaining fine-grained features within objects simultaneously remains challenging. To address these challenges, we introduce OpenObj, an innovative approach to build open-vocabulary object-level Neural Radiance Fields (NeRF) with fine-grained understanding. In essence, OpenObj establishes a robust framework for efficient and watertight scene modeling and comprehension at the object level. Specifically, we obtain cross-frame consistent instance-level masks for supervision through our two-stage mask clustering module. Moreover, by incorporating part-level features into the object NeRF models, OpenObj not only captures object-level instances but also preserves an understanding of their internal granularity. The results on multiple datasets demonstrate that OpenObj achieves superior performance in zero-shot segmentation and retrieval tasks. Additionally, OpenObj supports real-world robotics tasks at several levels, including global movement and local manipulation.
| 源语言 | 英语 |
|---|---|
| 页(从-至) | 652-659 |
| 页数 | 8 |
| 期刊 | IEEE Robotics and Automation Letters |
| 卷 | 10 |
| 期 | 1 |
| DOI | |
| 出版状态 | 已出版 - 2025 |
指纹
探究 'OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields With Fine-Grained Understanding' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver