TY - JOUR
T1 - OpenObj
T2 - Open-Vocabulary Object-Level Neural Radiance Fields With Fine-Grained Understanding
AU - Deng, Yinan
AU - Wang, Jiahui
AU - Zhao, Jingyu
AU - Dou, Jianyu
AU - Yang, Yi
AU - Yue, Yufeng
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2025
Y1 - 2025
N2 - In recent years, there has been a surge of interest in open-vocabulary 3D scene reconstruction facilitated by visual language models (VLMs), which showcase remarkable capabilities in open-set retrieval tasks. Although the semantic ambiguity of existing point-wise feature maps is alleviated by open-vocabulary mask segmenters for object-level understanding, effectively retaining fine-grained features within objects simultaneously remains challenging. To address these challenges, we introduce OpenObj, an innovative approach to build open-vocabulary object-level Neural Radiance Fields (NeRF) with fine-grained understanding. In essence, OpenObj establishes a robust framework for efficient and watertight scene modeling and comprehension at the object level. Specifically, we obtain cross-frame consistent instance-level masks for supervision through our two-stage mask clustering module. Moreover, by incorporating part-level features into the object NeRF models, OpenObj not only captures object-level instances but also preserves an understanding of their internal granularity. The results on multiple datasets demonstrate that OpenObj achieves superior performance in zero-shot segmentation and retrieval tasks. Additionally, OpenObj supports real-world robotics tasks at several levels, including global movement and local manipulation.
AB - In recent years, there has been a surge of interest in open-vocabulary 3D scene reconstruction facilitated by visual language models (VLMs), which showcase remarkable capabilities in open-set retrieval tasks. Although the semantic ambiguity of existing point-wise feature maps is alleviated by open-vocabulary mask segmenters for object-level understanding, effectively retaining fine-grained features within objects simultaneously remains challenging. To address these challenges, we introduce OpenObj, an innovative approach to build open-vocabulary object-level Neural Radiance Fields (NeRF) with fine-grained understanding. In essence, OpenObj establishes a robust framework for efficient and watertight scene modeling and comprehension at the object level. Specifically, we obtain cross-frame consistent instance-level masks for supervision through our two-stage mask clustering module. Moreover, by incorporating part-level features into the object NeRF models, OpenObj not only captures object-level instances but also preserves an understanding of their internal granularity. The results on multiple datasets demonstrate that OpenObj achieves superior performance in zero-shot segmentation and retrieval tasks. Additionally, OpenObj supports real-world robotics tasks at several levels, including global movement and local manipulation.
KW - Implicit mapping
KW - object-level NeRF
KW - open-vocabulary
KW - representation
UR - http://www.scopus.com/inward/record.url?scp=86000430006&partnerID=8YFLogxK
U2 - 10.1109/LRA.2024.3511401
DO - 10.1109/LRA.2024.3511401
M3 - Article
AN - SCOPUS:86000430006
SN - 2377-3766
VL - 10
SP - 652
EP - 659
JO - IEEE Robotics and Automation Letters
JF - IEEE Robotics and Automation Letters
IS - 1
ER -