R2G: Reasoning to ground in 3D scenes

Yixuan Li, Zan Wang, Wei Liang*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

We propose Reasoning to Ground (R2G), a neural symbolic model that grounds the target objects in 3D scenes in a reasoning manner. Unlike previous works that rely on end-to-end models for grounding, which often function as black boxes, our approach seeks to provide a more interpretable and reliable solution. R2Gexplicitly models the 3D scene using a semantic concept-based scene graph, recurrently simulates the attention transferring across object entities, and interpretably grounding the target objects with the highest attention score. Specifically, we embed multiple object properties within the graph nodes and spatial relations among entities within the edges through a predefined semantic vocabulary. To guide attention transfer, we employ learning or prompting-based approaches to interpret the referential utterance into reasoning instructions within the same semantic space. In each reasoning round, we either (1) merge current attention distribution with the similarity between instructions and embedded entity properties, or (2) shift the attention across the scene graph based on the similarity between instructions and embedded spatial relations. The experiments on Sr3D/Nr3D benchmarks show that R2G achieves a comparable result with the prior works while offering improved interpretability, breaking a new path for 3D grounding. The code and dataset for this work are available at:https://sites.google.com/view/reasoning-to-ground.

Original languageEnglish
Article number111728
JournalPattern Recognition
Volume168
DOIs
Publication statusPublished - Dec 2025

Keywords

  • 3D grounding
  • Neural-symbolic
  • Reasoning

Fingerprint

Dive into the research topics of 'R2G: Reasoning to ground in 3D scenes'. Together they form a unique fingerprint.

Cite this