Parallel Vertex Diffusion for Unified Visual Grounding

Zesen Cheng, Kehan Li, Peng Jin, Siheng Li, Xiangyang Ji, Li Yuan, Chang Liu*, Jie Chen*

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

12 Citations (Scopus)

Abstract

Unified visual grounding (UVG) capitalizes on a wealth of task-related knowledge across various grounding tasks via one-shot training, which curtails retraining costs and task-specific architecture design efforts. Vertex generation-based UVG methods achieve this versatility by unified modeling object box and contour prediction and provide a text-powered interface to vast related multi-modal tasks, e.g., visual question answering, and captioning. However, these methods typically generate vertexes sequentially through autoregression, which is prone to be trapped in error accumulation and heavy computation, especially for high-dimension sequence generation in complex scenarios. In this paper, we develop Parallel Vertex Diffusion (PVD) based on the parallelizability of diffusion models to accurately and efficiently generate vertexes in a parallel and scalable manner. Since the coordinates fluctuate greatly, it typically encounters slow convergence when training diffusion models without geometry constraints. Therefore, we consummate our PVD by two critical components, i.e., center anchor mechanism and angle summation loss, which serve to normalize coordinates and adopt a differentiable geometry descriptor from the point-in-polygon problem of computational geometry to constrain the overall difference of prediction and label vertexes. These innovative designs empower our PVD to demonstrate its superiority with state-of-the-art performance across various grounding tasks.

Original languageEnglish
Pages (from-to)1326-1334
Number of pages9
JournalProceedings of the AAAI Conference on Artificial Intelligence
Volume38
Issue number2
DOIs
Publication statusPublished - 25 Mar 2024
Externally publishedYes
Event38th AAAI Conference on Artificial Intelligence, AAAI 2024 - Vancouver, Canada
Duration: 20 Feb 202427 Feb 2024

Fingerprint

Dive into the research topics of 'Parallel Vertex Diffusion for Unified Visual Grounding'. Together they form a unique fingerprint.

Cite this