AutoGraph: Enabling Visual Context via Graph Alignment in Open Domain Multi-Modal Dialogue Generation

Deji Zhao, Donghong Han*, Ye Yuan, Bo Ning, Mengxiang Li, Zhongjiang He, Shuangyong Song

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Open-domain multi-modal dialogue system heavily relies on visual information to generate contextually relevant responses. The existing open-domain multi-modal dialog generation methods ignore the complementary relationship between multiple modalities, and are difficult to integrate with LLMs. To tackle these challenges, we introduce AutoGraph, an innovative method for constructing visual context graphs automatically. We aim to structure complex information and seamlessly integrate it with large language models (LLMs), aligning information from multiple modalities at both semantic and structural levels. Specifically, we fully connect the text graphs and scene graphs, and then trim unnecessary edges via LLMs to automatically construct a visual context graph. Next, we design several graph sampling grammar for the first time to convert graph structures into sequence which is suitable for LLMs. Finally, we propose a two-stage fine-tuning strategy to allow LLMs to understand graph sampling grammar and generate responses. We validate our proposed method on text-based LLMs, and visual-based LLMs, respectively. Experimental results show that our proposed method achieves state-of-the-art performance on multiple public datasets.

Original languageEnglish
Title of host publicationMM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia
PublisherAssociation for Computing Machinery, Inc
Pages2079-2088
Number of pages10
ISBN (Electronic)9798400706868
DOIs
Publication statusPublished - 28 Oct 2024
Event32nd ACM International Conference on Multimedia, MM 2024 - Melbourne, Australia
Duration: 28 Oct 20241 Nov 2024

Publication series

NameMM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia

Conference

Conference32nd ACM International Conference on Multimedia, MM 2024
Country/TerritoryAustralia
CityMelbourne
Period28/10/241/11/24

Keywords

  • dialogue generation
  • dialogue graph
  • multi-modal alignment

Fingerprint

Dive into the research topics of 'AutoGraph: Enabling Visual Context via Graph Alignment in Open Domain Multi-Modal Dialogue Generation'. Together they form a unique fingerprint.

Cite this