Multimodal Entity Linking With Dynamic Modality Selection and Interactive Prompt Learning

  • Yingyao Ma
  • , Yifan Xue
  • , Jiasong Wu*
  • , Lotfi Senhadji
  • , Huazhong Shu
  • , Jian Yang
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Recent advances in Multimodal Entity Linking leverage multimodal information to link target mentions to corresponding entities. However, existing methods uniformly adopt a “one-size-fits-all” approach, which overlooks the unique requirements of individual samples and fails to adequately balance modality-assisted disambiguation and modality-induced noise. Also, the commonly used separate large-scale visual and text pre-trained models for feature extraction do not address inter-modal heterogeneity and the high computational cost of fine-tuning. To resolve these two issues, we introduce a novel approach named Multimodal Entity Linking with Dynamic Modality Selection and Interactive Prompt Learning (DSMIP). First, we design three expert networks that utilize different subsets of modalities tailored to the task and train them individually. Specifically, for the multimodal expert network, we enhance entity and mention feature extraction by updating multimodal prompts and setting up a coupling function to realize the interaction of prompts between modalities. Subsequently, to select the best-suited expert network for each specific sample, we devise a Modality Selection Gating Network to gain the optimal one-hot selection vector by applying a specialized reparameterization technique and a two-stage training process. Experimental results on three public benchmark datasets demonstrate that the proposed DSMIP outperforms all state-of-the-art baselines.

Original languageEnglish
Pages (from-to)5467-5480
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Volume37
Issue number9
DOIs
Publication statusPublished - 2025
Externally publishedYes

Keywords

  • Multimodal entity linking
  • knowledge graph
  • large pre-trained model

Fingerprint

Dive into the research topics of 'Multimodal Entity Linking With Dynamic Modality Selection and Interactive Prompt Learning'. Together they form a unique fingerprint.

Cite this