Prompt-based and weak-modality enhanced multimodal recommendation

Xue Dong, Xuemeng Song*, Minghui Tian, Linmei Hu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

Beyond conventional recommendation systems that rely merely on user-item interaction data, multimodal recommendation systems additionally exploit the item multimodal data for boosting the recommendation performance. In this research line, late fusion-based approaches that first predict user ratings for each item modality independently and then merge these predictions for a final user rating have made significant advancements. Nevertheless, these methods still have the following two issues: (1) they utilize individual user embeddings to model user interest in different modalities, while overlooking the underlying relationship among modalities and significantly increasing the memory costs; and (2) they overlook the unreliable interest learned from certain modality, thus hindering the accurate final rating learning. To address these issues, we propose a prompt-based and weak-modality enhanced multimodal recommendation framework. It consists of two key components: (1) multimodal prompted user interest learning that adopts a single user embedding with different modality prompts to model different modality-specific user interests, and (2) weak-modality enhanced training that enhances the user interest learning in modalities where the predictions are less unreliable, ensuring well-balanced learning across all modalities. Extensive experiments on Amazon datasets have demonstrated the effectiveness of the proposed framework. The two components deployed onto existing methods help to make them more effective and efficient.

Original languageEnglish
Article number101989
JournalInformation Fusion
Volume101
DOIs
Publication statusPublished - Jan 2024

Keywords

  • Multimodal interest learning
  • Multimodal recommendation
  • Prompt learning

Fingerprint

Dive into the research topics of 'Prompt-based and weak-modality enhanced multimodal recommendation'. Together they form a unique fingerprint.

Cite this