Prompt-based and weak-modality enhanced multimodal recommendation

Xue Dong; Xuemeng Song; Minghui Tian; Linmei Hu

doi:10.1016/j.inffus.2023.101989

Prompt-based and weak-modality enhanced multimodal recommendation

Xue Dong, Xuemeng Song^*, Minghui Tian, Linmei Hu

^*Corresponding author for this work

School of Computer Science and Technology

Shandong University

Research output: Contribution to journal › Article › peer-review

8 Citations (Scopus)

Abstract

Beyond conventional recommendation systems that rely merely on user-item interaction data, multimodal recommendation systems additionally exploit the item multimodal data for boosting the recommendation performance. In this research line, late fusion-based approaches that first predict user ratings for each item modality independently and then merge these predictions for a final user rating have made significant advancements. Nevertheless, these methods still have the following two issues: (1) they utilize individual user embeddings to model user interest in different modalities, while overlooking the underlying relationship among modalities and significantly increasing the memory costs; and (2) they overlook the unreliable interest learned from certain modality, thus hindering the accurate final rating learning. To address these issues, we propose a prompt-based and weak-modality enhanced multimodal recommendation framework. It consists of two key components: (1) multimodal prompted user interest learning that adopts a single user embedding with different modality prompts to model different modality-specific user interests, and (2) weak-modality enhanced training that enhances the user interest learning in modalities where the predictions are less unreliable, ensuring well-balanced learning across all modalities. Extensive experiments on Amazon datasets have demonstrated the effectiveness of the proposed framework. The two components deployed onto existing methods help to make them more effective and efficient.

Original language	English
Article number	101989
Journal	Information Fusion
Volume	101
DOIs	https://doi.org/10.1016/j.inffus.2023.101989
Publication status	Published - Jan 2024

Keywords

Multimodal interest learning
Multimodal recommendation
Prompt learning

Access to Document

10.1016/j.inffus.2023.101989

Cite this

Dong, X., Song, X., Tian, M., & Hu, L. (2024). Prompt-based and weak-modality enhanced multimodal recommendation. Information Fusion, 101, Article 101989. https://doi.org/10.1016/j.inffus.2023.101989

@article{02a4fe93c57f427387fc5f932f40b652,

title = "Prompt-based and weak-modality enhanced multimodal recommendation",

abstract = "Beyond conventional recommendation systems that rely merely on user-item interaction data, multimodal recommendation systems additionally exploit the item multimodal data for boosting the recommendation performance. In this research line, late fusion-based approaches that first predict user ratings for each item modality independently and then merge these predictions for a final user rating have made significant advancements. Nevertheless, these methods still have the following two issues: (1) they utilize individual user embeddings to model user interest in different modalities, while overlooking the underlying relationship among modalities and significantly increasing the memory costs; and (2) they overlook the unreliable interest learned from certain modality, thus hindering the accurate final rating learning. To address these issues, we propose a prompt-based and weak-modality enhanced multimodal recommendation framework. It consists of two key components: (1) multimodal prompted user interest learning that adopts a single user embedding with different modality prompts to model different modality-specific user interests, and (2) weak-modality enhanced training that enhances the user interest learning in modalities where the predictions are less unreliable, ensuring well-balanced learning across all modalities. Extensive experiments on Amazon datasets have demonstrated the effectiveness of the proposed framework. The two components deployed onto existing methods help to make them more effective and efficient.",

keywords = "Multimodal interest learning, Multimodal recommendation, Prompt learning",

author = "Xue Dong and Xuemeng Song and Minghui Tian and Linmei Hu",

note = "Publisher Copyright: {\textcopyright} 2023 Elsevier B.V.",

year = "2024",

month = jan,

doi = "10.1016/j.inffus.2023.101989",

language = "English",

volume = "101",

journal = "Information Fusion",

issn = "1566-2535",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Prompt-based and weak-modality enhanced multimodal recommendation

AU - Dong, Xue

AU - Song, Xuemeng

AU - Tian, Minghui

AU - Hu, Linmei

PY - 2024/1

Y1 - 2024/1

N2 - Beyond conventional recommendation systems that rely merely on user-item interaction data, multimodal recommendation systems additionally exploit the item multimodal data for boosting the recommendation performance. In this research line, late fusion-based approaches that first predict user ratings for each item modality independently and then merge these predictions for a final user rating have made significant advancements. Nevertheless, these methods still have the following two issues: (1) they utilize individual user embeddings to model user interest in different modalities, while overlooking the underlying relationship among modalities and significantly increasing the memory costs; and (2) they overlook the unreliable interest learned from certain modality, thus hindering the accurate final rating learning. To address these issues, we propose a prompt-based and weak-modality enhanced multimodal recommendation framework. It consists of two key components: (1) multimodal prompted user interest learning that adopts a single user embedding with different modality prompts to model different modality-specific user interests, and (2) weak-modality enhanced training that enhances the user interest learning in modalities where the predictions are less unreliable, ensuring well-balanced learning across all modalities. Extensive experiments on Amazon datasets have demonstrated the effectiveness of the proposed framework. The two components deployed onto existing methods help to make them more effective and efficient.

AB - Beyond conventional recommendation systems that rely merely on user-item interaction data, multimodal recommendation systems additionally exploit the item multimodal data for boosting the recommendation performance. In this research line, late fusion-based approaches that first predict user ratings for each item modality independently and then merge these predictions for a final user rating have made significant advancements. Nevertheless, these methods still have the following two issues: (1) they utilize individual user embeddings to model user interest in different modalities, while overlooking the underlying relationship among modalities and significantly increasing the memory costs; and (2) they overlook the unreliable interest learned from certain modality, thus hindering the accurate final rating learning. To address these issues, we propose a prompt-based and weak-modality enhanced multimodal recommendation framework. It consists of two key components: (1) multimodal prompted user interest learning that adopts a single user embedding with different modality prompts to model different modality-specific user interests, and (2) weak-modality enhanced training that enhances the user interest learning in modalities where the predictions are less unreliable, ensuring well-balanced learning across all modalities. Extensive experiments on Amazon datasets have demonstrated the effectiveness of the proposed framework. The two components deployed onto existing methods help to make them more effective and efficient.

KW - Multimodal interest learning

KW - Multimodal recommendation

KW - Prompt learning

UR - http://www.scopus.com/inward/record.url?scp=85170419101&partnerID=8YFLogxK

U2 - 10.1016/j.inffus.2023.101989

DO - 10.1016/j.inffus.2023.101989

M3 - Article

AN - SCOPUS:85170419101

SN - 1566-2535

VL - 101

JO - Information Fusion

JF - Information Fusion

M1 - 101989

ER -

Prompt-based and weak-modality enhanced multimodal recommendation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this