Abstract
Beyond conventional recommendation systems that rely merely on user-item interaction data, multimodal recommendation systems additionally exploit the item multimodal data for boosting the recommendation performance. In this research line, late fusion-based approaches that first predict user ratings for each item modality independently and then merge these predictions for a final user rating have made significant advancements. Nevertheless, these methods still have the following two issues: (1) they utilize individual user embeddings to model user interest in different modalities, while overlooking the underlying relationship among modalities and significantly increasing the memory costs; and (2) they overlook the unreliable interest learned from certain modality, thus hindering the accurate final rating learning. To address these issues, we propose a prompt-based and weak-modality enhanced multimodal recommendation framework. It consists of two key components: (1) multimodal prompted user interest learning that adopts a single user embedding with different modality prompts to model different modality-specific user interests, and (2) weak-modality enhanced training that enhances the user interest learning in modalities where the predictions are less unreliable, ensuring well-balanced learning across all modalities. Extensive experiments on Amazon datasets have demonstrated the effectiveness of the proposed framework. The two components deployed onto existing methods help to make them more effective and efficient.
Original language | English |
---|---|
Article number | 101989 |
Journal | Information Fusion |
Volume | 101 |
DOIs | |
Publication status | Published - Jan 2024 |
Keywords
- Multimodal interest learning
- Multimodal recommendation
- Prompt learning