M3Rec: Selective State Space Models with Mixture-of-Modality Experts for Multi-Modal Sequential Recommendation

Xu Guo, Tong Zhang*, Yufei Xue, Chenxu Wang, Fuyun Wang, Zhen Cui

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The rapid growth of multimedia-sharing platforms drives the development of recommender systems. While traditional ID-based methods for mining user behavior signals are well-studied, research into multimodal sequential recommendation remains nascent. Current approaches face three critical challenges: (1) inadequate modeling of user preferences across diverse modalities, (2) ineffective capture of user action sequence dependencies hinders representation learning of preferences, and (3) inefficiency in Transformer-based models due to the quadratic complexity of attention mechanisms. To address these issues, we propose M3Rec, a Mamba-based selective state space model incorporating Mixture-of-Modality experts for Multimodal sequential recommendation. M3Rec strengthens the modeling of user action sequence dependencies through shared Mamba blocks across modalities and employs modality experts to extract modality-specific user preferences. The shared Mamba blocks efficiently model long-term user preferences with fast inference and linear scalability through hardware-aware parallelism, enhancing ID-based sequence signals and filtering out non-action-dependent redundant information. This enables more accurate modeling of user preferences across heterogeneous data. Extensive experiments on three public datasets validate the model's effectiveness. The implementation is released at https://github.com/Xu107/M3Rec-main.

Original languageEnglish
Title of host publication2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings
EditorsBhaskar D Rao, Isabel Trancoso, Gaurav Sharma, Neelesh B. Mehta
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350368741
DOIs
Publication statusPublished - 2025
Externally publishedYes
Event2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Hyderabad, India
Duration: 6 Apr 202511 Apr 2025

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
Country/TerritoryIndia
CityHyderabad
Period6/04/2511/04/25

Keywords

  • Mamba
  • Multimedia
  • Sequential Recommendation

Fingerprint

Dive into the research topics of 'M3Rec: Selective State Space Models with Mixture-of-Modality Experts for Multi-Modal Sequential Recommendation'. Together they form a unique fingerprint.

Cite this