TY - JOUR
T1 - A Survey on Multimodal Recommender Systems
T2 - Recent Advances and Future Directions
AU - Xu, Jinfeng
AU - Chen, Zheyu
AU - Yang, Shuo
AU - Li, Jinze
AU - Wang, Wei
AU - Hu, Xiping
AU - Hoi, Steven
AU - Ngai, Edith
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2026
Y1 - 2026
N2 - The exponential growth of online information has made it increasingly difficult for users to identify valuable and relevant content. Recommender systems have emerged as a critical solution to this challenge by tailoring content to individual preferences. With the proliferation of diverse multimedia services, human interaction with the digital world has become inherently multimodal. Consequently, recommender systems capable of comprehending and interpreting multimodal information can more effectively align with individual preferences. With its recent surge in research attention, the field of multimodal recommender systems (MRS) still lacks a comprehensive technical survey. Existing surveys suffer from two critical limitations: 1) Insufficient technical depth: Prior works predominantly focus on categorizing and discussing general structure, neglecting rigorous technical analysis of methodologies and architectures. 2) Absence of cutting-edge works: Due to the rapid evolution of AI technologies, current surveys fail to discuss the latest works that adopt the most advanced techniques. To bridge these gaps, this survey conducts a systematic and technical review of advanced MRS works from MRS's inception to the present. We organize existing works into coherent taxonomies based on their structure and provide in-depth analyses of methodological innovations at each component, including Feature Extraction, Encoder, Multimodal Fusion, and Loss Function. Moreover, we further discuss potential future directions for developing and enhancing MRS. This survey serves as technical guidance for researchers and practitioners, offering insights into the developments, techniques, and future directions of MRS. Notably.
AB - The exponential growth of online information has made it increasingly difficult for users to identify valuable and relevant content. Recommender systems have emerged as a critical solution to this challenge by tailoring content to individual preferences. With the proliferation of diverse multimedia services, human interaction with the digital world has become inherently multimodal. Consequently, recommender systems capable of comprehending and interpreting multimodal information can more effectively align with individual preferences. With its recent surge in research attention, the field of multimodal recommender systems (MRS) still lacks a comprehensive technical survey. Existing surveys suffer from two critical limitations: 1) Insufficient technical depth: Prior works predominantly focus on categorizing and discussing general structure, neglecting rigorous technical analysis of methodologies and architectures. 2) Absence of cutting-edge works: Due to the rapid evolution of AI technologies, current surveys fail to discuss the latest works that adopt the most advanced techniques. To bridge these gaps, this survey conducts a systematic and technical review of advanced MRS works from MRS's inception to the present. We organize existing works into coherent taxonomies based on their structure and provide in-depth analyses of methodological innovations at each component, including Feature Extraction, Encoder, Multimodal Fusion, and Loss Function. Moreover, we further discuss potential future directions for developing and enhancing MRS. This survey serves as technical guidance for researchers and practitioners, offering insights into the developments, techniques, and future directions of MRS. Notably.
KW - Data mining
KW - Information systems
KW - Multimedia information systems
KW - Multimodal recommender systems
UR - https://www.scopus.com/pages/publications/105031732972
U2 - 10.1109/TMM.2026.3668620
DO - 10.1109/TMM.2026.3668620
M3 - Review article
AN - SCOPUS:105031732972
SN - 1520-9210
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -