Skip to main navigation Skip to search Skip to main content

LOBSTER: Bilateral global semantic enhancement for multimedia recommendation

  • Jinfeng Xu
  • , Zheyu Chen
  • , Wei Wang
  • , Xiping Hu
  • , Jiyi Liu
  • , Edith C.H. Ngai*
  • *Corresponding author for this work
  • The University of Hong Kong
  • Hong Kong Polytechnic University
  • Shenzhen MSU-BIT University
  • Beijing Institute of Technology
  • Sun Yat-Sen University

Research output: Contribution to journalArticlepeer-review

Abstract

Multimedia information floods the Internet, subtly influencing human society. Combining multimedia information to alleviate the data sparsity problem is a popular way within the rapid development of recommender systems. However, many studies reveal that multimodal information can introduce cross-modality noise in some cases. A feasible solution to alleviate cross-modality noises is to enhance the common information among modalities. Recent advanced works enhance modality common information between users (via user-user graphs) or items (via item-item graphs) using extra homogeneous graphs. However, these additional homogeneous graph structures will inevitably bring huge computational costs. To better extract common information among modalities while reducing computational costs, we propose a biLateral glOBal SemanTic Enhancement for multimedia Recommendation, which is called LOBSTER. Specifically, LOBSTER constructs two global semantic spaces for user and item representations, enhances global/common semantic features on both the user and item sides through additional learnable representations shared across multiple modalities. LOBSTER further incorporates a layer-refined Graph Convolutional Network (GCN) and a dynamic optimization to alleviate the over-smoothing problem and adjust attention levels for different modalities. Extensive experiments on three real-world datasets demonstrate that LOBSTER achieves competitive or superior performance compared to models incorporating homogeneous graphs, while providing an average 2.45× speedup and a 60.26 % reduction in memory usage. Our code is available at https://github.com/Jinfeng-Xu/LOBSTER.

Original languageEnglish
Article number103778
JournalInformation Fusion
Volume127
DOIs
Publication statusPublished - Mar 2026
Externally publishedYes

Keywords

  • Graph neural network
  • Multimodal fusion
  • Multimodal recommendation
  • Semantic enhancement

Fingerprint

Dive into the research topics of 'LOBSTER: Bilateral global semantic enhancement for multimedia recommendation'. Together they form a unique fingerprint.

Cite this