E2E Learning Massive MIMO for Multimodal Semantic Non-Orthogonal Transmission and Fusion

  • Minghui Wu
  • , Zhen Gao*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

This paper investigates multimodal semantic non-orthogonal transmission and fusion in hybrid analog-digital massive multiple-input multiple-output (MIMO). A Transformer-based cross-modal source-channel semantic-aware network (CSC-SA-Net) framework is conceived, where channel state information (CSI) reference signal (RS), feedback, analog-beamforming/combining, and baseband semantic processing are data-driven end-to-end (E2E) optimized at the base station (BS) and user equipments (UEs). CSC-SA-Net comprises five sub-networks: BS-side CSI-RS network (BS-CSIRS-Net), UE-side channel semantic-aware network (UE-CSANet), BS-CSANet, UE-side multimodal semantic fusion network (UE-MSFNet), and BS-MSFNet. Specifically, we firstly E2E train BS-CSIRS-Net, UE-CSANet, and BS-CSANet to jointly design CSIRS, feedback, analog-beamforming/combining with maximum physical-layer's spectral-efficiency. Meanwhile, we E2E train UE-MSFNet and BS-MSFNet for optimizing application-layer's source semantic downstream tasks. On these pre-trained models, we further integrate application-layer semantic processing with physical-layer tasks to E2E train five subnetworks. Extensive simulations show that the proposed CSC-SA-Net outperforms traditional separated designs, revealing the advantage of cross-modal channel-source semantic fusion.

Original languageEnglish
JournalIEEE Journal on Selected Areas in Communications
DOIs
Publication statusAccepted/In press - 2025
Externally publishedYes

Keywords

  • Massive MIMO
  • deep learning
  • multimodal fusion
  • non-orthogonal transmission
  • semantic communication

Fingerprint

Dive into the research topics of 'E2E Learning Massive MIMO for Multimodal Semantic Non-Orthogonal Transmission and Fusion'. Together they form a unique fingerprint.

Cite this