TY - JOUR
T1 - E2E Learning Massive MIMO for Multimodal Semantic Non-Orthogonal Transmission and Fusion
AU - Wu, Minghui
AU - Gao, Zhen
N1 - Publisher Copyright:
© 1983-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - This paper investigates multimodal semantic non-orthogonal transmission and fusion in hybrid analog-digital massive multiple-input multiple-output (MIMO). A Transformer-based cross-modal source-channel semantic-aware network (CSC-SA-Net) framework is conceived, where channel state information (CSI) reference signal (RS), feedback, analog-beamforming/combining, and baseband semantic processing are data-driven end-to-end (E2E) optimized at the base station (BS) and user equipments (UEs). CSC-SA-Net comprises five sub-networks: BS-side CSI-RS network (BS-CSIRS-Net), UE-side channel semantic-aware network (UE-CSANet), BS-CSANet, UE-side multimodal semantic fusion network (UE-MSFNet), and BS-MSFNet. Specifically, we firstly E2E train BS-CSIRS-Net, UE-CSANet, and BS-CSANet to jointly design CSIRS, feedback, analog-beamforming/combining with maximum physical-layer's spectral-efficiency. Meanwhile, we E2E train UE-MSFNet and BS-MSFNet for optimizing application-layer's source semantic downstream tasks. On these pre-trained models, we further integrate application-layer semantic processing with physical-layer tasks to E2E train five subnetworks. Extensive simulations show that the proposed CSC-SA-Net outperforms traditional separated designs, revealing the advantage of cross-modal channel-source semantic fusion.
AB - This paper investigates multimodal semantic non-orthogonal transmission and fusion in hybrid analog-digital massive multiple-input multiple-output (MIMO). A Transformer-based cross-modal source-channel semantic-aware network (CSC-SA-Net) framework is conceived, where channel state information (CSI) reference signal (RS), feedback, analog-beamforming/combining, and baseband semantic processing are data-driven end-to-end (E2E) optimized at the base station (BS) and user equipments (UEs). CSC-SA-Net comprises five sub-networks: BS-side CSI-RS network (BS-CSIRS-Net), UE-side channel semantic-aware network (UE-CSANet), BS-CSANet, UE-side multimodal semantic fusion network (UE-MSFNet), and BS-MSFNet. Specifically, we firstly E2E train BS-CSIRS-Net, UE-CSANet, and BS-CSANet to jointly design CSIRS, feedback, analog-beamforming/combining with maximum physical-layer's spectral-efficiency. Meanwhile, we E2E train UE-MSFNet and BS-MSFNet for optimizing application-layer's source semantic downstream tasks. On these pre-trained models, we further integrate application-layer semantic processing with physical-layer tasks to E2E train five subnetworks. Extensive simulations show that the proposed CSC-SA-Net outperforms traditional separated designs, revealing the advantage of cross-modal channel-source semantic fusion.
KW - Massive MIMO
KW - deep learning
KW - multimodal fusion
KW - non-orthogonal transmission
KW - semantic communication
UR - https://www.scopus.com/pages/publications/105024851548
U2 - 10.1109/JSAC.2025.3643817
DO - 10.1109/JSAC.2025.3643817
M3 - Article
AN - SCOPUS:105024851548
SN - 0733-8716
JO - IEEE Journal on Selected Areas in Communications
JF - IEEE Journal on Selected Areas in Communications
ER -