TY - JOUR
T1 - Generative Semantic Communication Via Textual Prompts
T2 - Latency Performance Tradeoffs
AU - Ren, Mengmeng
AU - Qiao, Li
AU - Yang, Long
AU - Gao, Zhen
AU - Chen, Jian
AU - Mashhadi, Mahdi Boloursaz
AU - Xiao, Pei
AU - Tafazolli, Rahim
AU - Bennis, Mehdi
N1 - Publisher Copyright:
© 1967-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - This paper develops an edge-device collaborative Generative Semantic Communications (Gen SemCom) framework leveraging pre-trained Multi-modal/Vision Language Models (M/VLMs) for ultra-low-rate semantic communication via textual prompts. The proposed framework optimizes the use of M/VLMs on the wireless edge/device to generate high-fidelity textual prompts through visual captioning/question answering, which are then transmitted over a wireless channel for SemCom. Specifically, we develop a multi-user Gen SemCom framework using pre-trained M/VLMs, and formulate a joint optimization problem of prompt generation offloading, communication and computation resource allocation to minimize the latency and maximize the resulting semantic quality. Due to the non-convex nature of the problem with highly coupled discrete and continuous variables, we decompose it as a two-level problem and propose a low-complexity swap/leaving/joining (SLJ)-based matching algorithm. Simulation results demonstrate significant performance improvements over the conventional semantic-unaware/ non-collaborative generation offloading benchmarks.
AB - This paper develops an edge-device collaborative Generative Semantic Communications (Gen SemCom) framework leveraging pre-trained Multi-modal/Vision Language Models (M/VLMs) for ultra-low-rate semantic communication via textual prompts. The proposed framework optimizes the use of M/VLMs on the wireless edge/device to generate high-fidelity textual prompts through visual captioning/question answering, which are then transmitted over a wireless channel for SemCom. Specifically, we develop a multi-user Gen SemCom framework using pre-trained M/VLMs, and formulate a joint optimization problem of prompt generation offloading, communication and computation resource allocation to minimize the latency and maximize the resulting semantic quality. Due to the non-convex nature of the problem with highly coupled discrete and continuous variables, we decompose it as a two-level problem and propose a low-complexity swap/leaving/joining (SLJ)-based matching algorithm. Simulation results demonstrate significant performance improvements over the conventional semantic-unaware/ non-collaborative generation offloading benchmarks.
KW - collaborative edge-device generative AI
KW - Pre-trained multi-modal/vision language models (M/VLMs)
KW - semantic communication
KW - zero/few-shot captioning
UR - http://www.scopus.com/inward/record.url?scp=105004052952&partnerID=8YFLogxK
U2 - 10.1109/TVT.2025.3566488
DO - 10.1109/TVT.2025.3566488
M3 - Article
AN - SCOPUS:105004052952
SN - 0018-9545
JO - IEEE Transactions on Vehicular Technology
JF - IEEE Transactions on Vehicular Technology
ER -