Generative Semantic Communication Via Textual Prompts: Latency Performance Tradeoffs

Mengmeng Ren, Li Qiao, Long Yang*, Zhen Gao, Jian Chen, Mahdi Boloursaz Mashhadi, Pei Xiao, Rahim Tafazolli, Mehdi Bennis

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

This paper develops an edge-device collaborative Generative Semantic Communications (Gen SemCom) framework leveraging pre-trained Multi-modal/Vision Language Models (M/VLMs) for ultra-low-rate semantic communication via textual prompts. The proposed framework optimizes the use of M/VLMs on the wireless edge/device to generate high-fidelity textual prompts through visual captioning/question answering, which are then transmitted over a wireless channel for SemCom. Specifically, we develop a multi-user Gen SemCom framework using pre-trained M/VLMs, and formulate a joint optimization problem of prompt generation offloading, communication and computation resource allocation to minimize the latency and maximize the resulting semantic quality. Due to the non-convex nature of the problem with highly coupled discrete and continuous variables, we decompose it as a two-level problem and propose a low-complexity swap/leaving/joining (SLJ)-based matching algorithm. Simulation results demonstrate significant performance improvements over the conventional semantic-unaware/ non-collaborative generation offloading benchmarks.

Original languageEnglish
JournalIEEE Transactions on Vehicular Technology
DOIs
Publication statusAccepted/In press - 2025

Keywords

  • collaborative edge-device generative AI
  • Pre-trained multi-modal/vision language models (M/VLMs)
  • semantic communication
  • zero/few-shot captioning

Cite this