TY - JOUR
T1 - Cloud-Edge-End Collaborative Inference in Mobile Networks
T2 - Challenges and Solutions
AU - Zheng, Xixi
AU - Zhang, Weiting
AU - Hu, Chenfei
AU - Zhu, Liehuang
AU - Zhang, Chuan
N1 - Publisher Copyright:
© 1986-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Applying artificial intelligence (AI) to mobile networks can significantly enhance inference services, driving the rapid evolution towards smarter, more efficient networks. As next-generation mobile networks are deployed, they are expected to provide higher data rates and lower latency, which presents challenges for traditional large-model inference. These challenges necessitate that inference architectures and models quickly adapt to meet growing demands and handle increasingly complex tasks. To tackle these issues, the cloud-edge-end collaborative inference framework has emerged as a highly effective solution. This framework facilitates intelligent collaboration across the cloud, edge, and end devices, optimizing computational resources at all levels. Building on this approach, this article proposes a collaborative inference system specifically designed for mobile networks, addressing critical issues like computational bottlenecks and inference latency in traditional model deployments. The system integrates an innovative task offloading strategy and advanced model acceleration techniques, resulting in reduced inference latency. Experimental results show that the system not only significantly lowers latency but also maintains high-quality output. Finally, the article discusses future development trends in cloud-edge-end collaborative inference systems.
AB - Applying artificial intelligence (AI) to mobile networks can significantly enhance inference services, driving the rapid evolution towards smarter, more efficient networks. As next-generation mobile networks are deployed, they are expected to provide higher data rates and lower latency, which presents challenges for traditional large-model inference. These challenges necessitate that inference architectures and models quickly adapt to meet growing demands and handle increasingly complex tasks. To tackle these issues, the cloud-edge-end collaborative inference framework has emerged as a highly effective solution. This framework facilitates intelligent collaboration across the cloud, edge, and end devices, optimizing computational resources at all levels. Building on this approach, this article proposes a collaborative inference system specifically designed for mobile networks, addressing critical issues like computational bottlenecks and inference latency in traditional model deployments. The system integrates an innovative task offloading strategy and advanced model acceleration techniques, resulting in reduced inference latency. Experimental results show that the system not only significantly lowers latency but also maintains high-quality output. Finally, the article discusses future development trends in cloud-edge-end collaborative inference systems.
KW - artificial intelligence
KW - Cloud-edge-end collaboration
KW - collaborative inference
KW - large models
KW - mobile network
UR - http://www.scopus.com/inward/record.url?scp=85216689417&partnerID=8YFLogxK
U2 - 10.1109/MNET.2025.3533581
DO - 10.1109/MNET.2025.3533581
M3 - Article
AN - SCOPUS:85216689417
SN - 0890-8044
JO - IEEE Network
JF - IEEE Network
ER -