EEformer: Early Exiting for Transformer with Global-Local Exits and Progressive Fine-Tuning

  • Guanyu Xu
  • , Jiawei Hao
  • , Yong Luo
  • , Li Shen
  • , Han Hu
  • , Dan Zeng*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Recently, the efficient deployment and acceleration of transformer-based pre-trained models (TPMs) on resource-constrained edge devices for multimedia services have gained significant interest. Although early exiting is a feasible solution, it may lead to extra computational cost and substantial performance degradation compared to the original models. To tackle these issues, we propose a framework termed EEformer, which incorporates global-local heads (GLHs) into intermediate layers to construct the early exiting dynamic neural network (EDNN). The GLH can efficiently extract global and local information from hidden states produced by the backbone layer, thereby achieving a better performance-efficiency trade-off for the EDNN. Moreover, we propose a novel progressive fine-tuning strategy to steadily improve the efficiency of the EDNN while maintaining its performance comparable to the original mode through three fine-tuning stages. We conduct extensive experiments on image classification and natural language processing tasks, demonstrating the superiority of the proposed framework. In particular, the proposed framework achieves 1.87× speed-up while maintaining 99.0% performance on the CIFAR-100 dataset, and 3.05× speed-up while maintaining 98.5% performance on the SST-2 dataset.

Original languageEnglish
JournalIEEE Transactions on Multimedia
DOIs
Publication statusAccepted/In press - 2025
Externally publishedYes

Keywords

  • dynamic inference
  • Early exiting
  • edge intelligence
  • transformer

Fingerprint

Dive into the research topics of 'EEformer: Early Exiting for Transformer with Global-Local Exits and Progressive Fine-Tuning'. Together they form a unique fingerprint.

Cite this