Abstract
Recently, the efficient deployment and acceleration of transformer-based pre-trained models (TPMs) on resource-constrained edge devices for multimedia services have gained significant interest. Although early exiting is a feasible solution, it may lead to extra computational cost and substantial performance degradation compared to the original models. To tackle these issues, we propose a framework termed EEformer, which incorporates global-local heads (GLHs) into intermediate layers to construct the early exiting dynamic neural network (EDNN). The GLH can efficiently extract global and local information from hidden states produced by the backbone layer, thereby achieving a better performance-efficiency trade-off for the EDNN. Moreover, we propose a novel progressive fine-tuning strategy to steadily improve the efficiency of the EDNN while maintaining its performance comparable to the original mode through three fine-tuning stages. We conduct extensive experiments on image classification and natural language processing tasks, demonstrating the superiority of the proposed framework. In particular, the proposed framework achieves 1.87× speed-up while maintaining 99.0% performance on the CIFAR-100 dataset, and 3.05× speed-up while maintaining 98.5% performance on the SST-2 dataset.
| Original language | English |
|---|---|
| Journal | IEEE Transactions on Multimedia |
| DOIs | |
| Publication status | Accepted/In press - 2025 |
| Externally published | Yes |
Keywords
- dynamic inference
- Early exiting
- edge intelligence
- transformer