Minimal Distillation Schedule for Extreme Language Model Compression

Chen Zhang, Yang Yang, Qifan Wang, Jiahao Liu, Jingang Wang, Wei Wu, Dawei Song*

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

1 引用 (Scopus)

摘要

Recent studies have revealed that language model distillation can become less effective when there is a significant capacity gap between the teacher and the student models. In order to bridge the gap, teacher assistant-based distillation has been introduced, in which the selection of the teacher assistant plays a crucial role in transferring knowledge from the teacher to the student. However, existing approaches for teacher assistant-based distillation require numerous trials to find the optimal teacher assistant. In this paper, we propose a novel approach called Minimal Distillation Schedule (MINIDISC), which enables the scheduling of an optimal teacher assistant in just one trial for extreme model compression (e.g, to 5% scale). In particular, we empirically show that the performance of the student is positively correlated with the scale-performance tradeoff of the teacher assistant. We then introduce a new λ-tradeoff metric that quantifies the optimality of the teacher assistant without the need for trial distillation to the student. By employing a sandwich framework, MINIDISC can select the optimal teacher assistant with the best λtradeoff. We extensively evaluate MINIDISC through a series of experiments on the GLUE benchmark. The results demonstrate that our approach achieved an improved efficiency compared to various state-of-the-art baselines. Furthermore, we showcase the scalability of MINIDISC by applying it to a language model with billions of parameters.

源语言英语
主期刊名EACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2024
编辑Yvette Graham, Matthew Purver, Matthew Purver
出版商Association for Computational Linguistics (ACL)
1378-1394
页数17
ISBN(电子版)9798891760936
出版状态已出版 - 2024
活动18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024 - Findings of EACL 2024 - St. Julian's, 马耳他
期限: 17 3月 202422 3月 2024

出版系列

姓名EACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2024

会议

会议18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024 - Findings of EACL 2024
国家/地区马耳他
St. Julian's
时期17/03/2422/03/24

指纹

探究 'Minimal Distillation Schedule for Extreme Language Model Compression' 的科研主题。它们共同构成独一无二的指纹。

引用此