Sparse Teachers Can Be Dense with Knowledge

Yi Yang; Chen Zhang; Dawei Song

Sparse Teachers Can Be Dense with Knowledge

Yi Yang, Chen Zhang, Dawei Song^*

^*此作品的通讯作者

Beijing Institute of Technology

科研成果: 会议稿件 › 论文 › 同行评审

4 引用（Scopus）

摘要

Recent advances in distilling pretrained language models have discovered that, besides the expressiveness of knowledge, the student-friendliness should be taken into consideration to realize a truly knowledgeable teacher. Based on a pilot study, we find that over-parameterized teachers can produce expressive yet student-unfriendly knowledge and are thus limited in overall knowledgeableness. To remove the parameters that result in student-unfriendliness, we propose a sparse teacher trick under the guidance of an overall knowledgeable score for each teacher parameter. The knowledgeable score is essentially an interpolation of the expressiveness and student-friendliness scores. The aim is to ensure that the expressive parameters are retained while the student-unfriendly ones are removed. Extensive experiments on the GLUE benchmark show that the proposed sparse teachers can be dense with knowledge and lead to students with compelling performance in comparison with a series of competitive baselines.

源语言	英语
页	3904-3915
页数	12
出版状态	已出版 - 2022
活动	2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 - Abu Dhabi, 阿拉伯联合酋长国期限: 7 12月 2022 → 11 12月 2022

会议

会议	2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022
国家/地区	阿拉伯联合酋长国
市	Abu Dhabi
时期	7/12/22 → 11/12/22

其它文件与链接

链接到 Scopus 的出版物

引用此

Yang, Y., Zhang, C., & Song, D. (2022). Sparse Teachers Can Be Dense with Knowledge. 3904-3915. 论文发表于 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, 阿拉伯联合酋长国.

@conference{ef829b610a93470f959ef82b62face85,

title = "Sparse Teachers Can Be Dense with Knowledge",

abstract = "Recent advances in distilling pretrained language models have discovered that, besides the expressiveness of knowledge, the student-friendliness should be taken into consideration to realize a truly knowledgeable teacher. Based on a pilot study, we find that over-parameterized teachers can produce expressive yet student-unfriendly knowledge and are thus limited in overall knowledgeableness. To remove the parameters that result in student-unfriendliness, we propose a sparse teacher trick under the guidance of an overall knowledgeable score for each teacher parameter. The knowledgeable score is essentially an interpolation of the expressiveness and student-friendliness scores. The aim is to ensure that the expressive parameters are retained while the student-unfriendly ones are removed. Extensive experiments on the GLUE benchmark show that the proposed sparse teachers can be dense with knowledge and lead to students with compelling performance in comparison with a series of competitive baselines.",

author = "Yi Yang and Chen Zhang and Dawei Song",

note = "Publisher Copyright: {\textcopyright} 2022 Association for Computational Linguistics.; 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 ; Conference date: 07-12-2022 Through 11-12-2022",

year = "2022",

language = "English",

pages = "3904--3915",

}

TY - CONF

T1 - Sparse Teachers Can Be Dense with Knowledge

AU - Yang, Yi

AU - Zhang, Chen

AU - Song, Dawei

PY - 2022

Y1 - 2022

N2 - Recent advances in distilling pretrained language models have discovered that, besides the expressiveness of knowledge, the student-friendliness should be taken into consideration to realize a truly knowledgeable teacher. Based on a pilot study, we find that over-parameterized teachers can produce expressive yet student-unfriendly knowledge and are thus limited in overall knowledgeableness. To remove the parameters that result in student-unfriendliness, we propose a sparse teacher trick under the guidance of an overall knowledgeable score for each teacher parameter. The knowledgeable score is essentially an interpolation of the expressiveness and student-friendliness scores. The aim is to ensure that the expressive parameters are retained while the student-unfriendly ones are removed. Extensive experiments on the GLUE benchmark show that the proposed sparse teachers can be dense with knowledge and lead to students with compelling performance in comparison with a series of competitive baselines.

AB - Recent advances in distilling pretrained language models have discovered that, besides the expressiveness of knowledge, the student-friendliness should be taken into consideration to realize a truly knowledgeable teacher. Based on a pilot study, we find that over-parameterized teachers can produce expressive yet student-unfriendly knowledge and are thus limited in overall knowledgeableness. To remove the parameters that result in student-unfriendliness, we propose a sparse teacher trick under the guidance of an overall knowledgeable score for each teacher parameter. The knowledgeable score is essentially an interpolation of the expressiveness and student-friendliness scores. The aim is to ensure that the expressive parameters are retained while the student-unfriendly ones are removed. Extensive experiments on the GLUE benchmark show that the proposed sparse teachers can be dense with knowledge and lead to students with compelling performance in comparison with a series of competitive baselines.

UR - http://www.scopus.com/inward/record.url?scp=85149439930&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85149439930

SP - 3904

EP - 3915

T2 - 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022

Y2 - 7 December 2022 through 11 December 2022

ER -

Sparse Teachers Can Be Dense with Knowledge

摘要

会议

其它文件与链接

指纹

引用此