TY - GEN
T1 - Doge Tickets
T2 - 11th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2022
AU - Yang, Yi
AU - Zhang, Chen
AU - Wang, Benyou
AU - Song, Dawei
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Over-parameterized pre-trained language models (LMs), have shown an appealing expressive power due to their small learning bias. However, the huge learning capacity of LMs can also lead to large learning variance. In a pilot study, we find that, when faced with multiple domains, a critical portion of parameters behave unexpectedly in a domain-specific manner while others behave in a domain-general one. Motivated by this phenomenon, we for the first time posit that domain-general parameters can underpin a domain-general LM that can be derived from the original LM. To uncover the domain-general LM, we propose to identify domain-general parameters by playing lottery tickets (dubbed doge tickets). In order to intervene the lottery, we propose a domain-general score, which depicts how domain-invariant a parameter is by associating it with the variance. Comprehensive experiments are conducted on the Amazon, Mnli, and OntoNotes datasets. The results show that the doge tickets obtains an improved out-of-domain generalization in comparison with a range of competitive baselines. Analysis results further hint the existence of domain-general parameters and the performance consistency of doge tickets.
AB - Over-parameterized pre-trained language models (LMs), have shown an appealing expressive power due to their small learning bias. However, the huge learning capacity of LMs can also lead to large learning variance. In a pilot study, we find that, when faced with multiple domains, a critical portion of parameters behave unexpectedly in a domain-specific manner while others behave in a domain-general one. Motivated by this phenomenon, we for the first time posit that domain-general parameters can underpin a domain-general LM that can be derived from the original LM. To uncover the domain-general LM, we propose to identify domain-general parameters by playing lottery tickets (dubbed doge tickets). In order to intervene the lottery, we propose a domain-general score, which depicts how domain-invariant a parameter is by associating it with the variance. Comprehensive experiments are conducted on the Amazon, Mnli, and OntoNotes datasets. The results show that the doge tickets obtains an improved out-of-domain generalization in comparison with a range of competitive baselines. Analysis results further hint the existence of domain-general parameters and the performance consistency of doge tickets.
KW - Domain generalization
KW - Lottery tickets hypothesis
KW - Pre-trained language model
UR - http://www.scopus.com/inward/record.url?scp=85140484409&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-17120-8_12
DO - 10.1007/978-3-031-17120-8_12
M3 - Conference contribution
AN - SCOPUS:85140484409
SN - 9783031171192
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 144
EP - 156
BT - Natural Language Processing and Chinese Computing - 11th CCF International Conference, NLPCC 2022, Proceedings
A2 - Lu, Wei
A2 - Huang, Shujian
A2 - Hong, Yu
A2 - Zhou, Xiabing
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 24 September 2022 through 25 September 2022
ER -