Doge Tickets: Uncovering Domain-General Language Models by Playing Lottery Tickets

Yi Yang; Chen Zhang; Benyou Wang; Dawei Song

doi:10.1007/978-3-031-17120-8_12

Doge Tickets: Uncovering Domain-General Language Models by Playing Lottery Tickets

Yi Yang, Chen Zhang, Benyou Wang, Dawei Song^*

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

3 Citations (Scopus)

Abstract

Over-parameterized pre-trained language models (LMs), have shown an appealing expressive power due to their small learning bias. However, the huge learning capacity of LMs can also lead to large learning variance. In a pilot study, we find that, when faced with multiple domains, a critical portion of parameters behave unexpectedly in a domain-specific manner while others behave in a domain-general one. Motivated by this phenomenon, we for the first time posit that domain-general parameters can underpin a domain-general LM that can be derived from the original LM. To uncover the domain-general LM, we propose to identify domain-general parameters by playing lottery tickets (dubbed doge tickets). In order to intervene the lottery, we propose a domain-general score, which depicts how domain-invariant a parameter is by associating it with the variance. Comprehensive experiments are conducted on the Amazon, Mnli, and OntoNotes datasets. The results show that the doge tickets obtains an improved out-of-domain generalization in comparison with a range of competitive baselines. Analysis results further hint the existence of domain-general parameters and the performance consistency of doge tickets.

Original language	English
Title of host publication	Natural Language Processing and Chinese Computing - 11th CCF International Conference, NLPCC 2022, Proceedings
Editors	Wei Lu, Shujian Huang, Yu Hong, Xiabing Zhou
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	144-156
Number of pages	13
ISBN (Print)	9783031171192
DOIs	https://doi.org/10.1007/978-3-031-17120-8_12
Publication status	Published - 2022
Event	11th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2022 - Guilin, China Duration: 24 Sept 2022 → 25 Sept 2022

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	13551 LNAI
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	11th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2022
Country/Territory	China
City	Guilin
Period	24/09/22 → 25/09/22

Keywords

Domain generalization
Lottery tickets hypothesis
Pre-trained language model

Access to Document

10.1007/978-3-031-17120-8_12

Cite this

Yang, Y., Zhang, C., Wang, B., & Song, D. (2022). Doge Tickets: Uncovering Domain-General Language Models by Playing Lottery Tickets. In W. Lu, S. Huang, Y. Hong, & X. Zhou (Eds.), Natural Language Processing and Chinese Computing - 11th CCF International Conference, NLPCC 2022, Proceedings (pp. 144-156). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13551 LNAI). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-17120-8_12

Yang, Yi ; Zhang, Chen ; Wang, Benyou et al. / Doge Tickets : Uncovering Domain-General Language Models by Playing Lottery Tickets. Natural Language Processing and Chinese Computing - 11th CCF International Conference, NLPCC 2022, Proceedings. editor / Wei Lu ; Shujian Huang ; Yu Hong ; Xiabing Zhou. Springer Science and Business Media Deutschland GmbH, 2022. pp. 144-156 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{738ceef76f254dc7a34b6afd30ed2dd8,

title = "Doge Tickets: Uncovering Domain-General Language Models by Playing Lottery Tickets",

abstract = "Over-parameterized pre-trained language models (LMs), have shown an appealing expressive power due to their small learning bias. However, the huge learning capacity of LMs can also lead to large learning variance. In a pilot study, we find that, when faced with multiple domains, a critical portion of parameters behave unexpectedly in a domain-specific manner while others behave in a domain-general one. Motivated by this phenomenon, we for the first time posit that domain-general parameters can underpin a domain-general LM that can be derived from the original LM. To uncover the domain-general LM, we propose to identify domain-general parameters by playing lottery tickets (dubbed doge tickets). In order to intervene the lottery, we propose a domain-general score, which depicts how domain-invariant a parameter is by associating it with the variance. Comprehensive experiments are conducted on the Amazon, Mnli, and OntoNotes datasets. The results show that the doge tickets obtains an improved out-of-domain generalization in comparison with a range of competitive baselines. Analysis results further hint the existence of domain-general parameters and the performance consistency of doge tickets.",

keywords = "Domain generalization, Lottery tickets hypothesis, Pre-trained language model",

author = "Yi Yang and Chen Zhang and Benyou Wang and Dawei Song",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.; 11th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2022 ; Conference date: 24-09-2022 Through 25-09-2022",

year = "2022",

doi = "10.1007/978-3-031-17120-8_12",

language = "English",

isbn = "9783031171192",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "144--156",

editor = "Wei Lu and Shujian Huang and Yu Hong and Xiabing Zhou",

booktitle = "Natural Language Processing and Chinese Computing - 11th CCF International Conference, NLPCC 2022, Proceedings",

address = "Germany",

}

Yang, Y, Zhang, C, Wang, B & Song, D 2022, Doge Tickets: Uncovering Domain-General Language Models by Playing Lottery Tickets. in W Lu, S Huang, Y Hong & X Zhou (eds), Natural Language Processing and Chinese Computing - 11th CCF International Conference, NLPCC 2022, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13551 LNAI, Springer Science and Business Media Deutschland GmbH, pp. 144-156, 11th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2022, Guilin, China, 24/09/22. https://doi.org/10.1007/978-3-031-17120-8_12

Doge Tickets: Uncovering Domain-General Language Models by Playing Lottery Tickets. / Yang, Yi; Zhang, Chen; Wang, Benyou et al.
Natural Language Processing and Chinese Computing - 11th CCF International Conference, NLPCC 2022, Proceedings. ed. / Wei Lu; Shujian Huang; Yu Hong; Xiabing Zhou. Springer Science and Business Media Deutschland GmbH, 2022. p. 144-156 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13551 LNAI).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Doge Tickets

T2 - 11th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2022

AU - Yang, Yi

AU - Zhang, Chen

AU - Wang, Benyou

AU - Song, Dawei

PY - 2022

Y1 - 2022

N2 - Over-parameterized pre-trained language models (LMs), have shown an appealing expressive power due to their small learning bias. However, the huge learning capacity of LMs can also lead to large learning variance. In a pilot study, we find that, when faced with multiple domains, a critical portion of parameters behave unexpectedly in a domain-specific manner while others behave in a domain-general one. Motivated by this phenomenon, we for the first time posit that domain-general parameters can underpin a domain-general LM that can be derived from the original LM. To uncover the domain-general LM, we propose to identify domain-general parameters by playing lottery tickets (dubbed doge tickets). In order to intervene the lottery, we propose a domain-general score, which depicts how domain-invariant a parameter is by associating it with the variance. Comprehensive experiments are conducted on the Amazon, Mnli, and OntoNotes datasets. The results show that the doge tickets obtains an improved out-of-domain generalization in comparison with a range of competitive baselines. Analysis results further hint the existence of domain-general parameters and the performance consistency of doge tickets.

AB - Over-parameterized pre-trained language models (LMs), have shown an appealing expressive power due to their small learning bias. However, the huge learning capacity of LMs can also lead to large learning variance. In a pilot study, we find that, when faced with multiple domains, a critical portion of parameters behave unexpectedly in a domain-specific manner while others behave in a domain-general one. Motivated by this phenomenon, we for the first time posit that domain-general parameters can underpin a domain-general LM that can be derived from the original LM. To uncover the domain-general LM, we propose to identify domain-general parameters by playing lottery tickets (dubbed doge tickets). In order to intervene the lottery, we propose a domain-general score, which depicts how domain-invariant a parameter is by associating it with the variance. Comprehensive experiments are conducted on the Amazon, Mnli, and OntoNotes datasets. The results show that the doge tickets obtains an improved out-of-domain generalization in comparison with a range of competitive baselines. Analysis results further hint the existence of domain-general parameters and the performance consistency of doge tickets.

KW - Domain generalization

KW - Lottery tickets hypothesis

KW - Pre-trained language model

UR - http://www.scopus.com/inward/record.url?scp=85140484409&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-17120-8_12

DO - 10.1007/978-3-031-17120-8_12

M3 - Conference contribution

AN - SCOPUS:85140484409

SN - 9783031171192

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 144

EP - 156

BT - Natural Language Processing and Chinese Computing - 11th CCF International Conference, NLPCC 2022, Proceedings

A2 - Lu, Wei

A2 - Huang, Shujian

A2 - Hong, Yu

A2 - Zhou, Xiabing

PB - Springer Science and Business Media Deutschland GmbH

Y2 - 24 September 2022 through 25 September 2022

ER -

Yang Y, Zhang C, Wang B, Song D. Doge Tickets: Uncovering Domain-General Language Models by Playing Lottery Tickets. In Lu W, Huang S, Hong Y, Zhou X, editors, Natural Language Processing and Chinese Computing - 11th CCF International Conference, NLPCC 2022, Proceedings. Springer Science and Business Media Deutschland GmbH. 2022. p. 144-156. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-17120-8_12