TY - GEN
T1 - Improving Dialogue Summarization with Mixup Label Smoothing
AU - Cheng, Saihua
AU - Song, Dandan
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd 2023.
PY - 2023
Y1 - 2023
N2 - The abstractive dialogue summarization models trained with Maximum Likelihood Estimation suffer from the overconfident issue because the training objective encourages the model to assign all probability to the hard target. Although Label Smoothing is widely adopted to prevent the models from being overconfident, it assumes a pre-defined uniform distribution that is not adaptive and is not an ideal soft target. Therefore, we propose a Mixup Label Smoothing method in this paper, which exploits the general knowledge from the language model to construct a flexible soft target to present diverse candidates. We conceptualize the hypothesis distribution obtained from a pretrained language model as the context-smoothing target, which encodes much knowledge through the massive pretraining corpus and implies more possible candidate summaries. Extensive experiments on three popular dialogue summarization datasets demonstrate that our method effectively outperforms various strong baselines, as well as in low-resource settings.
AB - The abstractive dialogue summarization models trained with Maximum Likelihood Estimation suffer from the overconfident issue because the training objective encourages the model to assign all probability to the hard target. Although Label Smoothing is widely adopted to prevent the models from being overconfident, it assumes a pre-defined uniform distribution that is not adaptive and is not an ideal soft target. Therefore, we propose a Mixup Label Smoothing method in this paper, which exploits the general knowledge from the language model to construct a flexible soft target to present diverse candidates. We conceptualize the hypothesis distribution obtained from a pretrained language model as the context-smoothing target, which encodes much knowledge through the massive pretraining corpus and implies more possible candidate summaries. Extensive experiments on three popular dialogue summarization datasets demonstrate that our method effectively outperforms various strong baselines, as well as in low-resource settings.
KW - Dialogue summarization
KW - Label smoothing
KW - Pretrained language model
UR - http://www.scopus.com/inward/record.url?scp=85174707861&partnerID=8YFLogxK
U2 - 10.1007/978-981-99-6187-0_46
DO - 10.1007/978-981-99-6187-0_46
M3 - Conference contribution
AN - SCOPUS:85174707861
SN - 9789819961863
T3 - Lecture Notes in Electrical Engineering
SP - 460
EP - 475
BT - Proceedings of 2023 Chinese Intelligent Automation Conference
A2 - Deng, Zhidong
PB - Springer Science and Business Media Deutschland GmbH
T2 - Chinese Intelligent Automation Conference, CIAC 2023
Y2 - 2 October 2023 through 5 October 2023
ER -