Domain Adaptation and Summary Distillation for Unsupervised Query Focused Summarization

Jiancheng Du; Yang Gao

doi:10.1109/TKDE.2023.3296441

Domain Adaptation and Summary Distillation for Unsupervised Query Focused Summarization

Jiancheng Du, Yang Gao^*

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

Text summarizing is the task of reducing a document's length while maintaining its essential information. In the age of information explosion, how to obtain the content that users needed from a large volume of information becomes particularly significant. Under such circumstances, query-focused abstractive summarization (qfs) becomes more dominant since it is able to focus on user needs while delivering fluent, concise, succinct paraphrased summaries. However, unlike generic summarization, which has achieved remarkable progress driven by a substantial amount of parallel data, the qfs struggles due to a deficiency of parallel corpus. Therefore, in this paper, we leverage a typical large generic summarization dataset to facilitate the pressing demands on unsupervised qfs. The large-scale query-free benchmark is automatically transformed into a query-focused dataset (Query-CNNDM) while preserving its informative summaries. We propose a simple yet effective unsupervised method, called Domain Adaptation and Summary Distillation method (DASD). In the model, to achieve the domain adaptation for unsupervised qfs, we design a query-aware gap sentence generation (q-GSG) strategy to equip the model with the capability of learning target textual knowledge and obtaining a good initialization at the target domain. As instance-specific regularization, we train a teacher model with the Query-CNNDM to generate pseudo-labels for summary distillation. Experimental results indicate that our DASD model achieves state-of-the-art performance on two benchmark datasets, Debatepedia and Wikiref, in a zero-shot setting and shows good generalization to the abstractive few-shot qfs.

源语言	英语
页（从-至）	1044-1055
页数	12
期刊	IEEE Transactions on Knowledge and Data Engineering
卷	36
期	3
DOI	https://doi.org/10.1109/TKDE.2023.3296441
出版状态	已出版 - 1 3月 2024

访问文件

10.1109/TKDE.2023.3296441

其它文件与链接

链接到 Scopus 的出版物

引用此

Du, J., & Gao, Y. (2024). Domain Adaptation and Summary Distillation for Unsupervised Query Focused Summarization. IEEE Transactions on Knowledge and Data Engineering, 36(3), 1044-1055. https://doi.org/10.1109/TKDE.2023.3296441

@article{16fe9ad41d0e427c9f0a067349f1da9e,

title = "Domain Adaptation and Summary Distillation for Unsupervised Query Focused Summarization",

abstract = "Text summarizing is the task of reducing a document's length while maintaining its essential information. In the age of information explosion, how to obtain the content that users needed from a large volume of information becomes particularly significant. Under such circumstances, query-focused abstractive summarization (qfs) becomes more dominant since it is able to focus on user needs while delivering fluent, concise, succinct paraphrased summaries. However, unlike generic summarization, which has achieved remarkable progress driven by a substantial amount of parallel data, the qfs struggles due to a deficiency of parallel corpus. Therefore, in this paper, we leverage a typical large generic summarization dataset to facilitate the pressing demands on unsupervised qfs. The large-scale query-free benchmark is automatically transformed into a query-focused dataset (Query-CNNDM) while preserving its informative summaries. We propose a simple yet effective unsupervised method, called Domain Adaptation and Summary Distillation method (DASD). In the model, to achieve the domain adaptation for unsupervised qfs, we design a query-aware gap sentence generation (q-GSG) strategy to equip the model with the capability of learning target textual knowledge and obtaining a good initialization at the target domain. As instance-specific regularization, we train a teacher model with the Query-CNNDM to generate pseudo-labels for summary distillation. Experimental results indicate that our DASD model achieves state-of-the-art performance on two benchmark datasets, Debatepedia and Wikiref, in a zero-shot setting and shows good generalization to the abstractive few-shot qfs.",

keywords = "Abstractive summarization, domain adaptation, query-focused summarization, summary distillation, unsupervised learning",

author = "Jiancheng Du and Yang Gao",

note = "Publisher Copyright: {\textcopyright} 1989-2012 IEEE.",

year = "2024",

month = mar,

day = "1",

doi = "10.1109/TKDE.2023.3296441",

language = "English",

volume = "36",

pages = "1044--1055",

journal = "IEEE Transactions on Knowledge and Data Engineering",

issn = "1041-4347",

publisher = "IEEE Computer Society",

number = "3",

}

TY - JOUR

T1 - Domain Adaptation and Summary Distillation for Unsupervised Query Focused Summarization

AU - Du, Jiancheng

AU - Gao, Yang

PY - 2024/3/1

Y1 - 2024/3/1

N2 - Text summarizing is the task of reducing a document's length while maintaining its essential information. In the age of information explosion, how to obtain the content that users needed from a large volume of information becomes particularly significant. Under such circumstances, query-focused abstractive summarization (qfs) becomes more dominant since it is able to focus on user needs while delivering fluent, concise, succinct paraphrased summaries. However, unlike generic summarization, which has achieved remarkable progress driven by a substantial amount of parallel data, the qfs struggles due to a deficiency of parallel corpus. Therefore, in this paper, we leverage a typical large generic summarization dataset to facilitate the pressing demands on unsupervised qfs. The large-scale query-free benchmark is automatically transformed into a query-focused dataset (Query-CNNDM) while preserving its informative summaries. We propose a simple yet effective unsupervised method, called Domain Adaptation and Summary Distillation method (DASD). In the model, to achieve the domain adaptation for unsupervised qfs, we design a query-aware gap sentence generation (q-GSG) strategy to equip the model with the capability of learning target textual knowledge and obtaining a good initialization at the target domain. As instance-specific regularization, we train a teacher model with the Query-CNNDM to generate pseudo-labels for summary distillation. Experimental results indicate that our DASD model achieves state-of-the-art performance on two benchmark datasets, Debatepedia and Wikiref, in a zero-shot setting and shows good generalization to the abstractive few-shot qfs.

AB - Text summarizing is the task of reducing a document's length while maintaining its essential information. In the age of information explosion, how to obtain the content that users needed from a large volume of information becomes particularly significant. Under such circumstances, query-focused abstractive summarization (qfs) becomes more dominant since it is able to focus on user needs while delivering fluent, concise, succinct paraphrased summaries. However, unlike generic summarization, which has achieved remarkable progress driven by a substantial amount of parallel data, the qfs struggles due to a deficiency of parallel corpus. Therefore, in this paper, we leverage a typical large generic summarization dataset to facilitate the pressing demands on unsupervised qfs. The large-scale query-free benchmark is automatically transformed into a query-focused dataset (Query-CNNDM) while preserving its informative summaries. We propose a simple yet effective unsupervised method, called Domain Adaptation and Summary Distillation method (DASD). In the model, to achieve the domain adaptation for unsupervised qfs, we design a query-aware gap sentence generation (q-GSG) strategy to equip the model with the capability of learning target textual knowledge and obtaining a good initialization at the target domain. As instance-specific regularization, we train a teacher model with the Query-CNNDM to generate pseudo-labels for summary distillation. Experimental results indicate that our DASD model achieves state-of-the-art performance on two benchmark datasets, Debatepedia and Wikiref, in a zero-shot setting and shows good generalization to the abstractive few-shot qfs.

KW - Abstractive summarization

KW - domain adaptation

KW - query-focused summarization

KW - summary distillation

KW - unsupervised learning

UR - http://www.scopus.com/inward/record.url?scp=85165297921&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2023.3296441

DO - 10.1109/TKDE.2023.3296441

M3 - Article

AN - SCOPUS:85165297921

SN - 1041-4347

VL - 36

SP - 1044

EP - 1055

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

IS - 3

ER -

Domain Adaptation and Summary Distillation for Unsupervised Query Focused Summarization

摘要

访问文件

其它文件与链接

指纹

引用此