Extract Then Adjust: A Two-Stage Approach for Automatic Term Extraction

Jiangyu Wang; Chong Feng; Fang Liu; Xinyan Li; Xiaomei Wang

doi:10.1007/978-3-031-44696-2_19

Extract Then Adjust: A Two-Stage Approach for Automatic Term Extraction

Jiangyu Wang, Chong Feng^*, Fang Liu, Xinyan Li, Xiaomei Wang

^*此作品的通讯作者

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

1 引用（Scopus）

摘要

Automatic Term Extraction (ATE) is a fundamental natural language processing task that extracts relevant terms from domain-specific texts. Existing transformer-based approaches have indeed achieved impressive improvement. However, we observe that even state-of-the-art (SOTA) extractors suffer from boundary errors, which are distinguished by incorrect start or end positions of a candidate term. The minor differences between candidate terms and ground-truth leads to a noticeable performance decline. To alleviate the boundary errors, we propose a two-stage extraction approach. First, we design a span-based extractor to provide high-quality candidate terms. Subsequently, we adjust the boundaries of these candidate terms to enhance performance. Experiment results show that our approach effectively identifies and corrects boundary errors in candidate terms, thereby exceeding the performance of previous state-of-the-art models.

源语言	英语
主期刊名	Natural Language Processing and Chinese Computing - 12th National CCF Conference, NLPCC 2023, Proceedings
编辑	Fei Liu, Nan Duan, Qingting Xu, Yu Hong
出版商	Springer Science and Business Media Deutschland GmbH
页	236-247
页数	12
ISBN（印刷版）	9783031446955
DOI	https://doi.org/10.1007/978-3-031-44696-2_19
出版状态	已出版 - 2023
活动	12th National CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2023 - Foshan, 中国期限: 12 10月 2023 → 15 10月 2023

出版系列

姓名	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
卷	14303 LNAI
ISSN（印刷版）	0302-9743
ISSN（电子版）	1611-3349

会议

会议	12th National CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2023
国家/地区	中国
市	Foshan
时期	12/10/23 → 15/10/23

访问文件

10.1007/978-3-031-44696-2_19

其它文件与链接

链接到 Scopus 的出版物

引用此

Wang, J., Feng, C., Liu, F., Li, X., & Wang, X. (2023). Extract Then Adjust: A Two-Stage Approach for Automatic Term Extraction. 在 F. Liu, N. Duan, Q. Xu, & Y. Hong (编辑), Natural Language Processing and Chinese Computing - 12th National CCF Conference, NLPCC 2023, Proceedings (页码 236-247). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 14303 LNAI). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-44696-2_19

Wang, Jiangyu ; Feng, Chong ; Liu, Fang 等. / Extract Then Adjust : A Two-Stage Approach for Automatic Term Extraction. Natural Language Processing and Chinese Computing - 12th National CCF Conference, NLPCC 2023, Proceedings. 编辑 / Fei Liu ; Nan Duan ; Qingting Xu ; Yu Hong. Springer Science and Business Media Deutschland GmbH, 2023. 页码 236-247 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{c8b15d69ef1c467a8eb68f1033718ea0,

title = "Extract Then Adjust: A Two-Stage Approach for Automatic Term Extraction",

abstract = "Automatic Term Extraction (ATE) is a fundamental natural language processing task that extracts relevant terms from domain-specific texts. Existing transformer-based approaches have indeed achieved impressive improvement. However, we observe that even state-of-the-art (SOTA) extractors suffer from boundary errors, which are distinguished by incorrect start or end positions of a candidate term. The minor differences between candidate terms and ground-truth leads to a noticeable performance decline. To alleviate the boundary errors, we propose a two-stage extraction approach. First, we design a span-based extractor to provide high-quality candidate terms. Subsequently, we adjust the boundaries of these candidate terms to enhance performance. Experiment results show that our approach effectively identifies and corrects boundary errors in candidate terms, thereby exceeding the performance of previous state-of-the-art models.",

keywords = "automatic term extraction, boundary adjust, span extraction",

author = "Jiangyu Wang and Chong Feng and Fang Liu and Xinyan Li and Xiaomei Wang",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.; 12th National CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2023 ; Conference date: 12-10-2023 Through 15-10-2023",

year = "2023",

doi = "10.1007/978-3-031-44696-2_19",

language = "English",

isbn = "9783031446955",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "236--247",

editor = "Fei Liu and Nan Duan and Qingting Xu and Yu Hong",

booktitle = "Natural Language Processing and Chinese Computing - 12th National CCF Conference, NLPCC 2023, Proceedings",

address = "Germany",

}

Wang, J, Feng, C , Liu, F, Li, X & Wang, X 2023, Extract Then Adjust: A Two-Stage Approach for Automatic Term Extraction. 在 F Liu, N Duan, Q Xu & Y Hong (编辑), Natural Language Processing and Chinese Computing - 12th National CCF Conference, NLPCC 2023, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 卷 14303 LNAI, Springer Science and Business Media Deutschland GmbH, 页码 236-247, 12th National CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2023, Foshan, 中国, 12/10/23. https://doi.org/10.1007/978-3-031-44696-2_19

Extract Then Adjust: A Two-Stage Approach for Automatic Term Extraction. / Wang, Jiangyu; Feng, Chong ; Liu, Fang 等.
Natural Language Processing and Chinese Computing - 12th National CCF Conference, NLPCC 2023, Proceedings. 编辑 / Fei Liu; Nan Duan; Qingting Xu; Yu Hong. Springer Science and Business Media Deutschland GmbH, 2023. 页码 236-247 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 14303 LNAI).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Extract Then Adjust

T2 - 12th National CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2023

AU - Wang, Jiangyu

AU - Feng, Chong

AU - Liu, Fang

AU - Li, Xinyan

AU - Wang, Xiaomei

N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.

PY - 2023

Y1 - 2023

N2 - Automatic Term Extraction (ATE) is a fundamental natural language processing task that extracts relevant terms from domain-specific texts. Existing transformer-based approaches have indeed achieved impressive improvement. However, we observe that even state-of-the-art (SOTA) extractors suffer from boundary errors, which are distinguished by incorrect start or end positions of a candidate term. The minor differences between candidate terms and ground-truth leads to a noticeable performance decline. To alleviate the boundary errors, we propose a two-stage extraction approach. First, we design a span-based extractor to provide high-quality candidate terms. Subsequently, we adjust the boundaries of these candidate terms to enhance performance. Experiment results show that our approach effectively identifies and corrects boundary errors in candidate terms, thereby exceeding the performance of previous state-of-the-art models.

AB - Automatic Term Extraction (ATE) is a fundamental natural language processing task that extracts relevant terms from domain-specific texts. Existing transformer-based approaches have indeed achieved impressive improvement. However, we observe that even state-of-the-art (SOTA) extractors suffer from boundary errors, which are distinguished by incorrect start or end positions of a candidate term. The minor differences between candidate terms and ground-truth leads to a noticeable performance decline. To alleviate the boundary errors, we propose a two-stage extraction approach. First, we design a span-based extractor to provide high-quality candidate terms. Subsequently, we adjust the boundaries of these candidate terms to enhance performance. Experiment results show that our approach effectively identifies and corrects boundary errors in candidate terms, thereby exceeding the performance of previous state-of-the-art models.

KW - automatic term extraction

KW - boundary adjust

KW - span extraction

UR - http://www.scopus.com/inward/record.url?scp=85174679768&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-44696-2_19

DO - 10.1007/978-3-031-44696-2_19

M3 - Conference contribution

AN - SCOPUS:85174679768

SN - 9783031446955

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 236

EP - 247

BT - Natural Language Processing and Chinese Computing - 12th National CCF Conference, NLPCC 2023, Proceedings

A2 - Liu, Fei

A2 - Duan, Nan

A2 - Xu, Qingting

A2 - Hong, Yu

PB - Springer Science and Business Media Deutschland GmbH

Y2 - 12 October 2023 through 15 October 2023

ER -

Wang J, Feng C , Liu F, Li X, Wang X. Extract Then Adjust: A Two-Stage Approach for Automatic Term Extraction. 在 Liu F, Duan N, Xu Q, Hong Y, 编辑, Natural Language Processing and Chinese Computing - 12th National CCF Conference, NLPCC 2023, Proceedings. Springer Science and Business Media Deutschland GmbH. 2023. 页码 236-247. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-44696-2_19

Extract Then Adjust: A Two-Stage Approach for Automatic Term Extraction

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此