SynBERT: Chinese Synonym Discovery on Privacy-Constrain Medical Terms with Pre-trained BERT

Lingze Zeng, Chang Yao*, Meihui Zhang, Zhongle Xie

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Discovering medical synonym sets (i.e.,set of terms referring to a similar medical concept) is an important task in real-world, which can benefit many downstream applications such as medical information retrieval system and clinical decision support system. Recent synonym discovery methods take words as the input unit and leverage raw text as contextual information. However, they are ill-suited in Chinese participle as taking word as the input unit leads to serious Out-of-Vocabulary (OOV) problems. Additionally, it is hard to get large-scaled raw clinical texts in medical domain because of the privacy and security. Therefore, we define a new task discovering Chinese synonym from Privacy-Constrain terms (i.e., only terms without raw corpus) and propose a framework SynBERT to solve it. SynBERT consists of a binary classifier, inferring whether two term sets can form a synonym set, and two-phase clustering algorithm, applying classifier to cluster given terms into different synonym sets. In particular, SynBERT composes term’s embedding with character’s embedding to address the OOV problems. SynBERT introduces a BERT model pre-trained on public large-scaled corpus before to eliminate the need of raw context information. Ȧccording to our experiment, SynBERT outperforms better than baseline methods such as Kmeans, L2C, SynSetMine, etc.

源语言英语
主期刊名Web and Big Data - 6th International Joint Conference, APWeb-WAIM 2022, Proceedings
编辑Bohan Li, Chuanqi Tao, Lin Yue, Xuming Han, Diego Calvanese, Toshiyuki Amagasa
出版商Springer Science and Business Media Deutschland GmbH
331-344
页数14
ISBN(印刷版)9783031251573
DOI
出版状态已出版 - 2023
活动6th International Joint Conference on Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM), APWeb-WAIM 2022 - Nanjing, 中国
期限: 25 11月 202227 11月 2022

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
13421 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议6th International Joint Conference on Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM), APWeb-WAIM 2022
国家/地区中国
Nanjing
时期25/11/2227/11/22

指纹

探究 'SynBERT: Chinese Synonym Discovery on Privacy-Constrain Medical Terms with Pre-trained BERT' 的科研主题。它们共同构成独一无二的指纹。

引用此