TY - GEN
T1 - Nested Named Entity Recognition in Chinese Electronic Medical Records
AU - Yang, Maolin
AU - Lu, Zeran
AU - Lin, Yucong
AU - Song, Hong
AU - Yang, Jian
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Nested named entity recognition (NER) is crucial in processing Chinese electronic medical records (EMRs). Recently, the BERT-based model using CNN and a multi-head Biaffine decoder has shown promising results in nested NER on news datasets. However, this model faces difficulties in dealing with the complex and unevenly distributed entities in Chinese EMRs, resulting in prediction errors. This paper proposes an MC-BERT-CGC model based on MC-BERT semantic features comprising Context-Gated Convolution and multi-head Biaffine decoder. Our model initially incorporates Chinese medical language knowledge by leveraging MC-BERT to represent medical descriptions as sentence vectors. We then use Context-Gated Convolution to accurately define the boundaries of nested entities by learning overlapping relationships between different entities. Finally, we use Focal Loss to classify difficult-to-distinguish entities. Experimental results tested on our Chinese EMRs and the CMeEE-V2 dataset show that our model performs better than existing baseline models in Chinese medical NER tasks. The impacts of this study on the life of patients are significant, as more accurate and detailed medical information can be extracted from EMRs, potentially leading to improved diagnoses, personalized treatment recommendations, and proactive identification of health risks. Our code is available at https://github.com/ymlmorning/MC-BERT-CGC.
AB - Nested named entity recognition (NER) is crucial in processing Chinese electronic medical records (EMRs). Recently, the BERT-based model using CNN and a multi-head Biaffine decoder has shown promising results in nested NER on news datasets. However, this model faces difficulties in dealing with the complex and unevenly distributed entities in Chinese EMRs, resulting in prediction errors. This paper proposes an MC-BERT-CGC model based on MC-BERT semantic features comprising Context-Gated Convolution and multi-head Biaffine decoder. Our model initially incorporates Chinese medical language knowledge by leveraging MC-BERT to represent medical descriptions as sentence vectors. We then use Context-Gated Convolution to accurately define the boundaries of nested entities by learning overlapping relationships between different entities. Finally, we use Focal Loss to classify difficult-to-distinguish entities. Experimental results tested on our Chinese EMRs and the CMeEE-V2 dataset show that our model performs better than existing baseline models in Chinese medical NER tasks. The impacts of this study on the life of patients are significant, as more accurate and detailed medical information can be extracted from EMRs, potentially leading to improved diagnoses, personalized treatment recommendations, and proactive identification of health risks. Our code is available at https://github.com/ymlmorning/MC-BERT-CGC.
KW - Chinese electronic medical records
KW - Context-Gated convolution
KW - Focal Loss
KW - MC-BERT
KW - Nested named entity recognition
UR - http://www.scopus.com/inward/record.url?scp=105005932156&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-90714-2_5
DO - 10.1007/978-3-031-90714-2_5
M3 - Conference contribution
AN - SCOPUS:105005932156
SN - 9783031907135
T3 - Lecture Notes in Computer Science
SP - 58
EP - 69
BT - Computational Intelligence Methods for Bioinformatics and Biostatistics - 18th International Meeting, CIBB 2023, Revised Selected Papers
A2 - Vettoretti, Martina
A2 - Tavazzi, Erica
A2 - Longato, Enrico
A2 - Baruzzo, Giacomo
A2 - Bellato, Massimo
PB - Springer Science and Business Media Deutschland GmbH
T2 - 18th International Conference on Computational Intelligence Methods for Bioinformatics and Biostatistics, CIBB 2023
Y2 - 6 September 2023 through 8 September 2023
ER -