BDC:Using BERT and Deep Clustering to Improve Chinese Proper Noun Recognition

Yuanchi Ma, Hui He, Zhendong Niu

Research output: Contribution to journalConference articlepeer-review

1 Citation (Scopus)

Abstract

Proper noun recognition is a sub-task in named entity recognition. However, few methods have been specifically applied to the Chinese. The reason is that most of the existing deep clustering methods rely on manually labeled training sets, which take a long time in the learning process. And due to the wide and large-scale nature of the proprietary domain and the lack of word boundaries, recognizing Chinese specialized terms from unstructured text remains challenging. In this paper, we design an unsupervised method to improve Chinese proper noun recognition. The first step is to implement the word separation for Chinese, followed by a BERT-based improved word characterization method to obtain word vectors. Finally, we use the autoencoder-based deep clustering method to complete the extraction of proper nouns from books. We have done comparison experiments on the public dataset and our selected professional book data respectively, and the result is an improvement of our method in both the accuracy and F1 values.

Original languageEnglish
Pages (from-to)57-62
Number of pages6
JournalProceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE
Volume2023-July
DOIs
Publication statusPublished - 2023
Event35th International Conference on Software Engineering and Knowledge Engineering, SEKE 2023 - Hybrid, San Francisco, United States
Duration: 1 Jul 202310 Jul 2023

Keywords

  • BERT
  • Deep clustering
  • GMM
  • Proper noun recognition

Fingerprint

Dive into the research topics of 'BDC:Using BERT and Deep Clustering to Improve Chinese Proper Noun Recognition'. Together they form a unique fingerprint.

Cite this