TY - JOUR
T1 - Chinese Named Entity Recognition with Character-Level BLSTM and Soft Attention Model
AU - Yin, Jize
AU - Luo, Senlin
AU - Wu, Zhouting
AU - Pan, Limin
N1 - Publisher Copyright:
© 2020 Editorial Department of Journal of Beijing Institute of Technology.
PY - 2020/3/1
Y1 - 2020/3/1
N2 - Unlike named entity recognition (NER) for English, the absence of word boundaries reduces the final accuracy for Chinese NER. To avoid accumulated error introduced by word segmentation, a deep model extracting character-level features is carefully built and becomes a basis for a new Chinese NER method, which is proposed in this paper. This method converts the raw text to a character vector sequence, extracts global text features with a bidirectional long short-term memory and extracts local text features with a soft attention model. A linear chain conditional random field is also used to label all the characters with the help of the global and local text features. Experiments based on the Microsoft Research Asia (MSRA) dataset are designed and implemented. Results show that the proposed method has good performance compared to other methods, which proves that the global and local text features extracted have a positive influence on Chinese NER. For more variety in the test domains, a resume dataset from Sina Finance is also used to prove the effectiveness of the proposed method.
AB - Unlike named entity recognition (NER) for English, the absence of word boundaries reduces the final accuracy for Chinese NER. To avoid accumulated error introduced by word segmentation, a deep model extracting character-level features is carefully built and becomes a basis for a new Chinese NER method, which is proposed in this paper. This method converts the raw text to a character vector sequence, extracts global text features with a bidirectional long short-term memory and extracts local text features with a soft attention model. A linear chain conditional random field is also used to label all the characters with the help of the global and local text features. Experiments based on the Microsoft Research Asia (MSRA) dataset are designed and implemented. Results show that the proposed method has good performance compared to other methods, which proves that the global and local text features extracted have a positive influence on Chinese NER. For more variety in the test domains, a resume dataset from Sina Finance is also used to prove the effectiveness of the proposed method.
KW - Bidirectional long short-term memory
KW - Character-level
KW - Chinese
KW - Named entity recognition (NER)
KW - Soft attention model
UR - http://www.scopus.com/inward/record.url?scp=85086824658&partnerID=8YFLogxK
U2 - 10.15918/j.jbit1004-0579.18161
DO - 10.15918/j.jbit1004-0579.18161
M3 - Article
AN - SCOPUS:85086824658
SN - 1004-0579
VL - 29
SP - 60
EP - 71
JO - Journal of Beijing Institute of Technology (English Edition)
JF - Journal of Beijing Institute of Technology (English Edition)
IS - 1
ER -