Chinese Named Entity Recognition with Character-Level BLSTM and Soft Attention Model

Jize Yin; Senlin Luo; Zhouting Wu; Limin Pan

doi:10.15918/j.jbit1004-0579.18161

Chinese Named Entity Recognition with Character-Level BLSTM and Soft Attention Model

Jize Yin, Senlin Luo, Zhouting Wu, Limin Pan^*

^*Corresponding author for this work

School of Information and Electronics

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

5 Citations (Scopus)

Abstract

Unlike named entity recognition (NER) for English, the absence of word boundaries reduces the final accuracy for Chinese NER. To avoid accumulated error introduced by word segmentation, a deep model extracting character-level features is carefully built and becomes a basis for a new Chinese NER method, which is proposed in this paper. This method converts the raw text to a character vector sequence, extracts global text features with a bidirectional long short-term memory and extracts local text features with a soft attention model. A linear chain conditional random field is also used to label all the characters with the help of the global and local text features. Experiments based on the Microsoft Research Asia (MSRA) dataset are designed and implemented. Results show that the proposed method has good performance compared to other methods, which proves that the global and local text features extracted have a positive influence on Chinese NER. For more variety in the test domains, a resume dataset from Sina Finance is also used to prove the effectiveness of the proposed method.

Original language	English
Pages (from-to)	60-71
Number of pages	12
Journal	Journal of Beijing Institute of Technology (English Edition)
Volume	29
Issue number	1
DOIs	https://doi.org/10.15918/j.jbit1004-0579.18161
Publication status	Published - 1 Mar 2020

Keywords

Bidirectional long short-term memory
Character-level
Chinese
Named entity recognition (NER)
Soft attention model

Access to Document

10.15918/j.jbit1004-0579.18161

Cite this

Yin, J., Luo, S., Wu, Z., & Pan, L. (2020). Chinese Named Entity Recognition with Character-Level BLSTM and Soft Attention Model. Journal of Beijing Institute of Technology (English Edition), 29(1), 60-71. https://doi.org/10.15918/j.jbit1004-0579.18161

@article{b419af8728a441929cc56e0bf2291ced,

title = "Chinese Named Entity Recognition with Character-Level BLSTM and Soft Attention Model",

abstract = "Unlike named entity recognition (NER) for English, the absence of word boundaries reduces the final accuracy for Chinese NER. To avoid accumulated error introduced by word segmentation, a deep model extracting character-level features is carefully built and becomes a basis for a new Chinese NER method, which is proposed in this paper. This method converts the raw text to a character vector sequence, extracts global text features with a bidirectional long short-term memory and extracts local text features with a soft attention model. A linear chain conditional random field is also used to label all the characters with the help of the global and local text features. Experiments based on the Microsoft Research Asia (MSRA) dataset are designed and implemented. Results show that the proposed method has good performance compared to other methods, which proves that the global and local text features extracted have a positive influence on Chinese NER. For more variety in the test domains, a resume dataset from Sina Finance is also used to prove the effectiveness of the proposed method.",

keywords = "Bidirectional long short-term memory, Character-level, Chinese, Named entity recognition (NER), Soft attention model",

author = "Jize Yin and Senlin Luo and Zhouting Wu and Limin Pan",

note = "Publisher Copyright: {\textcopyright} 2020 Editorial Department of Journal of Beijing Institute of Technology.",

year = "2020",

month = mar,

day = "1",

doi = "10.15918/j.jbit1004-0579.18161",

language = "English",

volume = "29",

pages = "60--71",

journal = "Journal of Beijing Institute of Technology (English Edition)",

issn = "1004-0579",

publisher = "Beijing Institute of Technology",

number = "1",

}

TY - JOUR

T1 - Chinese Named Entity Recognition with Character-Level BLSTM and Soft Attention Model

AU - Yin, Jize

AU - Luo, Senlin

AU - Wu, Zhouting

AU - Pan, Limin

PY - 2020/3/1

Y1 - 2020/3/1

N2 - Unlike named entity recognition (NER) for English, the absence of word boundaries reduces the final accuracy for Chinese NER. To avoid accumulated error introduced by word segmentation, a deep model extracting character-level features is carefully built and becomes a basis for a new Chinese NER method, which is proposed in this paper. This method converts the raw text to a character vector sequence, extracts global text features with a bidirectional long short-term memory and extracts local text features with a soft attention model. A linear chain conditional random field is also used to label all the characters with the help of the global and local text features. Experiments based on the Microsoft Research Asia (MSRA) dataset are designed and implemented. Results show that the proposed method has good performance compared to other methods, which proves that the global and local text features extracted have a positive influence on Chinese NER. For more variety in the test domains, a resume dataset from Sina Finance is also used to prove the effectiveness of the proposed method.

AB - Unlike named entity recognition (NER) for English, the absence of word boundaries reduces the final accuracy for Chinese NER. To avoid accumulated error introduced by word segmentation, a deep model extracting character-level features is carefully built and becomes a basis for a new Chinese NER method, which is proposed in this paper. This method converts the raw text to a character vector sequence, extracts global text features with a bidirectional long short-term memory and extracts local text features with a soft attention model. A linear chain conditional random field is also used to label all the characters with the help of the global and local text features. Experiments based on the Microsoft Research Asia (MSRA) dataset are designed and implemented. Results show that the proposed method has good performance compared to other methods, which proves that the global and local text features extracted have a positive influence on Chinese NER. For more variety in the test domains, a resume dataset from Sina Finance is also used to prove the effectiveness of the proposed method.

KW - Bidirectional long short-term memory

KW - Character-level

KW - Chinese

KW - Named entity recognition (NER)

KW - Soft attention model

UR - http://www.scopus.com/inward/record.url?scp=85086824658&partnerID=8YFLogxK

U2 - 10.15918/j.jbit1004-0579.18161

DO - 10.15918/j.jbit1004-0579.18161

M3 - Article

AN - SCOPUS:85086824658

SN - 1004-0579

VL - 29

SP - 60

EP - 71

JO - Journal of Beijing Institute of Technology (English Edition)

JF - Journal of Beijing Institute of Technology (English Edition)

IS - 1

ER -

Chinese Named Entity Recognition with Character-Level BLSTM and Soft Attention Model

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this