VisPhone: Chinese named entity recognition model enhanced by visual and phonetic features

Baohua Zhang; Jiahao Cai; Huaping Zhang; Jianyun Shang

doi:10.1016/j.ipm.2023.103314

VisPhone: Chinese named entity recognition model enhanced by visual and phonetic features

Baohua Zhang, Jiahao Cai, Huaping Zhang^*, Jianyun Shang

^*Corresponding author for this work

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

16 Citations (Scopus)

Abstract

Many Chinese NER models only focus on lexical and radical information, ignoring the fact that there are also certain rules for the pronunciation of Chinese entities. In this paper, we propose VisPhone, which incorporates Chinese characters’ Phonetic features into Transformer Encoder along with the Lattice and Visual features. We present the common rules for the pronunciation of Chinese entities and explore the most appropriate method to encode it. VisPhone uses two identical cross transformer encoders to fuse the visual and phonetic features of the input characters with the text embedding. A selective fusion module is used to get the final features. We conducted experiments on four well-known Chinese NER benchmark datasets: OntoNotes4.0, MSRA, Resume, and Weibo, with F1 scores of 82.63%, 96.07%, 96.26%, 70.79% respectively, improving the performance by 0.79%, 0.32%, 0.39%, and 3.47%. Our ablation experiments have also demonstrated the effectiveness of VisPhone.

Original language	English
Article number	103314
Journal	Information Processing and Management
Volume	60
Issue number	3
DOIs	https://doi.org/10.1016/j.ipm.2023.103314
Publication status	Published - May 2023

Keywords

Chinese NER
Cross transformer
Phonetic information
Selective fusion
Visual information

Access to Document

10.1016/j.ipm.2023.103314

Cite this

@article{1a21c200ab2e4f88b5a1b02f34a2be5a,

title = "VisPhone: Chinese named entity recognition model enhanced by visual and phonetic features",

abstract = "Many Chinese NER models only focus on lexical and radical information, ignoring the fact that there are also certain rules for the pronunciation of Chinese entities. In this paper, we propose VisPhone, which incorporates Chinese characters{\textquoteright} Phonetic features into Transformer Encoder along with the Lattice and Visual features. We present the common rules for the pronunciation of Chinese entities and explore the most appropriate method to encode it. VisPhone uses two identical cross transformer encoders to fuse the visual and phonetic features of the input characters with the text embedding. A selective fusion module is used to get the final features. We conducted experiments on four well-known Chinese NER benchmark datasets: OntoNotes4.0, MSRA, Resume, and Weibo, with F1 scores of 82.63%, 96.07%, 96.26%, 70.79% respectively, improving the performance by 0.79%, 0.32%, 0.39%, and 3.47%. Our ablation experiments have also demonstrated the effectiveness of VisPhone.",

keywords = "Chinese NER, Cross transformer, Phonetic information, Selective fusion, Visual information",

author = "Baohua Zhang and Jiahao Cai and Huaping Zhang and Jianyun Shang",

note = "Publisher Copyright: {\textcopyright} 2023 The Authors",

year = "2023",

month = may,

doi = "10.1016/j.ipm.2023.103314",

language = "English",

volume = "60",

journal = "Information Processing and Management",

issn = "0306-4573",

publisher = "Elsevier Ltd.",

number = "3",

}

TY - JOUR

T1 - VisPhone

T2 - Chinese named entity recognition model enhanced by visual and phonetic features

AU - Zhang, Baohua

AU - Cai, Jiahao

AU - Zhang, Huaping

AU - Shang, Jianyun

PY - 2023/5

Y1 - 2023/5

N2 - Many Chinese NER models only focus on lexical and radical information, ignoring the fact that there are also certain rules for the pronunciation of Chinese entities. In this paper, we propose VisPhone, which incorporates Chinese characters’ Phonetic features into Transformer Encoder along with the Lattice and Visual features. We present the common rules for the pronunciation of Chinese entities and explore the most appropriate method to encode it. VisPhone uses two identical cross transformer encoders to fuse the visual and phonetic features of the input characters with the text embedding. A selective fusion module is used to get the final features. We conducted experiments on four well-known Chinese NER benchmark datasets: OntoNotes4.0, MSRA, Resume, and Weibo, with F1 scores of 82.63%, 96.07%, 96.26%, 70.79% respectively, improving the performance by 0.79%, 0.32%, 0.39%, and 3.47%. Our ablation experiments have also demonstrated the effectiveness of VisPhone.

AB - Many Chinese NER models only focus on lexical and radical information, ignoring the fact that there are also certain rules for the pronunciation of Chinese entities. In this paper, we propose VisPhone, which incorporates Chinese characters’ Phonetic features into Transformer Encoder along with the Lattice and Visual features. We present the common rules for the pronunciation of Chinese entities and explore the most appropriate method to encode it. VisPhone uses two identical cross transformer encoders to fuse the visual and phonetic features of the input characters with the text embedding. A selective fusion module is used to get the final features. We conducted experiments on four well-known Chinese NER benchmark datasets: OntoNotes4.0, MSRA, Resume, and Weibo, with F1 scores of 82.63%, 96.07%, 96.26%, 70.79% respectively, improving the performance by 0.79%, 0.32%, 0.39%, and 3.47%. Our ablation experiments have also demonstrated the effectiveness of VisPhone.

KW - Chinese NER

KW - Cross transformer

KW - Phonetic information

KW - Selective fusion

KW - Visual information

UR - http://www.scopus.com/inward/record.url?scp=85148077878&partnerID=8YFLogxK

U2 - 10.1016/j.ipm.2023.103314

DO - 10.1016/j.ipm.2023.103314

M3 - Article

AN - SCOPUS:85148077878

SN - 0306-4573

VL - 60

JO - Information Processing and Management

JF - Information Processing and Management

IS - 3

M1 - 103314

ER -

VisPhone: Chinese named entity recognition model enhanced by visual and phonetic features

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this