Abstract
Automatic recognition of Chinese personal name is emphasis and difficulty for unknown words recognition. Because of their inherent deficiencies, previous solutions are not satisfactory. This paper presents an approach for Chinese personal name recognition based on role tagging. Tokens after segmentation are tagged using Viterbi algorithm with different roles according to their functions in the generation of Chinese personal name. The possible names are recognized after maximum pattern matching on the roles sequence. During the recognition process, only the possibilities of tokens being specific roles and the transition possibilities between roles are required. The significance is that such lexical knowledge can be totally extracted from corpus automatically. In both close and open test on a 16-Mbyte realistic corpus, its recalling rate is nearly 98%. After combined with the algorithm for personal name recognition, authors' Chinese lexical analysis system ICTCLAS improves 1.41% in performance while the agglomerative evaluation argument F-l value of person recognition achieve 95.40%. Various experiments show that role-based algorithm proposed in this paper is effective for Chinese personal name recognition.
| Original language | English |
|---|---|
| Pages (from-to) | 85-91 |
| Number of pages | 7 |
| Journal | Jisuanji Xuebao/Chinese Journal of Computers |
| Volume | 27 |
| Issue number | 1 |
| Publication status | Published - Jan 2004 |
| Externally published | Yes |
Keywords
- Chinese personal name recognition
- Role tagging
- Unknown words recognition
- Viterbi algorithm