Abstract
A multi-stage disambiguation algorithm was proposed based on the construction of text feature space. According to the characteristics of query terms often occurring as common terms, heuristic rule was applied to determine if the query term is personal name after the pre-processing of documents. Then named entity and occupation were extracted according to the feature templates. The sentential semantic model was used for sentential semantic analysis and sentential semantic features extraction. The word frequency was counted according to the bag-of-words model. Then the three layers of feature space were constructed. The rule-based classification and two-stage hierarchical clustering algorithm was used to realize the name disambiguation. The overlap coefficient was introduced to compute the similarity of the sentential semantic features. The experiments datasets built by CLP2012 Chinese Personal Name disambiguation showed that F achieved 88.79%, which proved that the proposed approach can improve the performance of cross-document personal name disambiguation.
Original language | English |
---|---|
Pages (from-to) | 717-723 and 775 |
Journal | Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science) |
Volume | 49 |
Issue number | 4 |
DOIs | |
Publication status | Published - 1 Apr 2015 |
Keywords
- Natural language processing
- Personal name disambiguation
- Sentential semantic analysis
- Sentential semantic model