Cross-document personal name disambiguation merging sentential semantic analysis

Han Zhang, Sen Lin Luo*, Li Li Zou, Xiu Min Shi

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

A multi-stage disambiguation algorithm was proposed based on the construction of text feature space. According to the characteristics of query terms often occurring as common terms, heuristic rule was applied to determine if the query term is personal name after the pre-processing of documents. Then named entity and occupation were extracted according to the feature templates. The sentential semantic model was used for sentential semantic analysis and sentential semantic features extraction. The word frequency was counted according to the bag-of-words model. Then the three layers of feature space were constructed. The rule-based classification and two-stage hierarchical clustering algorithm was used to realize the name disambiguation. The overlap coefficient was introduced to compute the similarity of the sentential semantic features. The experiments datasets built by CLP2012 Chinese Personal Name disambiguation showed that F achieved 88.79%, which proved that the proposed approach can improve the performance of cross-document personal name disambiguation.

Original languageEnglish
Pages (from-to)717-723 and 775
JournalZhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science)
Volume49
Issue number4
DOIs
Publication statusPublished - 1 Apr 2015

Keywords

  • Natural language processing
  • Personal name disambiguation
  • Sentential semantic analysis
  • Sentential semantic model

Fingerprint

Dive into the research topics of 'Cross-document personal name disambiguation merging sentential semantic analysis'. Together they form a unique fingerprint.

Cite this