A novel unsupervised method for new word extraction

Lili Mei, Heyan Huang*, Xiaochi Wei, Xianling Mao

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

11 Citations (Scopus)

Abstract

New words could benefit many NLP tasks such as sentence chunking and sentiment analysis. However, automatic new word extraction is a challenging task because new words usually have no fixed language pattern, and even appear with the new meanings of existing words. To tackle these problems, this paper proposes a novel method to extract new words. It not only considers domain specificity, but also combines with multiple statistical language knowledge. First, we perform a filtering algorithm to obtain a candidate list of new words. Then, we employ the statistical language knowledge to extract the top ranked new words. Experimental results show that our proposed method is able to extract a large number of new words both in Chinese and English corpus, and notably outperforms the state-of-the-art methods. Moreover, we also demonstrate our method increases the accuracy of Chinese word segmentation by 10% on corpus containing new words.

Original languageEnglish
Article number92102
JournalScience China Information Sciences
Volume59
Issue number9
DOIs
Publication statusPublished - 1 Sept 2016

Keywords

  • domain specificity
  • domain word extraction
  • new word extraction
  • statistical language knowledge
  • word segmentation

Fingerprint

Dive into the research topics of 'A novel unsupervised method for new word extraction'. Together they form a unique fingerprint.

Cite this