Abstract
New words could benefit many NLP tasks such as sentence chunking and sentiment analysis. However, automatic new word extraction is a challenging task because new words usually have no fixed language pattern, and even appear with the new meanings of existing words. To tackle these problems, this paper proposes a novel method to extract new words. It not only considers domain specificity, but also combines with multiple statistical language knowledge. First, we perform a filtering algorithm to obtain a candidate list of new words. Then, we employ the statistical language knowledge to extract the top ranked new words. Experimental results show that our proposed method is able to extract a large number of new words both in Chinese and English corpus, and notably outperforms the state-of-the-art methods. Moreover, we also demonstrate our method increases the accuracy of Chinese word segmentation by 10% on corpus containing new words.
| Original language | English |
|---|---|
| Article number | 92102 |
| Journal | Science China Information Sciences |
| Volume | 59 |
| Issue number | 9 |
| DOIs | |
| Publication status | Published - 1 Sept 2016 |
Keywords
- domain specificity
- domain word extraction
- new word extraction
- statistical language knowledge
- word segmentation
Fingerprint
Dive into the research topics of 'A novel unsupervised method for new word extraction'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver