FCL: A new network words extraction approach based on statistical language knowledge

Lili Mei, Heyan Huang, Xiaochi Wei, Peng Yuan, Xian Ling Mao*

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

1 引用 (Scopus)

摘要

New network words could benefit many NLP tasks such as Chinese word segmentation and sentiment analysis. However, automatic new network words extraction is a challenging task because new network words usually have no fixed language pattern, and even appear with the new meanings of existing words. To tackle these problems, this paper proposes a novel approach of FCL to extract new network words. It not only considers domain specificity, but also combines with multiple statistical language knowledge. First, we perform a filtering algorithm to obtain a list of candidate new words. Then, we employ the statistical language knowledge to extract the top ranked new network words. Experimental results show that our proposed approach is able to extract a large number of new network words and notably outperforms the state-of-theart methods. Moreover, we also demonstrate our approach increases the accuracy of word segmentation by 10% on corpus containing new words.

源语言英语
主期刊名Social Media Processing - 4th National Conference, SMP 2015, Proceedings
编辑Maosong Sun, Xichun Zhang, Zhenyu Wang, Xuanjing Huang
出版商Springer Verlag
119-130
页数12
ISBN(印刷版)9789811000799
DOI
出版状态已出版 - 2015
活动4th National Conference on Social Media Processing, SMP 2015 - Guangzhou, 中国
期限: 16 11月 201517 11月 2015

出版系列

姓名Communications in Computer and Information Science
568
ISSN(印刷版)1865-0929

会议

会议4th National Conference on Social Media Processing, SMP 2015
国家/地区中国
Guangzhou
时期16/11/1517/11/15

指纹

探究 'FCL: A new network words extraction approach based on statistical language knowledge' 的科研主题。它们共同构成独一无二的指纹。

引用此