FCL: A new network words extraction approach based on statistical language knowledge

Lili Mei, Heyan Huang, Xiaochi Wei, Peng Yuan, Xian Ling Mao*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)
Plum Print visual indicator of research metrics
  • Citations
    • Citation Indexes: 1
  • Captures
    • Readers: 2
see details

Abstract

New network words could benefit many NLP tasks such as Chinese word segmentation and sentiment analysis. However, automatic new network words extraction is a challenging task because new network words usually have no fixed language pattern, and even appear with the new meanings of existing words. To tackle these problems, this paper proposes a novel approach of FCL to extract new network words. It not only considers domain specificity, but also combines with multiple statistical language knowledge. First, we perform a filtering algorithm to obtain a list of candidate new words. Then, we employ the statistical language knowledge to extract the top ranked new network words. Experimental results show that our proposed approach is able to extract a large number of new network words and notably outperforms the state-of-theart methods. Moreover, we also demonstrate our approach increases the accuracy of word segmentation by 10% on corpus containing new words.

Original languageEnglish
Title of host publicationSocial Media Processing - 4th National Conference, SMP 2015, Proceedings
EditorsMaosong Sun, Xichun Zhang, Zhenyu Wang, Xuanjing Huang
PublisherSpringer Verlag
Pages119-130
Number of pages12
ISBN (Print)9789811000799
DOIs
Publication statusPublished - 2015
Event4th National Conference on Social Media Processing, SMP 2015 - Guangzhou, China
Duration: 16 Nov 201517 Nov 2015

Publication series

NameCommunications in Computer and Information Science
Volume568
ISSN (Print)1865-0929

Conference

Conference4th National Conference on Social Media Processing, SMP 2015
Country/TerritoryChina
CityGuangzhou
Period16/11/1517/11/15

Keywords

  • Domain specificity
  • New network words extraction
  • Statistical language knowledge
  • Word segmentation

Fingerprint

Dive into the research topics of 'FCL: A new network words extraction approach based on statistical language knowledge'. Together they form a unique fingerprint.

Cite this

Mei, L., Huang, H., Wei, X., Yuan, P., & Mao, X. L. (2015). FCL: A new network words extraction approach based on statistical language knowledge. In M. Sun, X. Zhang, Z. Wang, & X. Huang (Eds.), Social Media Processing - 4th National Conference, SMP 2015, Proceedings (pp. 119-130). (Communications in Computer and Information Science; Vol. 568). Springer Verlag. https://doi.org/10.1007/978-981-10-0080-5_11