FCL: A new network words extraction approach based on statistical language knowledge

Lili Mei; Heyan Huang; Xiaochi Wei; Peng Yuan; Xian Ling Mao

doi:10.1007/978-981-10-0080-5_11

FCL: A new network words extraction approach based on statistical language knowledge

Lili Mei, Heyan Huang, Xiaochi Wei, Peng Yuan, Xian Ling Mao^*

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

1 引用（Scopus）

摘要

New network words could benefit many NLP tasks such as Chinese word segmentation and sentiment analysis. However, automatic new network words extraction is a challenging task because new network words usually have no fixed language pattern, and even appear with the new meanings of existing words. To tackle these problems, this paper proposes a novel approach of FCL to extract new network words. It not only considers domain specificity, but also combines with multiple statistical language knowledge. First, we perform a filtering algorithm to obtain a list of candidate new words. Then, we employ the statistical language knowledge to extract the top ranked new network words. Experimental results show that our proposed approach is able to extract a large number of new network words and notably outperforms the state-of-theart methods. Moreover, we also demonstrate our approach increases the accuracy of word segmentation by 10% on corpus containing new words.

源语言	英语
主期刊名	Social Media Processing - 4th National Conference, SMP 2015, Proceedings
编辑	Maosong Sun, Xichun Zhang, Zhenyu Wang, Xuanjing Huang
出版商	Springer Verlag
页	119-130
页数	12
ISBN（印刷版）	9789811000799
DOI	https://doi.org/10.1007/978-981-10-0080-5_11
出版状态	已出版 - 2015
活动	4th National Conference on Social Media Processing, SMP 2015 - Guangzhou, 中国期限: 16 11月 2015 → 17 11月 2015

出版系列

姓名	Communications in Computer and Information Science
卷	568
ISSN（印刷版）	1865-0929

会议

会议	4th National Conference on Social Media Processing, SMP 2015
国家/地区	中国
市	Guangzhou
时期	16/11/15 → 17/11/15

访问文件

10.1007/978-981-10-0080-5_11

其它文件与链接

链接到 Scopus 的出版物

引用此

Mei, L., Huang, H., Wei, X., Yuan, P., & Mao, X. L. (2015). FCL: A new network words extraction approach based on statistical language knowledge. 在 M. Sun, X. Zhang, Z. Wang, & X. Huang (编辑), Social Media Processing - 4th National Conference, SMP 2015, Proceedings (页码 119-130). (Communications in Computer and Information Science; 卷 568). Springer Verlag. https://doi.org/10.1007/978-981-10-0080-5_11

@inproceedings{534bd6aef9b54d9b8e3330ea864532d0,

title = "FCL: A new network words extraction approach based on statistical language knowledge",

abstract = "New network words could benefit many NLP tasks such as Chinese word segmentation and sentiment analysis. However, automatic new network words extraction is a challenging task because new network words usually have no fixed language pattern, and even appear with the new meanings of existing words. To tackle these problems, this paper proposes a novel approach of FCL to extract new network words. It not only considers domain specificity, but also combines with multiple statistical language knowledge. First, we perform a filtering algorithm to obtain a list of candidate new words. Then, we employ the statistical language knowledge to extract the top ranked new network words. Experimental results show that our proposed approach is able to extract a large number of new network words and notably outperforms the state-of-theart methods. Moreover, we also demonstrate our approach increases the accuracy of word segmentation by 10% on corpus containing new words.",

keywords = "Domain specificity, New network words extraction, Statistical language knowledge, Word segmentation",

author = "Lili Mei and Heyan Huang and Xiaochi Wei and Peng Yuan and Mao, {Xian Ling}",

note = "Publisher Copyright: {\textcopyright} Springer Science+Business Media Singapore 2015.; 4th National Conference on Social Media Processing, SMP 2015 ; Conference date: 16-11-2015 Through 17-11-2015",

year = "2015",

doi = "10.1007/978-981-10-0080-5_11",

language = "English",

isbn = "9789811000799",

series = "Communications in Computer and Information Science",

publisher = "Springer Verlag",

pages = "119--130",

editor = "Maosong Sun and Xichun Zhang and Zhenyu Wang and Xuanjing Huang",

booktitle = "Social Media Processing - 4th National Conference, SMP 2015, Proceedings",

address = "Germany",

}

Mei, L, Huang, H, Wei, X, Yuan, P & Mao, XL 2015, FCL: A new network words extraction approach based on statistical language knowledge. 在 M Sun, X Zhang, Z Wang & X Huang (编辑), Social Media Processing - 4th National Conference, SMP 2015, Proceedings. Communications in Computer and Information Science, 卷 568, Springer Verlag, 页码 119-130, 4th National Conference on Social Media Processing, SMP 2015, Guangzhou, 中国, 16/11/15. https://doi.org/10.1007/978-981-10-0080-5_11

FCL: A new network words extraction approach based on statistical language knowledge. / Mei, Lili; Huang, Heyan; Wei, Xiaochi 等.
Social Media Processing - 4th National Conference, SMP 2015, Proceedings. 编辑 / Maosong Sun; Xichun Zhang; Zhenyu Wang; Xuanjing Huang. Springer Verlag, 2015. 页码 119-130 (Communications in Computer and Information Science; 卷 568).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - FCL

T2 - 4th National Conference on Social Media Processing, SMP 2015

AU - Mei, Lili

AU - Huang, Heyan

AU - Wei, Xiaochi

AU - Yuan, Peng

AU - Mao, Xian Ling

N1 - Publisher Copyright: © Springer Science+Business Media Singapore 2015.

PY - 2015

Y1 - 2015

N2 - New network words could benefit many NLP tasks such as Chinese word segmentation and sentiment analysis. However, automatic new network words extraction is a challenging task because new network words usually have no fixed language pattern, and even appear with the new meanings of existing words. To tackle these problems, this paper proposes a novel approach of FCL to extract new network words. It not only considers domain specificity, but also combines with multiple statistical language knowledge. First, we perform a filtering algorithm to obtain a list of candidate new words. Then, we employ the statistical language knowledge to extract the top ranked new network words. Experimental results show that our proposed approach is able to extract a large number of new network words and notably outperforms the state-of-theart methods. Moreover, we also demonstrate our approach increases the accuracy of word segmentation by 10% on corpus containing new words.

AB - New network words could benefit many NLP tasks such as Chinese word segmentation and sentiment analysis. However, automatic new network words extraction is a challenging task because new network words usually have no fixed language pattern, and even appear with the new meanings of existing words. To tackle these problems, this paper proposes a novel approach of FCL to extract new network words. It not only considers domain specificity, but also combines with multiple statistical language knowledge. First, we perform a filtering algorithm to obtain a list of candidate new words. Then, we employ the statistical language knowledge to extract the top ranked new network words. Experimental results show that our proposed approach is able to extract a large number of new network words and notably outperforms the state-of-theart methods. Moreover, we also demonstrate our approach increases the accuracy of word segmentation by 10% on corpus containing new words.

KW - Domain specificity

KW - New network words extraction

KW - Statistical language knowledge

KW - Word segmentation

UR - http://www.scopus.com/inward/record.url?scp=84959317101&partnerID=8YFLogxK

U2 - 10.1007/978-981-10-0080-5_11

DO - 10.1007/978-981-10-0080-5_11

M3 - Conference contribution

AN - SCOPUS:84959317101

SN - 9789811000799

T3 - Communications in Computer and Information Science

SP - 119

EP - 130

BT - Social Media Processing - 4th National Conference, SMP 2015, Proceedings

A2 - Sun, Maosong

A2 - Zhang, Xichun

A2 - Wang, Zhenyu

A2 - Huang, Xuanjing

PB - Springer Verlag

Y2 - 16 November 2015 through 17 November 2015

ER -

Mei L, Huang H, Wei X, Yuan P, Mao XL. FCL: A new network words extraction approach based on statistical language knowledge. 在 Sun M, Zhang X, Wang Z, Huang X, 编辑, Social Media Processing - 4th National Conference, SMP 2015, Proceedings. Springer Verlag. 2015. 页码 119-130. (Communications in Computer and Information Science). doi: 10.1007/978-981-10-0080-5_11

FCL: A new network words extraction approach based on statistical language knowledge

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此