A sentence vector based over-sampling method for imbalanced emotion classification

Tao Chen, Ruifeng Xu*, Qin Lu, Bin Liu, Jun Xu, Lin Yao, Zhenyu He

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

10 引用 (Scopus)

摘要

Imbalanced training data poses a serious problem for supervised learning based text classification. Such a problem becomes more serious in emotion classification task with multiple emotion categories as the training data can be quite skewed. This paper presents a novel over-sampling method to form additional sum sentence vectors for minority classes in order to improve emotion classification for imbalanced data. Firstly, a large corpus is used to train a continuous skip-gram model to form each word vector using word/POS pair as the unit of word vector. The sentence vectors of the training data are then constructed as the sum vector of their word/POS vectors. The new minority class training samples are then generated by randomly add two sentence vectors in the corresponding class until the training samples for each class are the same so that the classifiers can be trained on fully balanced training dataset. Evaluations on NLP&CC2013 Chinese micro blog emotion classification dataset shows that the obtained classifier achieves 48.4% average precision, an 11.9 percent improvement over the state-of-art performance on this dataset (at 36.5%). This result shows that the proposed over-sampling method can effectively address the problem of data imbalance and thus achieve much improved performance for emotion classification.

源语言英语
主期刊名Computational Linguistics and Intelligent Text Processing - 15th International Conference, CICLing 2014, Proceedings
出版商Springer Verlag
62-72
页数11
版本PART 2
ISBN(印刷版)9783642549021
DOI
出版状态已出版 - 2014
已对外发布
活动15th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2014 - Kathmandu, 尼泊尔
期限: 6 4月 201412 4月 2014

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
编号PART 2
8404 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议15th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2014
国家/地区尼泊尔
Kathmandu
时期6/04/1412/04/14

指纹

探究 'A sentence vector based over-sampling method for imbalanced emotion classification' 的科研主题。它们共同构成独一无二的指纹。

引用此