TY - GEN
T1 - A lexicon-based multi-class semantic orientation analysis for microblogs
AU - Li, Yuqing
AU - Li, Xin
AU - Li, Fan
AU - Zhang, Xiaofeng
PY - 2014
Y1 - 2014
N2 - In the literature, most of existing works of semantic orientation analysis focus on the distinguishment of two polarities (positive and negative). In this paper, we propose a lexicon-based multi-class semantic orientation analysis for microblogs. To better capture the social attention on public events, we introduce Concern into the conventional psychological classes of sentiments and build up a sentiment lexicon with five categories(Concern, Joy, Blue, Anger, Fear). The seed words of the lexicon are extracted from HowNet, NTUSD, and catchwords of the Sina Weibo posts. The semantic similarity in HowNet is adopted to detect more sentiment words to enrich the lexicon. Accordingly, each Weibo post is represented as a multi-dimensional numerical vector in feature space. Then we adopt the Semi-Supervised Gaussian Mixture Model (Semi-GMM) and an adaptive K-nearst neighbour (KNN) with symmetric Kullback-Leibler divergence (KL-divergence) as similarity measurements to classify the posts. We compare our proposed methodologies with a few competitive baseline methods e.g., majority vote, KNN by using Cosine similarity, and SVM. The experimental evaluation shows that our proposed methods outperform other approaches by a large margin in terms of the accuracy and F1 score.
AB - In the literature, most of existing works of semantic orientation analysis focus on the distinguishment of two polarities (positive and negative). In this paper, we propose a lexicon-based multi-class semantic orientation analysis for microblogs. To better capture the social attention on public events, we introduce Concern into the conventional psychological classes of sentiments and build up a sentiment lexicon with five categories(Concern, Joy, Blue, Anger, Fear). The seed words of the lexicon are extracted from HowNet, NTUSD, and catchwords of the Sina Weibo posts. The semantic similarity in HowNet is adopted to detect more sentiment words to enrich the lexicon. Accordingly, each Weibo post is represented as a multi-dimensional numerical vector in feature space. Then we adopt the Semi-Supervised Gaussian Mixture Model (Semi-GMM) and an adaptive K-nearst neighbour (KNN) with symmetric Kullback-Leibler divergence (KL-divergence) as similarity measurements to classify the posts. We compare our proposed methodologies with a few competitive baseline methods e.g., majority vote, KNN by using Cosine similarity, and SVM. The experimental evaluation shows that our proposed methods outperform other approaches by a large margin in terms of the accuracy and F1 score.
KW - Kullback-Leibler divergence
KW - Semantic Orientation Analysis
KW - Semi-supervised Gaussian mixture model (Semi-GMM)
UR - http://www.scopus.com/inward/record.url?scp=84958545721&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-11116-2_8
DO - 10.1007/978-3-319-11116-2_8
M3 - Conference contribution
AN - SCOPUS:84958545721
SN - 9783319111155
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 81
EP - 92
BT - Web Technologies and Applications - 16th Asia-Pacific Web Conference, APWeb 2014, Proceedings
PB - Springer Verlag
T2 - 16th Asia-Pacific Web Conference on Web Technologies and Applications, APWeb 2014
Y2 - 5 September 2014 through 7 September 2014
ER -