An improved naive Bayesian classification algorithm for sentiment classification of microblogs

Zhi Qiang Li, De Quan Yang, Yuan Tan, Yuan Ping Zou

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

For the attribute-weighted based naive Bayesian classification algorithms, the selection of the weight directly affects the classification results. Based on this, the drawbacks of the TFIDF feature selection approaches in sentiment classification for the microblogs are analyzed, and an improved algorithm named TF-D(t)-CHI is proposed, which applies statistical calculation to obtain the correlation degree between the feature words and the classes. It presents the distribution of the feature items by variance in classes, which solves the problem that the short-texts contain few feature words while the high frequency feature words have too high weight. Experimental result indicate that TF-D(T)-CHI based naive Bayesian classification for feature selection and weight calculation has better classification results in sentiment classification for microblogs.

Original languageEnglish
Title of host publicationVehicle, Mechatronics and Information Technologies II
PublisherTrans Tech Publications
Pages3614-3620
Number of pages7
ISBN (Print)9783038350606
DOIs
Publication statusPublished - 2014
EventInternational Conference on Vehicle and Mechanical Engineering and Information Technology, VMEIT 2014 - Beijing, China
Duration: 19 Feb 201420 Feb 2014

Publication series

NameApplied Mechanics and Materials
Volume543-547
ISSN (Print)1660-9336
ISSN (Electronic)1662-7482

Conference

ConferenceInternational Conference on Vehicle and Mechanical Engineering and Information Technology, VMEIT 2014
Country/TerritoryChina
CityBeijing
Period19/02/1420/02/14

Keywords

  • Feature selection
  • Microblog sentiment classification
  • Naive Bayesian classification
  • TFIDF

Fingerprint

Dive into the research topics of 'An improved naive Bayesian classification algorithm for sentiment classification of microblogs'. Together they form a unique fingerprint.

Cite this