Microblog Sentiment Classification Using Parallel SVM in Apache Spark

Bo Yan, Zijiang Yang, Yitian Ren, Xing Tan, Eric Liu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Citations (Scopus)

Abstract

In the information age, sentiment classification of internet topics is of great significance. This paper proposes a microblog sentiment classification approach with parallel support vector machine (SVM). The proposed method integrates the features of microblog with preprocessing to ensure the data suitable for sentiment classification. After the preprocessing process, Apache Spark parallel SVM is used to execute the classification. SVM is one of the most popular algorithms in text classification. It fits small scale and nonlinear problems. However, SVM takes very long when dealing with big data. We apply Spark to parallelize SVM with Radial Basis Function (RBF) kernel function. The introduction of Apache Spark results in outstanding performance in machine learning compared to Hadoop. The experiments show that Spark increases the execution speed of SVM significantly. At the same time the classification accuracy is also increased by information gain (IG) approach in the preprocessing and kernel function parameter selection.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE 6th International Congress on Big Data, BigData Congress 2017
EditorsGeorge Karypis, Jia Zhang
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages282-288
Number of pages7
ISBN (Electronic)9781538619964
DOIs
Publication statusPublished - 7 Sept 2017
Event6th IEEE International Congress on Big Data, BigData Congress 2017 - Honolulu, United States
Duration: 25 Jun 201730 Jun 2017

Publication series

NameProceedings - 2017 IEEE 6th International Congress on Big Data, BigData Congress 2017

Conference

Conference6th IEEE International Congress on Big Data, BigData Congress 2017
Country/TerritoryUnited States
CityHonolulu
Period25/06/1730/06/17

Keywords

  • Radial Basis Function
  • Spark
  • big data
  • sentiment classification
  • support vector machine

Fingerprint

Dive into the research topics of 'Microblog Sentiment Classification Using Parallel SVM in Apache Spark'. Together they form a unique fingerprint.

Cite this

Yan, B., Yang, Z., Ren, Y., Tan, X., & Liu, E. (2017). Microblog Sentiment Classification Using Parallel SVM in Apache Spark. In G. Karypis, & J. Zhang (Eds.), Proceedings - 2017 IEEE 6th International Congress on Big Data, BigData Congress 2017 (pp. 282-288). Article 8029336 (Proceedings - 2017 IEEE 6th International Congress on Big Data, BigData Congress 2017). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigDataCongress.2017.43