Abstract
Customer reviews and comments on web pages are important information in our daily life. For example, we prefer to choose a hotel with positive comments from previous customers. As the huge amounts of such information demonstrate the characteristics of big data, it places heavy burdens on the assimilation of the customercontributed opinions. To overcoming this problem, we study an efficient opinion summarization approach for a set of massive user reviews and comments associated with an online resource, to summarize the opinions into two categories, i.e., positive and negative. In this paper, we proposed a framework including: (1) overcoming the big data problem of online comments using the efficient online-LDA approach; (2) selecting meaningful topics from the imbalanced data; (3) summarizing the opinion of comments with high precision and recall. This framework is different from much of the previous work in that the topics are pre-defined and selected the topics for better opinion summarization. To evaluate the proposed framework, we perform the experiments on a dataset of hotel reviews for the variety of topics contained. The results show that our framework can gain a significant performance improvement on opinion summarization.
| Original language | English |
|---|---|
| Pages (from-to) | 414-427 |
| Number of pages | 14 |
| Journal | International Journal of Computers, Communications and Control |
| Volume | 11 |
| Issue number | 3 |
| DOIs | |
| Publication status | Published - 2016 |
| Externally published | Yes |
Keywords
- Big data
- Imbalanced data
- Latent dirichlet allocation (LDA)
- Online - LDA
- Opinion summarization