Text clustering on short message by using deep semantic representation

Songze Wu*, Huaping Zhang, Chengcheng Xu, Tao Guo

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

Text clustering is a big challenge in the text mining field; traditional algorithms are powerless when dealing with short texts. Short messages are a much more flexible form of data in social media, containing not only textual information, but also comment, time and regional information. We propose an algorithm to extract semantic and multidimensional feature representation from such texts. In particular, by using the fact that comments are semantically related to the short message, we can get the supervised information and train the text representation, with which we transform the problem into a semi-supervised problem. We use a convolutional-pooling structure that aims at mapping the text into a semantic representation. What’s more, we expand the semantic representation with time- and region-related features, leading to a much more flexible and strong representation for short messages. Our approach shows great advantages in labelled data over traditional feature representation methods and performs better than other clustering methods via deep neural network representation.

Original languageEnglish
Title of host publicationAdvances in Computer Communication and Computational Sciences - Proceedings of IC4S 2017
EditorsSanjiv K. Bhatia, Shailesh Tiwari, Krishn K. Mishra, Munesh C. Trivedi
PublisherSpringer Verlag
Pages133-145
Number of pages13
ISBN (Print)9789811303432
DOIs
Publication statusPublished - 2019
Event2nd International Conference on Computer, Communication and Computational Sciences, IC4S 2017 - kathu, Thailand
Duration: 11 Oct 201712 Oct 2017

Publication series

NameAdvances in Intelligent Systems and Computing
Volume760
ISSN (Print)2194-5357
ISSN (Electronic)2194-5365

Conference

Conference2nd International Conference on Computer, Communication and Computational Sciences, IC4S 2017
Country/TerritoryThailand
Citykathu
Period11/10/1712/10/17

Keywords

  • Deep semantic representation
  • Multidimensional feature
  • Semi-supervised
  • Short message
  • Text clustering

Fingerprint

Dive into the research topics of 'Text clustering on short message by using deep semantic representation'. Together they form a unique fingerprint.

Cite this