Skip to main navigation Skip to search Skip to main content

A probabilistic model based on uncertainty for data clustering

  • Yaxin Yu*
  • , Xinhua Zhu
  • , Miao Li
  • , Guoren Wang
  • , Dan Luo
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recently, all kinds of data in real-life have exploded in an unbelievable way. In order to manage these data, dataspace has been becoming a universal platform, which contains various kinds of data, such as unstructured data, semi-structured data and structured data. But how to cluster these data in dataspace in an efficient and accurate way to help the user manage and explore them is still an intractable problem. In the previous work, the uncertain relationship between term and topic is not considered sufficiently. There are many techniques to handle this problem and probability theory provides an effective way to deal with the uncertainty of clustering. As a result, we proposed a novel probability model based on topic terms, i.e., Probabilistic Term Similarity Model (PTSM) to tackle the uncertainty between term and topic. In this model, not only terms from various data but also structure information of semi-structured and structured data are considered. Each term is assigned a probability indicating how relevant it is to the topic. Then, according to the probability for each term, a probabilistic matrix is established for clustering various data. At last, extensive experiment results show that the clustering method based on this probabilistic model has excellent performance and outperforms some other classical algorithms.

Original languageEnglish
Title of host publicationAgents and Data Mining Interaction - 8th International Workshop, ADMI 2012, Revised Selected Papers
Pages126-138
Number of pages13
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event8th International Workshop on Agents and Data Mining Interaction, ADMI 2012 - Valencia, Spain
Duration: 4 Jun 20125 Jun 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7607 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference8th International Workshop on Agents and Data Mining Interaction, ADMI 2012
Country/TerritorySpain
CityValencia
Period4/06/125/06/12

Keywords

  • data clustering
  • dataspace
  • probability
  • topic
  • uncertainty

Fingerprint

Dive into the research topics of 'A probabilistic model based on uncertainty for data clustering'. Together they form a unique fingerprint.

Cite this