Elastic algorithms for guaranteeing quality monotonicity in big data mining

Rui Han, Lei Nie, Moustafa M. Ghanem, Yike Guo*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Citations (Scopus)

Abstract

When mining large data volumes in big data applications users are typically willing to use algorithms that produce acceptable approximate results satisfying the given resource and time constraints. Two key challenges arise when designing such algorithms. The first relates to reasoning about tradeoffs between the quality of data mining output, e.g. prediction accuracy for classification tasks and available resource and time budgets. The second is organizing the computation of the algorithm to guarantee producing better quality of results as more budget is used. Little work has addressed these two challenges together in a generic way. In this paper, we propose a novel framework for developing elastic big data mining algorithms. Based on Shannon's entropy, an information-theoretic approach is introduced to reason about how result quality is affected by the allocated budget. This is then used to guide the development of algorithms that adapt to the available time budgets while guaranteeing producing better quality results as more budgets are used. We demonstrate the application of the framework by developing elastic k-Nearest Neighbour (kNN) classification and collaborative filtering (CF) recommendation algorithms as two examples. The core of both elastic algorithms is to use a naïve kNN classification or CF algorithm over R-tree data structures that successively approximate the entire datasets. Experimental evaluation was performed using prediction accuracy as quality metric on real datasets. The results show that elastic mining algorithms indeed produce results with consistent increase in observable qualities, i.e., prediction accuracy, in practice.

Original languageEnglish
Title of host publicationProceedings - 2013 IEEE International Conference on Big Data, Big Data 2013
PublisherIEEE Computer Society
Pages45-50
Number of pages6
ISBN (Print)9781479912926
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event2013 IEEE International Conference on Big Data, Big Data 2013 - Santa Clara, CA, United States
Duration: 6 Oct 20139 Oct 2013

Publication series

NameProceedings - 2013 IEEE International Conference on Big Data, Big Data 2013

Conference

Conference2013 IEEE International Conference on Big Data, Big Data 2013
Country/TerritoryUnited States
CitySanta Clara, CA
Period6/10/139/10/13

Keywords

  • R-tree
  • elastic data mining algorithms
  • entropy
  • quality monotonicity

Fingerprint

Dive into the research topics of 'Elastic algorithms for guaranteeing quality monotonicity in big data mining'. Together they form a unique fingerprint.

Cite this