Elastic algorithms for guaranteeing quality monotonicity in big data mining

Rui Han, Lei Nie, Moustafa M. Ghanem, Yike Guo*

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

5 引用 (Scopus)

摘要

When mining large data volumes in big data applications users are typically willing to use algorithms that produce acceptable approximate results satisfying the given resource and time constraints. Two key challenges arise when designing such algorithms. The first relates to reasoning about tradeoffs between the quality of data mining output, e.g. prediction accuracy for classification tasks and available resource and time budgets. The second is organizing the computation of the algorithm to guarantee producing better quality of results as more budget is used. Little work has addressed these two challenges together in a generic way. In this paper, we propose a novel framework for developing elastic big data mining algorithms. Based on Shannon's entropy, an information-theoretic approach is introduced to reason about how result quality is affected by the allocated budget. This is then used to guide the development of algorithms that adapt to the available time budgets while guaranteeing producing better quality results as more budgets are used. We demonstrate the application of the framework by developing elastic k-Nearest Neighbour (kNN) classification and collaborative filtering (CF) recommendation algorithms as two examples. The core of both elastic algorithms is to use a naïve kNN classification or CF algorithm over R-tree data structures that successively approximate the entire datasets. Experimental evaluation was performed using prediction accuracy as quality metric on real datasets. The results show that elastic mining algorithms indeed produce results with consistent increase in observable qualities, i.e., prediction accuracy, in practice.

源语言英语
主期刊名Proceedings - 2013 IEEE International Conference on Big Data, Big Data 2013
出版商IEEE Computer Society
45-50
页数6
ISBN(印刷版)9781479912926
DOI
出版状态已出版 - 2013
已对外发布
活动2013 IEEE International Conference on Big Data, Big Data 2013 - Santa Clara, CA, 美国
期限: 6 10月 20139 10月 2013

出版系列

姓名Proceedings - 2013 IEEE International Conference on Big Data, Big Data 2013

会议

会议2013 IEEE International Conference on Big Data, Big Data 2013
国家/地区美国
Santa Clara, CA
时期6/10/139/10/13

指纹

探究 'Elastic algorithms for guaranteeing quality monotonicity in big data mining' 的科研主题。它们共同构成独一无二的指纹。

引用此