AccurateML: Information-aggregation-based approximate processing for fast and accurate machine learning on MapReduce

Rui Han, Fan Zhang, Zhentao Wang

科研成果: 书/报告/会议事项章节会议稿件同行评审

4 引用 (Scopus)

摘要

The growing demands of processing massive datasets have promoted irresistible trends of running machine learning applications on MapReduce. When processing large input data, it is often of greater values to produce fast and accurate enough approximate results than slow exact results. Existing techniques produce approximate results by processing parts of the input data, thus incurring large accuracy losses when using short job execution times, because all the skipped input data potentially contributes to result accuracy. We address this limitation by proposing AccurateML that aggregates information of input data in each map task to create small aggregated data points. These aggregated points enable all map tasks producing initial outputs quickly to save computation times and decrease the outputs' size to reduce communication times. Our approach further identifies the parts of input data most related to result accuracy, thus first using these parts to improve the produced outputs to minimize accuracy losses. We evaluated AccurateML using real machine learning applications and datasets. The results show: (i) it reduces execution times by 30 times with small accuracy losses compared to exact results; (ii) when using the same execution times, it achieves 2.71 times reductions in accuracy losses compared to existing approximate processing techniques.

源语言英语
主期刊名INFOCOM 2017 - IEEE Conference on Computer Communications
出版商Institute of Electrical and Electronics Engineers Inc.
ISBN(电子版)9781509053360
DOI
出版状态已出版 - 2 10月 2017
已对外发布
活动2017 IEEE Conference on Computer Communications, INFOCOM 2017 - Atlanta, 美国
期限: 1 5月 20174 5月 2017

出版系列

姓名Proceedings - IEEE INFOCOM
ISSN(印刷版)0743-166X

会议

会议2017 IEEE Conference on Computer Communications, INFOCOM 2017
国家/地区美国
Atlanta
时期1/05/174/05/17

指纹

探究 'AccurateML: Information-aggregation-based approximate processing for fast and accurate machine learning on MapReduce' 的科研主题。它们共同构成独一无二的指纹。

引用此