Skip to main navigation Skip to search Skip to main content

Cost-Effective Big Data Mining in the Cloud: A Case Study with K-means

  • Qiang He
  • , Xiaodong Zhu
  • , Dongwei Li
  • , Shuliang Wang
  • , Jun Shen
  • , Yun Yang
  • Swinburne University of Technology
  • University of Shanghai for Science and Technology
  • Beijing Institute of Technology
  • University of Wollongong
  • Anhui University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Mining big data often requires tremendous computational resources. This has become a major obstacle to broad applications of big data analytics. Cloud computing allows data scientists to access computational resources on-demand for building their big data analytics solutions in the cloud. However, the monetary cost of mining big data in the cloud can still be unexpectedly high. For example, running 100 m4-xlarge Amazon EC2 instances for a month costs approximately 17,495.00. On this ground, it is a critical issue to analyze the cost effectiveness of big data mining in the cloud, i.e., how to achieve a sufficiently satisfactory result at the lowest possible computation cost. In certain big data mining scenarios, 100% accuracy is unnecessary. Instead, it is often more preferable to achieve a sufficient accuracy, e.g., 99%, at a much lower cost, e.g., 10%, than the cost of achieving the 100% accuracy. In this paper, we explore and demonstrate the cost effectiveness of big data mining with a case study using well known k-means. With the case study, we find that achieving 99% accuracy only needs 0.32%-46.17% computation cost of 100% accuracy. This finding lays the cornerstone for cost-effective big data mining in a variety of domains.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE 10th International Conference on Cloud Computing, CLOUD 2017
EditorsGeoffrey C. Fox
PublisherIEEE Computer Society
Pages74-81
Number of pages8
ISBN (Electronic)9781538619933
DOIs
Publication statusPublished - 8 Sept 2017
Event10th IEEE International Conference on Cloud Computing, CLOUD 2017 - Honolulu, United States
Duration: 25 Jun 201730 Jun 2017

Publication series

NameIEEE International Conference on Cloud Computing, CLOUD
Volume2017-June
ISSN (Print)2159-6182
ISSN (Electronic)2159-6190

Conference

Conference10th IEEE International Conference on Cloud Computing, CLOUD 2017
Country/TerritoryUnited States
CityHonolulu
Period25/06/1730/06/17

Keywords

  • Big Data
  • Cloud Computing
  • Cost-Effective
  • Data Mining
  • K-Means

Fingerprint

Dive into the research topics of 'Cost-Effective Big Data Mining in the Cloud: A Case Study with K-means'. Together they form a unique fingerprint.

Cite this