TY - GEN
T1 - Cost-Effective Big Data Mining in the Cloud
T2 - 10th IEEE International Conference on Cloud Computing, CLOUD 2017
AU - He, Qiang
AU - Zhu, Xiaodong
AU - Li, Dongwei
AU - Wang, Shuliang
AU - Shen, Jun
AU - Yang, Yun
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/9/8
Y1 - 2017/9/8
N2 - Mining big data often requires tremendous computational resources. This has become a major obstacle to broad applications of big data analytics. Cloud computing allows data scientists to access computational resources on-demand for building their big data analytics solutions in the cloud. However, the monetary cost of mining big data in the cloud can still be unexpectedly high. For example, running 100 m4-xlarge Amazon EC2 instances for a month costs approximately 17,495.00. On this ground, it is a critical issue to analyze the cost effectiveness of big data mining in the cloud, i.e., how to achieve a sufficiently satisfactory result at the lowest possible computation cost. In certain big data mining scenarios, 100% accuracy is unnecessary. Instead, it is often more preferable to achieve a sufficient accuracy, e.g., 99%, at a much lower cost, e.g., 10%, than the cost of achieving the 100% accuracy. In this paper, we explore and demonstrate the cost effectiveness of big data mining with a case study using well known k-means. With the case study, we find that achieving 99% accuracy only needs 0.32%-46.17% computation cost of 100% accuracy. This finding lays the cornerstone for cost-effective big data mining in a variety of domains.
AB - Mining big data often requires tremendous computational resources. This has become a major obstacle to broad applications of big data analytics. Cloud computing allows data scientists to access computational resources on-demand for building their big data analytics solutions in the cloud. However, the monetary cost of mining big data in the cloud can still be unexpectedly high. For example, running 100 m4-xlarge Amazon EC2 instances for a month costs approximately 17,495.00. On this ground, it is a critical issue to analyze the cost effectiveness of big data mining in the cloud, i.e., how to achieve a sufficiently satisfactory result at the lowest possible computation cost. In certain big data mining scenarios, 100% accuracy is unnecessary. Instead, it is often more preferable to achieve a sufficient accuracy, e.g., 99%, at a much lower cost, e.g., 10%, than the cost of achieving the 100% accuracy. In this paper, we explore and demonstrate the cost effectiveness of big data mining with a case study using well known k-means. With the case study, we find that achieving 99% accuracy only needs 0.32%-46.17% computation cost of 100% accuracy. This finding lays the cornerstone for cost-effective big data mining in a variety of domains.
KW - Big Data
KW - Cloud Computing
KW - Cost-Effective
KW - Data Mining
KW - K-Means
UR - https://www.scopus.com/pages/publications/85032196042
U2 - 10.1109/CLOUD.2017.124
DO - 10.1109/CLOUD.2017.124
M3 - Conference contribution
AN - SCOPUS:85032196042
T3 - IEEE International Conference on Cloud Computing, CLOUD
SP - 74
EP - 81
BT - Proceedings - 2017 IEEE 10th International Conference on Cloud Computing, CLOUD 2017
A2 - Fox, Geoffrey C.
PB - IEEE Computer Society
Y2 - 25 June 2017 through 30 June 2017
ER -