TY - GEN
T1 - Enabling Differentially Private in Big Data Machine Learning
AU - Li, Dong
AU - Zuo, Xiaojiang
AU - Han, Rui
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/12
Y1 - 2019/12
N2 - Using the machine learning technology to explore the potential value of Big Data brings us into a smarter world, and the way data is mined through data sharing patterns also threatens the privacy of personal data. Differential privacy is a prevalent mechanism to effectively protect the personal data privacy due to the strict and the provable privacy definition, although there are several achievements have reached by combining the differential privacy and traditional machine learning algorithms in a stand-alone mode, little to talk about the distributed environment. To fill this gap, this paper proposes a method to embed the differential privacy mechanism into distributed platform, respectively implements the DPLloyd, GUPT k-means and GUPT logistic regression on the platform of Spark. The evaluation demonstrates that the approach barely interferes the effect of distributed machine learning algorithms and thus achieves the goal of differential privacy.
AB - Using the machine learning technology to explore the potential value of Big Data brings us into a smarter world, and the way data is mined through data sharing patterns also threatens the privacy of personal data. Differential privacy is a prevalent mechanism to effectively protect the personal data privacy due to the strict and the provable privacy definition, although there are several achievements have reached by combining the differential privacy and traditional machine learning algorithms in a stand-alone mode, little to talk about the distributed environment. To fill this gap, this paper proposes a method to embed the differential privacy mechanism into distributed platform, respectively implements the DPLloyd, GUPT k-means and GUPT logistic regression on the platform of Spark. The evaluation demonstrates that the approach barely interferes the effect of distributed machine learning algorithms and thus achieves the goal of differential privacy.
KW - Spark MLlib
KW - big data
KW - differential privacy
KW - machine learning
UR - http://www.scopus.com/inward/record.url?scp=85091906134&partnerID=8YFLogxK
U2 - 10.1109/ICSIDP47821.2019.9173114
DO - 10.1109/ICSIDP47821.2019.9173114
M3 - Conference contribution
AN - SCOPUS:85091906134
T3 - ICSIDP 2019 - IEEE International Conference on Signal, Information and Data Processing 2019
BT - ICSIDP 2019 - IEEE International Conference on Signal, Information and Data Processing 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE International Conference on Signal, Information and Data Processing, ICSIDP 2019
Y2 - 11 December 2019 through 13 December 2019
ER -