Enabling Differentially Private in Big Data Machine Learning

Dong Li, Xiaojiang Zuo, Rui Han

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Using the machine learning technology to explore the potential value of Big Data brings us into a smarter world, and the way data is mined through data sharing patterns also threatens the privacy of personal data. Differential privacy is a prevalent mechanism to effectively protect the personal data privacy due to the strict and the provable privacy definition, although there are several achievements have reached by combining the differential privacy and traditional machine learning algorithms in a stand-alone mode, little to talk about the distributed environment. To fill this gap, this paper proposes a method to embed the differential privacy mechanism into distributed platform, respectively implements the DPLloyd, GUPT k-means and GUPT logistic regression on the platform of Spark. The evaluation demonstrates that the approach barely interferes the effect of distributed machine learning algorithms and thus achieves the goal of differential privacy.

Original languageEnglish
Title of host publicationICSIDP 2019 - IEEE International Conference on Signal, Information and Data Processing 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728123455
DOIs
Publication statusPublished - Dec 2019
Event2019 IEEE International Conference on Signal, Information and Data Processing, ICSIDP 2019 - Chongqing, China
Duration: 11 Dec 201913 Dec 2019

Publication series

NameICSIDP 2019 - IEEE International Conference on Signal, Information and Data Processing 2019

Conference

Conference2019 IEEE International Conference on Signal, Information and Data Processing, ICSIDP 2019
Country/TerritoryChina
CityChongqing
Period11/12/1913/12/19

Keywords

  • Spark MLlib
  • big data
  • differential privacy
  • machine learning

Fingerprint

Dive into the research topics of 'Enabling Differentially Private in Big Data Machine Learning'. Together they form a unique fingerprint.

Cite this