TY - JOUR
T1 - Federated Data Cleaning
T2 - Collaborative and Privacy-Preserving Data Cleaning for Edge Intelligence
AU - Ma, Lichuan
AU - Pei, Qingqi
AU - Zhou, Lu
AU - Zhu, Haojin
AU - Wang, Licheng
AU - Ji, Yusheng
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2021/4/15
Y1 - 2021/4/15
N2 - As an important driving factor of emerging Internet-of-Things (IoT) applications, machine learning algorithms are currently facing the challenge of how to 'clean' data noise, that is introduced during the training process (e.g., asynchronous execution and lossy data compression and quantization). In an attempt to guarantee data quality, various data cleaning approaches have been proposed to filter out abnormal data entries based on the global data distribution. However, most existing data cleaning approaches are based on a centralized paradigm and thus cannot be applied to future edge-based IoT applications, where each edge node (EN) has only a limited view of the global data distribution. Moreover, the increasing demand for privacy preservation largely prevents ENs from combining their data for centralized cleaning. In this study, we propose a federated data cleaning protocol, coined as FedClean, for edge intelligence (EI) scenarios that is designed to achieve data cleaning without compromising data privacy. More specifically, different ENs first generate Boolean shares of their data and distribute them to two noncolluding servers. These two servers then run the FedClean protocol to privately and efficiently compute the attribute value frequency (AVF) scores of the collected data entries, which are then sorted in ascending order via a bitonic sorting network without revealing their values. As a result, data entries with lower AVF scores are considered as abnormal and filtered out. The security, efficiency, and effectiveness of the proposed approach are then demonstrated via concrete security analysis and comprehensive experiments.
AB - As an important driving factor of emerging Internet-of-Things (IoT) applications, machine learning algorithms are currently facing the challenge of how to 'clean' data noise, that is introduced during the training process (e.g., asynchronous execution and lossy data compression and quantization). In an attempt to guarantee data quality, various data cleaning approaches have been proposed to filter out abnormal data entries based on the global data distribution. However, most existing data cleaning approaches are based on a centralized paradigm and thus cannot be applied to future edge-based IoT applications, where each edge node (EN) has only a limited view of the global data distribution. Moreover, the increasing demand for privacy preservation largely prevents ENs from combining their data for centralized cleaning. In this study, we propose a federated data cleaning protocol, coined as FedClean, for edge intelligence (EI) scenarios that is designed to achieve data cleaning without compromising data privacy. More specifically, different ENs first generate Boolean shares of their data and distribute them to two noncolluding servers. These two servers then run the FedClean protocol to privately and efficiently compute the attribute value frequency (AVF) scores of the collected data entries, which are then sorted in ascending order via a bitonic sorting network without revealing their values. As a result, data entries with lower AVF scores are considered as abnormal and filtered out. The security, efficiency, and effectiveness of the proposed approach are then demonstrated via concrete security analysis and comprehensive experiments.
KW - Data cleaning
KW - edge intelligence (EI)
KW - privacy preserving
UR - https://www.scopus.com/pages/publications/85100552503
U2 - 10.1109/JIOT.2020.3027980
DO - 10.1109/JIOT.2020.3027980
M3 - Article
AN - SCOPUS:85100552503
SN - 2327-4662
VL - 8
SP - 6757
EP - 6770
JO - IEEE Internet of Things Journal
JF - IEEE Internet of Things Journal
IS - 8
M1 - 9210000
ER -