TY - GEN
T1 - Privacy-preserving outlier detection with high efficiency over distributed datasets
AU - Lu, Guanghong
AU - Duan, Chunhui
AU - Zhou, Guohao
AU - Ding, Xuan
AU - Liu, Yunhao
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/5/10
Y1 - 2021/5/10
N2 - The ability to detect outliers is crucial in data mining, with widespread usage in many fields, including fraud detection, malicious behavior monitoring, health diagnosis, etc. With the tremendous volume of data becoming more distributed than ever, global outlier detection for a group of distributed datasets is particularly desirable. In this work, we propose PIF (Privacy-preserving Isolation Forest), which can detect outliers for multiple distributed data providers with high efficiency and accuracy while giving certain security guarantees. To achieve the goal, PIF makes an innovative improvement to the traditional iForest algorithm, enabling it in distributed environments. With a series of carefully-designed algorithms, each participating party collaborates to build an ensemble of isolation trees efficiently without disclosing sensitive information of data. Besides, to deal with complicated real-world scenarios where different kinds of partitioned data are involved, we propose a comprehensive schema that can work for both horizontally and vertically partitioned data models. We have implemented our method and evaluated it with extensive experiments. It is demonstrated that PIF can achieve comparable AUC to existing iForest on average and maintains a linear time complexity without privacy violation.
AB - The ability to detect outliers is crucial in data mining, with widespread usage in many fields, including fraud detection, malicious behavior monitoring, health diagnosis, etc. With the tremendous volume of data becoming more distributed than ever, global outlier detection for a group of distributed datasets is particularly desirable. In this work, we propose PIF (Privacy-preserving Isolation Forest), which can detect outliers for multiple distributed data providers with high efficiency and accuracy while giving certain security guarantees. To achieve the goal, PIF makes an innovative improvement to the traditional iForest algorithm, enabling it in distributed environments. With a series of carefully-designed algorithms, each participating party collaborates to build an ensemble of isolation trees efficiently without disclosing sensitive information of data. Besides, to deal with complicated real-world scenarios where different kinds of partitioned data are involved, we propose a comprehensive schema that can work for both horizontally and vertically partitioned data models. We have implemented our method and evaluated it with extensive experiments. It is demonstrated that PIF can achieve comparable AUC to existing iForest on average and maintains a linear time complexity without privacy violation.
KW - Distributed data
KW - Outlier detection
KW - PIF
KW - Privacy-preserving
UR - http://www.scopus.com/inward/record.url?scp=85111919183&partnerID=8YFLogxK
U2 - 10.1109/INFOCOM42981.2021.9488710
DO - 10.1109/INFOCOM42981.2021.9488710
M3 - Conference contribution
AN - SCOPUS:85111919183
T3 - Proceedings - IEEE INFOCOM
BT - INFOCOM 2021 - IEEE Conference on Computer Communications
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 40th IEEE Conference on Computer Communications, INFOCOM 2021
Y2 - 10 May 2021 through 13 May 2021
ER -