Privacy-preserving outlier detection with high efficiency over distributed datasets

Guanghong Lu, Chunhui Duan*, Guohao Zhou, Xuan Ding, Yunhao Liu

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

4 引用 (Scopus)

摘要

The ability to detect outliers is crucial in data mining, with widespread usage in many fields, including fraud detection, malicious behavior monitoring, health diagnosis, etc. With the tremendous volume of data becoming more distributed than ever, global outlier detection for a group of distributed datasets is particularly desirable. In this work, we propose PIF (Privacy-preserving Isolation Forest), which can detect outliers for multiple distributed data providers with high efficiency and accuracy while giving certain security guarantees. To achieve the goal, PIF makes an innovative improvement to the traditional iForest algorithm, enabling it in distributed environments. With a series of carefully-designed algorithms, each participating party collaborates to build an ensemble of isolation trees efficiently without disclosing sensitive information of data. Besides, to deal with complicated real-world scenarios where different kinds of partitioned data are involved, we propose a comprehensive schema that can work for both horizontally and vertically partitioned data models. We have implemented our method and evaluated it with extensive experiments. It is demonstrated that PIF can achieve comparable AUC to existing iForest on average and maintains a linear time complexity without privacy violation.

源语言英语
主期刊名INFOCOM 2021 - IEEE Conference on Computer Communications
出版商Institute of Electrical and Electronics Engineers Inc.
ISBN(电子版)9780738112817
DOI
出版状态已出版 - 10 5月 2021
已对外发布
活动40th IEEE Conference on Computer Communications, INFOCOM 2021 - Vancouver, 加拿大
期限: 10 5月 202113 5月 2021

出版系列

姓名Proceedings - IEEE INFOCOM
2021-May
ISSN(印刷版)0743-166X

会议

会议40th IEEE Conference on Computer Communications, INFOCOM 2021
国家/地区加拿大
Vancouver
时期10/05/2113/05/21

指纹

探究 'Privacy-preserving outlier detection with high efficiency over distributed datasets' 的科研主题。它们共同构成独一无二的指纹。

引用此