Privacy-preserving outlier detection with high efficiency over distributed datasets

Guanghong Lu; Chunhui Duan; Guohao Zhou; Xuan Ding; Yunhao Liu

doi:10.1109/INFOCOM42981.2021.9488710

Privacy-preserving outlier detection with high efficiency over distributed datasets

Guanghong Lu, Chunhui Duan^*, Guohao Zhou, Xuan Ding, Yunhao Liu

^*此作品的通讯作者

Tsinghua University

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

4 引用（Scopus）

摘要

The ability to detect outliers is crucial in data mining, with widespread usage in many fields, including fraud detection, malicious behavior monitoring, health diagnosis, etc. With the tremendous volume of data becoming more distributed than ever, global outlier detection for a group of distributed datasets is particularly desirable. In this work, we propose PIF (Privacy-preserving Isolation Forest), which can detect outliers for multiple distributed data providers with high efficiency and accuracy while giving certain security guarantees. To achieve the goal, PIF makes an innovative improvement to the traditional iForest algorithm, enabling it in distributed environments. With a series of carefully-designed algorithms, each participating party collaborates to build an ensemble of isolation trees efficiently without disclosing sensitive information of data. Besides, to deal with complicated real-world scenarios where different kinds of partitioned data are involved, we propose a comprehensive schema that can work for both horizontally and vertically partitioned data models. We have implemented our method and evaluated it with extensive experiments. It is demonstrated that PIF can achieve comparable AUC to existing iForest on average and maintains a linear time complexity without privacy violation.

源语言	英语
主期刊名	INFOCOM 2021 - IEEE Conference on Computer Communications
出版商	Institute of Electrical and Electronics Engineers Inc.
ISBN（电子版）	9780738112817
DOI	https://doi.org/10.1109/INFOCOM42981.2021.9488710
出版状态	已出版 - 10 5月 2021
已对外发布	是
活动	40th IEEE Conference on Computer Communications, INFOCOM 2021 - Vancouver, 加拿大期限: 10 5月 2021 → 13 5月 2021

出版系列

姓名	Proceedings - IEEE INFOCOM
卷	2021-May
ISSN（印刷版）	0743-166X

会议

会议	40th IEEE Conference on Computer Communications, INFOCOM 2021
国家/地区	加拿大
市	Vancouver
时期	10/05/21 → 13/05/21

访问文件

10.1109/INFOCOM42981.2021.9488710

其它文件与链接

链接到 Scopus 的出版物

引用此

@inproceedings{8e5fd0eb1f424885979df4e6cec038bb,

title = "Privacy-preserving outlier detection with high efficiency over distributed datasets",

abstract = "The ability to detect outliers is crucial in data mining, with widespread usage in many fields, including fraud detection, malicious behavior monitoring, health diagnosis, etc. With the tremendous volume of data becoming more distributed than ever, global outlier detection for a group of distributed datasets is particularly desirable. In this work, we propose PIF (Privacy-preserving Isolation Forest), which can detect outliers for multiple distributed data providers with high efficiency and accuracy while giving certain security guarantees. To achieve the goal, PIF makes an innovative improvement to the traditional iForest algorithm, enabling it in distributed environments. With a series of carefully-designed algorithms, each participating party collaborates to build an ensemble of isolation trees efficiently without disclosing sensitive information of data. Besides, to deal with complicated real-world scenarios where different kinds of partitioned data are involved, we propose a comprehensive schema that can work for both horizontally and vertically partitioned data models. We have implemented our method and evaluated it with extensive experiments. It is demonstrated that PIF can achieve comparable AUC to existing iForest on average and maintains a linear time complexity without privacy violation.",

keywords = "Distributed data, Outlier detection, PIF, Privacy-preserving",

author = "Guanghong Lu and Chunhui Duan and Guohao Zhou and Xuan Ding and Yunhao Liu",

note = "Publisher Copyright: {\textcopyright} 2021 IEEE.; 40th IEEE Conference on Computer Communications, INFOCOM 2021 ; Conference date: 10-05-2021 Through 13-05-2021",

year = "2021",

month = may,

day = "10",

doi = "10.1109/INFOCOM42981.2021.9488710",

language = "English",

series = "Proceedings - IEEE INFOCOM",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "INFOCOM 2021 - IEEE Conference on Computer Communications",

address = "United States",

}

Lu, G, Duan, C, Zhou, G, Ding, X & Liu, Y 2021, Privacy-preserving outlier detection with high efficiency over distributed datasets. 在 INFOCOM 2021 - IEEE Conference on Computer Communications., 9488710, Proceedings - IEEE INFOCOM, 卷 2021-May, Institute of Electrical and Electronics Engineers Inc., 40th IEEE Conference on Computer Communications, INFOCOM 2021, Vancouver, 加拿大, 10/05/21. https://doi.org/10.1109/INFOCOM42981.2021.9488710

Privacy-preserving outlier detection with high efficiency over distributed datasets. / Lu, Guanghong; Duan, Chunhui; Zhou, Guohao 等.
INFOCOM 2021 - IEEE Conference on Computer Communications. Institute of Electrical and Electronics Engineers Inc., 2021. 9488710 (Proceedings - IEEE INFOCOM; 卷 2021-May).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Privacy-preserving outlier detection with high efficiency over distributed datasets

AU - Lu, Guanghong

AU - Duan, Chunhui

AU - Zhou, Guohao

AU - Ding, Xuan

AU - Liu, Yunhao

PY - 2021/5/10

Y1 - 2021/5/10

N2 - The ability to detect outliers is crucial in data mining, with widespread usage in many fields, including fraud detection, malicious behavior monitoring, health diagnosis, etc. With the tremendous volume of data becoming more distributed than ever, global outlier detection for a group of distributed datasets is particularly desirable. In this work, we propose PIF (Privacy-preserving Isolation Forest), which can detect outliers for multiple distributed data providers with high efficiency and accuracy while giving certain security guarantees. To achieve the goal, PIF makes an innovative improvement to the traditional iForest algorithm, enabling it in distributed environments. With a series of carefully-designed algorithms, each participating party collaborates to build an ensemble of isolation trees efficiently without disclosing sensitive information of data. Besides, to deal with complicated real-world scenarios where different kinds of partitioned data are involved, we propose a comprehensive schema that can work for both horizontally and vertically partitioned data models. We have implemented our method and evaluated it with extensive experiments. It is demonstrated that PIF can achieve comparable AUC to existing iForest on average and maintains a linear time complexity without privacy violation.

AB - The ability to detect outliers is crucial in data mining, with widespread usage in many fields, including fraud detection, malicious behavior monitoring, health diagnosis, etc. With the tremendous volume of data becoming more distributed than ever, global outlier detection for a group of distributed datasets is particularly desirable. In this work, we propose PIF (Privacy-preserving Isolation Forest), which can detect outliers for multiple distributed data providers with high efficiency and accuracy while giving certain security guarantees. To achieve the goal, PIF makes an innovative improvement to the traditional iForest algorithm, enabling it in distributed environments. With a series of carefully-designed algorithms, each participating party collaborates to build an ensemble of isolation trees efficiently without disclosing sensitive information of data. Besides, to deal with complicated real-world scenarios where different kinds of partitioned data are involved, we propose a comprehensive schema that can work for both horizontally and vertically partitioned data models. We have implemented our method and evaluated it with extensive experiments. It is demonstrated that PIF can achieve comparable AUC to existing iForest on average and maintains a linear time complexity without privacy violation.

KW - Distributed data

KW - Outlier detection

KW - PIF

KW - Privacy-preserving

UR - http://www.scopus.com/inward/record.url?scp=85111919183&partnerID=8YFLogxK

U2 - 10.1109/INFOCOM42981.2021.9488710

DO - 10.1109/INFOCOM42981.2021.9488710

M3 - Conference contribution

AN - SCOPUS:85111919183

T3 - Proceedings - IEEE INFOCOM

BT - INFOCOM 2021 - IEEE Conference on Computer Communications

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 40th IEEE Conference on Computer Communications, INFOCOM 2021

Y2 - 10 May 2021 through 13 May 2021

ER -

Privacy-preserving outlier detection with high efficiency over distributed datasets

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此