Searching Density-Increasing Path to Local Density Peaks for Unsupervised Anomaly Detection

Jiachen Zhao; Fang Deng; Jiaqi Zhu; Jie Chen

doi:10.1109/TBDATA.2023.3265509

Searching Density-Increasing Path to Local Density Peaks for Unsupervised Anomaly Detection

Jiachen Zhao, Fang Deng^*, Jiaqi Zhu, Jie Chen

^*此作品的通讯作者

自动化学院

科研成果: 期刊稿件 › 文章 › 同行评审

2 引用（Scopus）

摘要

Unsupervised anomaly detection (AD) is a challenging problem in the data mining community. Clustering-based AD methods aim to group normal data points into clusters and then regard a point belonging to none of the clusters as an anomaly. However, they may suffer from the problems of unknown cluster numbers and arbitrary cluster shapes. This paper presents a novel clustering-based AD method named Density-increasing Path (DIP) to tackle these challenges. DIP searches a path for each data point. The path starts at the data point itself, passes through several points with monotonically increasing densities, and ends at a density peak. Further, DIP defines the climbing difficulty of each path by combining the distance and density increment of each step along the path, which can be regarded as the anomaly score of the path starting point. DIP can adaptively decide the number of peaks to address the challenge of unknown cluster numbers. Since DIP requires the path to pass several points rather than directly reaching the peak, it handles arbitrary cluster shapes. We also propose the ensemble DIP to improve prediction accuracy. The experimental results on four synthetic datasets and eleven real-world benchmarks demonstrate that DIP outperforms existing methods.

源语言	英语
页（从-至）	1198-1209
页数	12
期刊	IEEE Transactions on Big Data
卷	9
期	4
DOI	https://doi.org/10.1109/TBDATA.2023.3265509
出版状态	已出版 - 1 8月 2023

访问文件

10.1109/TBDATA.2023.3265509

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{74cd1f95b74f4112ab297fa56028de52,

title = "Searching Density-Increasing Path to Local Density Peaks for Unsupervised Anomaly Detection",

abstract = "Unsupervised anomaly detection (AD) is a challenging problem in the data mining community. Clustering-based AD methods aim to group normal data points into clusters and then regard a point belonging to none of the clusters as an anomaly. However, they may suffer from the problems of unknown cluster numbers and arbitrary cluster shapes. This paper presents a novel clustering-based AD method named Density-increasing Path (DIP) to tackle these challenges. DIP searches a path for each data point. The path starts at the data point itself, passes through several points with monotonically increasing densities, and ends at a density peak. Further, DIP defines the climbing difficulty of each path by combining the distance and density increment of each step along the path, which can be regarded as the anomaly score of the path starting point. DIP can adaptively decide the number of peaks to address the challenge of unknown cluster numbers. Since DIP requires the path to pass several points rather than directly reaching the peak, it handles arbitrary cluster shapes. We also propose the ensemble DIP to improve prediction accuracy. The experimental results on four synthetic datasets and eleven real-world benchmarks demonstrate that DIP outperforms existing methods.",

keywords = "Anomaly detection, clustering analysis, imbalance learning, outlier detection",

author = "Jiachen Zhao and Fang Deng and Jiaqi Zhu and Jie Chen",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.",

year = "2023",

month = aug,

day = "1",

doi = "10.1109/TBDATA.2023.3265509",

language = "English",

volume = "9",

pages = "1198--1209",

journal = "IEEE Transactions on Big Data",

issn = "2332-7790",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "4",

}

TY - JOUR

T1 - Searching Density-Increasing Path to Local Density Peaks for Unsupervised Anomaly Detection

AU - Zhao, Jiachen

AU - Deng, Fang

AU - Zhu, Jiaqi

AU - Chen, Jie

PY - 2023/8/1

Y1 - 2023/8/1

N2 - Unsupervised anomaly detection (AD) is a challenging problem in the data mining community. Clustering-based AD methods aim to group normal data points into clusters and then regard a point belonging to none of the clusters as an anomaly. However, they may suffer from the problems of unknown cluster numbers and arbitrary cluster shapes. This paper presents a novel clustering-based AD method named Density-increasing Path (DIP) to tackle these challenges. DIP searches a path for each data point. The path starts at the data point itself, passes through several points with monotonically increasing densities, and ends at a density peak. Further, DIP defines the climbing difficulty of each path by combining the distance and density increment of each step along the path, which can be regarded as the anomaly score of the path starting point. DIP can adaptively decide the number of peaks to address the challenge of unknown cluster numbers. Since DIP requires the path to pass several points rather than directly reaching the peak, it handles arbitrary cluster shapes. We also propose the ensemble DIP to improve prediction accuracy. The experimental results on four synthetic datasets and eleven real-world benchmarks demonstrate that DIP outperforms existing methods.

AB - Unsupervised anomaly detection (AD) is a challenging problem in the data mining community. Clustering-based AD methods aim to group normal data points into clusters and then regard a point belonging to none of the clusters as an anomaly. However, they may suffer from the problems of unknown cluster numbers and arbitrary cluster shapes. This paper presents a novel clustering-based AD method named Density-increasing Path (DIP) to tackle these challenges. DIP searches a path for each data point. The path starts at the data point itself, passes through several points with monotonically increasing densities, and ends at a density peak. Further, DIP defines the climbing difficulty of each path by combining the distance and density increment of each step along the path, which can be regarded as the anomaly score of the path starting point. DIP can adaptively decide the number of peaks to address the challenge of unknown cluster numbers. Since DIP requires the path to pass several points rather than directly reaching the peak, it handles arbitrary cluster shapes. We also propose the ensemble DIP to improve prediction accuracy. The experimental results on four synthetic datasets and eleven real-world benchmarks demonstrate that DIP outperforms existing methods.

KW - Anomaly detection

KW - clustering analysis

KW - imbalance learning

KW - outlier detection

UR - http://www.scopus.com/inward/record.url?scp=85153515337&partnerID=8YFLogxK

U2 - 10.1109/TBDATA.2023.3265509

DO - 10.1109/TBDATA.2023.3265509

M3 - Article

AN - SCOPUS:85153515337

SN - 2332-7790

VL - 9

SP - 1198

EP - 1209

JO - IEEE Transactions on Big Data

JF - IEEE Transactions on Big Data

IS - 4

ER -

Searching Density-Increasing Path to Local Density Peaks for Unsupervised Anomaly Detection

摘要

访问文件

其它文件与链接

指纹

引用此