TY - JOUR
T1 - An efficient algorithm for distributed density-based outlier detection on big data
AU - Bai, Mei
AU - Wang, Xite
AU - Xin, Junchang
AU - Wang, Guoren
N1 - Publisher Copyright:
© 2015 Elsevier B.V.
PY - 2016/3/12
Y1 - 2016/3/12
N2 - The outlier detection is a popular issue in the area of data management and multimedia analysis, and it can be used in many applications such as detection of noisy images, credit card fraud detection, network intrusion detection. The density-based outlier is an important definition of outlier, whose target is to compute a Local Outlier Factor (LOF) for each tuple in a data set to represent the degree of this tuple to be an outlier. It shows several significant advantages comparing with other existing definitions. This paper focuses on the problem of distributed density-based outlier detection for large-scale data. First, we propose a Gird-Based Partition algorithm (GBP) as a data preparation method. GBP first splits the data set into several grids, and then allocates these grids to the datanodes in a distributed environment. Second, we propose a Distributed LOF Computing method (DLC) for detecting density-based outliers in parallel, which only needs a small amount of network communications. At last, the efficiency and effectiveness of the proposed approaches are verified through a series of simulation experiments.
AB - The outlier detection is a popular issue in the area of data management and multimedia analysis, and it can be used in many applications such as detection of noisy images, credit card fraud detection, network intrusion detection. The density-based outlier is an important definition of outlier, whose target is to compute a Local Outlier Factor (LOF) for each tuple in a data set to represent the degree of this tuple to be an outlier. It shows several significant advantages comparing with other existing definitions. This paper focuses on the problem of distributed density-based outlier detection for large-scale data. First, we propose a Gird-Based Partition algorithm (GBP) as a data preparation method. GBP first splits the data set into several grids, and then allocates these grids to the datanodes in a distributed environment. Second, we propose a Distributed LOF Computing method (DLC) for detecting density-based outliers in parallel, which only needs a small amount of network communications. At last, the efficiency and effectiveness of the proposed approaches are verified through a series of simulation experiments.
KW - Density-based outlier
KW - Distributed algorithm
KW - Local outlier factor
UR - http://www.scopus.com/inward/record.url?scp=84959036264&partnerID=8YFLogxK
U2 - 10.1016/j.neucom.2015.05.135
DO - 10.1016/j.neucom.2015.05.135
M3 - Article
AN - SCOPUS:84959036264
SN - 0925-2312
VL - 181
SP - 19
EP - 28
JO - Neurocomputing
JF - Neurocomputing
ER -