An efficient algorithm for distributed density-based outlier detection on big data

Mei Bai*, Xite Wang, Junchang Xin, Guoren Wang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

84 Citations (Scopus)

Abstract

The outlier detection is a popular issue in the area of data management and multimedia analysis, and it can be used in many applications such as detection of noisy images, credit card fraud detection, network intrusion detection. The density-based outlier is an important definition of outlier, whose target is to compute a Local Outlier Factor (LOF) for each tuple in a data set to represent the degree of this tuple to be an outlier. It shows several significant advantages comparing with other existing definitions. This paper focuses on the problem of distributed density-based outlier detection for large-scale data. First, we propose a Gird-Based Partition algorithm (GBP) as a data preparation method. GBP first splits the data set into several grids, and then allocates these grids to the datanodes in a distributed environment. Second, we propose a Distributed LOF Computing method (DLC) for detecting density-based outliers in parallel, which only needs a small amount of network communications. At last, the efficiency and effectiveness of the proposed approaches are verified through a series of simulation experiments.

Original languageEnglish
Pages (from-to)19-28
Number of pages10
JournalNeurocomputing
Volume181
DOIs
Publication statusPublished - 12 Mar 2016
Externally publishedYes

Keywords

  • Density-based outlier
  • Distributed algorithm
  • Local outlier factor

Fingerprint

Dive into the research topics of 'An efficient algorithm for distributed density-based outlier detection on big data'. Together they form a unique fingerprint.

Cite this