跳到主要导航 跳到搜索 跳到主要内容

An efficient algorithm for distributed density-based outlier detection on big data

  • Mei Bai*
  • , Xite Wang
  • , Junchang Xin
  • , Guoren Wang
  • *此作品的通讯作者
  • Northeastern University China

科研成果: 期刊稿件文章同行评审

摘要

The outlier detection is a popular issue in the area of data management and multimedia analysis, and it can be used in many applications such as detection of noisy images, credit card fraud detection, network intrusion detection. The density-based outlier is an important definition of outlier, whose target is to compute a Local Outlier Factor (LOF) for each tuple in a data set to represent the degree of this tuple to be an outlier. It shows several significant advantages comparing with other existing definitions. This paper focuses on the problem of distributed density-based outlier detection for large-scale data. First, we propose a Gird-Based Partition algorithm (GBP) as a data preparation method. GBP first splits the data set into several grids, and then allocates these grids to the datanodes in a distributed environment. Second, we propose a Distributed LOF Computing method (DLC) for detecting density-based outliers in parallel, which only needs a small amount of network communications. At last, the efficiency and effectiveness of the proposed approaches are verified through a series of simulation experiments.

源语言英语
页(从-至)19-28
页数10
期刊Neurocomputing
181
DOI
出版状态已出版 - 12 3月 2016
已对外发布

指纹

探究 'An efficient algorithm for distributed density-based outlier detection on big data' 的科研主题。它们共同构成独一无二的指纹。

引用此