Outlier detection over sliding windows for probabilistic data streams

Bin Wang*, Xiao Chun Yang, Guo Ren Wang, Ge Yu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

16 Citations (Scopus)

Abstract

Outlier detection is a very useful technique in many applications, where data is generally uncertain and could be described using probability. While having been studied intensively in the field of deterministic data, outlier detection is still novel in the emerging uncertain data field. In this paper, we study the semantic of outlier detection on probabilistic data stream and present a new definition of distance-based outlier over sliding window. We then show the problem of detecting an outlier over a set of possible world instances is equivalent to the problem of finding the k-th element in its neighborhood. Based on this observation, a dynamic programming algorithm (DPA) is proposed to reduce the detection cost from O(2 |R(e,d)|) to O(|κ·R(e, d)|), where R(e, d) is the d-neighborhood of e. Furthermore, we propose a pruning-based approach (PBA) to effectively and efficiently filter non-outliers on single window, and dynamically detect recent m elements incrementally. Finally, detailed analysis and thorough experimental results demonstrate the efficiency and scalability of our approach.

Original languageEnglish
Pages (from-to)389-400
Number of pages12
JournalJournal of Computer Science and Technology
Volume25
Issue number3
DOIs
Publication statusPublished - May 2010
Externally publishedYes

Keywords

  • Outlier detection
  • Probabilistic data stream
  • Sliding window
  • Uncertain data

Fingerprint

Dive into the research topics of 'Outlier detection over sliding windows for probabilistic data streams'. Together they form a unique fingerprint.

Cite this