Outlier detection over sliding windows for probabilistic data streams

Bin Wang; Xiao Chun Yang; Guo Ren Wang; Ge Yu

doi:10.1007/s11390-010-9332-2

Outlier detection over sliding windows for probabilistic data streams

Bin Wang^*, Xiao Chun Yang, Guo Ren Wang, Ge Yu

^*Corresponding author for this work

Northeastern University China

Research output: Contribution to journal › Article › peer-review

16 Citations (Scopus)

Abstract

Outlier detection is a very useful technique in many applications, where data is generally uncertain and could be described using probability. While having been studied intensively in the field of deterministic data, outlier detection is still novel in the emerging uncertain data field. In this paper, we study the semantic of outlier detection on probabilistic data stream and present a new definition of distance-based outlier over sliding window. We then show the problem of detecting an outlier over a set of possible world instances is equivalent to the problem of finding the k-th element in its neighborhood. Based on this observation, a dynamic programming algorithm (DPA) is proposed to reduce the detection cost from O(2 ^|R(e,d)|) to O(|κ·R(e, d)|), where R(e, d) is the d-neighborhood of e. Furthermore, we propose a pruning-based approach (PBA) to effectively and efficiently filter non-outliers on single window, and dynamically detect recent m elements incrementally. Finally, detailed analysis and thorough experimental results demonstrate the efficiency and scalability of our approach.

Original language	English
Pages (from-to)	389-400
Number of pages	12
Journal	Journal of Computer Science and Technology
Volume	25
Issue number	3
DOIs	https://doi.org/10.1007/s11390-010-9332-2
Publication status	Published - May 2010
Externally published	Yes

Keywords

Outlier detection
Probabilistic data stream
Sliding window
Uncertain data

Access to Document

10.1007/s11390-010-9332-2

Cite this

Wang, B., Yang, X. C., Wang, G. R., & Yu, G. (2010). Outlier detection over sliding windows for probabilistic data streams. Journal of Computer Science and Technology, 25(3), 389-400. https://doi.org/10.1007/s11390-010-9332-2

@article{a7d9ea45d1104904bbb9f6eb8000c902,

title = "Outlier detection over sliding windows for probabilistic data streams",

abstract = "Outlier detection is a very useful technique in many applications, where data is generally uncertain and could be described using probability. While having been studied intensively in the field of deterministic data, outlier detection is still novel in the emerging uncertain data field. In this paper, we study the semantic of outlier detection on probabilistic data stream and present a new definition of distance-based outlier over sliding window. We then show the problem of detecting an outlier over a set of possible world instances is equivalent to the problem of finding the k-th element in its neighborhood. Based on this observation, a dynamic programming algorithm (DPA) is proposed to reduce the detection cost from O(2 |R(e,d)|) to O(|κ·R(e, d)|), where R(e, d) is the d-neighborhood of e. Furthermore, we propose a pruning-based approach (PBA) to effectively and efficiently filter non-outliers on single window, and dynamically detect recent m elements incrementally. Finally, detailed analysis and thorough experimental results demonstrate the efficiency and scalability of our approach.",

keywords = "Outlier detection, Probabilistic data stream, Sliding window, Uncertain data",

author = "Bin Wang and Yang, {Xiao Chun} and Wang, {Guo Ren} and Ge Yu",

year = "2010",

month = may,

doi = "10.1007/s11390-010-9332-2",

language = "English",

volume = "25",

pages = "389--400",

journal = "Journal of Computer Science and Technology",

issn = "1000-9000",

publisher = "Springer New York",

number = "3",

}

TY - JOUR

T1 - Outlier detection over sliding windows for probabilistic data streams

AU - Wang, Bin

AU - Yang, Xiao Chun

AU - Wang, Guo Ren

AU - Yu, Ge

PY - 2010/5

Y1 - 2010/5

N2 - Outlier detection is a very useful technique in many applications, where data is generally uncertain and could be described using probability. While having been studied intensively in the field of deterministic data, outlier detection is still novel in the emerging uncertain data field. In this paper, we study the semantic of outlier detection on probabilistic data stream and present a new definition of distance-based outlier over sliding window. We then show the problem of detecting an outlier over a set of possible world instances is equivalent to the problem of finding the k-th element in its neighborhood. Based on this observation, a dynamic programming algorithm (DPA) is proposed to reduce the detection cost from O(2 |R(e,d)|) to O(|κ·R(e, d)|), where R(e, d) is the d-neighborhood of e. Furthermore, we propose a pruning-based approach (PBA) to effectively and efficiently filter non-outliers on single window, and dynamically detect recent m elements incrementally. Finally, detailed analysis and thorough experimental results demonstrate the efficiency and scalability of our approach.

AB - Outlier detection is a very useful technique in many applications, where data is generally uncertain and could be described using probability. While having been studied intensively in the field of deterministic data, outlier detection is still novel in the emerging uncertain data field. In this paper, we study the semantic of outlier detection on probabilistic data stream and present a new definition of distance-based outlier over sliding window. We then show the problem of detecting an outlier over a set of possible world instances is equivalent to the problem of finding the k-th element in its neighborhood. Based on this observation, a dynamic programming algorithm (DPA) is proposed to reduce the detection cost from O(2 |R(e,d)|) to O(|κ·R(e, d)|), where R(e, d) is the d-neighborhood of e. Furthermore, we propose a pruning-based approach (PBA) to effectively and efficiently filter non-outliers on single window, and dynamically detect recent m elements incrementally. Finally, detailed analysis and thorough experimental results demonstrate the efficiency and scalability of our approach.

KW - Outlier detection

KW - Probabilistic data stream

KW - Sliding window

KW - Uncertain data

UR - http://www.scopus.com/inward/record.url?scp=77955874019&partnerID=8YFLogxK

U2 - 10.1007/s11390-010-9332-2

DO - 10.1007/s11390-010-9332-2

M3 - Article

AN - SCOPUS:77955874019

SN - 1000-9000

VL - 25

SP - 389

EP - 400

JO - Journal of Computer Science and Technology

JF - Journal of Computer Science and Technology

IS - 3

ER -

Outlier detection over sliding windows for probabilistic data streams

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this