Outlier Summarization via Human Interpretable Rules

Yuhao Deng, Yu Wang, Lei Cao, Lianpeng Qiao*, Yuping Wang, Jingzhe Xu, Yizhou Yan, Samuel Madden

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

Abstract

Outlier detection is crucial for preventing financial fraud, network intrusions, and device failures. Users often expect systems to automatically summarize and interpret outlier detection results to reduce human effort and convert outliers into actionable insights. However, existing methods fail to effectively assist users in identifying the root causes of outliers, as they only pinpoint data attributes without considering outliers in the same subspace may have different causes. To fill this gap, we propose STAIR, which learns concise and human-understandable rules to summarize and explain outlier detection results with finer granularity. These rules consider both attributes and associated values. STAIR employs an interpretationaware optimization objective to generate a small number of rules with minimal complexity for strong interpretability. The learning algorithm of STAIR produces a rule set by iteratively splitting the large rules and is optimal in maximizing this objective in each iteration. Moreover, to effectively handle high dimensional, highly complex data sets that are hard to summarize with simple rules, we propose a localized STAIR approach, called L-STAIR. Taking data locality into consideration, it simultaneously partitions data and learns a set of localized rules for each partition. Our experimental study on many outlier benchmark datasets shows that STAIR significantly reduces the complexity of the rules required to summarize the outlier detection results, thus more amenable for humans to understand and evaluate.

Original languageEnglish
Pages (from-to)1591-1604
Number of pages14
JournalProceedings of the VLDB Endowment
Volume17
Issue number7
DOIs
Publication statusPublished - 2024
Event50th International Conference on Very Large Data Bases, VLDB 2024 - Guangzhou, China
Duration: 24 Aug 202429 Aug 2024

Fingerprint

Dive into the research topics of 'Outlier Summarization via Human Interpretable Rules'. Together they form a unique fingerprint.

Cite this