TY - JOUR
T1 - Outlier Summarization via Human Interpretable Rules
AU - Deng, Yuhao
AU - Wang, Yu
AU - Cao, Lei
AU - Qiao, Lianpeng
AU - Wang, Yuping
AU - Xu, Jingzhe
AU - Yan, Yizhou
AU - Madden, Samuel
N1 - Publisher Copyright:
© 2024, VLDB Endowment. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Outlier detection is crucial for preventing financial fraud, network intrusions, and device failures. Users often expect systems to automatically summarize and interpret outlier detection results to reduce human effort and convert outliers into actionable insights. However, existing methods fail to effectively assist users in identifying the root causes of outliers, as they only pinpoint data attributes without considering outliers in the same subspace may have different causes. To fill this gap, we propose STAIR, which learns concise and human-understandable rules to summarize and explain outlier detection results with finer granularity. These rules consider both attributes and associated values. STAIR employs an interpretationaware optimization objective to generate a small number of rules with minimal complexity for strong interpretability. The learning algorithm of STAIR produces a rule set by iteratively splitting the large rules and is optimal in maximizing this objective in each iteration. Moreover, to effectively handle high dimensional, highly complex data sets that are hard to summarize with simple rules, we propose a localized STAIR approach, called L-STAIR. Taking data locality into consideration, it simultaneously partitions data and learns a set of localized rules for each partition. Our experimental study on many outlier benchmark datasets shows that STAIR significantly reduces the complexity of the rules required to summarize the outlier detection results, thus more amenable for humans to understand and evaluate.
AB - Outlier detection is crucial for preventing financial fraud, network intrusions, and device failures. Users often expect systems to automatically summarize and interpret outlier detection results to reduce human effort and convert outliers into actionable insights. However, existing methods fail to effectively assist users in identifying the root causes of outliers, as they only pinpoint data attributes without considering outliers in the same subspace may have different causes. To fill this gap, we propose STAIR, which learns concise and human-understandable rules to summarize and explain outlier detection results with finer granularity. These rules consider both attributes and associated values. STAIR employs an interpretationaware optimization objective to generate a small number of rules with minimal complexity for strong interpretability. The learning algorithm of STAIR produces a rule set by iteratively splitting the large rules and is optimal in maximizing this objective in each iteration. Moreover, to effectively handle high dimensional, highly complex data sets that are hard to summarize with simple rules, we propose a localized STAIR approach, called L-STAIR. Taking data locality into consideration, it simultaneously partitions data and learns a set of localized rules for each partition. Our experimental study on many outlier benchmark datasets shows that STAIR significantly reduces the complexity of the rules required to summarize the outlier detection results, thus more amenable for humans to understand and evaluate.
UR - http://www.scopus.com/inward/record.url?scp=85195699690&partnerID=8YFLogxK
U2 - 10.14778/3654621.3654627
DO - 10.14778/3654621.3654627
M3 - Conference article
AN - SCOPUS:85195699690
SN - 2150-8097
VL - 17
SP - 1591
EP - 1604
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 7
T2 - 50th International Conference on Very Large Data Bases, VLDB 2024
Y2 - 24 August 2024 through 29 August 2024
ER -