Outlier Summarization via Human Interpretable Rules

Yuhao Deng; Yu Wang; Lei Cao; Lianpeng Qiao; Yuping Wang; Jingzhe Xu; Yizhou Yan; Samuel Madden

doi:10.14778/3654621.3654627

Outlier Summarization via Human Interpretable Rules

Yuhao Deng, Yu Wang, Lei Cao, Lianpeng Qiao^*, Yuping Wang, Jingzhe Xu, Yizhou Yan, Samuel Madden

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 会议文章 › 同行评审

摘要

Outlier detection is crucial for preventing financial fraud, network intrusions, and device failures. Users often expect systems to automatically summarize and interpret outlier detection results to reduce human effort and convert outliers into actionable insights. However, existing methods fail to effectively assist users in identifying the root causes of outliers, as they only pinpoint data attributes without considering outliers in the same subspace may have different causes. To fill this gap, we propose STAIR, which learns concise and human-understandable rules to summarize and explain outlier detection results with finer granularity. These rules consider both attributes and associated values. STAIR employs an interpretationaware optimization objective to generate a small number of rules with minimal complexity for strong interpretability. The learning algorithm of STAIR produces a rule set by iteratively splitting the large rules and is optimal in maximizing this objective in each iteration. Moreover, to effectively handle high dimensional, highly complex data sets that are hard to summarize with simple rules, we propose a localized STAIR approach, called L-STAIR. Taking data locality into consideration, it simultaneously partitions data and learns a set of localized rules for each partition. Our experimental study on many outlier benchmark datasets shows that STAIR significantly reduces the complexity of the rules required to summarize the outlier detection results, thus more amenable for humans to understand and evaluate.

源语言	英语
页（从-至）	1591-1604
页数	14
期刊	Proceedings of the VLDB Endowment
卷	17
期	7
DOI	https://doi.org/10.14778/3654621.3654627
出版状态	已出版 - 2024
活动	50th International Conference on Very Large Data Bases, VLDB 2024 - Guangzhou, 中国期限: 24 8月 2024 → 29 8月 2024

访问文件

10.14778/3654621.3654627

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{fcbd9db6aa824834bbb2ea0d05802adc,

title = "Outlier Summarization via Human Interpretable Rules",

abstract = "Outlier detection is crucial for preventing financial fraud, network intrusions, and device failures. Users often expect systems to automatically summarize and interpret outlier detection results to reduce human effort and convert outliers into actionable insights. However, existing methods fail to effectively assist users in identifying the root causes of outliers, as they only pinpoint data attributes without considering outliers in the same subspace may have different causes. To fill this gap, we propose STAIR, which learns concise and human-understandable rules to summarize and explain outlier detection results with finer granularity. These rules consider both attributes and associated values. STAIR employs an interpretationaware optimization objective to generate a small number of rules with minimal complexity for strong interpretability. The learning algorithm of STAIR produces a rule set by iteratively splitting the large rules and is optimal in maximizing this objective in each iteration. Moreover, to effectively handle high dimensional, highly complex data sets that are hard to summarize with simple rules, we propose a localized STAIR approach, called L-STAIR. Taking data locality into consideration, it simultaneously partitions data and learns a set of localized rules for each partition. Our experimental study on many outlier benchmark datasets shows that STAIR significantly reduces the complexity of the rules required to summarize the outlier detection results, thus more amenable for humans to understand and evaluate.",

author = "Yuhao Deng and Yu Wang and Lei Cao and Lianpeng Qiao and Yuping Wang and Jingzhe Xu and Yizhou Yan and Samuel Madden",

note = "Publisher Copyright: {\textcopyright} 2024, VLDB Endowment. All rights reserved.; 50th International Conference on Very Large Data Bases, VLDB 2024 ; Conference date: 24-08-2024 Through 29-08-2024",

year = "2024",

doi = "10.14778/3654621.3654627",

language = "English",

volume = "17",

pages = "1591--1604",

journal = "Proceedings of the VLDB Endowment",

issn = "2150-8097",

publisher = "Very Large Data Base Endowment Inc.",

number = "7",

}

TY - JOUR

T1 - Outlier Summarization via Human Interpretable Rules

AU - Deng, Yuhao

AU - Wang, Yu

AU - Cao, Lei

AU - Qiao, Lianpeng

AU - Wang, Yuping

AU - Xu, Jingzhe

AU - Yan, Yizhou

AU - Madden, Samuel

PY - 2024

Y1 - 2024

N2 - Outlier detection is crucial for preventing financial fraud, network intrusions, and device failures. Users often expect systems to automatically summarize and interpret outlier detection results to reduce human effort and convert outliers into actionable insights. However, existing methods fail to effectively assist users in identifying the root causes of outliers, as they only pinpoint data attributes without considering outliers in the same subspace may have different causes. To fill this gap, we propose STAIR, which learns concise and human-understandable rules to summarize and explain outlier detection results with finer granularity. These rules consider both attributes and associated values. STAIR employs an interpretationaware optimization objective to generate a small number of rules with minimal complexity for strong interpretability. The learning algorithm of STAIR produces a rule set by iteratively splitting the large rules and is optimal in maximizing this objective in each iteration. Moreover, to effectively handle high dimensional, highly complex data sets that are hard to summarize with simple rules, we propose a localized STAIR approach, called L-STAIR. Taking data locality into consideration, it simultaneously partitions data and learns a set of localized rules for each partition. Our experimental study on many outlier benchmark datasets shows that STAIR significantly reduces the complexity of the rules required to summarize the outlier detection results, thus more amenable for humans to understand and evaluate.

AB - Outlier detection is crucial for preventing financial fraud, network intrusions, and device failures. Users often expect systems to automatically summarize and interpret outlier detection results to reduce human effort and convert outliers into actionable insights. However, existing methods fail to effectively assist users in identifying the root causes of outliers, as they only pinpoint data attributes without considering outliers in the same subspace may have different causes. To fill this gap, we propose STAIR, which learns concise and human-understandable rules to summarize and explain outlier detection results with finer granularity. These rules consider both attributes and associated values. STAIR employs an interpretationaware optimization objective to generate a small number of rules with minimal complexity for strong interpretability. The learning algorithm of STAIR produces a rule set by iteratively splitting the large rules and is optimal in maximizing this objective in each iteration. Moreover, to effectively handle high dimensional, highly complex data sets that are hard to summarize with simple rules, we propose a localized STAIR approach, called L-STAIR. Taking data locality into consideration, it simultaneously partitions data and learns a set of localized rules for each partition. Our experimental study on many outlier benchmark datasets shows that STAIR significantly reduces the complexity of the rules required to summarize the outlier detection results, thus more amenable for humans to understand and evaluate.

UR - http://www.scopus.com/inward/record.url?scp=85195699690&partnerID=8YFLogxK

U2 - 10.14778/3654621.3654627

DO - 10.14778/3654621.3654627

M3 - Conference article

AN - SCOPUS:85195699690

SN - 2150-8097

VL - 17

SP - 1591

EP - 1604

JO - Proceedings of the VLDB Endowment

JF - Proceedings of the VLDB Endowment

IS - 7

T2 - 50th International Conference on Very Large Data Bases, VLDB 2024

Y2 - 24 August 2024 through 29 August 2024

ER -

Outlier Summarization via Human Interpretable Rules

摘要

访问文件

其它文件与链接

指纹

引用此