Coresets for fast causal discovery with the additive noise model

Boxiang Zhao; Shuliang Wang; Lianhua Chi; Hanning Yuan; Ye Yuan; Qi Li; Jing Geng; Shao Liang Zhang

doi:10.1016/j.patcog.2023.110149

Coresets for fast causal discovery with the additive noise model

Boxiang Zhao, Shuliang Wang^*, Lianhua Chi, Hanning Yuan, Ye Yuan, Qi Li, Jing Geng, Shao Liang Zhang

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

Causal discovery reveals the true causal relationships behind data and discovering causal relationships from observed data is a particularly challenging problem, especially in large-scale datasets. The functional causal model is an effective method for causal discovery, but its time efficiency cannot be guaranteed. How to efficiently apply it to massive data still needs to be solved. In this paper, we propose a coreset construction for the additive noise model to accelerate causal discovery. According to the asymmetry characteristic of causality, samples were assigned different weights to construct the coreset. With the constructed coreset, we propose a Fast causal discovery algorithm based on the Additive Noise Model (FANM) to improve the time efficiency of the functional causal model while ensuring the result performance of causal discovery. Experiments on synthetic data and real-world data show that our proposed algorithm is much more time-efficient than the methods based on the functional causal model, and the runtime of FANM remains consistent as sample size increases while maintaining or exceeding the accuracy of the original nonlinear additive noise model.

Original language	English
Article number	110149
Journal	Pattern Recognition
Volume	148
DOIs	https://doi.org/10.1016/j.patcog.2023.110149
Publication status	Published - Apr 2024

Keywords

Additive noise model
Big data
Causal discovery
Coresets
Functional causal model

Access to Document

10.1016/j.patcog.2023.110149

Cite this

@article{916f8072fbd444d59094f8a03438dfe2,

title = "Coresets for fast causal discovery with the additive noise model",

abstract = "Causal discovery reveals the true causal relationships behind data and discovering causal relationships from observed data is a particularly challenging problem, especially in large-scale datasets. The functional causal model is an effective method for causal discovery, but its time efficiency cannot be guaranteed. How to efficiently apply it to massive data still needs to be solved. In this paper, we propose a coreset construction for the additive noise model to accelerate causal discovery. According to the asymmetry characteristic of causality, samples were assigned different weights to construct the coreset. With the constructed coreset, we propose a Fast causal discovery algorithm based on the Additive Noise Model (FANM) to improve the time efficiency of the functional causal model while ensuring the result performance of causal discovery. Experiments on synthetic data and real-world data show that our proposed algorithm is much more time-efficient than the methods based on the functional causal model, and the runtime of FANM remains consistent as sample size increases while maintaining or exceeding the accuracy of the original nonlinear additive noise model.",

keywords = "Additive noise model, Big data, Causal discovery, Coresets, Functional causal model",

author = "Boxiang Zhao and Shuliang Wang and Lianhua Chi and Hanning Yuan and Ye Yuan and Qi Li and Jing Geng and Zhang, {Shao Liang}",

note = "Publisher Copyright: {\textcopyright} 2023 Elsevier Ltd",

year = "2024",

month = apr,

doi = "10.1016/j.patcog.2023.110149",

language = "English",

volume = "148",

journal = "Pattern Recognition",

issn = "0031-3203",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - Coresets for fast causal discovery with the additive noise model

AU - Zhao, Boxiang

AU - Wang, Shuliang

AU - Chi, Lianhua

AU - Yuan, Hanning

AU - Yuan, Ye

AU - Li, Qi

AU - Geng, Jing

AU - Zhang, Shao Liang

PY - 2024/4

Y1 - 2024/4

N2 - Causal discovery reveals the true causal relationships behind data and discovering causal relationships from observed data is a particularly challenging problem, especially in large-scale datasets. The functional causal model is an effective method for causal discovery, but its time efficiency cannot be guaranteed. How to efficiently apply it to massive data still needs to be solved. In this paper, we propose a coreset construction for the additive noise model to accelerate causal discovery. According to the asymmetry characteristic of causality, samples were assigned different weights to construct the coreset. With the constructed coreset, we propose a Fast causal discovery algorithm based on the Additive Noise Model (FANM) to improve the time efficiency of the functional causal model while ensuring the result performance of causal discovery. Experiments on synthetic data and real-world data show that our proposed algorithm is much more time-efficient than the methods based on the functional causal model, and the runtime of FANM remains consistent as sample size increases while maintaining or exceeding the accuracy of the original nonlinear additive noise model.

AB - Causal discovery reveals the true causal relationships behind data and discovering causal relationships from observed data is a particularly challenging problem, especially in large-scale datasets. The functional causal model is an effective method for causal discovery, but its time efficiency cannot be guaranteed. How to efficiently apply it to massive data still needs to be solved. In this paper, we propose a coreset construction for the additive noise model to accelerate causal discovery. According to the asymmetry characteristic of causality, samples were assigned different weights to construct the coreset. With the constructed coreset, we propose a Fast causal discovery algorithm based on the Additive Noise Model (FANM) to improve the time efficiency of the functional causal model while ensuring the result performance of causal discovery. Experiments on synthetic data and real-world data show that our proposed algorithm is much more time-efficient than the methods based on the functional causal model, and the runtime of FANM remains consistent as sample size increases while maintaining or exceeding the accuracy of the original nonlinear additive noise model.

KW - Additive noise model

KW - Big data

KW - Causal discovery

KW - Coresets

KW - Functional causal model

UR - http://www.scopus.com/inward/record.url?scp=85178165104&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2023.110149

DO - 10.1016/j.patcog.2023.110149

M3 - Article

AN - SCOPUS:85178165104

SN - 0031-3203

VL - 148

JO - Pattern Recognition

JF - Pattern Recognition

M1 - 110149

ER -

Coresets for fast causal discovery with the additive noise model

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this