TY - JOUR
T1 - Bayesian Maximal Information Coefficient (BMIC) to reason novel trends in large datasets
AU - Shuliang, Wang
AU - Surapunt, Tisinee
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2022/7
Y1 - 2022/7
N2 - The Bayesian network (BN) is a probability inference model to describe the explicit relationship between cause and effect, which may be examined in the complex system of rice price with data uncertainty. However, discovering the optimized structure from a super-exponential number of graphs in the search space is an NP-hard problem. In this paper, Bayesian Maximal Information Coefficient (BMIC) is proposed to uncover the causal correlations from a large data set in a random system by integrating probabilistic graphical model (PGM) and maximal information coefficient (MIC) with Bayesian linear regression (BLR). First, MIC is to capture the strong dependence between predictor variables and a target variable to reduce the number of variables for the BN structural learning of PGM. Second, BLR is responsible for assigning orientation in a graph resulting from a posterior probability distribution. It conforms to what BN needs to acquire a conditional probability distribution when given the parents for each node by the Bayes’ Theorem. Third, the Bayesian information criterion (BIC) is treated as an indicator to determine the well-explained model with its data to ensure correctness. The score shows that the proposed BMIC obtains the highest score compared to the two traditional learning algorithms. Finally, the proposed BMIC is applied to discover the causal correlations from the large data set on Thai rice price by identifying the causal changes in the paddy price of Jasmine rice. The results of the experiments show that the proposed BMIC returns directional relationships with clues to identify the cause(s) and effect(s) of paddy price with a better heuristic search.
AB - The Bayesian network (BN) is a probability inference model to describe the explicit relationship between cause and effect, which may be examined in the complex system of rice price with data uncertainty. However, discovering the optimized structure from a super-exponential number of graphs in the search space is an NP-hard problem. In this paper, Bayesian Maximal Information Coefficient (BMIC) is proposed to uncover the causal correlations from a large data set in a random system by integrating probabilistic graphical model (PGM) and maximal information coefficient (MIC) with Bayesian linear regression (BLR). First, MIC is to capture the strong dependence between predictor variables and a target variable to reduce the number of variables for the BN structural learning of PGM. Second, BLR is responsible for assigning orientation in a graph resulting from a posterior probability distribution. It conforms to what BN needs to acquire a conditional probability distribution when given the parents for each node by the Bayes’ Theorem. Third, the Bayesian information criterion (BIC) is treated as an indicator to determine the well-explained model with its data to ensure correctness. The score shows that the proposed BMIC obtains the highest score compared to the two traditional learning algorithms. Finally, the proposed BMIC is applied to discover the causal correlations from the large data set on Thai rice price by identifying the causal changes in the paddy price of Jasmine rice. The results of the experiments show that the proposed BMIC returns directional relationships with clues to identify the cause(s) and effect(s) of paddy price with a better heuristic search.
KW - Bayesian Maximal Information Coefficient (BMIC)
KW - Bayesian linear regression
KW - Causal correlations
KW - Maximal information coefficient
KW - Temporal sequential correlation
KW - Thai rice price
UR - http://www.scopus.com/inward/record.url?scp=85122663090&partnerID=8YFLogxK
U2 - 10.1007/s10489-021-03090-y
DO - 10.1007/s10489-021-03090-y
M3 - Article
AN - SCOPUS:85122663090
SN - 0924-669X
VL - 52
SP - 10202
EP - 10219
JO - Applied Intelligence
JF - Applied Intelligence
IS - 9
ER -