TY - JOUR
T1 - Research on the multi-source causal feature selection method based on multiple causal relevance
AU - Qiu, Ping
AU - Niu, Zhendong
AU - Zhang, Chunxia
N1 - Publisher Copyright:
© 2023 Elsevier B.V.
PY - 2023/4/8
Y1 - 2023/4/8
N2 - Multi-source causal feature selection captures causal relevance of the features with the class attribute in different datasets and are very important to improve the stability and reliability of prediction models. The Multi-source Causal Feature Selection (MCFS) is the most advanced method that can simultaneously select features on multiple datasets. However, it only considers the causal relevance between a single feature and class attributes, which ignores the causal relevance among multiple features. In addition, MCFS uses exhaustive method to obtain the optimal causal feature set on multiple datasets, which is time-consuming. Focusing on the two problems, firstly we propose the Multiple Causal Relevance, which can remove redundant information hidden in pairwise causal relevance. Secondly, we analyze the Markov blanket of multi-source class attributes, where the upper and lower bounds of optimal causal feature set are proven to reduce the search range of features and improve the efficiency of the algorithm. Finally, we propose a multi-source causal Feature Selection method based on Multiple Causal Relevance (MCRFS) and use synthetic datasets and binary and multiclassification real datasets with 2 feature selection methods, extensive experiments show that the accuracy and efficiency of MCRFS method on SVM and KNN classifiers are better than two comparison methods.
AB - Multi-source causal feature selection captures causal relevance of the features with the class attribute in different datasets and are very important to improve the stability and reliability of prediction models. The Multi-source Causal Feature Selection (MCFS) is the most advanced method that can simultaneously select features on multiple datasets. However, it only considers the causal relevance between a single feature and class attributes, which ignores the causal relevance among multiple features. In addition, MCFS uses exhaustive method to obtain the optimal causal feature set on multiple datasets, which is time-consuming. Focusing on the two problems, firstly we propose the Multiple Causal Relevance, which can remove redundant information hidden in pairwise causal relevance. Secondly, we analyze the Markov blanket of multi-source class attributes, where the upper and lower bounds of optimal causal feature set are proven to reduce the search range of features and improve the efficiency of the algorithm. Finally, we propose a multi-source causal Feature Selection method based on Multiple Causal Relevance (MCRFS) and use synthetic datasets and binary and multiclassification real datasets with 2 feature selection methods, extensive experiments show that the accuracy and efficiency of MCRFS method on SVM and KNN classifiers are better than two comparison methods.
KW - Classification
KW - Invariant sets
KW - Markov Blanket
KW - Multi-source causal feature selection
KW - Multiple causal relevance
KW - Recommendation systems
UR - http://www.scopus.com/inward/record.url?scp=85150415396&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2023.110334
DO - 10.1016/j.knosys.2023.110334
M3 - Article
AN - SCOPUS:85150415396
SN - 0950-7051
VL - 265
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 110334
ER -