TY - JOUR
T1 - A fusion of centrality and correlation for feature selection
AU - Qiu, Ping
AU - Zhang, Chunxia
AU - Gao, Dongping
AU - Niu, Zhendong
N1 - Publisher Copyright:
© 2023 Elsevier Ltd
PY - 2024/5/1
Y1 - 2024/5/1
N2 - The rapid development of computer and database technologies has led to the high growth of large-scale datasets. This produces an important issue for data mining applications called the curse of dimensionality, where the number of features is much higher than the number of patterns. One of the dimensionality reduction approaches is feature selection, which can increase the accuracy of these applications and reduce their computational complexity. This paper proposes a novel feature selection method to reduce the dimensionality and computational complexity in high-dimensional data processing. First, based on the centrality and Fisher score, a probabilistic strategy metric is proposed to measure the influence of features. Second, a new discriminant function is proposed to determine whether one feature should be selected. It can automatically calculate weight parameters to balance the relevance of a feature to class labels and the redundancy of the selected feature subset. Finally, a new method is proposed by combining the maximum information coefficient (MIC), total information (TI) and centrality technique, named MTC_FS. The experimental results show that the average accuracy of the MTC_FS improves by 1.8% compared with the best baseline, and the comprehensive performance of the MTC_FS is superior to all baselines on 12 public datasets. MTC_FS has a shorter runtime on all datasets than all baselines. In addition, the performance of the MTC_FS method is the most stable on the NBayes classifier.
AB - The rapid development of computer and database technologies has led to the high growth of large-scale datasets. This produces an important issue for data mining applications called the curse of dimensionality, where the number of features is much higher than the number of patterns. One of the dimensionality reduction approaches is feature selection, which can increase the accuracy of these applications and reduce their computational complexity. This paper proposes a novel feature selection method to reduce the dimensionality and computational complexity in high-dimensional data processing. First, based on the centrality and Fisher score, a probabilistic strategy metric is proposed to measure the influence of features. Second, a new discriminant function is proposed to determine whether one feature should be selected. It can automatically calculate weight parameters to balance the relevance of a feature to class labels and the redundancy of the selected feature subset. Finally, a new method is proposed by combining the maximum information coefficient (MIC), total information (TI) and centrality technique, named MTC_FS. The experimental results show that the average accuracy of the MTC_FS improves by 1.8% compared with the best baseline, and the comprehensive performance of the MTC_FS is superior to all baselines on 12 public datasets. MTC_FS has a shorter runtime on all datasets than all baselines. In addition, the performance of the MTC_FS method is the most stable on the NBayes classifier.
KW - Centrality
KW - E-learning
KW - Feature selection
KW - High-dimensional data
KW - Multivariable correlation
UR - http://www.scopus.com/inward/record.url?scp=85177736306&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2023.122548
DO - 10.1016/j.eswa.2023.122548
M3 - Article
AN - SCOPUS:85177736306
SN - 0957-4174
VL - 241
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 122548
ER -