TCIC_FS: Total correlation information coefficient-based feature selection method for high-dimensional data

Ping Qiu, Zhendong Niu*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

16 Citations (Scopus)

Abstract

High-dimensional data have been a challenging problem in classification. Feature selection works as a filter to remove irrelevant or redundant features and has made comparative progress. However, this problem is still challenging because current methods consider only the correlation between two variables while leaving the correlation among multiple variables largely unsolved, and multivariate interactions can contain joint information that cannot be obtained pairwise. Furthermore, many feature selection methods require hyperparameter settings, which require prior knowledge and lack interpretability. Focusing on the above problems, this paper proposes the total correlation information coefficient-based feature selection (TCIC_FS) method to select the optimal solution, which can avoid setting hyperparameters and fully consider the correlations among multiple variables. First, based on a Gaussian copula, the total correlation information coefficient (TCIC) is proposed to evaluate the correlations among multiple variables. Compared with the existing multivariate correlation methods, TCIC can measure a wider range of multivariate correlations, including linear, nonlinear, functional, and nonfunctional correlations. Second, a novel evaluation mechanism based on TCIC is proposed to measure the relevance between features and classes and the redundancy between a single feature and a selected feature subset. Finally, the TCIC_FS method is constructed based on the TCIC and the evaluation mechanism. Compared with the baseline values, the TCIC_FS method has the lowest time complexity and the smallest optimal feature subset obtained by single selection. Therefore, TCIC_FS is more suitable for processing high-dimensional data.

Original languageEnglish
Article number107418
JournalKnowledge-Based Systems
Volume231
DOIs
Publication statusPublished - 14 Nov 2021

Keywords

  • Evaluation mechanism
  • Feature selection
  • Gaussian copula
  • High dimensional data
  • Multivariate correlation
  • Recommendation system

Fingerprint

Dive into the research topics of 'TCIC_FS: Total correlation information coefficient-based feature selection method for high-dimensional data'. Together they form a unique fingerprint.

Cite this