TY - JOUR
T1 - Dual-phase feature selection using adaptive neighborhood rough sets and hybrid sine-cosine optimization for classification
AU - Zheng, Chengfeng
AU - Kasihmuddin, Mohd Shareduwan Mohd
AU - Yan, Zhizhong
AU - Mansor, Mohd Asyraf
AU - Gao, Yuan
AU - Chen, Ju
N1 - Publisher Copyright:
© 2025
PY - 2025/11/23
Y1 - 2025/11/23
N2 - High-dimensional, multi-class, and imbalanced datasets present significant challenges in classification tasks across various industries, including healthcare, finance, and image processing. Existing feature selection methods, particularly those based on neighborhood rough sets, often struggle with handling both feature redundancy and noisy samples, making it difficult to capture the complex distribution of features and samples across different classes. To address this, we propose a dual-phase feature selection method that performs joint optimization in both horizontal (feature-level) and vertical (sample-level) dimensions. In the first phase, adaptive neighborhood rough set theory is used for horizontal feature selection. By adjusting the neighborhood radius (δ) and inclusion degree (λ) through cross-validation, the method selects relevant feature subsets tailored to the granularity of each dataset, thereby improving generalization. In the second phase, a hybrid sine cosine algorithm is employed for vertical processing to optimize sample selection. This algorithm iteratively removes noisy or misleading samples based on fitness evaluation, enhancing the model's robustness. Furthermore, the framework integrates an enhanced fuzzy k-nearest neighbor classifier that leverages feature subset weights for each class to better address class imbalance during classification. Extensive experiments on 21 public datasets, using three types of classifiers, show that the proposed method outperforms seven benchmark feature selection algorithms in terms of classification accuracy, weighted precision, weighted recall, and weighted F1-score. Statistical tests, including the Wilcoxon signed-rank test, confirm significant improvements. This dual-phase horizontal and vertical optimization approach offers a robust and effective solution for real-world classification tasks involving complex data distributions.
AB - High-dimensional, multi-class, and imbalanced datasets present significant challenges in classification tasks across various industries, including healthcare, finance, and image processing. Existing feature selection methods, particularly those based on neighborhood rough sets, often struggle with handling both feature redundancy and noisy samples, making it difficult to capture the complex distribution of features and samples across different classes. To address this, we propose a dual-phase feature selection method that performs joint optimization in both horizontal (feature-level) and vertical (sample-level) dimensions. In the first phase, adaptive neighborhood rough set theory is used for horizontal feature selection. By adjusting the neighborhood radius (δ) and inclusion degree (λ) through cross-validation, the method selects relevant feature subsets tailored to the granularity of each dataset, thereby improving generalization. In the second phase, a hybrid sine cosine algorithm is employed for vertical processing to optimize sample selection. This algorithm iteratively removes noisy or misleading samples based on fitness evaluation, enhancing the model's robustness. Furthermore, the framework integrates an enhanced fuzzy k-nearest neighbor classifier that leverages feature subset weights for each class to better address class imbalance during classification. Extensive experiments on 21 public datasets, using three types of classifiers, show that the proposed method outperforms seven benchmark feature selection algorithms in terms of classification accuracy, weighted precision, weighted recall, and weighted F1-score. Statistical tests, including the Wilcoxon signed-rank test, confirm significant improvements. This dual-phase horizontal and vertical optimization approach offers a robust and effective solution for real-world classification tasks involving complex data distributions.
KW - Adaptive rough set feature selection
KW - Artificial intelligence applications
KW - Fuzzy k-nearest neighbor
KW - High-dimensional data
KW - Multi-class classification
KW - Sine cosine algorithm
UR - https://www.scopus.com/pages/publications/105012830926
U2 - 10.1016/j.engappai.2025.111899
DO - 10.1016/j.engappai.2025.111899
M3 - Article
AN - SCOPUS:105012830926
SN - 0952-1976
VL - 160
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
M1 - 111899
ER -