TY - JOUR
T1 - Improving ELM-based microarray data classification by diversified sequence features selection
AU - Zhao, Yuhai
AU - Wang, Guoren
AU - Yin, Ying
AU - Li, Yuan
AU - Wang, Zhanghui
N1 - Publisher Copyright:
© 2014, Springer-Verlag London.
PY - 2016/1/1
Y1 - 2016/1/1
N2 - In this paper, we focus on the problem of extreme learning machine (ELM)-based microarray data classification. Different from the traditional classification problem, the goal in this case is not just to predict the class labels for the unseen samples, but to make clear what lead to the results, i.e., the genes involving with a specific disease. This is especially significant for biologists, since they need to decipher the causes of disease. As a black-box method, ELM could not measure up to the task by itself. In this work, we propose a diversified sequence feature selection-based framework to address the problem. In this framework, (1) a sequence model, EWave, is introduced to ensure the structural ordering information among genes exploitable; (2) a concept of irreducible sequence is proposed, where the genes work as an orderly whole to keep high confidence with a specific class and any reduction in the genes decreases the confidence much. An efficient sequence mining algorithm together with some effective pruning rules is developed to mine such sequences; and (3) we study how to extract a set of diversified sequence features as the representative of all mined results. The problem is proved to be NP-hard. A greedy algorithm is presented to approximate the optimal solution. Experimental results show that the proposed approach significantly improves the efficiency and the effectiveness of ELM w.r.t some widely used feature selection techniques.
AB - In this paper, we focus on the problem of extreme learning machine (ELM)-based microarray data classification. Different from the traditional classification problem, the goal in this case is not just to predict the class labels for the unseen samples, but to make clear what lead to the results, i.e., the genes involving with a specific disease. This is especially significant for biologists, since they need to decipher the causes of disease. As a black-box method, ELM could not measure up to the task by itself. In this work, we propose a diversified sequence feature selection-based framework to address the problem. In this framework, (1) a sequence model, EWave, is introduced to ensure the structural ordering information among genes exploitable; (2) a concept of irreducible sequence is proposed, where the genes work as an orderly whole to keep high confidence with a specific class and any reduction in the genes decreases the confidence much. An efficient sequence mining algorithm together with some effective pruning rules is developed to mine such sequences; and (3) we study how to extract a set of diversified sequence features as the representative of all mined results. The problem is proved to be NP-hard. A greedy algorithm is presented to approximate the optimal solution. Experimental results show that the proposed approach significantly improves the efficiency and the effectiveness of ELM w.r.t some widely used feature selection techniques.
KW - Classification
KW - Extreme learning machine
KW - Sequence mining
UR - https://www.scopus.com/pages/publications/84953345896
U2 - 10.1007/s00521-014-1571-7
DO - 10.1007/s00521-014-1571-7
M3 - Article
AN - SCOPUS:84953345896
SN - 0941-0643
VL - 27
SP - 155
EP - 166
JO - Neural Computing and Applications
JF - Neural Computing and Applications
IS - 1
ER -