TY - JOUR
T1 - An ensemble learning framework for potential miRNA-disease association prediction with positive-unlabeled data
AU - Wu, Yao
AU - Zhu, Donghua
AU - Wang, Xuefeng
AU - Zhang, Shuo
N1 - Publisher Copyright:
© 2021 Elsevier Ltd
PY - 2021/12
Y1 - 2021/12
N2 - To explore the pathogenic mechanisms of MicroRNA (miRNA) on diverse diseases, many researchers have concentrated on discovering the potential associations between miRNA and disease using machine learning methods. However, the prediction accuracy of supervised machine learning methods is limited by lacking of experimentally-validated uncorrelated miRNA-disease pairs. Without these negative samples, training a highly accurate model is much more difficult. Different from traditional miRNA-disease prediction models using randomly selected unknown samples as negative training samples, we propose an ensemble learning framework to solve this positive-unlabeled (PU) learning problem. The framework incorporates two steps, i.e., a novel semi-supervised Kmeans (SS-Kmeans) to extract reliable negative samples from unknown miRNA-disease pairs and subagging method to generate diverse training sample sets to make full use of those reliable negative samples for ensemble learning. Combined with effective random vector functional link (RVFL) network as prediction model, the proposed framework showed superior prediction accuracy comparing with other popular approaches. A case study on lung and gastric neoplasms further confirms the framework's efficacy at identifying miRNA disease associations.
AB - To explore the pathogenic mechanisms of MicroRNA (miRNA) on diverse diseases, many researchers have concentrated on discovering the potential associations between miRNA and disease using machine learning methods. However, the prediction accuracy of supervised machine learning methods is limited by lacking of experimentally-validated uncorrelated miRNA-disease pairs. Without these negative samples, training a highly accurate model is much more difficult. Different from traditional miRNA-disease prediction models using randomly selected unknown samples as negative training samples, we propose an ensemble learning framework to solve this positive-unlabeled (PU) learning problem. The framework incorporates two steps, i.e., a novel semi-supervised Kmeans (SS-Kmeans) to extract reliable negative samples from unknown miRNA-disease pairs and subagging method to generate diverse training sample sets to make full use of those reliable negative samples for ensemble learning. Combined with effective random vector functional link (RVFL) network as prediction model, the proposed framework showed superior prediction accuracy comparing with other popular approaches. A case study on lung and gastric neoplasms further confirms the framework's efficacy at identifying miRNA disease associations.
KW - Ensemble learning
KW - MiRNA-disease association
KW - Random vector functional link (RVFL)
KW - Semi-supervised Kmeans (SS-Kmeans)
KW - Subagging
UR - http://www.scopus.com/inward/record.url?scp=85114782506&partnerID=8YFLogxK
U2 - 10.1016/j.compbiolchem.2021.107566
DO - 10.1016/j.compbiolchem.2021.107566
M3 - Article
C2 - 34534906
AN - SCOPUS:85114782506
SN - 1476-9271
VL - 95
JO - Computational Biology and Chemistry
JF - Computational Biology and Chemistry
M1 - 107566
ER -