An ensemble learning framework for potential miRNA-disease association prediction with positive-unlabeled data

Yao Wu, Donghua Zhu, Xuefeng Wang*, Shuo Zhang

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    8 Citations (Scopus)

    Abstract

    To explore the pathogenic mechanisms of MicroRNA (miRNA) on diverse diseases, many researchers have concentrated on discovering the potential associations between miRNA and disease using machine learning methods. However, the prediction accuracy of supervised machine learning methods is limited by lacking of experimentally-validated uncorrelated miRNA-disease pairs. Without these negative samples, training a highly accurate model is much more difficult. Different from traditional miRNA-disease prediction models using randomly selected unknown samples as negative training samples, we propose an ensemble learning framework to solve this positive-unlabeled (PU) learning problem. The framework incorporates two steps, i.e., a novel semi-supervised Kmeans (SS-Kmeans) to extract reliable negative samples from unknown miRNA-disease pairs and subagging method to generate diverse training sample sets to make full use of those reliable negative samples for ensemble learning. Combined with effective random vector functional link (RVFL) network as prediction model, the proposed framework showed superior prediction accuracy comparing with other popular approaches. A case study on lung and gastric neoplasms further confirms the framework's efficacy at identifying miRNA disease associations.

    Original languageEnglish
    Article number107566
    JournalComputational Biology and Chemistry
    Volume95
    DOIs
    Publication statusPublished - Dec 2021

    Keywords

    • Ensemble learning
    • MiRNA-disease association
    • Random vector functional link (RVFL)
    • Semi-supervised Kmeans (SS-Kmeans)
    • Subagging

    Fingerprint

    Dive into the research topics of 'An ensemble learning framework for potential miRNA-disease association prediction with positive-unlabeled data'. Together they form a unique fingerprint.

    Cite this