Abstract
The identification of microRNA precursors (pre-miRNAs) helps in understanding regulator in biological processes. The performance of computational predictors depends on their training sets, in which the negative sets play an important role. In this regard, we investigated the influence of benchmark datasets on the predictive performance of computational predictors in the field of miRNA identification, and found that the negative samples have significant impact on the predictive results of various methods. We constructed a new benchmark set with different data distributions of negative samples. Trained with this high quality benchmark dataset, a new computational predictor called iMiRNA-SSF was proposed, which employed various features extracted from RNA sequences. Experimental results showed that iMiRNA-SSF outperforms three state-of-the-art computational methods. For practical applications, a web-server of iMiRNA-SSF was established at the website http://bioinformatics.hitsz.edu.cn/iMiRNA-SSF/.
Original language | English |
---|---|
Article number | 19062 |
Journal | Scientific Reports |
Volume | 6 |
DOIs | |
Publication status | Published - 12 Jan 2016 |
Externally published | Yes |