TY - JOUR
T1 - MiRNA-dis
T2 - MicroRNA precursor identification based on distance structure status pairs
AU - Liu, Bin
AU - Fang, Longyun
AU - Chen, Junjie
AU - Liu, Fule
AU - Wang, Xiaolong
N1 - Publisher Copyright:
© The Royal Society of Chemistry 2015.
PY - 2015/4/1
Y1 - 2015/4/1
N2 - MicroRNA precursor identification is an important task in bioinformatics. Support Vector Machine (SVM) is one of the most effective machine learning methods used in this field. The performance of SVM-based methods depends on the vector representations of RNAs. However, the discriminative power of the existing feature vectors is limited, and many methods lack an interpretable model for analysis of characteristic sequence features. Prior studies have demonstrated that sequence or structure order effects were relevant for discrimination, but little work has explored how to use this kind of information for human pre-microRNA identification. In this study, in order to incorporate the structure-order information into the prediction, a method called "miRNA-dis" was proposed, in which the feature vector was constructed by the occurrence frequency of the "distance structure status pair" or just the "distance-pair". Rigorous cross-validations on a much larger and more stringent newly constructed benchmark dataset showed that the miRNA-dis outperformed some state-of-the-art predictors in this area. Remarkably, miRNA-dis trained with human data can correctly predict 87.02% of the 4022 pre-miRNAs from 11 different species ranging from animals, plants and viruses. miRNA-dis would be a useful high throughput tool for large-scale analysis of microRNA precursors. In addition, the learnt model can be easily analyzed in terms of discriminative features, and some interesting patterns were discovered, which could reflect the characteristics of microRNAs. A user-friendly web-server of miRNA-dis was constructed, which is freely accessible to the public at the web-site on http://bioinformatics.hitsz.edu.cn/miRNA-dis/.
AB - MicroRNA precursor identification is an important task in bioinformatics. Support Vector Machine (SVM) is one of the most effective machine learning methods used in this field. The performance of SVM-based methods depends on the vector representations of RNAs. However, the discriminative power of the existing feature vectors is limited, and many methods lack an interpretable model for analysis of characteristic sequence features. Prior studies have demonstrated that sequence or structure order effects were relevant for discrimination, but little work has explored how to use this kind of information for human pre-microRNA identification. In this study, in order to incorporate the structure-order information into the prediction, a method called "miRNA-dis" was proposed, in which the feature vector was constructed by the occurrence frequency of the "distance structure status pair" or just the "distance-pair". Rigorous cross-validations on a much larger and more stringent newly constructed benchmark dataset showed that the miRNA-dis outperformed some state-of-the-art predictors in this area. Remarkably, miRNA-dis trained with human data can correctly predict 87.02% of the 4022 pre-miRNAs from 11 different species ranging from animals, plants and viruses. miRNA-dis would be a useful high throughput tool for large-scale analysis of microRNA precursors. In addition, the learnt model can be easily analyzed in terms of discriminative features, and some interesting patterns were discovered, which could reflect the characteristics of microRNAs. A user-friendly web-server of miRNA-dis was constructed, which is freely accessible to the public at the web-site on http://bioinformatics.hitsz.edu.cn/miRNA-dis/.
UR - http://www.scopus.com/inward/record.url?scp=84924975850&partnerID=8YFLogxK
U2 - 10.1039/c5mb00050e
DO - 10.1039/c5mb00050e
M3 - Article
C2 - 25715848
AN - SCOPUS:84924975850
SN - 1742-206X
VL - 11
SP - 1194
EP - 1204
JO - Molecular BioSystems
JF - Molecular BioSystems
IS - 4
ER -