TY - JOUR
T1 - iDRBP-EL
T2 - Identifying DNA- and RNA- Binding Proteins Based on Hierarchical Ensemble Learning
AU - Wang, Ning
AU - Zhang, Jun
AU - Liu, Bin
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2023/1/1
Y1 - 2023/1/1
N2 - Identification of DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs) from the primary sequences is essential for further exploring protein-nucleic acid interactions. Previous studies have shown that machine-learning-based methods can efficiently identify DBPs or RBPs. However, the information used in these methods is slightly unitary, and most of them only can predict DBPs or RBPs. In this study, we proposed a computational predictor iDRBP-EL to identify DNA- and RNA- binding proteins, and introduced hierarchical ensemble learning to integrate three level information. The method can integrate the information of different features, machine learning algorithms and data into one multi-label model. The ablation experiment showed that the fusion of different information can improve the prediction performance and overcome the cross-prediction problem. Experimental results on the independent datasets showed that iDRBP-EL outperformed all the other competing methods. Moreover, we established a user-friendly webserver iDRBP-EL (http://bliulab.net/iDRBP-EL), which can predict both DBPs and RBPs only based on protein sequences.
AB - Identification of DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs) from the primary sequences is essential for further exploring protein-nucleic acid interactions. Previous studies have shown that machine-learning-based methods can efficiently identify DBPs or RBPs. However, the information used in these methods is slightly unitary, and most of them only can predict DBPs or RBPs. In this study, we proposed a computational predictor iDRBP-EL to identify DNA- and RNA- binding proteins, and introduced hierarchical ensemble learning to integrate three level information. The method can integrate the information of different features, machine learning algorithms and data into one multi-label model. The ablation experiment showed that the fusion of different information can improve the prediction performance and overcome the cross-prediction problem. Experimental results on the independent datasets showed that iDRBP-EL outperformed all the other competing methods. Moreover, we established a user-friendly webserver iDRBP-EL (http://bliulab.net/iDRBP-EL), which can predict both DBPs and RBPs only based on protein sequences.
KW - DNA- and RNA- binding protein prediction
KW - hierarchical ensemble learning
KW - multi-label learning
KW - stacking technology
UR - http://www.scopus.com/inward/record.url?scp=85122104762&partnerID=8YFLogxK
U2 - 10.1109/TCBB.2021.3136905
DO - 10.1109/TCBB.2021.3136905
M3 - Article
C2 - 34932484
AN - SCOPUS:85122104762
SN - 1545-5963
VL - 20
SP - 432
EP - 441
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
IS - 1
ER -