TY - JOUR
T1 - iDRBP-ECHF
T2 - Identifying DNA- and RNA-binding proteins based on extensible cubic hybrid framework
AU - Feng, Jiawei
AU - Wang, Ning
AU - Zhang, Jun
AU - Liu, Bin
N1 - Publisher Copyright:
© 2022 Elsevier Ltd
PY - 2022/10
Y1 - 2022/10
N2 - Proteins interact with nucleic acids to regulate the life activities of organisms. Therefore, how to accurately and efficiently identify nucleic acid-binding proteins (NABPs) is particularly significant. Some sequence-based computational methods have been proposed to identify DNA- and RNA-binding proteins in previous studies. However, the benchmark datasets used by these methods ignore the proportion of NABPs in the real world, and some integration methods only integrate traditional machine learning algorithms, resulting in limited prediction performance. In this study, we proposed a sequence-based method called iDRBP-ECHF to predict the DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs). We constructed a benchmark dataset by considering the proportion of positive and negative samples in the real world, and used down-sampling to generate three relatively balanced datasets to train the iDRBP-ECHF. In addition, we incorporated the deep learning algorithms into the framework to obtain a more compact high-level feature representation of the input data. The results on two independent datasets show that it achieves the most advanced performance and is superior to the other existing sequence-based DBP and RBP prediction methods. In addition, we set up a webserver iDRBP-ECHF, which can be accessed at http://bliulab.net/iDRBP-ECHF.
AB - Proteins interact with nucleic acids to regulate the life activities of organisms. Therefore, how to accurately and efficiently identify nucleic acid-binding proteins (NABPs) is particularly significant. Some sequence-based computational methods have been proposed to identify DNA- and RNA-binding proteins in previous studies. However, the benchmark datasets used by these methods ignore the proportion of NABPs in the real world, and some integration methods only integrate traditional machine learning algorithms, resulting in limited prediction performance. In this study, we proposed a sequence-based method called iDRBP-ECHF to predict the DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs). We constructed a benchmark dataset by considering the proportion of positive and negative samples in the real world, and used down-sampling to generate three relatively balanced datasets to train the iDRBP-ECHF. In addition, we incorporated the deep learning algorithms into the framework to obtain a more compact high-level feature representation of the input data. The results on two independent datasets show that it achieves the most advanced performance and is superior to the other existing sequence-based DBP and RBP prediction methods. In addition, we set up a webserver iDRBP-ECHF, which can be accessed at http://bliulab.net/iDRBP-ECHF.
KW - DNA- and RNA-binding proteins identification
KW - Extensible cubic hybrid framework
KW - Machine learning
KW - Multi-label learning
UR - http://www.scopus.com/inward/record.url?scp=85136578841&partnerID=8YFLogxK
U2 - 10.1016/j.compbiomed.2022.105940
DO - 10.1016/j.compbiomed.2022.105940
M3 - Article
C2 - 36044786
AN - SCOPUS:85136578841
SN - 0010-4825
VL - 149
JO - Computers in Biology and Medicine
JF - Computers in Biology and Medicine
M1 - 105940
ER -