TY - JOUR
T1 - iDRBP_MMC
T2 - Identifying DNA-Binding Proteins and RNA-Binding Proteins Based on Multi-Label Learning Model and Motif-Based Convolutional Neural Network
AU - Zhang, Jun
AU - Chen, Qingcai
AU - Liu, Bin
N1 - Publisher Copyright:
© 2020 Elsevier Ltd
PY - 2020/11/6
Y1 - 2020/11/6
N2 - DNA-binding protein (DBP) and RNA-binding protein (RBP) are playing crucial roles in gene expression. Accurate identification of them is of great significance, and accurately computational predictors are highly required. In previous studies, DBP recognition and RBP recognition were treated as two separate tasks. Because the functional and structural similarities between DBPs and RBPs are high, the DBP predictors tend to predict RBPs as DBPs, while the RBP predictors tend to predict the DBPs as the RBPs, leading to high cross-prediction rate and low prediction precision. Here we introduced a multi-label learning model based on the motif-based convolutional neural network, and a sequence-based computational method called iDRBP_MMC was proposed to solve the cross-prediction problem so as to improve the predictive performance of DBPs and RBPs. The results on four test datasets showed that it outperformed other state-of-the-art DBP predictors and RBP predictors. When applied to analyze the tomato genome, the results reveal the ability of iDRBP_MMC for large-scale data analysis. Moreover, iDRBP_MMC can identify the proteins binding to both DNA and RNA, which is beyond the scope of existing DBP predictors or RBP predictors. The web-server of iDRBP_MMC is freely available at http://bliulab.net/iDRBP_MMC.
AB - DNA-binding protein (DBP) and RNA-binding protein (RBP) are playing crucial roles in gene expression. Accurate identification of them is of great significance, and accurately computational predictors are highly required. In previous studies, DBP recognition and RBP recognition were treated as two separate tasks. Because the functional and structural similarities between DBPs and RBPs are high, the DBP predictors tend to predict RBPs as DBPs, while the RBP predictors tend to predict the DBPs as the RBPs, leading to high cross-prediction rate and low prediction precision. Here we introduced a multi-label learning model based on the motif-based convolutional neural network, and a sequence-based computational method called iDRBP_MMC was proposed to solve the cross-prediction problem so as to improve the predictive performance of DBPs and RBPs. The results on four test datasets showed that it outperformed other state-of-the-art DBP predictors and RBP predictors. When applied to analyze the tomato genome, the results reveal the ability of iDRBP_MMC for large-scale data analysis. Moreover, iDRBP_MMC can identify the proteins binding to both DNA and RNA, which is beyond the scope of existing DBP predictors or RBP predictors. The web-server of iDRBP_MMC is freely available at http://bliulab.net/iDRBP_MMC.
KW - cross-prediction problem
KW - motif-based convolutional neural network
KW - multi-label learning
KW - nucleic acid binding protein prediction
UR - http://www.scopus.com/inward/record.url?scp=85091888267&partnerID=8YFLogxK
U2 - 10.1016/j.jmb.2020.09.008
DO - 10.1016/j.jmb.2020.09.008
M3 - Article
C2 - 32920048
AN - SCOPUS:85091888267
SN - 0022-2836
VL - 432
SP - 5860
EP - 5875
JO - Journal of Molecular Biology
JF - Journal of Molecular Biology
IS - 22
ER -