DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation

Bin Liu; Shanyi Wang; Xiaolong Wang

doi:10.1038/srep15479

DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation

Bin Liu^*, Shanyi Wang, Xiaolong Wang

^*Corresponding author for this work

Harbin Institute of Technology

Research output: Contribution to journal › Article › peer-review

123 Citations (Scopus)

Abstract

DNA-binding proteins play an important role in most cellular processes. Therefore, it is necessary to develop an efficient predictor for identifying DNA-binding proteins only based on the sequence information of proteins. The bottleneck for constructing a useful predictor is to find suitable features capturing the characteristics of DNA binding proteins. We applied PseAAC to DNA binding protein identification, and PseAAC was further improved by incorporating the evolutionary information by using profile-based protein representation. Finally, Combined with Support Vector Machines (SVMs), a predictor called iDNAPro-PseAAC was proposed. Experimental results on an updated benchmark dataset showed that iDNAPro-PseAAC outperformed some state-of-the-art approaches, and it can achieve stable performance on an independent dataset. By using an ensemble learning approach to incorporate more negative samples (non-DNA binding proteins) in the training process, the performance of iDNAPro-PseAAC was further improved. The web server of iDNAPro-PseAAC is available at http://bioinformatics.hitsz.edu.cn/iDNAPro-PseAAC/.

Original language	English
Article number	15479
Journal	Scientific Reports
Volume	5
DOIs	https://doi.org/10.1038/srep15479
Publication status	Published - 20 Oct 2015
Externally published	Yes

Access to Document

10.1038/srep15479

Cite this

Liu, B., Wang, S., & Wang, X. (2015). DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation. Scientific Reports, 5, Article 15479. https://doi.org/10.1038/srep15479

@article{7e280b759da24854ab8b39eee106b298,

title = "DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation",

abstract = "DNA-binding proteins play an important role in most cellular processes. Therefore, it is necessary to develop an efficient predictor for identifying DNA-binding proteins only based on the sequence information of proteins. The bottleneck for constructing a useful predictor is to find suitable features capturing the characteristics of DNA binding proteins. We applied PseAAC to DNA binding protein identification, and PseAAC was further improved by incorporating the evolutionary information by using profile-based protein representation. Finally, Combined with Support Vector Machines (SVMs), a predictor called iDNAPro-PseAAC was proposed. Experimental results on an updated benchmark dataset showed that iDNAPro-PseAAC outperformed some state-of-the-art approaches, and it can achieve stable performance on an independent dataset. By using an ensemble learning approach to incorporate more negative samples (non-DNA binding proteins) in the training process, the performance of iDNAPro-PseAAC was further improved. The web server of iDNAPro-PseAAC is available at http://bioinformatics.hitsz.edu.cn/iDNAPro-PseAAC/.",

author = "Bin Liu and Shanyi Wang and Xiaolong Wang",

year = "2015",

month = oct,

day = "20",

doi = "10.1038/srep15479",

language = "English",

volume = "5",

journal = "Scientific Reports",

issn = "2045-2322",

publisher = "Nature Publishing Group",

}

TY - JOUR

T1 - DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation

AU - Liu, Bin

AU - Wang, Shanyi

AU - Wang, Xiaolong

PY - 2015/10/20

Y1 - 2015/10/20

N2 - DNA-binding proteins play an important role in most cellular processes. Therefore, it is necessary to develop an efficient predictor for identifying DNA-binding proteins only based on the sequence information of proteins. The bottleneck for constructing a useful predictor is to find suitable features capturing the characteristics of DNA binding proteins. We applied PseAAC to DNA binding protein identification, and PseAAC was further improved by incorporating the evolutionary information by using profile-based protein representation. Finally, Combined with Support Vector Machines (SVMs), a predictor called iDNAPro-PseAAC was proposed. Experimental results on an updated benchmark dataset showed that iDNAPro-PseAAC outperformed some state-of-the-art approaches, and it can achieve stable performance on an independent dataset. By using an ensemble learning approach to incorporate more negative samples (non-DNA binding proteins) in the training process, the performance of iDNAPro-PseAAC was further improved. The web server of iDNAPro-PseAAC is available at http://bioinformatics.hitsz.edu.cn/iDNAPro-PseAAC/.

AB - DNA-binding proteins play an important role in most cellular processes. Therefore, it is necessary to develop an efficient predictor for identifying DNA-binding proteins only based on the sequence information of proteins. The bottleneck for constructing a useful predictor is to find suitable features capturing the characteristics of DNA binding proteins. We applied PseAAC to DNA binding protein identification, and PseAAC was further improved by incorporating the evolutionary information by using profile-based protein representation. Finally, Combined with Support Vector Machines (SVMs), a predictor called iDNAPro-PseAAC was proposed. Experimental results on an updated benchmark dataset showed that iDNAPro-PseAAC outperformed some state-of-the-art approaches, and it can achieve stable performance on an independent dataset. By using an ensemble learning approach to incorporate more negative samples (non-DNA binding proteins) in the training process, the performance of iDNAPro-PseAAC was further improved. The web server of iDNAPro-PseAAC is available at http://bioinformatics.hitsz.edu.cn/iDNAPro-PseAAC/.

UR - http://www.scopus.com/inward/record.url?scp=84944886528&partnerID=8YFLogxK

U2 - 10.1038/srep15479

DO - 10.1038/srep15479

M3 - Article

C2 - 26482832

AN - SCOPUS:84944886528

SN - 2045-2322

VL - 5

JO - Scientific Reports

JF - Scientific Reports

M1 - 15479

ER -

DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation

Abstract

Access to Document

Other files and links

Fingerprint

Cite this