Identification of DNA-Binding Proteins by Combining Auto-Cross Covariance Transformation and Ensemble Learning

Bin Liu*, Shanyi Wang, Qiwen Dong, Shumin Li, Xuan Liu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

78 Citations (Scopus)

Abstract

DNA-binding proteins play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. With the rapid development of next generation of sequencing technique, the number of protein sequences is unprecedentedly increasing. Thus it is necessary to develop computational methods to identify the DNA-binding proteins only based on the protein sequence information. In this study, a novel method called iDNA-KACC is presented, which combines the support vector machine (SVM) and the auto-cross covariance transformation. The protein sequences are first converted into profile-based protein representation, and then converted into a series of fixed-length vectors by the auto-cross covariance transformation with Kmer composition. The sequence order effect can be effectively captured by this scheme. These vectors are then fed into support vector machine (SVM) to discriminate the DNA-binding proteins from the non-DNA-binding ones. iDNA-KACC achieves an overall accuracy of 75.16% and Matthew correlation coefficient of 0.5 by a rigorous jackknife test. Its performance is further improved by employing an ensemble learning approach, and the improved predictor is called iDNA-KACC-EL. Experimental results on an independent dataset shows that iDNA-KACC-EL outperforms all the other state-of-the-art predictors, indicating that it would be a useful computational tool for DNA binding protein identification.

Original languageEnglish
Article number7457231
Pages (from-to)328-334
Number of pages7
JournalIEEE Transactions on Nanobioscience
Volume15
Issue number4
DOIs
Publication statusPublished - Jun 2016
Externally publishedYes

Keywords

  • Auto-cross covariance transformation
  • DNA-binding protein
  • ensemble learning
  • support vector machine

Fingerprint

Dive into the research topics of 'Identification of DNA-Binding Proteins by Combining Auto-Cross Covariance Transformation and Ensemble Learning'. Together they form a unique fingerprint.

Cite this