IEsGene-ZCPseKNC: Identify Essential Genes Based on Z Curve Pseudo k-Tuple Nucleotide Composition

Jiahai Chen, Yongmin Liu, Qing Liao*, Bin Liu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)

Abstract

As an important technique for synthetic biology, computational identification of essential genes will facilitate the development of the related fields, such as genome analysis, drug design, etc. The identification of prokaryotic essential genes has been extensively studied, especially focusing on the essential genes in bacteria. Archaea as an important domain in prokaryote exists high variance of genome sizes. However, there is no predictor available for predicting essential genes in archaea. In this paper, we developed the first computational predictor for predicting essential genes in archaea called iEsGene-ZCPseKNC. With the purpose of capturing sequence patterns of the essential genes, a new feature called Z curve pseudo k-tuple nucleotide composition (ZCPseKNC) was proposed, which incorporates the advantages of both Z curve and pseudo k-tuple nucleotide composition (PseKNC). In order to overcome the problems caused by the imbalanced training set, the SMOTE algorithm was employed to further improve the predictive performance of iEsGene-ZCPseKNC. Evaluated by the rigorous jackknife test on a benchmark dataset, the experimental results showed that the iEsGene-ZCPseKNC predictor outperformed the predictors based on Z curve and PseKNC, indicating that iEsGene-ZCPseKNC is useful for identification of essential genes in archaea, and would be a powerful tool for genome analysis. A user friendly web server of the iEsGene-ZCPseKNC predictor was established and can be easily accessed from http://bliulab.net/iEsGene-ZCPseKNC/.

Original languageEnglish
Article number8894693
Pages (from-to)165241-165247
Number of pages7
JournalIEEE Access
Volume7
DOIs
Publication statusPublished - 2019

Keywords

  • Essential gene prediction
  • SMOTE
  • ZCPseKNC
  • support vector machine

Fingerprint

Dive into the research topics of 'IEsGene-ZCPseKNC: Identify Essential Genes Based on Z Curve Pseudo k-Tuple Nucleotide Composition'. Together they form a unique fingerprint.

Cite this