TY - JOUR
T1 - IEsGene-ZCPseKNC
T2 - Identify Essential Genes Based on Z Curve Pseudo k-Tuple Nucleotide Composition
AU - Chen, Jiahai
AU - Liu, Yongmin
AU - Liao, Qing
AU - Liu, Bin
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2019
Y1 - 2019
N2 - As an important technique for synthetic biology, computational identification of essential genes will facilitate the development of the related fields, such as genome analysis, drug design, etc. The identification of prokaryotic essential genes has been extensively studied, especially focusing on the essential genes in bacteria. Archaea as an important domain in prokaryote exists high variance of genome sizes. However, there is no predictor available for predicting essential genes in archaea. In this paper, we developed the first computational predictor for predicting essential genes in archaea called iEsGene-ZCPseKNC. With the purpose of capturing sequence patterns of the essential genes, a new feature called Z curve pseudo k-tuple nucleotide composition (ZCPseKNC) was proposed, which incorporates the advantages of both Z curve and pseudo k-tuple nucleotide composition (PseKNC). In order to overcome the problems caused by the imbalanced training set, the SMOTE algorithm was employed to further improve the predictive performance of iEsGene-ZCPseKNC. Evaluated by the rigorous jackknife test on a benchmark dataset, the experimental results showed that the iEsGene-ZCPseKNC predictor outperformed the predictors based on Z curve and PseKNC, indicating that iEsGene-ZCPseKNC is useful for identification of essential genes in archaea, and would be a powerful tool for genome analysis. A user friendly web server of the iEsGene-ZCPseKNC predictor was established and can be easily accessed from http://bliulab.net/iEsGene-ZCPseKNC/.
AB - As an important technique for synthetic biology, computational identification of essential genes will facilitate the development of the related fields, such as genome analysis, drug design, etc. The identification of prokaryotic essential genes has been extensively studied, especially focusing on the essential genes in bacteria. Archaea as an important domain in prokaryote exists high variance of genome sizes. However, there is no predictor available for predicting essential genes in archaea. In this paper, we developed the first computational predictor for predicting essential genes in archaea called iEsGene-ZCPseKNC. With the purpose of capturing sequence patterns of the essential genes, a new feature called Z curve pseudo k-tuple nucleotide composition (ZCPseKNC) was proposed, which incorporates the advantages of both Z curve and pseudo k-tuple nucleotide composition (PseKNC). In order to overcome the problems caused by the imbalanced training set, the SMOTE algorithm was employed to further improve the predictive performance of iEsGene-ZCPseKNC. Evaluated by the rigorous jackknife test on a benchmark dataset, the experimental results showed that the iEsGene-ZCPseKNC predictor outperformed the predictors based on Z curve and PseKNC, indicating that iEsGene-ZCPseKNC is useful for identification of essential genes in archaea, and would be a powerful tool for genome analysis. A user friendly web server of the iEsGene-ZCPseKNC predictor was established and can be easily accessed from http://bliulab.net/iEsGene-ZCPseKNC/.
KW - Essential gene prediction
KW - SMOTE
KW - ZCPseKNC
KW - support vector machine
UR - http://www.scopus.com/inward/record.url?scp=85077614624&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2019.2952237
DO - 10.1109/ACCESS.2019.2952237
M3 - Article
AN - SCOPUS:85077614624
SN - 2169-3536
VL - 7
SP - 165241
EP - 165247
JO - IEEE Access
JF - IEEE Access
M1 - 8894693
ER -