TY - JOUR
T1 - iRO-PsekGCC
T2 - Identify DNA Replication Origins Based on Pseudo k-Tuple GC Composition
AU - Liu, Bin
AU - Chen, Shengyu
AU - Yan, Ke
AU - Weng, Fan
N1 - Publisher Copyright:
© Copyright © 2019 Liu, Chen, Yan and Weng.
PY - 2019/9/18
Y1 - 2019/9/18
N2 - Summary: Identification of replication origins is playing a key role in understanding the mechanism of DNA replication. This task is of great significance in DNA sequence analysis. Because of its importance, some computational approaches have been introduced. Among these predictors, the iRO-3wPseKNC predictor is the first discriminative method that is able to correctly identify the entire replication origins. For further improving its predictive performance, we proposed the Pseudo k-tuple GC Composition (PsekGCC) approach to capture the “GC asymmetry bias” of yeast species by considering both the GC skew and the sequence order effects of k-tuple GC Composition (k-GCC) in this study. Based on PseKGCC, we proposed a new predictor called iRO-PsekGCC to identify the DNA replication origins. Rigorous jackknife test on two yeast species benchmark datasets (Saccharomyces cerevisiae, Pichia pastoris) indicated that iRO-PsekGCC outperformed iRO-3wPseKNC. It can be anticipated that iRO-PsekGCC will be a useful tool for DNA replication origin identification. Availability and implementation: The web-server for the iRO-PsekGCC predictor was established, and it can be accessed at http://bliulab.net/iRO-PsekGCC/.
AB - Summary: Identification of replication origins is playing a key role in understanding the mechanism of DNA replication. This task is of great significance in DNA sequence analysis. Because of its importance, some computational approaches have been introduced. Among these predictors, the iRO-3wPseKNC predictor is the first discriminative method that is able to correctly identify the entire replication origins. For further improving its predictive performance, we proposed the Pseudo k-tuple GC Composition (PsekGCC) approach to capture the “GC asymmetry bias” of yeast species by considering both the GC skew and the sequence order effects of k-tuple GC Composition (k-GCC) in this study. Based on PseKGCC, we proposed a new predictor called iRO-PsekGCC to identify the DNA replication origins. Rigorous jackknife test on two yeast species benchmark datasets (Saccharomyces cerevisiae, Pichia pastoris) indicated that iRO-PsekGCC outperformed iRO-3wPseKNC. It can be anticipated that iRO-PsekGCC will be a useful tool for DNA replication origin identification. Availability and implementation: The web-server for the iRO-PsekGCC predictor was established, and it can be accessed at http://bliulab.net/iRO-PsekGCC/.
KW - DNA sequence analysis
KW - pseudo k-tuple GC composition
KW - random forest
KW - replication origin identification
KW - web-server
UR - http://www.scopus.com/inward/record.url?scp=85072898044&partnerID=8YFLogxK
U2 - 10.3389/fgene.2019.00842
DO - 10.3389/fgene.2019.00842
M3 - Article
AN - SCOPUS:85072898044
SN - 1664-8021
VL - 10
JO - Frontiers in Genetics
JF - Frontiers in Genetics
M1 - 842
ER -