TY - JOUR
T1 - sgRNA-2wPSM
T2 - Identify sgRNAs on-target activity by combining two-window-based position specific mismatch and synthetic minority oversampling technique
AU - Zhang, Lichao
AU - Bai, Tao
AU - Wu, Hao
N1 - Publisher Copyright:
© 2023 Elsevier Ltd
PY - 2023/3
Y1 - 2023/3
N2 - Motivation: sgRNAs on-target activity prediction is a critical step in the CRISPR-Cas9 system. Due to its importance to RNA function research and genome editing application, some computational methods were introduced, treating it as a binary classification task or a regression task. Among these methods, sgRNA-PSM is a state-of-the-art method. In this work, we improved this method by proposing a new feature extraction method called two-window-based PSM, which divides the DNA sequences into two non-overlapping segments so as to extract different patterns in the two different segments. The two-window-based PSM were fed into Support Vector Machines (SVMs), and a new method called sgRNA-2wPSM was proposed. Furthermore, a new oversampling method called SCORE-SVM-SMOTE was proposed to solve the imbalanced training set problem based on the SVM-SMOTE algorithm. Results on the benchmark datasets indicated that sgRNA-2wPSM is superior to other methods.
AB - Motivation: sgRNAs on-target activity prediction is a critical step in the CRISPR-Cas9 system. Due to its importance to RNA function research and genome editing application, some computational methods were introduced, treating it as a binary classification task or a regression task. Among these methods, sgRNA-PSM is a state-of-the-art method. In this work, we improved this method by proposing a new feature extraction method called two-window-based PSM, which divides the DNA sequences into two non-overlapping segments so as to extract different patterns in the two different segments. The two-window-based PSM were fed into Support Vector Machines (SVMs), and a new method called sgRNA-2wPSM was proposed. Furthermore, a new oversampling method called SCORE-SVM-SMOTE was proposed to solve the imbalanced training set problem based on the SVM-SMOTE algorithm. Results on the benchmark datasets indicated that sgRNA-2wPSM is superior to other methods.
KW - SCORE-SVM-SMOTE
KW - Support vector machine
KW - Two-window-based PSM
KW - sgRNAs on-target activity
UR - https://www.scopus.com/pages/publications/85149060127
U2 - 10.1016/j.compbiomed.2022.106489
DO - 10.1016/j.compbiomed.2022.106489
M3 - Article
C2 - 36841059
AN - SCOPUS:85149060127
SN - 0010-4825
VL - 155
JO - Computers in Biology and Medicine
JF - Computers in Biology and Medicine
M1 - 106489
ER -