Protein Fold Recognition by Combining Support Vector Machines and Pairwise Sequence Similarity Scores

Ke Yan, Jie Wen, Jin Xing Liu, Yong Xu*, Bin Liu*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

13 Citations (Scopus)

Abstract

Protein fold recognition is one of the most essential steps for protein structure prediction, aiming to classify proteins into known protein folds. There are two main computational approaches: one is the template-based method based on the alignment scores between query-template protein pairs and the other is the machine learning method based on the feature representation and classifier. These two approaches have their own advantages and disadvantages. Can we combine these methods to establish more accurate predictors for protein fold recognition? In this study, we made an initial attempt and proposed two novel algorithms: TSVM-fold and ESVM-fold. TSVM-fold was based on the Support Vector Machines (SVMs), which utilizes a set of pairwise sequence similarity scores generated by three complementary template-based methods, including HHblits, SPARKS-X, and DeepFR. These scores measured the global relationships between query sequences and templates. The comprehensive features of the attributes of the sequences were fed into the SVMs for the prediction. Then the TSVM-fold was further combined with the HHblits algorithm so as to improve its generalization ability. The combined method is called ESVM-fold. Experimental results in two rigorous benchmark datasets (LE and YK datasets) showed that the proposed methods outperform some state-of-the-art methods, indicating that the TSVM-fold and ESVM-fold are efficient predictors for protein fold recognition.

Original languageEnglish
Pages (from-to)2008-2016
Number of pages9
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume18
Issue number5
DOIs
Publication statusPublished - 2021

Keywords

  • Pairwise sequence similarity scores
  • Protein fold recognition
  • SVMs
  • Template-based method

Fingerprint

Dive into the research topics of 'Protein Fold Recognition by Combining Support Vector Machines and Pairwise Sequence Similarity Scores'. Together they form a unique fingerprint.

Cite this