SOFM-top: Protein remote homology detection and fold recognition based on sequence-order frequency matrix

Junjie Chen, Mingyue Guo, Xiaolong Wang, Bin Liu*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Protein remote homology detection and fold recognition are critical for the studies of protein structure and function. Currently, the profile-based methods showed the state-of-the-art performance in this field, which are based on widely used sequence profiles, such as Position-Specific Frequency Matrix (PSFM) and Position-Specific Scoring Matrix (PSSM). However, these approaches ignore the sequence-order effects along protein sequence. In this study, we proposed a novel profile, called Sequence-Order Frequency Matrix (SOFM), which can incorporate the sequence-order information and extract the evolutionary information from Multiple Sequence Alignment (MSA). Statistical tests and experimental results demonstrated its effects. Combined with a previously proposed approach Top-n-grams, the SOFM was then applied to remote homology detection and fold recognition, and a computational predictor called SOFM-Top was proposed. Evaluated on four benchmark datasets, it outperformed other state-of-the-art methods in this filed, indicating that SOFM-Top would be a more useful tool, and SOFM is a richer representation than PSFM and PSSM. SOFM will have many potential applications since profiles have been widely used for constructing computational predictors in the studies of protein structure and function.

Original languageEnglish
Title of host publicationIntelligent Computing Theories and Application - 13th International Conference, ICIC 2017, Proceedings
EditorsDe-Shuang Huang, Kang-Hyun Jo, Juan Carlos Figueroa-Garcia
PublisherSpringer Verlag
Pages469-480
Number of pages12
ISBN (Print)9783319633114
DOIs
Publication statusPublished - 2017
Externally publishedYes
Event13th International Conference on Intelligent Computing, ICIC 2017 - Liverpool, United Kingdom
Duration: 7 Aug 201710 Aug 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10362 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference13th International Conference on Intelligent Computing, ICIC 2017
Country/TerritoryUnited Kingdom
CityLiverpool
Period7/08/1710/08/17

Keywords

  • Orderhomology detection Frequency
  • Profile representation
  • Protein Matrixfold recognition
  • Protein remote Sequence
  • Top-n-grams

Fingerprint

Dive into the research topics of 'SOFM-top: Protein remote homology detection and fold recognition based on sequence-order frequency matrix'. Together they form a unique fingerprint.

Cite this