Abstract
Protein remote homology detection and fold recognition are two critical tasks for the studies of protein structures and functions. Currently, the profile-based methods achieve the state-of-the-art performance in these fields. However, the widely used sequence profiles, like position-specific frequency matrix (PSFM) and position-specific scoring matrix (PSSM), ignore the sequence-order effects along protein sequence. In this study, we have proposed a novel profile, called sequence-order frequency matrix (SOFM), to extract the sequence-order information of neighboring residues from multiple sequence alignment (MSA). Combined with two profile feature extraction approaches, top-n-grams and the Smith-Waterman algorithm, the SOFMs are applied to protein remote homology detection and fold recognition, and two predictors called SOFM-Top and SOFM-SW are proposed. Experimental results show that SOFM contains more information content than other profiles, and these two predictors outperform other state-of-the-art methods. It is anticipated that SOFM will become a very useful profile in the studies of protein structures and functions.
Original language | English |
---|---|
Article number | 8078207 |
Pages (from-to) | 292-300 |
Number of pages | 9 |
Journal | IEEE/ACM Transactions on Computational Biology and Bioinformatics |
Volume | 16 |
Issue number | 1 |
DOIs | |
Publication status | Published - 1 Jan 2019 |
Externally published | Yes |
Keywords
- Protein remote homology detection
- Smith-Waterman local alignment algorithm
- protein fold recognition
- sequence-order frequency matrix
- top-n-gram