TY - GEN
T1 - SOFM-top
T2 - 13th International Conference on Intelligent Computing, ICIC 2017
AU - Chen, Junjie
AU - Guo, Mingyue
AU - Wang, Xiaolong
AU - Liu, Bin
N1 - Publisher Copyright:
© Springer International Publishing AG 2017.
PY - 2017
Y1 - 2017
N2 - Protein remote homology detection and fold recognition are critical for the studies of protein structure and function. Currently, the profile-based methods showed the state-of-the-art performance in this field, which are based on widely used sequence profiles, such as Position-Specific Frequency Matrix (PSFM) and Position-Specific Scoring Matrix (PSSM). However, these approaches ignore the sequence-order effects along protein sequence. In this study, we proposed a novel profile, called Sequence-Order Frequency Matrix (SOFM), which can incorporate the sequence-order information and extract the evolutionary information from Multiple Sequence Alignment (MSA). Statistical tests and experimental results demonstrated its effects. Combined with a previously proposed approach Top-n-grams, the SOFM was then applied to remote homology detection and fold recognition, and a computational predictor called SOFM-Top was proposed. Evaluated on four benchmark datasets, it outperformed other state-of-the-art methods in this filed, indicating that SOFM-Top would be a more useful tool, and SOFM is a richer representation than PSFM and PSSM. SOFM will have many potential applications since profiles have been widely used for constructing computational predictors in the studies of protein structure and function.
AB - Protein remote homology detection and fold recognition are critical for the studies of protein structure and function. Currently, the profile-based methods showed the state-of-the-art performance in this field, which are based on widely used sequence profiles, such as Position-Specific Frequency Matrix (PSFM) and Position-Specific Scoring Matrix (PSSM). However, these approaches ignore the sequence-order effects along protein sequence. In this study, we proposed a novel profile, called Sequence-Order Frequency Matrix (SOFM), which can incorporate the sequence-order information and extract the evolutionary information from Multiple Sequence Alignment (MSA). Statistical tests and experimental results demonstrated its effects. Combined with a previously proposed approach Top-n-grams, the SOFM was then applied to remote homology detection and fold recognition, and a computational predictor called SOFM-Top was proposed. Evaluated on four benchmark datasets, it outperformed other state-of-the-art methods in this filed, indicating that SOFM-Top would be a more useful tool, and SOFM is a richer representation than PSFM and PSSM. SOFM will have many potential applications since profiles have been widely used for constructing computational predictors in the studies of protein structure and function.
KW - Orderhomology detection Frequency
KW - Profile representation
KW - Protein Matrixfold recognition
KW - Protein remote Sequence
KW - Top-n-grams
UR - http://www.scopus.com/inward/record.url?scp=85027699776&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-63312-1_41
DO - 10.1007/978-3-319-63312-1_41
M3 - Conference contribution
AN - SCOPUS:85027699776
SN - 9783319633114
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 469
EP - 480
BT - Intelligent Computing Theories and Application - 13th International Conference, ICIC 2017, Proceedings
A2 - Huang, De-Shuang
A2 - Jo, Kang-Hyun
A2 - Figueroa-Garcia, Juan Carlos
PB - Springer Verlag
Y2 - 7 August 2017 through 10 August 2017
ER -