TY - GEN
T1 - Protein Remote Homology Detection Based on Profiles
AU - Liao, Qing
AU - Guo, Mingyue
AU - Liu, Bin
N1 - Publisher Copyright:
© 2019, Springer Nature Switzerland AG.
PY - 2019
Y1 - 2019
N2 - As a most important task in protein sequence analysis, protein remote homology detection has been extensively studied for decades. Currently, the profile-based methods show the state-of-the-art performance. Position-Specific Frequency Matrix (PSFM) is a widely used profile. The reason is that this profile contains evolutionary information, which is critical for protein sequence analysis. However, there exists noise information in the profiles introduced by the amino acids with low frequencies, which are not likely to occur in the corresponding sequence positions during evolutionary process. In this study, we propose one method to remove the noise information in the PSFM by removing the amino acids with low frequencies and two a profile can be generated, called Top frequency profile (TFP). Autocross covariance (ACC) transformation is performed on the profile to convert them into fixed length feature vectors. Combined with Support Vector Machines (SVMs), the predictor is constructed. Evaluated on a benchmark dataset, experimental results show that the proposed method outperforms other state-of-the-art predictors for protein remote homology detection, indicating that the proposed method is useful tools for protein sequence analysis. Because the profiles generated from multiple sequence alignments are important for protein structure and function prediction, the TFP will has many potential applications.
AB - As a most important task in protein sequence analysis, protein remote homology detection has been extensively studied for decades. Currently, the profile-based methods show the state-of-the-art performance. Position-Specific Frequency Matrix (PSFM) is a widely used profile. The reason is that this profile contains evolutionary information, which is critical for protein sequence analysis. However, there exists noise information in the profiles introduced by the amino acids with low frequencies, which are not likely to occur in the corresponding sequence positions during evolutionary process. In this study, we propose one method to remove the noise information in the PSFM by removing the amino acids with low frequencies and two a profile can be generated, called Top frequency profile (TFP). Autocross covariance (ACC) transformation is performed on the profile to convert them into fixed length feature vectors. Combined with Support Vector Machines (SVMs), the predictor is constructed. Evaluated on a benchmark dataset, experimental results show that the proposed method outperforms other state-of-the-art predictors for protein remote homology detection, indicating that the proposed method is useful tools for protein sequence analysis. Because the profiles generated from multiple sequence alignments are important for protein structure and function prediction, the TFP will has many potential applications.
KW - Protein remote homology detection
KW - Top Frequency Profile (TFP)
UR - http://www.scopus.com/inward/record.url?scp=85065846117&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-17938-0_24
DO - 10.1007/978-3-030-17938-0_24
M3 - Conference contribution
AN - SCOPUS:85065846117
SN - 9783030179373
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 261
EP - 268
BT - Bioinformatics and Biomedical Engineering - 7th International Work-Conference, IWBBIO 2019, Proceedings
A2 - Rojas, Fernando
A2 - Ortuño, Francisco
A2 - Valenzuela, Olga
A2 - Ortuño, Francisco
A2 - Rojas, Ignacio
PB - Springer Verlag
T2 - 7th International Work-Conference on Bioinformatics and Biomedical Engineering, IWBBIO 2019
Y2 - 8 May 2019 through 10 May 2019
ER -