TY - JOUR
T1 - Protein remote homology detection by combining chou's pseudo amino acid composition and profile-based protein representation
AU - Liu, Bin
AU - Wang, Xiaolong
AU - Zou, Quan
AU - Dong, Qiwen
AU - Chen, Qingcai
PY - 2013/10
Y1 - 2013/10
N2 - Protein remote homology detection is a key problem in bioinformatics. Currently the discriminative methods, such as Support Vector Machine (SVM) can achieve the best performance. The most efficient approach to improve the performance of SVM-based methods is to find a general protein representation method that is able to convert proteins with different lengths into fixed length vectors and captures the different properties of the proteins for the discrimination. The bottleneck of designing the protein representation method is that native proteins have different lengths. Motivated by the success of the pseudo amino acid composition (PseAAC) proposed by Chou, we applied this approach for protein remote homology detection. Some new indices derived from the amino acid index (AAIndex) database are incorporated into the PseAAC to improve the generalization ability of this method. Finally, the performance is further improved by combining the modified PseAAC with profile-based protein representation containing the evolutionary information extracted from the frequency profiles. Our experiments on a well-known benchmark show this method achieves superior or comparable performance with current state-of-theart methods.
AB - Protein remote homology detection is a key problem in bioinformatics. Currently the discriminative methods, such as Support Vector Machine (SVM) can achieve the best performance. The most efficient approach to improve the performance of SVM-based methods is to find a general protein representation method that is able to convert proteins with different lengths into fixed length vectors and captures the different properties of the proteins for the discrimination. The bottleneck of designing the protein representation method is that native proteins have different lengths. Motivated by the success of the pseudo amino acid composition (PseAAC) proposed by Chou, we applied this approach for protein remote homology detection. Some new indices derived from the amino acid index (AAIndex) database are incorporated into the PseAAC to improve the generalization ability of this method. Finally, the performance is further improved by combining the modified PseAAC with profile-based protein representation containing the evolutionary information extracted from the frequency profiles. Our experiments on a well-known benchmark show this method achieves superior or comparable performance with current state-of-theart methods.
KW - Frequency profile
KW - Protein remote homology
KW - Pseudo amino acid composition
KW - Support Vector Machine
UR - http://www.scopus.com/inward/record.url?scp=84892972298&partnerID=8YFLogxK
U2 - 10.1002/minf.201300084
DO - 10.1002/minf.201300084
M3 - Article
AN - SCOPUS:84892972298
SN - 1868-1743
VL - 32
SP - 775
EP - 782
JO - Molecular Informatics
JF - Molecular Informatics
IS - 9-10
ER -