Using distances between Top-n-gram and residue pairs for protein remote homology detection

Bin Liu, Jinghao Xu, Quan Zou, Ruifeng Xu*, Xiaolong Wang, Qingcai Chen

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

88 引用 (Scopus)
Plum Print visual indicator of research metrics
  • Citations
    • Citation Indexes: 88
  • Captures
    • Readers: 28
see details

摘要

Background: Protein remote homology detection is one of the central problems in bioinformatics, which is important for both basic research and practical application. Currently, discriminative methods based on Support Vector Machines (SVMs) achieve the state-of-the-art performance. Exploring feature vectors incorporating the position information of amino acids or other protein building blocks is a key step to improve the performance of the SVM-based methods. Results: Two new methods for protein remote homology detection were proposed, called SVM-DR and SVM-DT. SVM-DR is a sequence-based method, in which the feature vector representation for protein is based on the distances between residue pairs. SVM-DT is a profile-based method, which considers the distances between Top-ngram pairs. Top-n-gram can be viewed as a profile-based building block of proteins, which is calculated from the frequency profiles. These two methods are position dependent approaches incorporating the sequence-order information of protein sequences. Various experiments were conducted on a benchmark dataset containing 54 families and 23 superfamilies. Experimental results showed that these two new methods are very promising. Compared with the position independent methods, the performance improvement is obvious. Furthermore, the proposed methods can also provide useful insights for studying the features of protein families. Conclusion: The better performance of the proposed methods demonstrates that the position dependant approaches are efficient for protein remote homology detection. Another advantage of our methods arises from the explicit feature space representation, which can be used to analyze the characteristic features of protein families. The source code of SVM-DT and SVM-DR is available at http://bioinformatics.hitsz.edu.cn/DistanceSVM/ index.jsp

源语言英语
文章编号S3
期刊BMC Bioinformatics
15
DOI
出版状态已出版 - 2014
已对外发布

指纹

探究 'Using distances between Top-n-gram and residue pairs for protein remote homology detection' 的科研主题。它们共同构成独一无二的指纹。

引用此

Liu, B., Xu, J., Zou, Q., Xu, R., Wang, X., & Chen, Q. (2014). Using distances between Top-n-gram and residue pairs for protein remote homology detection. BMC Bioinformatics, 15, 文章 S3. https://doi.org/10.1186/1471-2105-15-S2-S3