Protein fold recognition based on multi-view modeling

Ke Yan; Xiaozhao Fang; Yong Xu; Bin Liu

doi:10.1093/bioinformatics/btz040

Protein fold recognition based on multi-view modeling

Ke Yan, Xiaozhao Fang, Yong Xu^*, Bin Liu

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

78 引用（Scopus）

摘要

Motivation: Protein fold recognition has attracted increasing attention because it is critical for studies of the 3D structures of proteins and drug design. Researchers have been extensively studying this important task, and several features with high discriminative power have been proposed. However, the development of methods that efficiently combine these features to improve the predictive performance remains a challenging problem. Results: In this study, we proposed two algorithms: MV-fold and MT-fold. MV-fold is a new computational predictor based on the multi-view learning model for fold recognition. Different features of proteins were treated as different views of proteins, including the evolutionary information, secondary structure information and physicochemical properties. These different views constituted the latent space. The ϵ-dragging technique was employed to enlarge the margins between different protein folds, improving the predictive performance of MV-fold. Then, MV-fold was combined with two template-based methods: HHblits and HMMER. The ensemble method is called MT-fold incorporating the advantages of both discriminative methods and template-based methods. Experimental results on five widely used benchmark datasets (DD, RDD, EDD, TG and LE) showed that the proposed methods outperformed some state-of-the-art methods in this field, indicating that MV-fold and MT-fold are useful computational tools for protein fold recognition and protein homology detection and would be efficient tools for protein sequence analysis. Finally, we constructed an update and rigorous benchmark dataset based on SCOPe (version 2.07) to fairly evaluate the performance of the proposed method, and our method achieved stable performance on this new dataset. This new benchmark dataset will become a widely used benchmark dataset to fairly evaluate the performance of different methods for fold recognition. Supplementary information: Supplementary data are available at Bioinformatics online.

源语言	英语
页（从-至）	2982-2990
页数	9
期刊	Bioinformatics
卷	35
期	17
DOI	https://doi.org/10.1093/bioinformatics/btz040
出版状态	已出版 - 1 9月 2019

访问文件

10.1093/bioinformatics/btz040

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{7b599210dbbd400199a5d86aae29a2a6,

title = "Protein fold recognition based on multi-view modeling",

abstract = "Motivation: Protein fold recognition has attracted increasing attention because it is critical for studies of the 3D structures of proteins and drug design. Researchers have been extensively studying this important task, and several features with high discriminative power have been proposed. However, the development of methods that efficiently combine these features to improve the predictive performance remains a challenging problem. Results: In this study, we proposed two algorithms: MV-fold and MT-fold. MV-fold is a new computational predictor based on the multi-view learning model for fold recognition. Different features of proteins were treated as different views of proteins, including the evolutionary information, secondary structure information and physicochemical properties. These different views constituted the latent space. The ϵ-dragging technique was employed to enlarge the margins between different protein folds, improving the predictive performance of MV-fold. Then, MV-fold was combined with two template-based methods: HHblits and HMMER. The ensemble method is called MT-fold incorporating the advantages of both discriminative methods and template-based methods. Experimental results on five widely used benchmark datasets (DD, RDD, EDD, TG and LE) showed that the proposed methods outperformed some state-of-the-art methods in this field, indicating that MV-fold and MT-fold are useful computational tools for protein fold recognition and protein homology detection and would be efficient tools for protein sequence analysis. Finally, we constructed an update and rigorous benchmark dataset based on SCOPe (version 2.07) to fairly evaluate the performance of the proposed method, and our method achieved stable performance on this new dataset. This new benchmark dataset will become a widely used benchmark dataset to fairly evaluate the performance of different methods for fold recognition. Supplementary information: Supplementary data are available at Bioinformatics online.",

author = "Ke Yan and Xiaozhao Fang and Yong Xu and Bin Liu",

note = "Publisher Copyright: {\textcopyright} 2019 The Author(s).",

year = "2019",

month = sep,

day = "1",

doi = "10.1093/bioinformatics/btz040",

language = "English",

volume = "35",

pages = "2982--2990",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "17",

}

TY - JOUR

T1 - Protein fold recognition based on multi-view modeling

AU - Yan, Ke

AU - Fang, Xiaozhao

AU - Xu, Yong

AU - Liu, Bin

PY - 2019/9/1

Y1 - 2019/9/1

N2 - Motivation: Protein fold recognition has attracted increasing attention because it is critical for studies of the 3D structures of proteins and drug design. Researchers have been extensively studying this important task, and several features with high discriminative power have been proposed. However, the development of methods that efficiently combine these features to improve the predictive performance remains a challenging problem. Results: In this study, we proposed two algorithms: MV-fold and MT-fold. MV-fold is a new computational predictor based on the multi-view learning model for fold recognition. Different features of proteins were treated as different views of proteins, including the evolutionary information, secondary structure information and physicochemical properties. These different views constituted the latent space. The ϵ-dragging technique was employed to enlarge the margins between different protein folds, improving the predictive performance of MV-fold. Then, MV-fold was combined with two template-based methods: HHblits and HMMER. The ensemble method is called MT-fold incorporating the advantages of both discriminative methods and template-based methods. Experimental results on five widely used benchmark datasets (DD, RDD, EDD, TG and LE) showed that the proposed methods outperformed some state-of-the-art methods in this field, indicating that MV-fold and MT-fold are useful computational tools for protein fold recognition and protein homology detection and would be efficient tools for protein sequence analysis. Finally, we constructed an update and rigorous benchmark dataset based on SCOPe (version 2.07) to fairly evaluate the performance of the proposed method, and our method achieved stable performance on this new dataset. This new benchmark dataset will become a widely used benchmark dataset to fairly evaluate the performance of different methods for fold recognition. Supplementary information: Supplementary data are available at Bioinformatics online.

AB - Motivation: Protein fold recognition has attracted increasing attention because it is critical for studies of the 3D structures of proteins and drug design. Researchers have been extensively studying this important task, and several features with high discriminative power have been proposed. However, the development of methods that efficiently combine these features to improve the predictive performance remains a challenging problem. Results: In this study, we proposed two algorithms: MV-fold and MT-fold. MV-fold is a new computational predictor based on the multi-view learning model for fold recognition. Different features of proteins were treated as different views of proteins, including the evolutionary information, secondary structure information and physicochemical properties. These different views constituted the latent space. The ϵ-dragging technique was employed to enlarge the margins between different protein folds, improving the predictive performance of MV-fold. Then, MV-fold was combined with two template-based methods: HHblits and HMMER. The ensemble method is called MT-fold incorporating the advantages of both discriminative methods and template-based methods. Experimental results on five widely used benchmark datasets (DD, RDD, EDD, TG and LE) showed that the proposed methods outperformed some state-of-the-art methods in this field, indicating that MV-fold and MT-fold are useful computational tools for protein fold recognition and protein homology detection and would be efficient tools for protein sequence analysis. Finally, we constructed an update and rigorous benchmark dataset based on SCOPe (version 2.07) to fairly evaluate the performance of the proposed method, and our method achieved stable performance on this new dataset. This new benchmark dataset will become a widely used benchmark dataset to fairly evaluate the performance of different methods for fold recognition. Supplementary information: Supplementary data are available at Bioinformatics online.

UR - http://www.scopus.com/inward/record.url?scp=85068778013&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btz040

DO - 10.1093/bioinformatics/btz040

M3 - Article

C2 - 30668845

AN - SCOPUS:85068778013

SN - 1367-4803

VL - 35

SP - 2982

EP - 2990

JO - Bioinformatics

JF - Bioinformatics

IS - 17

ER -

Protein fold recognition based on multi-view modeling

摘要

访问文件

其它文件与链接

指纹

引用此