Exploratory predicting protein folding model with random forest and hybrid features

Xuewei Zhao; Quan Zou; Bin Liu; Xiangrong Liu

doi:10.2174/157016461104150121115154

Exploratory predicting protein folding model with random forest and hybrid features

Xuewei Zhao, Quan Zou, Bin Liu, Xiangrong Liu^*

^*此作品的通讯作者

科研成果: 期刊稿件 › 文章 › 同行评审

58 引用（Scopus）

摘要

Recent developments in bioinformatics have highlighted the importance of protein structure prediction for which information about structure classes forms the foundation and plays an important role in the prediction of protein folds and tertiary structure. The majority of previous researches have focused on only four protein classes in the Structure Classification of Proteins (SCOP) database. In this paper, we focused mainly on finding the best performing prediction method using SCOP—extended (SCOPe, Release 2.03; previously known as version 1.75C in SCOP), which contains seven major protein classes, including all-α proteins, all-β proteins, α/β proteins, α+β proteins, multi-domain proteins, membrane and cell surface proteins and peptides, and small proteins. The framework that we developed consists of two stages: in the first stage we used a hybrid frequency method for feature extraction from a SCOPe dataset, and in the second stage, we calculated an effective parameter (number of trees) for the Random Forest Classifier. Our computational results on the SCOPe dataset demonstrate the efficiency and effectiveness of our model that generated predictions with an accuracy of 88%, which is much higher than the accuracies reported in previous studies. These encouraging results may be helpful for future research on protein structure and protein fold prediction. Our codes are available in http://datamining.xmu.edu.cn/~zhaoxuewei/PSP.

源语言	英语
页（从-至）	289-299
页数	11
期刊	Current Proteomics
卷	11
期	4
DOI	https://doi.org/10.2174/157016461104150121115154
出版状态	已出版 - 1 12月 2014
已对外发布	是

访问文件

10.2174/157016461104150121115154

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhao, X., Zou, Q., Liu, B., & Liu, X. (2014). Exploratory predicting protein folding model with random forest and hybrid features. Current Proteomics, 11(4), 289-299. https://doi.org/10.2174/157016461104150121115154

@article{d2e366d122df4034a7b108ac3e4604f7,

title = "Exploratory predicting protein folding model with random forest and hybrid features",

abstract = "Recent developments in bioinformatics have highlighted the importance of protein structure prediction for which information about structure classes forms the foundation and plays an important role in the prediction of protein folds and tertiary structure. The majority of previous researches have focused on only four protein classes in the Structure Classification of Proteins (SCOP) database. In this paper, we focused mainly on finding the best performing prediction method using SCOP—extended (SCOPe, Release 2.03; previously known as version 1.75C in SCOP), which contains seven major protein classes, including all-α proteins, all-β proteins, α/β proteins, α+β proteins, multi-domain proteins, membrane and cell surface proteins and peptides, and small proteins. The framework that we developed consists of two stages: in the first stage we used a hybrid frequency method for feature extraction from a SCOPe dataset, and in the second stage, we calculated an effective parameter (number of trees) for the Random Forest Classifier. Our computational results on the SCOPe dataset demonstrate the efficiency and effectiveness of our model that generated predictions with an accuracy of 88%, which is much higher than the accuracies reported in previous studies. These encouraging results may be helpful for future research on protein structure and protein fold prediction. Our codes are available in http://datamining.xmu.edu.cn/~zhaoxuewei/PSP.",

keywords = "Classifier, Protein class, Protein structure prediction, Random forests, SCOP dataset, n-gram feature",

author = "Xuewei Zhao and Quan Zou and Bin Liu and Xiangrong Liu",

note = "Publisher Copyright: {\textcopyright} 2014 Bentham Science Publishers.",

year = "2014",

month = dec,

day = "1",

doi = "10.2174/157016461104150121115154",

language = "English",

volume = "11",

pages = "289--299",

journal = "Current Proteomics",

issn = "1570-1646",

publisher = "Bentham Science Publishers",

number = "4",

}

TY - JOUR

T1 - Exploratory predicting protein folding model with random forest and hybrid features

AU - Zhao, Xuewei

AU - Zou, Quan

AU - Liu, Bin

AU - Liu, Xiangrong

PY - 2014/12/1

Y1 - 2014/12/1

N2 - Recent developments in bioinformatics have highlighted the importance of protein structure prediction for which information about structure classes forms the foundation and plays an important role in the prediction of protein folds and tertiary structure. The majority of previous researches have focused on only four protein classes in the Structure Classification of Proteins (SCOP) database. In this paper, we focused mainly on finding the best performing prediction method using SCOP—extended (SCOPe, Release 2.03; previously known as version 1.75C in SCOP), which contains seven major protein classes, including all-α proteins, all-β proteins, α/β proteins, α+β proteins, multi-domain proteins, membrane and cell surface proteins and peptides, and small proteins. The framework that we developed consists of two stages: in the first stage we used a hybrid frequency method for feature extraction from a SCOPe dataset, and in the second stage, we calculated an effective parameter (number of trees) for the Random Forest Classifier. Our computational results on the SCOPe dataset demonstrate the efficiency and effectiveness of our model that generated predictions with an accuracy of 88%, which is much higher than the accuracies reported in previous studies. These encouraging results may be helpful for future research on protein structure and protein fold prediction. Our codes are available in http://datamining.xmu.edu.cn/~zhaoxuewei/PSP.

AB - Recent developments in bioinformatics have highlighted the importance of protein structure prediction for which information about structure classes forms the foundation and plays an important role in the prediction of protein folds and tertiary structure. The majority of previous researches have focused on only four protein classes in the Structure Classification of Proteins (SCOP) database. In this paper, we focused mainly on finding the best performing prediction method using SCOP—extended (SCOPe, Release 2.03; previously known as version 1.75C in SCOP), which contains seven major protein classes, including all-α proteins, all-β proteins, α/β proteins, α+β proteins, multi-domain proteins, membrane and cell surface proteins and peptides, and small proteins. The framework that we developed consists of two stages: in the first stage we used a hybrid frequency method for feature extraction from a SCOPe dataset, and in the second stage, we calculated an effective parameter (number of trees) for the Random Forest Classifier. Our computational results on the SCOPe dataset demonstrate the efficiency and effectiveness of our model that generated predictions with an accuracy of 88%, which is much higher than the accuracies reported in previous studies. These encouraging results may be helpful for future research on protein structure and protein fold prediction. Our codes are available in http://datamining.xmu.edu.cn/~zhaoxuewei/PSP.

KW - Classifier

KW - Protein class

KW - Protein structure prediction

KW - Random forests

KW - SCOP dataset

KW - n-gram feature

UR - http://www.scopus.com/inward/record.url?scp=84922326160&partnerID=8YFLogxK

U2 - 10.2174/157016461104150121115154

DO - 10.2174/157016461104150121115154

M3 - Article

AN - SCOPUS:84922326160

SN - 1570-1646

VL - 11

SP - 289

EP - 299

JO - Current Proteomics

JF - Current Proteomics

IS - 4

ER -

Exploratory predicting protein folding model with random forest and hybrid features

摘要

访问文件

其它文件与链接

指纹

引用此