TY - JOUR
T1 - Clustering and percolation in protein loop structures
AU - Peng, Xubiao
AU - He, Jianfeng
AU - Niemi, Antti J.
N1 - Publisher Copyright:
© 2015 Peng et al.
PY - 2015/10/29
Y1 - 2015/10/29
N2 - Background: High precision protein loop modelling remains a challenge, both in template based and template independent approaches to protein structure prediction. Method: We introduce the concepts of protein loop clustering and percolation, to develop a quantitative approach to systematically classify the modular building blocks of loops in crystallographic folded proteins. These fragments are all different parameterisations of a unique kink solution to a generalised discrete nonlinear Schrödinger (DNLS) equation. Accordingly, the fragments are also local energy minima of the ensuing energy function. Results: We show how the loop fragments cover practically all ultrahigh resolution crystallographic protein structures in Protein Data Bank (PDB), with a 0.2 Ångström root-mean-square (RMS) precision. We find that no more than 12 different loop fragments are needed, to describe around 38 % of ultrahigh resolution loops in PDB. But there is also a large number of loop fragments that are either unique, or very rare, and examples of unique fragments are found even in the structure of a myoglobin. Conclusions: Protein loops are built in a modular fashion. The loops are composed of fragments that can be modelled by the kink of the DNLS equation. The majority of loop fragments are also common, which are shared by many proteins. These common fragments are probably important for supporting the overall protein conformation. But there are also several fragments that are either unique to a given protein, or very rare. Such fragments are probably related to the function of the protein. Furthermore, we have found that the amino acid sequence does not determine the structure in a unique fashion. There are many examples of loop fragments with an identical amino acid sequence, but with a very different structure.
AB - Background: High precision protein loop modelling remains a challenge, both in template based and template independent approaches to protein structure prediction. Method: We introduce the concepts of protein loop clustering and percolation, to develop a quantitative approach to systematically classify the modular building blocks of loops in crystallographic folded proteins. These fragments are all different parameterisations of a unique kink solution to a generalised discrete nonlinear Schrödinger (DNLS) equation. Accordingly, the fragments are also local energy minima of the ensuing energy function. Results: We show how the loop fragments cover practically all ultrahigh resolution crystallographic protein structures in Protein Data Bank (PDB), with a 0.2 Ångström root-mean-square (RMS) precision. We find that no more than 12 different loop fragments are needed, to describe around 38 % of ultrahigh resolution loops in PDB. But there is also a large number of loop fragments that are either unique, or very rare, and examples of unique fragments are found even in the structure of a myoglobin. Conclusions: Protein loops are built in a modular fashion. The loops are composed of fragments that can be modelled by the kink of the DNLS equation. The majority of loop fragments are also common, which are shared by many proteins. These common fragments are probably important for supporting the overall protein conformation. But there are also several fragments that are either unique to a given protein, or very rare. Such fragments are probably related to the function of the protein. Furthermore, we have found that the amino acid sequence does not determine the structure in a unique fashion. There are many examples of loop fragments with an identical amino acid sequence, but with a very different structure.
KW - Cα trace problem
KW - Loop modeling
KW - Protein backbone
UR - http://www.scopus.com/inward/record.url?scp=84945973830&partnerID=8YFLogxK
U2 - 10.1186/s12900-015-0049-x
DO - 10.1186/s12900-015-0049-x
M3 - Article
C2 - 26510704
AN - SCOPUS:84945973830
SN - 1472-6807
VL - 15
JO - BMC Structural Biology
JF - BMC Structural Biology
IS - 1
M1 - 22
ER -