SUMMARY: Minimization of entanglement strongly influences why a protein molecule attains its specific shape, an issue relevant to diseases caused by protein misfolding. |
Consequently, understanding how proteins fold is clearly a medically relevant issue. However, this isn't a simple problem.
Protein folding: A vexing question.
There are 21 common amino acids, the basic building blocks of proteins. Each amino acid has its own unique structure and properties, and an educated guess (although not necessarily true) would be that each has a unique impact on protein folding.
Proteins are typically comprised of many amino acids, roughly 200-300 for a typical protein. One can easily envision a huge number of possible protein folding arrangements, given the tremendous range of possible amino acid compositions.
This suggests that predicting how a protein will fold, based on its amino acid sequence, is a particularly daunting task. It gets even more difficult, in the form of an apparent paradox.
Experiments have shown that protein function depends upon its amino acid sequence, but two proteins of differing amino acid sequence may nevertheless fold in the same manner. It would seem that how a protein folds is not a unique function of its amino acid sequence.
Is there in fact a rather limited number of shapes that are possible for a protein to attain? Are there many more protein folding arrangements that are possible, yet not attained in nature?
Antonio Trovato (Institute for the Physics of Matter, Italy), Fabio Pietrucci (Swiss Federal Institute of Technology), and coworkers have made a substantial contribution to the question of protein folding, in the form of a computational study of simulated proteins. They have found that proteins in nature assume only a small number of the possible folding arrangements which are theoretically attainable, based on the principle of avoiding entanglement.
The simulated proteins are realistic.
The scientists' simulations were of proteins possessing 60 amino acid (valine) units, and they were typically compared to real proteins of similar length. Proteins are generally larger than 60 units long, and possess many different amino acids, but the scientists' simulations have the advantage of being computationally tractable.
Within 50 microseconds of simulation, roughly 30,000 folded structures were observed which possessed secondary structure (three-dimensonal form) and a small radius of gyration (a measure of protein size; specifically, average distance of the constituents from the center of the protein). The simulated proteins typically underwent local reorganization of their shape, but occasionally dramatic reorganizations was observed.
The shapes they observed are reasonable for real proteins. The angles between the amino acids were feasible (Ramachandran plot), bond lengths and angles match up well with known structures (the "G-factors" measured via PROCHECK software were greater than -1), hydrogen bond energy was reasonable (also via PROCHECK), and fragments of the simulated proteins also matched properties of known proteins.
The simulated proteins were not identical to real proteins. However, this was expected, because a real protein in nature does not possess 60 repeating units of the same amino acid.
Simulations in nature, and nature in simulations.
As mentioned, the scientists found roughly 30,000 unique shapes in their simulations. How many of these shapes appear in nature?
They compared the topology of proteins in their simulations with known proteins via the TM-align method. A score of greater than 0.45 is considered to be a good match.
Using this benchmark, eighty-six percent of known folds in proteins of length between 55 and 65 amino acids are found within the scientists' simulations (of proteins possessing 60 valine units). The only reported bias towards a particular folding arrangement in the simulations was towards parallel beta sheets.
However, even though proteins in nature largely have analogs found in the simulations, the reverse relationship does not hold. The simulated proteins exhibit far more folding arrangements than that observed in nature, wherein there is a folding bias towards more compact structures.
Why might this be the case? It's reasonable to speculate that the kinetics of protein folding are more energetically favorable for compact, unentangled entities, and that a major reason why different amino acids are present in proteins at all (viewed solely from a folding standpoint) is to minimize the free energy of one compact folding arrangement over another.
Remember that the simulations were of proteins comprised of a chain of 60 valine units, and that these simulated proteins nevertheless exhibit a far larger range of folds than that observed for real proteins of comparable length. This further suggests that protein folding is based more on symmetry and geometry than on amino acid composition.
Implications.
Protein folding may in fact be much simpler than previously envisioned. Geometric properties are clearly far more important than amino acid composition, although amino acid composition is obviously important in certain cases.
These findings do not mean that the amino acid composition of a protein is unimportant. Even changing the identity of one amino acid can lead to a serious disease, a rare example being fatal familial insomnia (wherein aspartic acid takes the place of one of the asparagine units).
Resolving the protein folding question won't automatically lead to a cure for diseases caused by protein misfolding. However, it will set scientists on the right track in such directions.
Insight into protein folding will also help scientists design new bio-inspired constructs, of unique function and properties, that are not found in nature. The purposes of such directions are to "improve" upon nature, and to use biological matter to fabricate devices that work under harsh (or unique) conditions.
NOTE: The scientists' research was funded by the University of Padova and Programmi di Ricerca Scientifica di Rilevante Interesse Nazionale.
Cossio, P., Trovato, A., Pietrucci, F., Seno, F., Maritan, A., & Laio, A. (2010). Exploring the Universe of Protein Structures beyond the Protein Data Bank PLoS Computational Biology, 6 (11) DOI: 10.1371/journal.pcbi.1000957