Clues left by evolution help solve protein puzzles

Traces of ancient evolutionary changes in proteins help reveal their 3-D structure today

By Michael McCarthy  |  HSNewsBeat  |  Updated 2:45 PM, 01.25.2017

Posted in: Research

  • (Click for full image.) A cartoon illustrates how patterns of co-evolution in linear sequence of amino acids can be used to predict protein structure. On the left is an alignment of linear sequences of the same protein from many different organisms. Notice that whenever there is a red amino acid on the left (grey box), a complementary green amino acid always appears on the right (and vice versa). These two positions likely form a physical interaction, as shown in the two structures on the right Courtesy of Sergey Ovchinnikov

Traces of protein changes that have occurred over millions of years of evolution have allowed scientists to quickly decipher the 3-D shape of hundreds of proteins.  Previously, these structures had remained a long-standing puzzle.

How these structures were solved is reported by University of Washington Institute for Protein Design researchers in their paper in the Jan. 19 issue of the journal Science. 

Sergey Ovchinnikov, a UW graduate student in Molecular & Cellular Biology, and his colleagues in the laboratory of David Baker, UW professor of biochemistry and director of the UW Institute for Protein Design, authored the paper.  They describe how they worked out the structure of more than 600 of these protein families by analyzing differences in the amino acid sequences in similar proteins from different species.

 “This approach provides a way to develop representative models for major protein families rapidly and at a fraction of the usual cost,” said Baker, who led the research project.

Proteins are made from chains of amino acids strung together like necklace beads. Once synthesized, these chains spontaneously fold into a compact shape. This process is driven by interactions between the atoms of the different amino acids and their environment. In some cases these forces pull the atoms together and in others push them apart. Ultimately, these interactions drive the protein to fold into a shape that balances out these forces. Eventually the protein formation comes to rest at lowest energy state possible.

In theory, at least, it should be possible to predict the final shape of a protein from its amino acid sequence alone. Supposedly, a computer model could be created to take into account all these interactions and pick a conformation that generates the lowest energy state. The problem is that there are many possible interactions between the atoms of the protein and the environment. The multitude of interactions defy calculation even with the most powerful computers.

 “The computational ‘space’ is just too big,” says Baker.

Over the past twenty years, the Baker lab has been using a computational approach that incorporates what is already known about the structures certain amino acid sequences form. This information narrows down the possible conformations a protein might assume to reach its lowest-energy state. The program, called Rosetta, has proven to be remarkably effective in predicting protein structures.

The new paper shows how it is possible to further narrow the possible shapes a protein might take.  This is done by comparing proteins that perform similar functions in different species, but whose sequences have changed through the course of evolution.

“The analogy I use is: if you want to find the lowest point on earth, this is like having a tool that narrows its location to somewhere in the Middle East, which will make it a lot easier for you to find the Dead Sea,” Baker said.

The technique works because when proteins reach their final folded shape, contacts are formed between pairs of amino acids that are brought close together. These contacts can be important in the protein’s stability and function. If one amino acid in a contact pair changes, the other member of the pair may have to change as well to maintain the contact and preserve the protein’s shape. This phenomenon makes it possible to compare changes in the amino acid sequences of similar proteins from different species to detect patterns that suggest such contact pairing. If a change in an amino acid at one location is almost always associated with a change in an amino acid in another location, it’s likely that they are member of a pair.

What made the UW researchers’ new work possible were developments in an entirely different research area, called metagenomics. In that field, researchers sequence the entire genetic content of an ecological community, such as a sample of sea water that might harbor thousands of microbial species. With this approach, the UW team could compare sequences in proteins from hundreds different species, thereby making it much more likely that any sequence changes detected did, in fact,  involve contact points.

By integrating this approach into Rosetta, the researchers showed it was possible to create structure prediction models on 614 protein families that have been, up to now, undecipherable.

“Many of the remaining unsolved structures are proteins from eukaryotes, which are higher life forms such as plants and animals. They are not as abundant in metagenomic samples, which tend to be largely from bacterial species,” said Baker. “We’re now reaching out to scientists everywhere who are sequencing eukaryotic species to build a database to solve these eukaryotic protein structures.”

Support for this research came in part from the U.S. Department of Energy Joint Genome Institute [Contract No. DE-AC02-05CH1123] and from the participants who donated their computer time in the Rosetta@ home and Charity engine programs.

Tagged with: Institute for Protein Design
Contact us about this story.