Using the OSG to study protein evolution at the University of Pennsylvania

Plotkin2Postdocs can help open new horizons. Just ask Joshua Plotkin, professor of biology, mathematics, and computer and information science at the University of Pennsylvania. After Oana Carja completed her PhD in biology at Stanford University (specializing in evolutionary dynamics), she came to Penn to work with Dr. Plotkin. His research group uses math and computation to study questions in evolutionary biology and ecology, but Carja added something special: experience with the Open Science Grid (OSG).



Image courtesy Joshua Plotkin

“Thanks to the OSG, we have started new projects that we wouldn’t have otherwise been able to do,” says Plotkin. “It has not only allowed us to solve current problems, but we can work on new problems. Now, many in our group are using OSG extensively. It has even affected the direction of some of our research.”

Plotkin and his colleagues are trying to understand how evolution works at the molecular scale in DNA—where mutations actually arise. Even though they are focused on the molecular level, the questions they are trying to answer focus on the big picture: How did evolution produce complicated organisms, or even complex parts like vertebrate eyes? How did complex things arise from seemingly simple things like bacteria?

influenza virus protein

Image courtesy Joshua Plotkin. A computationally predicted structure of an influenza virus protein. Plotkin’s research group uses the Open Science Grid to study questions in evolutionary biology and ecology. 

“These broad questions are hard to answer,” says Plotkin. “So, we work on more accessible questions like what determines how quickly a genome acquires and fixes mutations, or how large a role chance plays in evolution. Darwin thought about natural selection, but we realize now that randomness plays a role. So we try to study the importance of random, chance events.

“For example, when a big beneficial change comes around, is its effect dependent on what came before or would it have arisen anyway? Organisms must be robust against mutations, but sometimes an organism finds itself in an environment where it needs to use those mutations to adapt. So, we try to understand how organisms can use mutations when they are needed, but avoid their deleterious effects when they’re not needed.”

Plotkin principally uses mathematical models to describe this evolutionary process, data analysis to look at the actual genetic sequences as they evolve, and computation to bridge the mathematical models and data. A lot of the computation involves simulating complex models like protein folding.

“We do a combination of biology and mathematics,” says Plotkin. “On the one hand, we need to know some biology—how organisms work, how proteins work and fold—but to describe an evolutionary process, one needs to be conversant in mathematics as well, because we are writing down mathematical descriptions of the processes.”

Plotkin came from a mathematics background and then became interested in biology. Some in his lab come from physics or other disciplines. They seek to apply their skills to biology and ask questions that make sense to biologists.

“Evolutionary theory is intrinsically interdisciplinary,” he adds. “That makes it challenging. On one hand, we can do simple math models on paper, but then we also have complicated genetic sequences. We use the OSG to run computer simulations for complex processes. By simulating evolution in populations, we can study hypothetical situations that you can’t study in the wet lab or field.”

Simulations allow flexibility to explore questions that aren’t possible by inspecting real-life data sets alone. And they can provide more complexity than tractable math models.

These simulations are computationally expensive because the researchers need to simulate a whole range of mutations to determine how a population evolves. They also run a large number of simultaneous simulations to see what happens.

“If we need 4,000 CPUs to simulate a single protein evolving or to see what would happen if it had evolved differently, OSG can do that. For a while, we were purchasing time on computer clusters, but clusters are set up for a different kind of scientific problem. We don’t need lots of data analysis. The OSG has transformed our ability to simulate complex problems.”

Plotkin stresses that not enough scientists are aware of the OSG. “Those who use numerical computation requiring many CPUs or those doing stochastic (randomly determined) simulations, anything done in large batches of simulations all at once—a good example is protein folding—would benefit greatly from the OSG.”

Balamurugan (Bala) Desinghu at the University of Chicago lent a hand in getting them set up. “Bala called me literally five minutes after I signed up for OSG,” says Plotkin. “He was incredibly helpful, completely the opposite from my previous experience trying to find computing resources. When some of our jobs involving long computations were taking as long as a week, Bala noticed and gave us some recommendations for more efficient code.”

“Evolution is happening all around us and affects our lives,” says Plotkin. “Modern evolutionary theory can describe what is happening right in front of us, including things like influenza viruses and cancer, and it directly affects how we treat the flu and cancer. Thanks to the OSG, we can vastly accelerate our research in evolutionary theory.”

Plotkin and colleagues Premal Shah and David M. McCandlish recently published a paper, Contingency and entrenchment in protein evolution under purifying selection, where they acknowledged the OSG for support of their computational models.

– Greg Moore