Using the OSG to simulate DNA-protein interaction



Gordon Freeman recently defended his doctoral dissertation at the University of Wisconsin–Madison (UWM). Freeman worked in UWM’s de Pablo Research Group, which studies the thermodynamic properties of materials such as DNA and proteins at the nanoscale. He found the Open Science Grid to be indispensable to his research.

The focus of Freeman’s work is how DNA interacts with entities such as proteins within a cell. “The interactions are not well understood,” Freeman said. Therefore, it is very difficult for us to replicate them. These interactions take place everywhere in the human body and are important for regulatory elements (called the epigenome) and how they are expressed. In particular, Freeman studies diseases where the behavior of DNA in nucleosomes has been implicated. “Understanding how proteins bind to the DNA will greatly further our understanding of how the human body works.”

The group’s tools are all computational. Researchers first draw on the literature to ensure their models are grounded in reality. The difficulty is figuring out which details are important and which are not, so they have developed several generations of DNA models. Overly simplistic models can be problematic when trying to study complex biological systems, but more detail is computationally expensive. Ultimately, they try to find the tradeoff between essential details and what is computationally feasible.

nucleosome DNA is packaged into the nucleus of a cell with the help of a protein complex made up of histone proteins. Shown here is the building block of that structure, called the nucleosome. Using molecular simulation and the OSG, Freeman and his coworkers have found that the physical attributes of DNA dictate the binding of this protein complex. The basic principles that govern this interaction also play an important role in many other DNA-protein binding events and have important implications in better understanding the role of DNA biophysics in disease as well as in engineering new strategies to control gene expression.

Freeman is interested in the actual biophysics that affect recognition between proteins called histones and DNA. Histones are the proteins that package DNA within the nucleus of a cell. Why will histones bind to DNA in one place but not another? UWM research suggests that the physical attributes of the DNA itself are what drives the interaction between histones and DNA. The group tries to tease out the key biophysical properties that help proteins recognize very specific instances of DNA so as to bind in the right place.

What makes all this very challenging is that the physical shape of the DNA molecule is critically important for determining which regions of DNA bind most readily to the histones. Therefore, the researchers have a pressing need for computing models that capture all this detail – and that, in turn, means they require significant computing power.

OSG computing power helps them address those challenges. Freeman and his colleagues can run in excess of 3,000 simulations simultaneously, something they cannot do in their lab. The OSG gives them the resources to run very long simulations that they can break into trivially parallel simulations that don’t need to communicate with each other. As a result, their productivity has skyrocketed.

The lab uses HTCondor for simulations. This gives researchers the ability to add a flag to a submit file, telling it to run wherever it can. The data then comes back (it can even be a few days). “It has been easy to learn and minimal work,” says Freeman. “It’s as if everything is running locally at UWM.” Reassembling data on the back end is easy, he notes – the software schedules the jobs, finds the open nodes, sends the executable, and then it sends back the data that has been generated. The group then does local post-processing. Adding the OSG doesn’t change anything on UWM’s end. It adds the CPU power without adding complexity.One of the many benefits Freeman sees in the OSG is that they are only using computers that are not in use at the moment – the OSG is opportunistic. When computers attached to the OSG are available, other OSG users can use them. “This is efficient and makes the most of valuable resources,” notes Freeman. “Funding agencies should be pleased to know that their resources are being used to the max. The way the OSG works makes that possible.”

Freeman recommends that other researchers consider whether they have a problem that would benefit from computational power. “Anyone who processes a lot of data—even social sciences—will benefit from the OSG,” he says. “If you can split data into independent jobs, you have seemingly unlimited resources. If they need to change their workflow to take advantage of the OSG, they should. It’s too big of a resource to ignore. Six to seven months of work now becomes one or two months or maybe even only weeks, merely by having more computing nodes available. Three thousand is better than 500. Six times more machines means six times faster.”

That his family could watch his dissertation defense over streaming media illustrated to him the usefulness of visualization: “What we are discovering about DNA is easy to grasp when you can see it,” says Freeman. “We can now fully illustrate, through computation, how DNA interacts with other entities such as proteins within a cell. We can show the public what our research looks like.” Freeman pointed out that what they discover through computation (in silico), they always confirm in the real world (in vivo) – but he says that computation should matter to the general public because it enables researchers to study a wide range of interactions between key biological molecules in an inexpensive manner. It speeds up drug discovery. Thus, funding science is extremely valuable to everyone.

Freeman concluded, “Our work doesn’t just describe DNA-protein interaction; we can develop models and make predictions and do everything much cheaper than experimental approaches. Computational power gives scientists powerful tools. If I were to describe our work, it would be like this: If you understand DNA, you can design molecules or objects that complement DNA, and engineer therapeutic agents that bind the same way. Oncogenes are an example. An oncogene is a gene that has the potential to cause cancer. By regulating these genes, we might be able to develop therapeutic treatments for cancer. This is possible through computational power and resources like the Open Science Grid.”

~Greg Moore and Sarah Engel