Courtney Hall is a Ph.D. candidate studying quantitative methods in the Department of Educational Psychology at the University of Wisconsin-Madison (UW-Madison). She is working with Dr. Peter Steiner and Dr. Jee-Seon Kim on a project exploring propensity score matching methods with multilevel—or “nested”—observational data for use in the social sciences. With a background in educational policy, she is now applying the power of the Open Science Grid (OSG) to this multilevel data to help policy makers.
Courtney Hall. Photo Credit: Jennifer Seelig
Quantitative methods address social phenomena with statistical, mathematical or numerical data, and often with computation. In this case, propensity score matching is a statistical technique to help estimate the social effects of a policy by accounting for variables that predict treatment outcomes.
“Society needs better schools, better health care,” said Hall. “Education policy in the last 10 years or so has become evidence based and data driven. Policy makers in education or health care need statistics to back up policy. The OSG is helping to solve big data problems, which will help policy makers. Because much of the data we use is nested, we are creating new methodologies that work better with that kind of data. Traditional methods are not as effective.”
Nested data often occur in the fields of health, education and psychology. For example, students are nested within schools, schools within districts, and districts within states. Nested data need a special kind of analysis since some data can have things in common and some data can be very different. “If you are looking at more than one classroom,” adds Hall, “some students are getting one teacher and some another teacher, so students in one classroom might be fundamentally different from students in another classroom.” She is trying to find causal treatment effects, to be able to say “a treatment causes a particular effect.” The goal is to determine whether her data analyses can and should be used by other researchers and policy makers.
For her work, Hall creates realistic target populations. Taking 1,000 samples from each population, she looks for the average treatment effect under different conditions. Large sample sizes, multilevel modeling, and time-consuming matching methods can radically increase compute time. However, all three are necessary to draw conclusions. So, Hall got in touch with Lauren Michael at the UW-Madison Center for High Throughput Computing (CHTC) to learn about running large, complex simulations.
“At the CHTC, all new users meet with a Research Computing Facilitator, who introduces users to our various computing capabilities,” said Michael. “Through our system, users can submit jobs to our CHTC HTCondor pool, but can also elect to have their jobs ‘flocked’ to other HTCondor pools on campus (the “UW Grid”) or additionally sent to the OSG by adding simple submit file lines. CHTC users can get the equivalent of one year of computing (~10,000 compute hours) in a real day when submitting to the CHTC pool, but can obtain up to 10 years of computing in a single day when submitting to all three CHTC-accessible distributed high-throughput computing (DHTC) resources (CHTC, UW Grid, and OSG).”
When she started her project two years ago using a personal computer, Hall started with just a couple of target populations. She could do only a limited number—multilevel data are more complex, and analysis takes a long time. There were not as many options, and she would get results in a few weeks. Adjustments also took a long time, making slight adjustments painfully long. Progress was slow.
“Outsourcing to the OSG and the other resources at UW-Madison is very helpful,” notes Hall. “If I were to create the data and run the simulations on a personal computer, it could take three or four months. On the OSG and the CHTC-accessible grid, several thousand CPUs run overnight. This has not only sped up what we are doing, but has expanded what we are able to do. We are able to look at more scenarios. We have added a lot of depth to our research. We can create more populations.”
Her current simulations take samples from target populations and then estimate a variety of different propensity scores and treatment effects in a range of different ways. She is trying to find out how well to define a propensity score in order to get an unbiased treatment effect. Now, she can get results in a couple of days, do several hundred more analyses, and make adjustments much more quickly.
“For users like Courtney,” added Michael, “we provide additional scripts that make jobs portable to all of the available computers. These CHTC tools allow users to more easily stage files for many HTCondor jobs. In the last year, Courtney has obtained more than 300,000 compute hours through our DHTC resources, including more than 10,000 hours on the OSG.”
Once colleagues started hearing about her work, they became very interested. She is now working with five or six faculty and graduate students in her department to help them get started. “Lauren is very helpful,” says Hall, “but it’s good to have a peer in your own department. I can help translate between the social science world and the computer science world. Working with Lauren is great. It can be intimidating to start out, but now I’m much more comfortable thanks to her help.”
~ Greg Moore and Sarah Engel