OSG Newsletter – April 2013

New OSG Software Team Members

Please welcome two, new, full-time members to the OSG Software team, located at the University of Wisconsin-Madison:

Carl

Carl Edquist is a Wisconsin native who graduated from UW–Madison with degrees in math and computer science. Previously, he worked in network and database administration, network security, and programming (all things Linux). He especially enjoys the spring and fall in lovely Madison, hanging out with friends, and watching the sunsets over Lake Mendota.

Brian

Brian Lin grew up on Long Island (near Brookhaven National Laboratory) and graduated from McGill? University (Montreal) in Physics and Atmospheric Sciences. He has lived in Madison for the past 2 1/2 years and worked at Weather Central, supporting end-users and testing software. He enjoys cooking, brewing beer and tinkering with his Emacs configuration.

~Tim Cartwright

OSG All Hands 2013

The Open Science Grid recently held its annual all hands meeting in March on the campus of Indiana University – Purdue University at Indianapolis to focus on the computational needs of research and academic communities at every scale. Craig Stewart, executive director of the Indiana University Pervasive Technology Institute, welcomed attendees and recognized the OSG for its achievements. “The OSG plays a critical role in supporting the high-energy physics community, and yet the diversity of scientific disciplines supported by the OSG is at an all-time high.”

Read more details about the OSG All Hands Meeting.

~Rob Quick,Open Science Grid Operations Area Coordinator

Following the long tail of economics: Amit Gandhi uses the OSG to outdo parallel computing
Gandhi

Amit Gandhi is an assistant professor of economics at the University of Wisconsin-Madison. His specialty is industrial organization, and his research involves using data from industry to estimate the parameters in economic models. Once he knows those parameters—in other words, once he recovers the economic model from the data—he can use the model for policy analysis (answering policy-relevant questions such as what the likely effect of regulation will be).

Among other things, Gandhi is studying the long-tail phenomena—the full range of product variety in a given market, which is consistent with the choices consumers actually face. His goal is producing economic models of demand and supply that capture the full richness of product variety in the marketplace. Whereas traditional economic analysis has focused on competition among the top handful of brands or products in an industry, the reality is that the industries are characterized by a much larger mass of product variety than standard models have allowed and this variety is important for understanding competition and economic forces.

Gandhi2

The concept of “the long tail” is credited to Chris Anderson, author of “The Long Tail: Why the Future of Business is Selling Less of More.” According to Anderson, products in low demand or those that have a low sales volume can collectively make up a market share that rivals or exceeds the relatively few current bestsellers.

Gandhi’s recent research has generalized traditional models to take into account the entire pattern of product variety in an industry (i.e., both the head and the tail of demand), but one key cost of estimating these models against data is that it is computationally intensive. The model makes a prediction, the data seem to say something, and the econometrics underlying Gandhi’s methods tries to understand it all and minimize discrepancies. Optimization is especially challenging because everything becomes increasingly non-linear as models become more complicated. Gandhi maintains that the best way to solve non-linear problems like these is to do grid search because it is the only way to know for sure you have found the “best” answer, but people stopped doing grid search when they started finding too many points to search.

Gandhi3

For the models Gandhi has recently designed to study massive product variety, the econometric procedure requires “testing” any candidate parameter value to see whether it is an accepted member of a confidence set (a confidence interval of parameter values), and hence the estimation problem takes the form of an exhaustive grid search.

Fortunately for Gandhi, UW-Madison’s Center for High Throughput Computing (CHTC) offers advanced computing resources for researchers at the university. The Open Science Grid is one of the key tools the CHTC uses to help with complex problems. In particular, Gandhi works with the CHTC to use the OSG for structural estimations of his economic models.

By searching the grid with high throughput computing on the OSG, Gandhi has found that he can break up problems into nodes—the communication between the jobs only has to happen once all the nodes are finished. The ability to call upon thousands of nodes has revolutionized his research, as computation is no longer limited by parallel computing (which uses the Message Passing Interface). This approach allows him to gain a more immediate understanding of the likely parameter values, and greatly facilitates the econometric estimation of problems that otherwise would not be feasible.

Gandhi believes an approach combining CHTC resources and the OSG is superior to a departmental cluster approach. “Those clusters are expensive and soon become antiquated and hard to maintain,” he notes. “The CHTC/OSG approach is flexible—they can adjust.” The CHTC has also helped Gandhi with his submit scripts, so he can concentrate on his work. “For me,” he says, “CHTC is critical. I don’t have time to think about the computing end of it.”

Gandhi thinks economists could greatly benefit from learning about the benefits of grid computing over parallel computing. He also observes that many disciplines have common fundamental problems that are similar in structure. Ultimately, he would like to see disciplines learn from each other’s computational solutions. This is starting to happen through shared approaches to using resources such as the OSG, but researchers are only now beginning to realize how broad the implications could be.

~ Greg Moore and Sarah Engel

Lark brings distributed high throughput computing to the network
Distributed High Throughput Computing has a long history of finding resources for user jobs. This involves a delicate matchmaking process. A job will try to describe the resources it needs (number of cores, megabytes of RAM, gigabytes of disk), and systems such as HTCondor attempt to find a matching worker node.

On highly distributed platforms such as the OSG, we’ve found that networking resources are relevant to the matchmaking process: Does the job need an incoming network connection? Does it require access to a special network? How much bandwidth is necessary? How much bandwidth is available? On a university cluster, the networking between the scheduler and worker nodes may be relatively homogeneous – but, on the OSG, the bandwidth between a scheduler and worker node may differ by an order of magnitude.

The NSF-funded Lark project (award #1245864) aims to study the matchmaking language, policy, and technical mechanisms needed to make HTCondor aware of the network layer. We hope to enable HTCondor to 1) reactively make scheduling decisions based on perfSONAR network monitoring and 2) proactively reconfigure each batch slot’s network based on the job description.

To manage the batch slot’s network resource, we use a Linux feature called “network namespaces” to provide a per-batch-slot network device. This isolates each batch slot from the host network and other jobs, allowing us to provide job-specific configurations. We can further bridge the job’s device onto the external network and give it an externally routable address (similar to how bridge networking works with the KVM hypervisor). If a job is addressable, a sufficiently intelligent network can treat it separately from other jobs. For example, some jobs will get access to a private network, while others stay on the public network.

To learn how we create per-job network devices and hook them into the network.

For more OSG Technology Area updates, with insights into life in Distributed High Throughput Computing, follow our blog.

~Brian Bockelman, OSG developer, University of Nebraska–Lincoln

(Edited by Sarah Engel and Greg Moore)

Campus Grid Identity Integration with U-BOLT
Campus grid identity integration addresses a variety of problems in bringing humans to the computational resources that help them perform research, but it poses its own challenges to operators of such decentralized systems. Some of these challenges can be addressed by using conventional tools in unconventional ways, while others require new thinking and tooling. In the OSG Campus Infrastructures Community webcast, February, 2013, we discuss some solutions in use at the University of Chicago Computing Cooperative (UC3), a campus grid service, and explore the obstacles that institutions encounter and how they might be overcome in this installment of the OSG Campus Infrastructures Series.

About the Presenter: David Champion is a systems architect in the Computation Institute and the Enrico Fermi Institute at the University of Chicago, working with UC3 and with the US ATLAS Midwest Tier 2 Center. Over 20 years he has roved between central and departmental IT in system engineering, identity management, and software development. He hopes to combine this experience building identity management solutions for distributed computation providers.

~David Champion

 

Next Step in OSG Certificate Service (PKI) Transition

As of Mar 25, 2013 the DOE Grids Certificate Authority (CA) has stopped issuing certificates. The DOE Grids CA now redirects visitors to the OSG PKI, which is issuing host and user certificates for all OSG users. Existing DOE Grids certificates will continue to function until they expire.

This is a significant step in the transition of the certificate services from the Energy Services Network (ESnet) to OSG. There has been excellent collaboration between the projects along the way, and we are prepared for this new phase.

~ Von Welch

From OSG Communications
You might enjoy reading recent talks on different aspects of the program of work and visions of OSG and its collaborators:

From the Executive Director at the ISGC conference, “Open Science Grid and EScience in the US”
From the Technical Director at the DOE ASCR PI meeting, “dV/dt – Accelerating the rate of progress towards extreme scale collaborative science”
If you are interested in information about running new “standard applications,” read about running Quantum espresso using OSG software

~ Ruth Pordes, OSG Communications

— KimberlyMyles – 01 Apr 2013