|
The Operations team provides a bi-weekly report to the Council to keep them up to date and to bring issues to the table for discussion. Here are a few items from recent postings.
With CMS and ATLAS increasing their number of submitted jobs to a site, several scaling problems have arisen. These issues are being addressed by extensions to Condor and Condor-G, and with the Globus software which is run on the head node for the
job-manager.
More than 45% of sites have upgraded to OSG 0.4.0, and 26% are reporting to the MonaLisa (ML) accounting. The daily usage reports are based on ML, so while it remains an optional component, if you want your site to be included in the accounting you will need to install and configure it. The operations team will be happy to help with this.
The education project MARIACHI and more than seven new resources registered with operations this month. Over 100 tickets were generated of which over three-quarters were closed during the month. In March we will continue to encourage resources to install OSG 0.4.0 and support centers to consider supporting new VOs.
Leigh Grundhoefer, OSG Operations Coordinator, Indiana University
|

Image Credit Greg Flint/ Purdue Univ. |
The Grid Exerciser is an application developed and run by the University of Wisconsin- Madison's Condor team to validate the usability of sites and to check the ability of OSG sites to support "bottom feeder" applications. This application has been available for several years, and has been suffering from site administrators' nervousness that it has the potential to overload the system and/or pre-empt other jobs from running. This should not happen with a properly configured compute element, and documentation is available on the OSG TWiki for configuring the policy appropriately for PBS and Condor batch systems. We invite feedback for other batch systems.
The Grid Exerciser runs under the GridEx VO and runs a job through the batch queue on a site. It tests authentication and the job execution environment. The OSG GridEx is currently running against six OSG-ITB and four US CMS production sites that run OSG 0.3.4. After we drive all the known errors out of these sites, we will consider increasing the target site list, but only to new OSG versions.
Recently the University of Wisconsin team has increased their attention to the specific errors encountered and is working directly with operations and site administrators to understand and fix them. The Grid Exerciser is proving to be a valuable tool for looking at the availability, stability and fault tolerance of site services. We expect this work to continue and eventually ramp up to be a standard part of the validation tools for all OSG sites and increase the effectiveness of the distributed facility for running submitted jobs.
Miron Livny, OSG Facility Coordinator, University of Wisconsin-Madison
|
  |
|
 |
|
Dear OSG Consortium and Friends,
I am very pleased to announce that Bill Kramer has been selected by the Council as its new chairperson. As head of the NERSC computing center, Bill brings a wealth of experience and understanding to our program, and we are already keeping him busy. As one of the new applications coordinators, Frank Würthwein continues to be part of the OSG's core team and we will continue to benefit from his contributions and insights. I look forward to working with each and every member of the Executive Board.
At the beginning of this month we submitted the OSG program of work as an unsolicited proposal to the NSF's Mathematical and Physical Sciences Directorate, and we are in the process of submitting the same proposal to the DOE SciDAC-2 program. The proposal focuses on three key areas: the OSG facility; education, outreach and training; and science-driven extensions.
The Consortium meeting saw the presentation and discussion of many aspects of the use and provisioning of the facility, including the contents and schedule of the next two OSG releases and VDT 1.3.10. The local organizers—Paul Avery, DeeDee Carver and Jorge Rodriguez—did a superb job. CMS is ramping up OSG activity once again and DZero is validating one site at a time to run SAMGrid-based reprocessing jobs. Mike Wilde is working with Soma Mukherjee and UTB on the logistics and schedule for this year's summer school; please contact him if you are interested in contributing.
Sincerely,
Ruth Pordes, OSG Executive Director
|

Reprocessing status as of February 20. (Click on image for larger version.) |
D0's latest reprocessing of its Run IIa data used several OSG sites, which together processed more than 10 million events. D0 has used resources from collaborating institutions for several years for reprocessing, since their Fermilab resources are busy processing newly collected data and Monte Carlo simulations are always ongoing. The addition of OSG and LCG resources helped the current reprocessing of about 1.4 billion events finish several weeks ahead of schedule. The reprocessed data will be used for physics analyses over the next year and probably longer.
We reprocessed a total of approximately 72,000 1 GB data files using all resources. At one file per job, that's 72,000 jobs. We required about 4 GB of local scratch space (not NFS mounted) on the worker node, and worker nodes had to have outgoing Internet connectivity (our SAMGrid software can deal with clusters where incoming Internet access is blocked). Jobs typically ran for about 12 hours, each reading and writing 1 GB of data. The data came from Fermilab or an intermediate site and were returned to Fermilab for storage. To sustain running, we required 20 MB/s of aggregate bandwidth—which meant using several sites to spread out the network load.
Jobs ran on OSG sites at Indiana University, Oklahoma University and the University of Nebraska, as well as on the CMS, D0 and FermiGrid sites at Fermilab. Our focus will now shift to automated generation of Monte Carlo simulations, where we hope to sustain a rate of more than two million events per week on OSG. To achieve this, we plan to test the new "SAM on the fly" installation, which a University of Wisconsin student has been working on, with the SAM team.
Adam Lyon and Parag Mhashilkar, Fermilab and Joel Snow, Langston University
|

The OSG Consortium at January's meeting. (Click on image for larger version.) |
In January, the OSG Council elected the first Executive Director and Council Chair, who
appointed the remaining members of the Executive Team and Executive Board to their first
term.
The OSG Executive Team is responsible for all aspects of the OSG program of work—deliverables,
milestones, activities and finances. The Team includes the Executive Director,
Facility, Application and EOT Coordinators and Resource Managers.
The OSG Executive Board, which includes the Executive Team, directs the OSG program of work,
draws up policies and represents the OSG Consortium in dealing with other organizations,
communities and committees.
| Executive Director |
Ruth Pordes |
Fermilab |
| Council Chair |
Bill Kramer |
NERSC |
| Facility Coordinator |
Miron Livny |
University of Wisconsin-Madison |
| Resource Managers |
Paul Avery |
University of Florida |
|
Albert Lazzarini |
Caltech |
| Application Coordinators |
Torre Wenaus |
Brookhaven National Laboratory |
|
Frank Würthwein |
University of California, San Diego |
| EOT Coordinator |
Mike Wilde |
Argonne National Laboratory |
| Security Officer |
Don Petravick |
Fermilab |
| Engagement Coordinator |
Alan Blatecky |
Renaissance Computing Institute |
| Operations Coordinator |
Leigh Grundhoefer |
Indiana University |
| Middleware Coordinator |
Alain Roy |
University of Wisconsin-Madison |
| Liaison to European Grid Projects |
John Huth |
Harvard University |
| Liaison to TeraGrid and U.S. Grid Projects |
Mark L. Green |
University at Buffalo |
| Deputy to the Executive Director and Integration Coordinator |
Rob Gardner |
University of Chicago |
| Deputy to the Executive Director, Operations and Security |
Doug Olson |
Lawrence Berkeley National Laboratory |
|
|
|