Gathering network metrics in OSG, and why that is useful

As many of you have heard, Open Science Grid (OSG) is developing an infrastructure to manage and organize network measurements from perfSONAR instances in OSG and the Worldwide LHC Computing Grid. The perfSONAR project provides a toolkit containing standardized network measurement components packaged for easy use. The OSG networking, technology and operations areas are working together to provide a framework to incorporate perfSONAR into sites, managing its configuration, defining regular network tests, and gathering and storing those resulting network metrics centrally.

Why is this important?  The first thing to note is that networks differ from many other IT components in that they typically span multiple administrative domains. No one person or group has control of or access to all the components of the end-to-end path.  Any problem in “the network” then becomes much more challenging to find and fix quickly. Another important aspect of the network is that it can be difficult to identify when performance issues involving use of the network are actually the result of problems in the network as opposed to problems on the end-hosts or application software involved.

The perfSONAR deployments give OSG information on how the network is behaving between specific sites and can quickly identify when problems arise.  OSG’s large set of such measurements could also provide the ability to quickly localize which specific “hops” in the network may be at fault. Another important aspect of having these metrics is allow higher-level services and applications to make informed decisions regarding all the monitored network paths, finding the best combinations of sources and destinations for data transfers, for example.

The perfSONAR framework in OSG is just ramping up, with the 3.4 release of perfSONAR in October.  Watch for tools and monitoring components to start to appear in production over the next few months.

~ Shawn McKee