OSG and Network Awareness

As some of you may know, the OSG project has added networking as a focus area over the last two years. Why is this important to OSG? Two main reasons: 1) networks are the basis of OSG, tying computing and storage resources to scientists and 2) network problems can be very difficult to find and fix.

The OSG Networking area has worked with the perfSONAR-PS developers and OSG end-sites to deploy and configure two perfSONAR-PS Toolkit instances to measure latency and bandwidth as well as the network path between those end-sites and relevant partner sites across network. The collaboration between OSG Networking and the perfSONAR-PS developers has resulted in a much more robust, resilient perfSONAR-PS deployment that requires very little maintenance at the sites. The sites benefit from standardized network monitoring that provides visibility into their network status as well as enabling more effective remote support when issues are found.

The OSG Networking area’s vision is to instrument all OSG networks with strategically placed perfSONAR-PS installations to gather metrics that can be used to identify and localize problems in the network. These metrics form the basis of an OSG Network Service, which uses an underlying datastore that users and applications can query to identify and localize problems and to provide “network awareness” to higher-level services so they can better utilize the networks that glue OSG users and resources together. OSG intends to become the network data steward, not only for OSG but globally for the Worldwide LHC Computing Grid community as well.

~ Shawn McKee

Examples of the MaDDash network dashboard, the ESnet-supported dashboard for the OSG Network Service, featuring US ATLAS Tier-2 sites:  the left matrix shows bandwidth test results, while the right shows latency/packet-loss results.  When on the dashboard site, users can “drill down” by clicking on boxes to get further details.