perfSONAR in ATLAS/WLCGShawn McKee, Marian BabikATLAS Jamboree / Network Section3rd December 2014
● perfSONAR has been deployed to monitor the network● The WLCG Networks and Transfer Metrics working
group is in the middle of a campaign to get perfSONAR upgraded and properly operating at ALL WLCG Tier-2 (and above) sites– Info on the working group is at: https://
twiki.cern.ch/twiki/bin/view/LCG/NetworkTransferMetrics – perfSONAR install details at: https://
twiki.opensciencegrid.org/bin/view/Documentation/DeployperfSONAR
– Primary challenges: newest version 3.4.1 installed, properly configured, firewalls not blocking operation
Introduction
Network Monitoring and Metrics WGMeeting
• perfSONAR 3.4 released Oct 14th
• Restructuring support and operations– Introduced site-level support via GGUS
• Rewritten documentation– https://twiki.opensciencegrid.org/bin/view/Documentation/DeployperfSONAR
• Responded to ShellShock and Poodle– Sites advised to terminated their instances– Performed security audit and established security procedures
• Testing and validation of the new perfSONAR central configuration is in progress
• Ongoing perfSONAR 3.4 update campaign – includes migration to the new configuration system – security considerations documented– progressing well (119 sonars updated out of 214)– See http://grid-monitoring.cern.ch/perfsonar_coverage.txt – deadline 8th January (we start ticketing sites after…)
3
perfSONAR ops
Network Monitoring and Metrics WGMeeting
• Mesh-configuration tool deployed in OSG production– Extra slides at end cover this tool
• Provides central interface to reconfigure the entire network– All aspects – tests parameters, mesh participation– List of available sonars taken from GOCDB and OIM– Supports hierarchical support model (per mesh admins)– Web interface– Connected to perfSONAR infrastructure monitoring
• Site reconfiguration needed to adopt– Run as part of 3.4 campaign
• perfSONAR data store status and plans– Deployed in OSG ITB – several major issues fixed– Scale tests on-going this week; operationally ready for production– Will be the “source” of network metrics for OSG/WLCG– Plan is to feed ->SSB->AGIS->SchedConfigDB continuously
Network Monitoring and Metrics WG Meeting
4
perfSONAR config and store
• Via perfSONAR we gather a number of metrics:– Topology/path-information via traceroute– One-way delay via OWAMP– Packet-loss via OWAMP– Usable bandwidth via BWCTL
• ESnet has some nice pages on using perfSONAR to identify problems– http://fasterdata.es.net/performance-testing/evaluating-n
etwork-performance/
• Some specific examples are discussed in the presentation from last week’s WG meeting: https://indico.cern.ch/event/354593/ Network Monitoring and Metrics WG
Meeting5
What?: Metrics and Their Use
• Saul Youssef has made a study of FTS transfers (see http://egg.bu.edu/atlas/studies%7btype:egg.Hatch%7d/fts-wan-study-2-for-adc/plot_latest/ )– Uses FTS transfer data + traceroute; assumes avg rate/file– Identifies problematic FTS channels– Can identify problematic hops in the network– Update at
http://egg.bu.edu/atlas/studies%7btype:egg.Hatch%7d/FTS_November_2014_bonus/
• This technique can be extended to other network metrics (perfSONAR, FAX, etc)
• Is very similar to PuNDITs network tomography algorithm• We should plan to incorporate this technique in ops:
– Automate this analysis on specific datasets (perfSONAR,FTS,FAX)– Plan to use it to identify problematic paths/links and FIX them!
Network Monitoring and Metrics WG Meeting
6
Saul’s Network Study
• We have a number of tools to track, monitor and manage the perfSONAR deployment– OMD (Nagios “bundle”) to track service status,
versions, configuration• https://maddash.aglt2.org/WLCGperfSONAR/check_mk (prototype)• Credentials WLCGps/WLCG to “read”• Now have a new version respecting x509 certs to be put into prod.
– MaDDash to visualize metrics • http://maddash.aglt2.org/maddash-webui/ (prototype)
– Summary coverage http://grid-monitoring.cern.ch/perfsonar_coverage.txt
– Mesh config/management (see slides at end)Network Monitoring and Metrics WG
Meeting7
Monitoring/Management Tools
• perfSONAR instances must be upgraded and properly configured.
• WG waiting on input from ATLAS (and others) on use-cases/requirements for network metrics– Strawman document ready early next year
• Discussion topics– How best to correlate perfSONAR instances with storage?– Tuning perfSONAR parameters and coverage– Requirements for “user” API for datastore – Using perfSONAR data (network tomography; problem
location; problem identification)
Network Monitoring and Metrics WG Meeting
8
perfSONAR Related Items
• We need to get perfSONAR data consistently available from all our sites, covering all our paths. Get sites upgraded/configured!
Questions?Discussion, Comments?
Network Monitoring and Metrics WG Meeting
9
Conclusion
Network Monitoring and Metrics WG Meeting
10
Mesh-Config GUI Host Groups
OSG (Soichi) has developed a nice web interface for mesh creation and configurationCurrently implements access based upon x509 credential. No fined-grain authorization: either ‘admin’ or ‘no access’Instances found from perfSONAR registration information from OIM (OSG) or GOCDB (WLCG)
Network Monitoring and Metrics WG Meeting
11
Mesh-Config Parameters
Parameters for perfSONAR tests are controlled centrally. Easy to modify as required
Network Monitoring and Metrics WG Meeting
12
Mesh-Config Meshes
Meshes can be created using this tab. This is the “metadata” needed to organize sets of perfSONAR instances.
Network Monitoring and Metrics WG Meeting
13
Mesh-Config Test Definitions
What tests get run for a mesh? That is controlled by this section.
• Once meshes are defined they are exposed via a URL like:http://myosg.grid.iu.edu/pfmesh/json/name/<mesh-name>?new
• Example for us-atlas: http://myosg.grid.iu.edu/pfmesh/json/name/us-atlas?new
• Status: In production but without the ?new will return the old “static” values hosted on CERN AFS
Network Monitoring and Metrics WG Meeting
14
Mesh Config URL