Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
The perfSONAR Measurement Framework: Project Update and
Roadmap h>p://www.perfsonar.net
May 17, 2016
What is perfSONAR? • perfSONAR is a tool to:
– Set (hopefully raise) network performance expectaOons – Find network problems (“soR failures”) – Help fix these problems
• All in mulO-‐domain environments • These problems are all harder when mulOple networks are involved
• perfSONAR is provides a standard way to publish acOve and passive monitoring data – This data is interesOng to network researchers as well as network operators
6/2/15 2
Target perfSONAR Users • Network Engineers • Wide-‐Area Network Operators • Distributed Data Managers
– Large distributed science projects (e.g.: LHC) • perfSONAR is not aimed at end-‐users
– Finding the existence of performance problems is not hard with the right tools
– Finding the cause of performance problems is hard, even with the right tools
– perfSONAR is a great tool for skilled network engineers to diagnose problems • If there are enough perfSONAR hosts along the path.
June 9, 2016 3
perfSONAR Dashboard: Raising Expecta:ons and
improving network visibility
Status at-‐a-‐glance • Packet loss • Throughput • Correctness Current live instances at: • h>p://ps-‐dashboard.es.net/ • And many more Drill-‐down capabiliOes: • Test history between hosts • Ability to correlate with other
events • Very valuable for fault
localizaOon and isolaOon
6/2/15 4
When did network change occur?
June 9, 2016 © 2016, h>p://www.perfsonar.net 5
Problem Statement • In pracOce, performance issues are
prevalent and distributed. • When a network is underperforming
or errors occur, it is difficult to idenOfy the source, as problems can happen anywhere, in any domain.
• Local-‐area network tesOng is not sufficient, as errors can occur between networks.
6/2/15 6
Where Are The Problems?
Source Campus
Backbone
S
NREN
Congested or faulty links between domains
Congested intra-‐ campus links
D
DesOnaOon Campus
Latency dependant problems inside domains with small RTT
Regional
6/2/15 7
Source Campus
R&E Backbone
Regional
D S
Des:na:on Campus
Regional
Performance is good when RTT is < ~10 ms
Performance is poor when RTT exceeds ~10 ms
Switch with small buffers
Local TesOng Will Not Find Everything
6/2/15 8
Hard vs. SoR Failures • “Hard failures” are the kind of problems every organizaOon understands
– Fiber cut – Power failure takes down routers – Hardware ceases to funcOon
• Classic monitoring systems are good at alerOng hard failures – i.e., NOC sees something turn red on their screen – Engineers paged by monitoring systems
• “SoR failures” are different and oRen go undetected – Basic connecOvity (ping, traceroute, web pages, email) works – Performance is just poor
6/2/15 9
Sample SoR Failure: failing opOcs
Gb
/s
normal performance
degrading performance
one month
repair
6/2/15 10
A small amount of packet loss makes a huge difference
Metro Area
Local (LAN)
Regional
ConOnental
InternaOonal
Measured (TCP Reno) Measured (HTCP) Theoretical (TCP Reno) Measured (no loss)
With loss, high performance beyond metro distances is essentially impossible
6/2/15 11
perfSONAR CollaboraOon • The perfSONAR collaboraOon is a Open Source project led by ESnet, Internet2, Indiana
University, and GEANT. – Each organizaOon has commi>ed 1.5 FTE effort to the project – Plus addiOonal help from many others in the community (OSG, RNP, SLAC, and more)
• The perfSONAR Roadmap is influenced by – requests on the project issue tracker – annual user surveys sent to everyone on the user list – regular meeOngs with VO using perfSONAR such as the WLCG and OSG – discussions at various perfSONAR related workshops
• Based on the above, every 6-‐12 months the perfSONAR governance group meets to prioriOze features based on: – impact to the community – level of effort required to implement and support – availability of someone with the right skill set for the task
June 9, 2016 © 2016, h>p://www.perfsonar.net 12
Target perfSONAR Users • Network Engineers • Wide-‐Area Network Operators • Distributed Data Managers
– Large distributed science projects (e.g.: LHC) • perfSONAR is not aimed at end-‐users
– Finding the existence of performance problems is not hard with the right tools
– Finding the cause of performance problems is hard, even with the right tools
– perfSONAR is a great tool for skilled network engineers to diagnose problems … • … if there are enough perfSONAR hosts along the path.
June 9, 2016 13
public perfSONAR Servers (May 2016) • Around 1600 publicly registered servers
– Equal number of non-‐registered servers? • ESnet: 50
– mostly 10G, includes a 40G host in Boston – About 50% are now a ‘combined’ throughput/latency host
• GEANT: 22 – 100G host coming soon
• Internet2: 3 – PAS servers are private, used for alarming, but results are available via MADDASH
• Some other top deployments: – Onenet (24), AMPATH (8), bc.net (10), RNP (8), Canarie (13), kreonet (14), NERO(12), AARnet
(19), JGN (17), CENIC (5), KANREN (5)
June 9, 2016 © 2016, h>p://www.perfsonar.net 14
Who is running perfSONAR?
hKp://stats.es.net/ServicesDirectory/ 6/2/15 15
More perfSONAR StaOsOcs • 75% are running latest version
– v3.5.1.3, probably running auto-‐update • 22% of the hosts have an IPV6 Address • 38% are .edu hosts • 58 total top-‐level domains • 736 domains • 40% have MTU =9000 • 49% have a 10G NIC • 2.5% have a 40G NIC
June 9, 2016 © 2016, h>p://www.perfsonar.net 16
perfSONAR Hardware • These days you can get a good 1U host capable of pushing 10Gbps
TCP for around $500 (+10G NIC cost, $750?). – See perfSONAR user list
• And you can get a host capable of 1G for around $150! – Get a mulO-‐core Intel Celeron-‐based host
• ARM is not fast enough! – e.g.: ZBOX by ZOTAC: h>ps://www.zotac.com/us/product/mini_pcs/
zbox-‐ci323-‐nano
• VMs are not recommended – Tools more accurate if can guarantee NIC isolaOon
17
PERFSONAR DETAILS
June 9, 2016 © 2014, h>p://www.perfsonar.net 18
perfSONAR Components
June 9, 2016 © 2016, h>p://www.perfsonar.net 19
Recent Updates • 3.5 (September 2015)
– Re-‐designed Toolkit web interface – Introduced bundles – Debian 7 support – Improved central management features
• 3.5.1 (March 2016) – Updated regular tesOng UI – Updated esmond API – Synchronized package names and file structures between RedHat and Debian – Debian 8 support
June 9, 2016 © 2016, h>p://www.perfsonar.net 20
Common Themes From Users • Central meshes useful but hard to setup • Where should I run tests? • When will perfSONAR support CentOS 7? • I wish my dashboard had alerOng • BWCTL/scheduling issues
– I wish I had more visibility into BWCTL/scheduling – Why doesn’t everything get tracked by BWCTL? – I want more flexible scheduling – How do I add new tools?
June 9, 2016 © 2016, h>p://www.perfsonar.net 21
perfSONAR 4.0 • TargeOng beta in late summer, final in the fall of 2016
• Want to tackle as many issues as we can but need to keep scoped within what’s possible
June 9, 2016 © 2016, h>p://www.perfsonar.net 22
Current perfSONAR development • One of the themes for v4.0 will be “Control and Scalability”
– perfSONAR is successful because of the ‘default open’ model. – BUT, as the number of perfSONAR hosts worldwide grows, we need a way to control
• Who is running tests • How oRen are they allowed to run tests • What hosts can I run tests to? How to I get my host added to someone else’s list of
allowed hosts?
• Working on a new test scheduler (pScheduler): – Shared by all tests and aware of the resources each uses – Containing finer grained controls about who can run tests and what tests they
are allowed to run. – Increased visibility and control as to when tests will be run
June 9, 2016 23
Roadmap for the Next Release • New graphs that allow for easier comparison of mulOple
metrics – based on ESnet Tools team react-‐based ployng tools
• A web interface for creaOng test meshes • Easier selecOon of endpoints based on topology locaOon,
geographic locaOon, accessibility and/or custom searches • Dashboards that support alerOng based on pa>erns across an
enOre mesh • CentOS 7 / Debian 8 support
June 9, 2016 24
New: Endpoint SelecOon • Common feedback is that it’s hard to determine where to test
• We won’t solve all the problems this release, but trying to put some infrastructure in place – Gathering more metadata in lookup service – Leverage exisOng informaOon to find closest endpoint based off of traceroute
– Looking at ways to determine endpoint accessibility (i.e. is there a firewall? Does it block me?)
June 9, 2016 © 2016, h>p://www.perfsonar.net 25
Current: MaDDash • Status at-‐a-‐glance
– Packet loss – Throughput – Correctness
• Current live instances at: – h>p://ps-‐dashboard.es.net/
– And many more • Drill-‐down capabiliOes:
– Test history between hosts – Ability to correlate with other events
• Very valuable for fault localizaOon and isolaOon
• Currently no way to be pushed a noOficaOon of an issue
June 9, 2016 © 2016, h>p://www.perfsonar.net 26
New: Introducing MaDAlert • Developed at University of Michigan • Looks at dashboards and scans for pa>erns
– Example: If every box for a host is orange, good indicaOon host is down • Provides REST API to results • Currently a GUI to look at just the alerts
– h>p://madalert.aglt2.org/ • Working on Nagios checks so can leverage that noOficaOon system
without flooding yourself with emails • Also hoping to integrate with MaDDash UI to make idenOfying common
problems easier
June 9, 2016 © 2016, h>p://www.perfsonar.net 27
New: CentOS 7 • Current toolkits run on CentOS 6 • Not a flashy change, but survey results show the migraOon at many insOtuOons to RedHat/CentOS 7 is already well underway
• Current plan is to provide CentOS 6 AND CentOS 7 RPMs of all the 4.0 packages
• Likely will only be providing CentOS 7 Toolkit ISO for this release
June 9, 2016 © 2016, h>p://www.perfsonar.net 28
PERFSONAR INSTALLATION OPTIONS
June 9, 2016 © 2014, h>p://www.perfsonar.net 29
perfSONAR Toolkit • Currently most people run the perfSONAR Toolkit – Full suite of perfSONAR tools to configure, execute, collect, and visualize measurement results
– CentOS-‐based ISO pre-‐tuned and configured with default system and security seyngs
June 9, 2016 © 2016, h>p://www.perfsonar.net 30
perfSONAR Bundles • perfsonar-‐tools – Just the basics: iperf, iperf3, bwctl, owamp
• perfsonar-‐testpoint – Tools + regular tesOng, LS registraOon
• perfsonar-‐core – Testpoint + esmond (for storing results)
June 9, 2016 © 2014, h>p://www.perfsonar.net 31
Will this host primarily run
regularly scheduled measurements?
Install perfsonar-tools
Do I want to manage each host through the web
UI?
Is this
host going to centrally archive or manage my other
measurement hosts?
Do I want to store
my measurements in an archive that runs
on this host?
Install perfsonar-testpoint
Install perfsonar-centralmanagement
Yes
No
Install perfsonar-toolkitYes
Install perfsonar-coreYes
Yes
NoNo
START
Who answers "No"?
- Central measurement archives
- Data transfer nodes- Hosts that use the
network for purposes beyond just measurement
Who answers "Yes"?
- Central measurement archives
Who answers "Yes"?
- Dedicated measurement hosts solely tasked with performing network measurements
Who answers "No"?
- Hosts part of a large deployment, usually centrally managed by Puppet, CFEngine, etc.
- Hosts running on minimal hardware
Who answers "Yes"?
- Hosts without access to a central archive such as those in a large deployment that do not wish to deal with the extra effort required to run a large central archive
Who answers "No"?- Hosts running on minimal
hardware - Hosts with access to a central
measurement archive- Hosts that are part of a
centrally managed mesh- You want a registered testpoint
that others can run tests to
Who answers "No"?
- Data transfer nodes- Any other host that
uses a network
Who answers "Yes"?
- Hosts part of a small deployment (1-2 hosts)
- Hosts run by new perfSONAR users wanting to explore the full set of features from collection to display
Other Useful Packages:- Dashboard:
- maddash- Host Configuration:
- perfsonar-toolkit-ntp- perfsonar-toolkit-sysctl- perfsonar-toolkit-security
- Nagios- nagios-plugins-perfsonar
perfSONAR Bundle Selection
Guide
No
June 9, 2016 32
Current: MeshConfig • The idea of a central mesh file is to define tests in one place for all your hosts
• Basic process is: 1. Manually create configuraOon file 2. Convert to JSON using provided script 3. Publish JSON on web server 4. Point clients at JSON to figure out what tests to run
(opOonally point MaDDash at JSON to display results)
June 9, 2016 © 2016, h>p://www.perfsonar.net 33
Current: MeshConfig File • Can grow quickly • ESnet has one about 10,000 lines long: – h>ps://github.com/esnet/esnet-‐perfsonar-‐mesh/blob/master/conf/esnet-‐mesh_config.conf
June 9, 2016 © 2016, h>p://www.perfsonar.net 34
New: MeshConfig Admin UI • Replaces need to edit text file by hand – Based on work done for OSG
• AutomaOcally pulls hosts from lookup service • Access control allows you to assign different admins edit rights to different meshes
• AutomaOcally produces URLs to JSON specific for each host (i.e. no need to see tests not involved-‐in)
June 9, 2016 © 2016, h>p://www.perfsonar.net 35
NEW TEST SCHEDULER: PSCHEDULER
June 9, 2016 © 2014, h>p://www.perfsonar.net 36
Current: Test Scheduling • Currently two components are in charge of scheduling and execuOng tests:
– BWCTL – perfSONAR Regular TesOng
• BWCTL has been around a number of years and is good at what it does…but it’s challenging to make it do more
• Lots of requests for: – Greater visibility into scheduler – Make it easier to pipe more tools through it so aren’t unexpected conflicts – Support for different ways to define schedules – Be>er ability to request tests on-‐demand – Greater ability to extend in general
June 9, 2016 © 2016, h>p://www.perfsonar.net 37
3.5 Data CollecOon
June 9, 2016 © 2016, h>p://www.perfsonar.net 38
4.0 Data CollecOon
June 9, 2016 © 2016, h>p://www.perfsonar.net 39
New: pScheduler • Completely new soRware to handle all the stuff BWCTL and regular tesOng could
do…plus more • REST API allows tests to be requested, cancelled, viewed, etc • Plug-‐in framework for wriOng new tools
– Plug-‐ins for all the exisOng tools included at launch – Plug-‐in can be wri>en in any language – System for normalizing output between similar tools
• Plug-‐in framework for wriOng to different archivers • Keeps state in database so maintain schedule between reboots, outages, etc • Working on designing a more flexible limits and resource management
infrastructure
June 9, 2016 © 2016, h>p://www.perfsonar.net 40
Useful URLs • h>p://docs.perfsonar.net/ • h>p://www.perfsonar.net/ • h>p://fasterdata.es.net/ – h>p://fasterdata.es.net/performance-‐tesOng/network-‐troubleshooOng-‐tools/
• h>ps://github.com/perfsonar – h>ps://github.com/perfsonar/project/wiki
41