41
The perfSONAR Measurement Framework: Project Update and Roadmap h>p://www.perfsonar.net May 17, 2016

The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

The  perfSONAR  Measurement  Framework:  Project  Update  and  

Roadmap  h>p://www.perfsonar.net  

May  17,  2016    

Page 2: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

What  is  perfSONAR?  •  perfSONAR  is  a  tool  to:  

–  Set  (hopefully  raise)  network  performance  expectaOons  –  Find  network  problems  (“soR  failures”)  –  Help  fix  these  problems  

•  All  in  mulO-­‐domain  environments  •  These  problems  are  all  harder  when  mulOple  networks  are  involved  

•  perfSONAR  is  provides  a  standard  way  to  publish  acOve  and  passive  monitoring  data  –  This  data  is  interesOng  to  network  researchers  as  well  as  network  operators  

6/2/15   2  

Page 3: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

Target  perfSONAR  Users  •  Network  Engineers  •  Wide-­‐Area  Network  Operators  •  Distributed  Data  Managers  

–  Large  distributed  science  projects  (e.g.:  LHC)  •  perfSONAR  is  not  aimed  at  end-­‐users  

–  Finding  the  existence  of  performance  problems  is  not  hard  with  the  right  tools  

–  Finding  the  cause  of  performance  problems  is  hard,  even  with  the  right  tools  

–  perfSONAR  is  a  great  tool  for  skilled  network  engineers  to  diagnose  problems  •  If  there  are  enough  perfSONAR  hosts  along  the  path.  

June  9,  2016   3  

Page 4: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

perfSONAR  Dashboard:  Raising  Expecta:ons  and  

improving  network  visibility    

Status  at-­‐a-­‐glance  •  Packet  loss  •  Throughput  •  Correctness  Current  live  instances  at:  •  h>p://ps-­‐dashboard.es.net/  •  And  many  more  Drill-­‐down  capabiliOes:  •  Test  history  between  hosts  •  Ability  to  correlate  with  other  

events  •  Very  valuable  for  fault  

localizaOon  and  isolaOon  

6/2/15   4  

Page 5: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

When  did  network  change  occur?  

June  9,  2016  ©  2016,  h>p://www.perfsonar.net   5  

Page 6: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

Problem  Statement  •  In  pracOce,  performance  issues  are  

prevalent  and  distributed.  •  When  a  network  is  underperforming  

or  errors  occur,  it  is  difficult  to  idenOfy  the  source,  as  problems  can  happen  anywhere,  in  any  domain.    

•  Local-­‐area  network  tesOng  is  not  sufficient,  as  errors  can  occur  between  networks.  

6/2/15   6  

Page 7: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

Where  Are  The  Problems?  

Source  Campus  

 Backbone  

S  

NREN    

Congested  or  faulty  links  between  domains  

Congested  intra-­‐  campus  links  

D  

DesOnaOon  Campus  

Latency  dependant  problems  inside  domains  with  small  RTT  

Regional  

6/2/15   7  

Page 8: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

Source  Campus  

R&E  Backbone  

Regional  

D  S  

Des:na:on  Campus  

Regional  

Performance  is  good  when  RTT  is  <  ~10  ms  

Performance  is  poor  when  RTT  exceeds  ~10  ms  

Switch  with  small  buffers  

Local  TesOng  Will  Not  Find  Everything  

6/2/15   8  

Page 9: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

Hard  vs.  SoR  Failures  •  “Hard  failures”  are  the  kind  of  problems  every  organizaOon  understands  

–  Fiber  cut  –  Power  failure  takes  down  routers  –  Hardware  ceases  to  funcOon  

•  Classic  monitoring  systems  are  good  at  alerOng  hard  failures  –  i.e.,  NOC  sees  something  turn  red  on  their  screen  –  Engineers  paged  by  monitoring  systems  

•  “SoR  failures”  are  different  and  oRen  go  undetected  –  Basic  connecOvity  (ping,  traceroute,  web  pages,  email)  works  –  Performance  is  just  poor  

6/2/15   9  

Page 10: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

Sample  SoR  Failure:  failing  opOcs  

 Gb

/s  

normal  performance

degrading  performance

one  month  

repair

6/2/15   10  

Page 11: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

A small amount of packet loss makes a huge difference

Metro  Area  

Local  (LAN)  

Regional  

ConOnental  

InternaOonal  

Measured (TCP Reno) Measured (HTCP) Theoretical (TCP Reno) Measured (no loss)

With loss, high performance beyond metro distances is essentially impossible

6/2/15   11  

Page 12: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

perfSONAR  CollaboraOon  •  The  perfSONAR  collaboraOon  is  a  Open  Source  project  led  by  ESnet,  Internet2,  Indiana  

University,  and  GEANT.  –  Each  organizaOon  has  commi>ed  1.5  FTE  effort  to  the  project  –  Plus  addiOonal  help  from  many  others  in  the  community  (OSG,  RNP,  SLAC,  and  more)  

•  The  perfSONAR  Roadmap  is  influenced  by    –  requests  on  the  project  issue  tracker  –  annual  user  surveys  sent  to  everyone  on  the  user  list  –  regular  meeOngs  with  VO  using  perfSONAR  such  as  the  WLCG  and  OSG  –  discussions  at  various  perfSONAR  related  workshops  

•  Based  on  the  above,  every  6-­‐12  months  the  perfSONAR  governance  group  meets  to  prioriOze  features  based  on:  –  impact  to  the  community  –  level  of  effort  required  to  implement  and  support  –  availability  of  someone  with  the  right  skill  set  for  the  task  

June  9,  2016  ©  2016,  h>p://www.perfsonar.net   12  

Page 13: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

Target  perfSONAR  Users  •  Network  Engineers  •  Wide-­‐Area  Network  Operators  •  Distributed  Data  Managers  

–  Large  distributed  science  projects  (e.g.:  LHC)  •  perfSONAR  is  not  aimed  at  end-­‐users  

–  Finding  the  existence  of  performance  problems  is  not  hard  with  the  right  tools  

–  Finding  the  cause  of  performance  problems  is  hard,  even  with  the  right  tools  

–  perfSONAR  is  a  great  tool  for  skilled  network  engineers  to  diagnose  problems  …  •  …  if  there  are  enough  perfSONAR  hosts  along  the  path.  

June  9,  2016   13  

Page 14: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

public  perfSONAR  Servers  (May  2016)  •  Around  1600  publicly  registered  servers  

–  Equal  number  of  non-­‐registered  servers?  •  ESnet:  50    

–  mostly  10G,  includes  a  40G  host  in  Boston  –  About  50%  are  now  a  ‘combined’  throughput/latency  host  

•  GEANT:  22  –  100G  host  coming  soon  

•  Internet2:  3  –  PAS  servers  are  private,  used  for  alarming,  but  results  are  available  via  MADDASH  

•  Some  other  top  deployments:  –  Onenet  (24),  AMPATH  (8),  bc.net  (10),  RNP  (8),  Canarie  (13),  kreonet  (14),  NERO(12),  AARnet  

(19),  JGN  (17),  CENIC  (5),  KANREN  (5)  

June  9,  2016  ©  2016,  h>p://www.perfsonar.net   14  

Page 15: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

Who  is  running  perfSONAR?  

hKp://stats.es.net/ServicesDirectory/    6/2/15   15  

Page 16: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

More  perfSONAR  StaOsOcs  •  75%  are  running  latest  version    

–  v3.5.1.3,  probably  running  auto-­‐update  •  22%  of  the  hosts  have  an  IPV6  Address    •  38%  are  .edu  hosts  •  58  total  top-­‐level  domains  •  736  domains  •  40%  have  MTU  =9000  •  49%  have  a  10G  NIC  •  2.5%  have  a  40G  NIC    

June  9,  2016  ©  2016,  h>p://www.perfsonar.net   16  

Page 17: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

perfSONAR  Hardware  •  These  days  you  can  get  a  good  1U  host  capable  of  pushing  10Gbps  

TCP  for  around  $500  (+10G  NIC  cost,  $750?).  –  See  perfSONAR  user  list  

•  And  you  can  get  a  host  capable  of  1G  for  around  $150!  –  Get  a  mulO-­‐core  Intel  Celeron-­‐based  host    

•  ARM  is  not  fast  enough!  –  e.g.:    ZBOX  by  ZOTAC:  h>ps://www.zotac.com/us/product/mini_pcs/

zbox-­‐ci323-­‐nano  

•  VMs  are  not  recommended  –  Tools  more  accurate  if  can  guarantee  NIC  isolaOon  

17  

Page 18: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

PERFSONAR  DETAILS  

June  9,  2016  ©  2014,  h>p://www.perfsonar.net   18  

Page 19: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

perfSONAR  Components  

June  9,  2016  ©  2016,  h>p://www.perfsonar.net   19  

Page 20: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

Recent  Updates  •  3.5  (September  2015)  

–  Re-­‐designed  Toolkit  web  interface  –  Introduced  bundles  –  Debian  7  support  –  Improved  central  management  features  

•  3.5.1  (March  2016)  –  Updated  regular  tesOng  UI    –  Updated  esmond  API  –  Synchronized  package  names  and  file  structures  between  RedHat  and  Debian  –  Debian  8  support  

June  9,  2016  ©  2016,  h>p://www.perfsonar.net   20  

Page 21: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

Common  Themes  From  Users  •  Central  meshes  useful  but  hard  to  setup  •  Where  should  I  run  tests?  •  When  will  perfSONAR  support  CentOS  7?  •  I  wish  my  dashboard  had  alerOng  •  BWCTL/scheduling  issues  

–  I  wish  I  had  more  visibility  into  BWCTL/scheduling  –  Why  doesn’t  everything  get  tracked  by  BWCTL?  –  I  want  more  flexible  scheduling  –  How  do  I  add  new  tools?  

June  9,  2016  ©  2016,  h>p://www.perfsonar.net   21  

Page 22: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

perfSONAR  4.0  •  TargeOng  beta  in  late  summer,  final  in  the  fall  of  2016  

•  Want  to  tackle  as  many  issues  as  we  can  but  need  to  keep  scoped  within  what’s  possible  

June  9,  2016  ©  2016,  h>p://www.perfsonar.net   22  

Page 23: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

Current  perfSONAR  development  •  One  of  the  themes  for  v4.0  will  be  “Control  and  Scalability”  

–  perfSONAR  is  successful  because  of  the  ‘default  open’  model.  –  BUT,  as  the  number  of  perfSONAR  hosts  worldwide  grows,  we  need  a  way  to  control  

•  Who  is  running  tests  •  How  oRen  are  they  allowed  to  run  tests  •  What  hosts  can  I  run  tests  to?  How  to  I  get  my  host  added  to  someone  else’s  list  of  

allowed  hosts?  

•  Working  on  a  new  test  scheduler  (pScheduler):  –  Shared  by  all  tests  and  aware  of  the  resources  each  uses  –  Containing  finer  grained  controls  about  who  can  run  tests  and  what  tests  they  

are  allowed  to  run.  –  Increased  visibility  and  control  as  to  when  tests  will  be  run    

June  9,  2016  23  

Page 24: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

 Roadmap  for  the  Next  Release  •  New  graphs  that  allow  for  easier  comparison  of  mulOple  

metrics  –  based  on  ESnet  Tools  team  react-­‐based  ployng  tools    

•  A  web  interface  for  creaOng  test  meshes  •  Easier  selecOon  of  endpoints  based  on  topology  locaOon,  

geographic  locaOon,  accessibility  and/or  custom  searches  •  Dashboards  that  support  alerOng  based  on  pa>erns  across  an  

enOre  mesh  •  CentOS  7  /  Debian  8  support  

June  9,  2016  24  

Page 25: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

New:  Endpoint  SelecOon  •  Common  feedback  is  that  it’s  hard  to  determine  where  to  test  

•  We  won’t  solve  all  the  problems  this  release,  but  trying  to  put  some  infrastructure  in  place  –  Gathering  more  metadata  in  lookup  service  –  Leverage  exisOng  informaOon  to  find  closest  endpoint  based  off  of  traceroute  

–  Looking  at  ways  to  determine  endpoint  accessibility  (i.e.  is  there  a  firewall?  Does  it  block  me?)  

June  9,  2016  ©  2016,  h>p://www.perfsonar.net   25  

Page 26: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

Current:  MaDDash  •  Status  at-­‐a-­‐glance  

–  Packet  loss  –  Throughput  –  Correctness  

•  Current  live  instances  at:  –  h>p://ps-­‐dashboard.es.net/  

–  And  many  more  •  Drill-­‐down  capabiliOes:  

–  Test  history  between  hosts  –  Ability  to  correlate  with  other  events  

•  Very  valuable  for  fault  localizaOon  and  isolaOon  

•  Currently  no  way  to  be  pushed  a  noOficaOon  of  an  issue  

June  9,  2016  ©  2016,  h>p://www.perfsonar.net   26  

Page 27: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

New:  Introducing  MaDAlert  •  Developed  at  University  of  Michigan  •  Looks  at  dashboards  and  scans  for  pa>erns  

–  Example:  If  every  box  for  a  host  is  orange,  good  indicaOon  host  is  down  •  Provides  REST  API  to  results  •  Currently  a  GUI  to  look  at  just  the  alerts  

–  h>p://madalert.aglt2.org/  •  Working  on  Nagios  checks  so  can  leverage  that  noOficaOon  system  

without  flooding  yourself  with  emails  •  Also  hoping  to  integrate  with  MaDDash  UI  to  make  idenOfying  common  

problems  easier  

June  9,  2016  ©  2016,  h>p://www.perfsonar.net   27  

Page 28: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

New:  CentOS  7  •  Current  toolkits  run  on  CentOS  6  •  Not  a  flashy  change,  but  survey  results  show  the  migraOon  at  many  insOtuOons  to  RedHat/CentOS  7  is  already  well  underway  

•  Current  plan  is  to  provide  CentOS  6  AND  CentOS  7  RPMs  of  all  the  4.0  packages  

•  Likely  will  only  be  providing  CentOS  7  Toolkit    ISO  for  this  release  

June  9,  2016  ©  2016,  h>p://www.perfsonar.net   28  

Page 29: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

PERFSONAR  INSTALLATION  OPTIONS  

June  9,  2016  ©  2014,  h>p://www.perfsonar.net   29  

Page 30: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

perfSONAR  Toolkit  •  Currently  most  people  run  the  perfSONAR  Toolkit  – Full  suite  of  perfSONAR  tools  to  configure,  execute,  collect,  and  visualize  measurement  results  

– CentOS-­‐based  ISO  pre-­‐tuned  and  configured  with  default  system  and  security  seyngs  

June  9,  2016  ©  2016,  h>p://www.perfsonar.net   30  

Page 31: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

perfSONAR  Bundles  •  perfsonar-­‐tools  –  Just  the  basics:  iperf,  iperf3,  bwctl,  owamp  

•  perfsonar-­‐testpoint  – Tools  +  regular  tesOng,  LS  registraOon  

•  perfsonar-­‐core  – Testpoint  +  esmond  (for  storing  results)  

June  9,  2016  ©  2014,  h>p://www.perfsonar.net   31  

Page 32: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

Will this host primarily run

regularly scheduled measurements?

Install perfsonar-tools

Do I want to manage each host through the web

UI?

Is this

host going to centrally archive or manage my other

measurement hosts?

Do I want to store

my measurements in an archive that runs

on this host?

Install perfsonar-testpoint

Install perfsonar-centralmanagement

Yes

No

Install perfsonar-toolkitYes

Install perfsonar-coreYes

Yes

NoNo

START

Who answers "No"?

- Central measurement archives

- Data transfer nodes- Hosts that use the

network for purposes beyond just measurement

Who answers "Yes"?

- Central measurement archives

Who answers "Yes"?

- Dedicated measurement hosts solely tasked with performing network measurements

Who answers "No"?

- Hosts part of a large deployment, usually centrally managed by Puppet, CFEngine, etc.

- Hosts running on minimal hardware

Who answers "Yes"?

- Hosts without access to a central archive such as those in a large deployment that do not wish to deal with the extra effort required to run a large central archive

Who answers "No"?- Hosts running on minimal

hardware - Hosts with access to a central

measurement archive- Hosts that are part of a

centrally managed mesh- You want a registered testpoint

that others can run tests to

Who answers "No"?

- Data transfer nodes- Any other host that

uses a network

Who answers "Yes"?

- Hosts part of a small deployment (1-2 hosts)

- Hosts run by new perfSONAR users wanting to explore the full set of features from collection to display

Other Useful Packages:- Dashboard:

- maddash- Host Configuration:

- perfsonar-toolkit-ntp- perfsonar-toolkit-sysctl- perfsonar-toolkit-security

- Nagios- nagios-plugins-perfsonar

perfSONAR Bundle Selection

Guide

No

June  9,  2016   32  

Page 33: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

Current:  MeshConfig  •  The  idea  of  a  central  mesh  file  is  to  define  tests  in  one  place  for  all  your  hosts  

•  Basic  process  is:  1.  Manually  create  configuraOon  file  2.  Convert  to  JSON  using  provided  script  3.  Publish  JSON  on  web  server  4.  Point  clients  at  JSON  to  figure  out  what  tests  to  run  

(opOonally  point  MaDDash  at  JSON  to  display  results)  

June  9,  2016  ©  2016,  h>p://www.perfsonar.net   33  

Page 34: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

Current:  MeshConfig  File  •  Can  grow  quickly  •  ESnet  has  one  about  10,000  lines  long:  – h>ps://github.com/esnet/esnet-­‐perfsonar-­‐mesh/blob/master/conf/esnet-­‐mesh_config.conf  

June  9,  2016  ©  2016,  h>p://www.perfsonar.net   34  

Page 35: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

New:  MeshConfig  Admin  UI  •  Replaces  need  to  edit  text  file  by  hand  –  Based  on  work  done  for  OSG  

•  AutomaOcally  pulls  hosts  from  lookup  service  •  Access  control  allows  you  to  assign  different  admins  edit  rights  to  different  meshes  

•  AutomaOcally  produces  URLs  to  JSON  specific  for  each  host  (i.e.  no  need  to  see  tests  not  involved-­‐in)  

June  9,  2016  ©  2016,  h>p://www.perfsonar.net   35  

Page 36: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

NEW  TEST  SCHEDULER:  PSCHEDULER  

June  9,  2016  ©  2014,  h>p://www.perfsonar.net   36  

Page 37: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

Current:  Test  Scheduling  •  Currently  two  components  are  in  charge  of  scheduling  and  execuOng  tests:  

–  BWCTL  –  perfSONAR  Regular  TesOng  

•  BWCTL  has  been  around  a  number  of  years  and  is  good  at  what  it  does…but  it’s  challenging  to  make  it  do  more  

•  Lots  of  requests  for:  –  Greater  visibility  into  scheduler  –  Make  it  easier  to  pipe  more  tools  through  it  so  aren’t  unexpected  conflicts    –  Support  for  different  ways  to  define  schedules  –  Be>er  ability  to  request  tests  on-­‐demand  –  Greater  ability  to  extend  in  general  

June  9,  2016  ©  2016,  h>p://www.perfsonar.net   37  

Page 38: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

3.5  Data  CollecOon  

June  9,  2016  ©  2016,  h>p://www.perfsonar.net   38  

Page 39: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

4.0  Data  CollecOon  

June  9,  2016  ©  2016,  h>p://www.perfsonar.net   39  

Page 40: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

New:  pScheduler  •  Completely  new  soRware  to  handle  all  the  stuff  BWCTL  and  regular  tesOng  could  

do…plus  more  •  REST  API  allows  tests  to  be  requested,  cancelled,  viewed,  etc  •  Plug-­‐in  framework  for  wriOng  new  tools  

–  Plug-­‐ins  for  all  the  exisOng  tools  included  at  launch  –  Plug-­‐in  can  be  wri>en  in  any  language  –  System  for  normalizing  output  between  similar  tools  

•  Plug-­‐in  framework  for  wriOng  to  different  archivers  •  Keeps  state  in  database  so  maintain  schedule  between  reboots,  outages,  etc  •  Working  on  designing  a  more  flexible  limits  and  resource  management  

infrastructure  

June  9,  2016  ©  2016,  h>p://www.perfsonar.net   40  

Page 41: The$perfSONARMeasurement …Hard$vs.$SoR$Failures$ • “Hard$failures”$are$the$kind$of$problems$every$organizaon$understands$ – Fiber$cut – Power$failure$takes$down$routers$

Useful  URLs  •  h>p://docs.perfsonar.net/  •  h>p://www.perfsonar.net/  •  h>p://fasterdata.es.net/  –  h>p://fasterdata.es.net/performance-­‐tesOng/network-­‐troubleshooOng-­‐tools/  

•  h>ps://github.com/perfsonar  –  h>ps://github.com/perfsonar/project/wiki  

41