17
Introduction LHCOPN dashboard (proposal functional design) Monitor Working Group: Initiated in Bologna 10 th & 11 th December 2009 WLCG MB mandate (see url below) First meeting 22 th January 2010 TC 26 th May 2010 TC 15 th June 2010 Barcelona 28 th and 29 th June 2010: first proposal Chairman: John Shade (CERN) Website: https://twiki.cern.ch/twiki/bin/view/LHCOPN/MonWG Full version of functional design proposal on above url. My name Hanno Pet <[email protected] > (NL-T1 / SARA) SARA Computing & Networking service, 25-6-2010

Introduction LHCOPN dashboard (proposal functional design)

  • Upload
    celine

  • View
    50

  • Download
    0

Embed Size (px)

DESCRIPTION

Introduction LHCOPN dashboard (proposal functional design). Monitor Working Group: Initiated in Bologna 10 th & 11 th December 2009 WLCG MB mandate (see url below) First meeting 22 th January 2010 TC 26 th May 2010 TC 15 th June 2010 - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction LHCOPN dashboard (proposal functional design)

Introduction LHCOPN dashboard(proposal functional design)

Monitor Working Group:

• Initiated in Bologna 10th & 11th December 2009

• WLCG MB mandate (see url below)

• First meeting 22th January 2010

• TC 26th May 2010

• TC 15th June 2010

• Barcelona 28th and 29th June 2010: first proposal

Chairman: John Shade (CERN)

Website: https://twiki.cern.ch/twiki/bin/view/LHCOPN/MonWG

Full version of functional design proposal on above url.

My name Hanno Pet <[email protected]> (NL-T1 / SARA)

SARA Computing & Networking service, 25-6-2010

Page 2: Introduction LHCOPN dashboard (proposal functional design)

The problem

LHC experiments and WLCG users have not enough insight in the functioning of the LHCOPN because:

• Monitoring is decentralized at T0/T1 sites

• Monitoring is not accessible to them

The dashboard should solve these problems!

SARA Computing & Networking service, 25-6-2010

Page 3: Introduction LHCOPN dashboard (proposal functional design)

Requirements (1/4)

The requirements of the dashboard are as follows:

• Must only provide information about the LHCOPN keeping in mind the way application layers are using the LHCOPN. This means a full mesh of measurements is required

• Must provide correct and up to date information about each site’s IPv4 connectivity in the LHCOPN

• Must be simple for the LHC experiments and the WLCG user community

• Must provide more in-depth information for the T0/T1 sites router operators. The router operators must be able to drill down into the dashboard to see which measurements are causing the degraded or down status

SARA Computing & Networking service, 25-6-2010

Page 4: Introduction LHCOPN dashboard (proposal functional design)

Requirements (2/4)

• Must display a full mesh of end-to-end IPv4 unicast connectivity in the LHCOPN between each T0/T1 site

• Must use the application programming interface (API) of the perfSONAR-MDM measurement points to collect the data which is necessary for the functioning of the dashboard

• Must collect and display One Way Delay data gathered by the perfSONAR-MDM measurement points (and other parameters in the future)

• Must store (historical) data in its own database

SARA Computing & Networking service, 25-6-2010

Page 5: Introduction LHCOPN dashboard (proposal functional design)

Requirements (3/4)

• Must add new data from perfSONAR-MDM measurement points to its own database every <to be defined> minute(s)

• Must refresh dashboard status each <to be defined> minute

• Must provide an API for T0/T1 sites to generate alarms in their own NMS

• Must be able to make end-to-end IPv4 unicast connectivity reports

SARA Computing & Networking service, 25-6-2010

Page 6: Introduction LHCOPN dashboard (proposal functional design)

Requirements (4/4)

• Must be accessible via a web (https) interface for the LHC experiments and WLCG users with a grid certificate

• More detailed information will be available for the T0/T1 sites router operators with a grid certificate

• Must provide an explanation of the impact if end to end IPv4 unicast connectivity between two sites becomes degraded or down or if no data is available

SARA Computing & Networking service, 25-6-2010

Page 7: Introduction LHCOPN dashboard (proposal functional design)

Current perfSONAR-MDM implementation in LHCOPN (1/2)

The GEANT application service desk has installed perfSONAR-MDM measurement points at each T0/T1 site with the following applications/tools:

• Weathermap based on End to End Monitoring (E2EMON) information

• E2EMON information (no E2EMON measurement point)

• perfSONAR User Interface (UI)Alarm Service (Prototype based on Nagios)

SARA Computing & Networking service, 25-6-2010

Page 8: Introduction LHCOPN dashboard (proposal functional design)

Current perfSONAR-MDM implementation in LHCOPN (2/2)

• Hades Performance Measurements• Bandwidth Test Control / Achievable Bandwidth (BWCTL,

automated 1Gbit/s TCP Bandwidth Control Test)• One Way Delay (OWD) measurements using OWAMP• One Way Delay Variance / Jitter (OWDV) measurements

using OWAMP• Packet loss (measured between Hades nodes)• Traceroute (number of hops between each Hades nodes)• Possibly duplicate packets (measured between Hades

nodes)• Possibly out of order packets (measured between Hades

nodes)

SARA Computing & Networking service, 25-6-2010

Page 9: Introduction LHCOPN dashboard (proposal functional design)

Current perfSONAR-MDM setup en future dashboard

SARA Computing & Networking service, 25-6-2010

Page 10: Introduction LHCOPN dashboard (proposal functional design)

Dashboard approach

The first version of the dashboard must be based on:• The “keep it simple” principle• The data which perfSONAR-MDM is already collecting at the

moment

Proposal is to use One Way Delay (OWD) (using One Way Active Measurement Protocol (OWAMP)) to make the first version of the dashboard to “monitor” end-to-end IPv4 connectivity between each site in the LHCOPN (full mesh).

So OWAMP is “only” used to monitor connectivity and not yet used to monitor the delay itself.

Later versions of the dashboard could include parameters that are new(er) to perfSONAR-MDM (i.e. packet loss, traceroute, achievable bandwidth, interface status, BGP status, OWD and OWDV)

SARA Computing & Networking service, 25-6-2010

Page 11: Introduction LHCOPN dashboard (proposal functional design)

How it might look like (1/3)(current view)

SARA Computing & Networking service, 25-6-2010

End to End IPv4 unicast connectivity availability (current view)

ToCA-TRIUMF

ToCH-CERN

ToDE-KIT

ToES-PIC

ToFR-CC-IN2P3

ToIT-INFN-CNAF

ToNDGF

ToNL-T1

ToTW-ASGC

ToUK-T1-RAL

ToUS-BNL

ToUS-FNAL-CMS

From CA-TRIUMF   100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%From CH-CERN 100%   100% 100% 100% 100% 100% 100% 100% 100% 100% 100%From DE-KIT 100% 100%   100% 100% 100% 100% 100% 100% 100% 100% 0%From ES-PIC 100% 100% 100%   100% 100% 100% 100% 100% 100% 100% 100%From FR-CC-IN2P3 100% 100% 100% 100%   100% 100% 100% 100% 100% 100% 100%From IT-INFN-CNAF 100% 100% 100% 100% 100%   100% 100% 75% 100% 100% 100%From NDGF 100% 100% 100% 100% 100% 100%   100% 100% 100% 100% 100%From NL-T1 100% 100% 100% 100% 100% 0% 100%   100% 100% 100% 100%From TW-ASGC 100% 100% 100% 100% 100% 100% 100% 100%   100% 100% 100%From UK-T1-RAL 100% 100% 100% 100% 100% 100% 100% 100% 100%   100% 100%From US-BNL 50% 100% 100% 100% 100% 100% 100% 100% 100% 100%   100%From US-FNAL-CMS 100% 100% 100% 100% 100% No data 100% 100% 100% 100% 100%  

Date and time: 17-6-2010 12:30 UTC

What does "Normal" meanWhat does "Degraded" meanWhat does "Down" meanWhat does "No data" mean

Page 12: Introduction LHCOPN dashboard (proposal functional design)

How it might look like (2/3)(hourly view)

SARA Computing & Networking service, 25-6-2010

End to End IPv4 unicast connectivity availability daily view 17-6-2010

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Availability

From CA-TRIUMF to NL-T1                                                 91%

Page 13: Introduction LHCOPN dashboard (proposal functional design)

How it might look like (3/3)(weekly view)

SARA Computing & Networking service, 25-6-2010

End to End IPv4 unicast connectivity availability weekly view 17-6-2010

1 2 3 4 5 6 7 Availabilty

From IT-INFN-CNAF to US-BNL               95%From IT-INFN-CNAF to CH-CERN               100%From IT-INFN-CNAF to US-FNAL-CMS               85%From IT-INFN-CNAF to FR-CCIN2P3               100%From IT-INFN-CNAF to DE-KIT               100%From IT-INFN-CNAF to NDGF               100%From IT-INFN-CNAF to NL-T1               85%From IT-INFN-CNAF to ES-PIC               100%From IT-INFN-CNAF to UK-T1-RAL               100%From IT-INFN-CNAF to TW-ASGC               100%From IT-INFN-CNAF to CA-TRIUMF               96%

Page 14: Introduction LHCOPN dashboard (proposal functional design)

Status on the dashboard

The status of the end-to-end IPv4 unicast connectivity between sites must be shown on the dashboard in the following way:

• Normal, availability of the end-to-end IPv4 unicast connectivity between site A en B is 100% in the given timeframe

• Degraded, availability of the end-to-end IPv4 unicast connectivity between site A en B is less then 100% in the given timeframe

• Down, availability of the end-to-end IPv4 unicast connectivity between site A en B is 0% in the given timeframe

• No data, the dashboard server can connect to the perfSONAR-MDM measurement point on site but receives no data from the measurement archives.

SARA Computing & Networking service, 25-6-2010

Page 15: Introduction LHCOPN dashboard (proposal functional design)

Notifications

Notification should be done via:

• E-mail

• RSS-feeds

• API for integration into T0/T1 site NMS systems for raising alarms

• Grid Notifications for LHC experiments

We need to discuss this with grid notification experts at the LHC experiments and ask them how they would integrate this in their dashboards.

SARA Computing & Networking service, 25-6-2010

Page 16: Introduction LHCOPN dashboard (proposal functional design)

Questions

Interesting to know:

• Is this the right direction for the dashboard?

• Is perfSONAR-MDM able to support this?

• Is it possible to use OWAMP like this?

• Are T0/T1 sites going to use this?

• Are the LHC experiments going to use this?

• Are WLCG users (physicists) going to use this?

• Do we agree on the functional design?

SARA Computing & Networking service, 25-6-2010

Page 17: Introduction LHCOPN dashboard (proposal functional design)

WRAP UP

Read the full version of the functional design!

Please send your comments about this functional design to [email protected] before the 5th of July 2010!!

Thank you for your attention!

SARA Computing & Networking service, 25-6-2010