48
GlobalNOC Services Update 2015 Internet2 Global Summit

GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

GlobalNOC Services Update

2015 Internet2 Global Summit

Page 2: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

Annual Report

๏ http://globalnoc.iu.edu/annual-report/2014/

4/28/15

Page 3: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

Service Desk

๏ Welcomed ARE-ON and OSHEAN to the GlobalNOC Family

๏ All I2 FootPrints Projects Consolidated Into 1 = 1/5 of the Former Notifications

๏ Grown by 4 Staff and 1 Robot

April 28, 2015

Year in Review:

Page 4: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

Service Desk

๏ Conducted DR Exercise in Early December 2015 with Positive Result

๏ Created and Implemented a Major Incident Communication Policy

April 28, 2015

Year in Review:

Page 5: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

Service Desk

Activity Metrics for 2014 •  1.9 million alarms/year ~ 5200/day •  30,000 tickets created/year ~ 82/day •  15,600 phone calls received/year ~ 43/day •  264,000 e-mails sent and received ~ 720/

day

April 28, 2015

Page 6: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

Service Desk

๏ Pursuing ISO 20,000 certification • Why? • By When? • What Will the Net Effect Be?

Year Ahead:

Page 7: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

2015 Priorities

Page 8: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

2015 Focus Areas

Page 9: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

Automation

Page 10: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

Goal

๏Find the worst things to do by hand. Make a machine do those things.๏Things that are:

• Dangerous• Slow• Annoying

Page 11: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

Focus Areas๏Business Processes

๏on-call button๏auto-assign issues๏auto-notify๏auto-discover devices in a new network

๏Reporting๏How many times did we call an engineer?

๏Config automation๏alerting on config drift๏generate template config for new boxes๏push & pipeline

๏ Incident Advisor• auto-fix• hints• Annoying

Page 12: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

Service Management

Page 13: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

Goal

๏MINIMIZE• unplanned work• confusion• inconsistency

๏Stay flexibile, agile, and custom

Page 14: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

Huh?

๏STANDARDIZE: for processes where consistency is most important๏ORGANIZE: a simple lightweight structure where custom and novel work

happens

Page 15: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

2 Parts

๏Part 1: ISO/IEC 20000 Certification• Sparked by Internet2 effort, working to reach certification• Aligned with ITIL

• Incident Management

• Change Management

• Capacity Management

• Availability Management

• etc…

Page 16: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

2 Parts

๏Part 2: Other service-level improvements• Service Dashboard (end users, network owners)• Prioritize improvements• Faster Turn-up• Change Management

Page 17: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

So what…

๏ It’s not good enough anymore to talk about boxes and circuits. Everything is more complicated now.

๏We don’t deliver networks, we deliver services๏Requires rigor to make sure those services work, and agility to make sure

those services evolve quickly

Page 18: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

example๏What’s the availability of everyone’s IP Service for Internet2?๏complexities:

• multiple sessions• connectors back each other up

๏Let’s define available!๏First, a service is down if packets have to be retransmitted๏So:

• Up = ALL BGP sessions are established, no loss known• At Risk = At least 1 session is down, but at least one route is still in the routing table• Down = no routes

Page 19: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

Data Model

EntityRouted R&E

Service

BGP Peering BGP PeeringASN Peer IP

Reporting Engine

BGP Routing Data

Weekly Report

RoutesPeer State

SLA

Page 20: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

Service Awareness

Page 21: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

Corresponding process

report generated SLAmet?

send to NPT

outage in GRNOC control?

recommend changes

Recommended Changes

Published Report

Approve Changes

?

Published Report with Outline of Changes

NTP

Dir of Op

Sys

yes

no

yes

no

no

yes

Network

Owner

Page 22: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

Work Management

Page 23: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

Goal

๏Get coherent system to manage our work• systems• tools• disciplines• processes

๏ In other words, track, prioritize, and measure everything we do.

Page 24: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

This means

๏For the people who do work:๏ "Where do I go to see everything I'm supposed to be doing? What should I be

doing first?”๏For the managers:

๏ "Are we too busy? Are we working on the right things?”๏For the strategic view:

๏ "Are we doing well/better than a year ago?”

Page 25: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

How does work get tracked

๏Tickets๏Emails๏Post-its๏Workflow records๏Meeting docs๏Many todo lists

Page 26: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

The future

๏Review ticketing๏Look at structured processes๏Project management๏Unified view of workload and results

Page 27: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

Recruiting

Page 28: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

Goal

๏Make sure we have enough talented people…now and 5 years from now

Page 29: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

Parts

๏Attract & hire๏Pipeline

๏Get more students in๏ Improve Development

Page 30: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

Attracting

๏How do we attract experts that fit?๏Challenges

• Scary job descriptions• People don’t know what R&E or GlobalNOC does• Indiana - No really, it’s a nice place!

Page 31: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

Pipeline

๏Getting people into the pipeline• Students have worked very well • Summer of Networking• How do we get more?

๏Keeping the talent growing• Develop people well• Level up!

Page 32: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

What’s New With

GlobalNOC Software?

Page 33: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

SNAPP

๏ High performance SNMP measurement/visualization tool ๏ 3 major revisions, project began in 2002 ๏ RRDtool based storage ๏ High performance SNMP data collector ๏ Web-based data browser and Web-services API

Page 34: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

SNAPP 4 with TSDS

๏ Moving from RRDtool to a non-relational database •  “TSDS” Database based on MongoDB •  Sophisticated query language: TSQL •  Rich meta-data integrated with data. Allows for powerful queries; long-term

longitudinal analysis ๏ General Time Series Data Store, not just SNMP data

•  Ex. NOC activity metrics / key performance indicators; optical characteristics (light levels, loss, etc.); environmental/power data; aggregate flow data; OWAMP; BWCTL

Page 35: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million
Page 36: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

Alertmon Improvements

๏ Alert Collapsing •  Collapse services on a host when host is not reachable •  Root cause analysis based on dependency graph allows for intelligent collapsing

of alerts and suggests root cause of multiple alerts. •  Monitoring of management VPN endpoints to collapse alerts behind VPN when

management network access is impaired

Page 37: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million
Page 38: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million
Page 39: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million
Page 40: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

NOAA Operations Portal

๏ High-level overview of network status •  Operational Status Map •  Performance Measurement Overview •  Operations Calendars •  Detailed data pulled from other GlobalNOC tools

๏ Multi-network aggregate views

Page 41: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million
Page 42: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million
Page 43: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million
Page 44: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

19

SciPass Science DMZ

๏ Campus Networks are enterprise infrastructure •  large number of small flows •  security is a required capability ๏ not elephant flow friendly ๏ could just bypass but that

doesn’t provide required security ๏ what about performance assurance?

Page 45: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

Approach

๏ Combine • OpenFlow Switch • Bro • PerfSonar

๏ create reactive system ๏ default to secure /

slow path ๏ use IDS to control

what goes on fast path

Page 46: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

•  64 ms - time to detect and bypass •  250 ms - doubled throughput of firewall •  1.5 sec - same throughput as no firewall

Reactive Bypass Performance

Page 47: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

Find Out More

๏ Software Page • https://globalnoc.iu.edu/sdn/scipass.html

๏ Code Repository • https://github.com/GlobalNOC/SciPass

๏  email • [email protected] • [email protected]

Page 48: GlobalNOC Services Update 2015 Internet2 Global Summitmeetings.internet2.edu/media/medialibrary/...GlobalNOCServicesUpd… · Service Desk Activity Metrics for 2014 • 1.9 million

FlowSpace Firewall ๏ Developed in partnership with Internet2 ๏ Open Source Software ๏ OpenFlow Hypervisor

•  “Slice” OpenFlow 1.0 based on VLAN ID ๏ Currently running on Internet2 AL2S ๏ Other deployments growing. We’re interested in helping get FlowSpace

Firewall running on your OpenFlow network ๏ More Information/Download: http://globalnoc.iu.edu/sdn/fsfw.html/