11
Incident Management At The Edge Lisa Phillips VP, SRE @lisaphillips [email protected]

Incident command at the edge | Altitude NYC

  • Upload
    fastly

  • View
    29

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Incident command at the edge | Altitude NYC

Incident Management At The EdgeLisa Phillips VP, SRE@lisaphillips [email protected]

Page 2: Incident command at the edge | Altitude NYC
Page 3: Incident command at the edge | Altitude NYC

Incident Types

• Distributed Denial of Service attacks • Critical security vulnerabilities • Software bugs • Capacity concerns • Upstream network outages • Datacenter issues • “Operator Error” • Third Party service provider events

Page 4: Incident command at the edge | Altitude NYC

Incident Response Framework

1. Develop definitions of impact 2. Define severity 3. Define response and communication requirements

4. Define post-incident activities

Page 5: Incident command at the edge | Altitude NYC

Severity levels

Severity App Delivery Impact

Business Operations Impact

Scope of Impact

SEV0 Critical Critical All Sites Affected

SEV1 Critical Critical Multiple Sites Affected, or Single Site unavailable or suffering from severe degradation

SEV2 Major Major Multiple Sites Affected, Single Site intermittently available or suffering from minor

degradation

SEV3 Minor Minor Single Site or limited customer impact

Page 6: Incident command at the edge | Altitude NYC

The people

• Engineers are Human • Right people at the right time engaged • Randomization is expensive

Page 7: Incident command at the edge | Altitude NYC

The Situation: No NOC

• Immediate escalation to engineers is required• Global Customer Service Focused Engineer • Decentralized on-call and monitoring • Engineering teams own their own destiny and have control over their alerts • Empowered to improve

Page 8: Incident command at the edge | Altitude NYC

Incident Commander

• Coordinate actions across multiple responders• Alerts and updates stakeholders• Evaluate the high-level issue and understand its impact• Consult with team experts on necessary actions • Calls off or delays other activities • Clearly indicates when an Incident is resolved and leads follow-up

Page 9: Incident command at the edge | Altitude NYC

Communication: Wide Collaboration

• Involve all parts of the organization • Quick communication and collaboration - establish next

steps and regroup time • Identify quickly the questions to answer, and

communicate effectively to address them • When in doubt, choose transparency

Page 10: Incident command at the edge | Altitude NYC

Lessons

• Start with the basics • Empower your engineers to deal effectively with workload• Exercise your engineers • Continuously Improve • Partner closely with all stakeholders • Continue to let incidents teach you

Page 11: Incident command at the edge | Altitude NYC

Thank youLisa Phillips VP, SRE@lisaphillips [email protected]