59
War Games Flight training for ops teams DevOpsDays, Berlin 2015 David Mytton, Founder, Server Density

War Games - DevOpsDays Berlin

Embed Size (px)

Citation preview

Page 1: War Games - DevOpsDays Berlin

War GamesFlight training for ops teams

DevOpsDays, Berlin 2015 David Mytton, Founder, Server Density

Page 2: War Games - DevOpsDays Berlin
Page 3: War Games - DevOpsDays Berlin

Cost of uptime?

Page 4: War Games - DevOpsDays Berlin

Cost of uptime?

Page 5: War Games - DevOpsDays Berlin

Cost of uptime?

$2.9bnQ1: 2015

Page 6: War Games - DevOpsDays Berlin

Cost of uptime?

Page 7: War Games - DevOpsDays Berlin

Cost of uptime?

$1.21BQ2: 2015

$2.9bnQ1: 2015

Page 8: War Games - DevOpsDays Berlin

Cost of uptime?

Page 9: War Games - DevOpsDays Berlin

Cost of uptime?

$1.21BQ2: 2015

$4.1bnQ1: 2015

$2.9bnQ1: 2015

Page 10: War Games - DevOpsDays Berlin

Cost of uptime?

Page 11: War Games - DevOpsDays Berlin

How much are you spending?

Page 12: War Games - DevOpsDays Berlin

Expect downtime

• Prepare

• Respond

• Postmortem

Page 13: War Games - DevOpsDays Berlin

Prepare

Page 14: War Games - DevOpsDays Berlin

Prepare

Page 15: War Games - DevOpsDays Berlin

Incident process

1) Power failure to half of our servers

Page 16: War Games - DevOpsDays Berlin

Incident process

1) Power failure to half of our servers

2) Automated failover unavailable (known failure condition)

Page 17: War Games - DevOpsDays Berlin

Incident process

1) Power failure to half of our servers

2) Automated failover unavailable (known failure condition)

3) Manual DNS switch required

Page 18: War Games - DevOpsDays Berlin

Incident process

1) Power failure to half of our servers

2) Automated failover unavailable (known failure condition)

3) Manual DNS switch required

Expected impact = 20 min

Page 19: War Games - DevOpsDays Berlin

Incident process

1) Power failure to half of our servers

2) Automated failover unavailable (known failure condition)

3) Manual DNS switch required

Expected impact = 20 min

Actual impact = 43 min

Page 20: War Games - DevOpsDays Berlin

Human factors

Page 21: War Games - DevOpsDays Berlin

Human factors

1) Unfamiliarity with DNS failover procedure

Page 22: War Games - DevOpsDays Berlin

Human factors

1) Unfamiliarity with DNS failover procedure

2) Pressure of time sensitive event

Page 23: War Games - DevOpsDays Berlin

Human factors

1) Unfamiliarity with DNS failover procedure

2) Pressure of time sensitive event

3) Escalation resulted in delay

Page 24: War Games - DevOpsDays Berlin

Docs

Page 25: War Games - DevOpsDays Berlin

Docs

• Searchable

Page 26: War Games - DevOpsDays Berlin

Docs

• Searchable

• Independent

Page 27: War Games - DevOpsDays Berlin

Docs

Page 28: War Games - DevOpsDays Berlin

Practice = War Games

Page 29: War Games - DevOpsDays Berlin

• Realistic incident simulation

Practice = War Games

Page 30: War Games - DevOpsDays Berlin

• Realistic incident simulation

• Practicing general response process

Practice = War Games

Page 31: War Games - DevOpsDays Berlin

General response process

• First responder

Page 32: War Games - DevOpsDays Berlin

General response process

• First responder

1. Load incident response checklist

Page 33: War Games - DevOpsDays Berlin

blog.serverdensity.com

Page 34: War Games - DevOpsDays Berlin

General response process

• First responder

1. Load incident response checklist

2. Log into Ops War Room

Page 35: War Games - DevOpsDays Berlin

General response process

• First responder

1. Load incident response checklist

2. Log into Ops War Room

3. Log incident in JIRA

Page 36: War Games - DevOpsDays Berlin

General response process

• First responder

1. Load incident response checklist

2. Log into Ops War Room

3. Log incident in JIRA

4. Begin investigation

Page 37: War Games - DevOpsDays Berlin

• Realistic incident simulation

• Practicing general response process

Practice = War Games

• Practicing specific incident response

Page 38: War Games - DevOpsDays Berlin

• Realistic incident simulation

• Practicing general response process

Practice = War Games

• Practicing specific incident response

• Reveals deficiencies

Page 39: War Games - DevOpsDays Berlin

Human response

Page 40: War Games - DevOpsDays Berlin

• Increases confidence

Human response

Page 41: War Games - DevOpsDays Berlin

• Increases confidence

• Reduces panic

Human response

Page 42: War Games - DevOpsDays Berlin

• Increases confidence

• Reduces panic

Human response

• Better coordination

Page 43: War Games - DevOpsDays Berlin

• Increases confidence

• Reduces panic

Human response

• Better coordination

• Improves time to resolution

Page 44: War Games - DevOpsDays Berlin

Simulation setup

Page 45: War Games - DevOpsDays Berlin

Simulation setup

• Replica environment

Page 46: War Games - DevOpsDays Berlin

Simulation setup

• Replica environment

• Mock command line

Page 47: War Games - DevOpsDays Berlin

Simulation setup

• Replica environment

• Mock command line

• Recording actions

Page 48: War Games - DevOpsDays Berlin

Simulation setup

• Replica environment

• Mock command line

• Recording actions

• Run several failure scenarios

Page 49: War Games - DevOpsDays Berlin

Simulation goals

Page 50: War Games - DevOpsDays Berlin

Simulation goals

• How they would actually respond

Page 51: War Games - DevOpsDays Berlin

Simulation goals

• How they would actually respond

• Run real commands

Page 52: War Games - DevOpsDays Berlin

Simulation goals

• How they would actually respond

• Run real commands

• Training your people

Page 53: War Games - DevOpsDays Berlin

Simulation goals

• How they would actually respond

• Run real commands

• Training your people

• Training your processes

Page 54: War Games - DevOpsDays Berlin

Simulation goals

• How they would actually respond

• Run real commands

• Training your people

• Training your processes

• Training your tools

Page 55: War Games - DevOpsDays Berlin

Review and repeat

Page 56: War Games - DevOpsDays Berlin

Review and repeat

• Objective review of the process

Page 57: War Games - DevOpsDays Berlin

Review and repeat

• Objective review of the process

• Suggestions for improvements

Page 58: War Games - DevOpsDays Berlin

Review and repeat

• Objective review of the process

• Suggestions for improvements

• Do it again

Page 59: War Games - DevOpsDays Berlin

ありがとうございます

[email protected]

@davidmytton