229
http://gapingvoid.com / Sunday, June 20, 2010

The Upside of Downtime (Velocity 2010)

Embed Size (px)

Citation preview

Page 1: The Upside of Downtime (Velocity 2010)

http://gapingvoid.com/

Sunday, June 20, 2010

Page 2: The Upside of Downtime (Velocity 2010)

The Upside of DowntimeTurning disaster into opportunity

Sunday, June 20, 2010

Page 3: The Upside of Downtime (Velocity 2010)

Who’s had a site go down?

Sunday, June 20, 2010

Page 4: The Upside of Downtime (Velocity 2010)

Who’s hasn’t had a site go down?

Sunday, June 20, 2010

Page 5: The Upside of Downtime (Velocity 2010)

There’s always that one guy!

Sunday, June 20, 2010

Page 6: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 7: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 8: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 9: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 10: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 11: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 12: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 13: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 14: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 15: The Upside of Downtime (Velocity 2010)

Downtime sucks

Source: http://www.motivatedphotos.com/?id=8080

Sunday, June 20, 2010

Page 16: The Upside of Downtime (Velocity 2010)

Why downtime sucks

Business

$0

$750

$1,500

$2,250

$3,000

0 2 4 6 8 10 12 14 16 18 20 22

Sales

Sunday, June 20, 2010

Page 17: The Upside of Downtime (Velocity 2010)

Why downtime sucks

Business

Brand

Sunday, June 20, 2010

Page 18: The Upside of Downtime (Velocity 2010)

Why downtime sucks

Business

Brand

You

Sunday, June 20, 2010

Page 19: The Upside of Downtime (Velocity 2010)

Why downtime sucks

Business

Brand

You

Users

Sunday, June 20, 2010

Page 20: The Upside of Downtime (Velocity 2010)

Downtime = Bad! (Duh)

Sunday, June 20, 2010

Page 21: The Upside of Downtime (Velocity 2010)

Approach #1Don’t fail

Sunday, June 20, 2010

Page 22: The Upside of Downtime (Velocity 2010)

Source: http://kansansforlife.files.wordpress.com/2009/12/titanic.jpg

Sunday, June 20, 2010

Page 23: The Upside of Downtime (Velocity 2010)

“Everything fails all the time”-- Werner Vogels (Amazon, CTO)

Sunday, June 20, 2010

Page 24: The Upside of Downtime (Velocity 2010)

“Everything fails all the time”-- Werner Vogels (Amazon, CTO)

Sunday, June 20, 2010

Page 25: The Upside of Downtime (Velocity 2010)

Your site will fail

Werner Vogels (Amazon, CTO)

Sunday, June 20, 2010

Page 26: The Upside of Downtime (Velocity 2010)

Why?!?

Sunday, June 20, 2010

Page 27: The Upside of Downtime (Velocity 2010)

Risk Homeostasis

Why Failure Happens

Source: http://joshuahind.files.wordpress.com/2009/09/bicycle-crash.jpg

Sunday, June 20, 2010

Page 28: The Upside of Downtime (Velocity 2010)

Risk Homeostasis

Black Swan

Why Failure Happens

Source: Amazon.com

Sunday, June 20, 2010

Page 29: The Upside of Downtime (Velocity 2010)

Risk Homeostasis

Black Swan

Unknown unknowns

Why Failure Happens

Source: http://www.apoliticus.com/wp-content/uploads/2009/01/6_21_080306_rumsfeld.jpg

Sunday, June 20, 2010

Page 30: The Upside of Downtime (Velocity 2010)

Risk Homeostasis

Black Swan

Unknown unknowns

Change

Why Failure Happens

Source: http://bozark.net/wordpress/wp-content/uploads/2008/09/barack_obama_change_fairey.jpg

Sunday, June 20, 2010

Page 31: The Upside of Downtime (Velocity 2010)

Risk Homeostasis

Black Swan

Unknown unknowns

Change

Many small failures

Why Failure Happens

Source: http://www.biojobblog.com/uploads/image/dominos.jpg

Sunday, June 20, 2010

Page 32: The Upside of Downtime (Velocity 2010)

Risk Homeostasis

Black Swan

Unknown unknowns

Change

Many small failures

Humans

Why Failure Happens

Source: http://www.librarian.net/talks/clc/CLC.key/SJ_Shoulder_Shrug.jpg

Sunday, June 20, 2010

Page 33: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 34: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 35: The Upside of Downtime (Velocity 2010)

Not unusual

Polisherblocked

Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm

Sunday, June 20, 2010

Page 36: The Upside of Downtime (Velocity 2010)

Not unusual Not expected

Polisherblocked

Moisture leaks into air system

Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm

Sunday, June 20, 2010

Page 37: The Upside of Downtime (Velocity 2010)

Not unusual

Polisherblocked

Moisture leaks into air system

Flow of cold water stopped

Not expected Not good

Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm

Sunday, June 20, 2010

Page 38: The Upside of Downtime (Velocity 2010)

Not unusual

Polisherblocked

Moisture leaks into air system

Flow of cold water stopped

Not expectedBackup disabled

Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm

Sunday, June 20, 2010

Page 39: The Upside of Downtime (Velocity 2010)

Not unusual

Polisherblocked

Moisture leaks into air system

Flow of cold water stopped

Not expectedBackup disabled

Indicator blockedDoh!

Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm

Sunday, June 20, 2010

Page 40: The Upside of Downtime (Velocity 2010)

Not unusual

Polisherblocked

Moisture leaks into air system

Flow of cold water stopped

Not expectedBackup disabled

Indicator blocked

Relief valve broken

Doh!

Dammit

Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm

Sunday, June 20, 2010

Page 41: The Upside of Downtime (Velocity 2010)

Not unusual

Polisherblocked

Moisture leaks into air system

Flow of cold water stopped

Not expectedBackup disabled

Indicator blocked

Relief valve broken

Gauge broken

Doh!

Dammit

WTF

Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm

Sunday, June 20, 2010

Page 42: The Upside of Downtime (Velocity 2010)

Not unusual

Polisherblocked

Moisture leaks into air system

Flow of cold water stopped

Meltdown

Not expectedBackup disabled

Indicator blocked

Relief valve broken

Gauge broken

Doh!

Dammit

Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm

Sunday, June 20, 2010

Page 43: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 45: The Upside of Downtime (Velocity 2010)

“accidental power failure”

Source: http://www.datacenterknowledge.com/archives/2010/06/16/power-failure-kos-intuit-sites-for-24-hours/

Sunday, June 20, 2010

Page 46: The Upside of Downtime (Velocity 2010)

“traffic accident damaged a nearby utility transformer”

Source: http://www.datacenterknowledge.com/archives/2007/11/13/truck-crash-knocks-rackspace-offline/

Sunday, June 20, 2010

Page 47: The Upside of Downtime (Velocity 2010)

“unfortunate code change”Source: http://www.datacenterknowledge.com/archives/2010/06/11/errant-code-change-crashes-10-million-blogs/

Sunday, June 20, 2010

Page 48: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 49: The Upside of Downtime (Velocity 2010)

“Unhappy customers may get some attention, but unhappy networked customers can quickly impact your business”

-- Clay Shirky

Source: http://happenupon.files.wordpress.com/2009/02/technology-guru-clay-shir-001.jpg, http://scholarlykitchen.sspnet.org/2010/03/02/shirky-at-nfais-how-abundance-breaks-everything/

Sunday, June 20, 2010

Page 50: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 51: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 52: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 53: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 54: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 55: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 56: The Upside of Downtime (Velocity 2010)

http://labs.webmetrics.com/crowdsourceduptimeSunday, June 20, 2010

Page 57: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 58: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 59: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 60: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 61: The Upside of Downtime (Velocity 2010)

Recap

Sunday, June 20, 2010

Page 62: The Upside of Downtime (Velocity 2010)

Your site will fail

Sunday, June 20, 2010

Page 63: The Upside of Downtime (Velocity 2010)

Your site will fail+Downtime is bad

Sunday, June 20, 2010

Page 64: The Upside of Downtime (Velocity 2010)

Your site will fail+Downtime is bad+Everyone will find out

Sunday, June 20, 2010

Page 65: The Upside of Downtime (Velocity 2010)

Your site will fail+Downtime is bad+Everyone will find out=Screw it, I’ll become a lumberjack

Source: http://sbadrinath.files.wordpress.com/2009/03/different26rqcu3.jpg

Sunday, June 20, 2010

Page 66: The Upside of Downtime (Velocity 2010)

“Embrace fear of outages and degradation. Use it to guide your architecture, your code, your infrastructure. So lean into it.”

-- John Allspaw, VP Tech. Ops at Etsy

Sunday, June 20, 2010

Page 67: The Upside of Downtime (Velocity 2010)

Approach #2Prepare for downtime

Sunday, June 20, 2010

Page 68: The Upside of Downtime (Velocity 2010)

Disclaimer: Try hard to avoid downtime

Sunday, June 20, 2010

Page 69: The Upside of Downtime (Velocity 2010)

Learning by example...

Sunday, June 20, 2010

Page 70: The Upside of Downtime (Velocity 2010)

Case Study #1Facebook

Sunday, June 20, 2010

Page 71: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 72: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 73: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 74: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 75: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 76: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 77: The Upside of Downtime (Velocity 2010)

“The larger issue here isn't just that a portion of Facebook's platform has gone down - numerous web services have issues from time to time, including everything from Gmail to Twitter. An outage of this length, however, with no official communication from the company itself is disturbing.”

-- N.Y. Times

Sunday, June 20, 2010

Page 78: The Upside of Downtime (Velocity 2010)

Downtime Disturbing

Facebook

Sunday, June 20, 2010

Page 79: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 80: The Upside of Downtime (Velocity 2010)

Case Study #2Google App Engine

Sunday, June 20, 2010

Page 81: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 82: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 83: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 84: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 85: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 86: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 87: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 88: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 89: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 90: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 91: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 92: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 93: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 94: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 95: The Upside of Downtime (Velocity 2010)

Downtime Kudos

Google App Engine

Sunday, June 20, 2010

Page 96: The Upside of Downtime (Velocity 2010)

Case Study #3Atlassian

Sunday, June 20, 2010

Page 97: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 98: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 99: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 100: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 101: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 102: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 103: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 104: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 105: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 106: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 107: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 108: The Upside of Downtime (Velocity 2010)

Downtime

Atlassian

Bravo

Sunday, June 20, 2010

Page 109: The Upside of Downtime (Velocity 2010)

http://atlassian.com/

Sunday, June 20, 2010

Page 110: The Upside of Downtime (Velocity 2010)

Downtime:Opportunity to Build Trust

Sunday, June 20, 2010

Page 111: The Upside of Downtime (Velocity 2010)

Downtime:Opportunity to Destroy Trust

Sunday, June 20, 2010

Page 112: The Upside of Downtime (Velocity 2010)

How To: Prepare for Downtime

Sunday, June 20, 2010

Page 113: The Upside of Downtime (Velocity 2010)

Something > Nothing

Sunday, June 20, 2010

Page 114: The Upside of Downtime (Velocity 2010)

Upside of Downtime Framework 1.0

Oh crapLife is good That sucked

Time

Sunday, June 20, 2010

Page 115: The Upside of Downtime (Velocity 2010)

Upside of Downtime Framework 1.0

CommunicatePrepare Explain

Time

Sunday, June 20, 2010

Page 116: The Upside of Downtime (Velocity 2010)

Upside of Downtime Framework 1.0

CommunicatePrepare Explain

Time

Sunday, June 20, 2010

Page 117: The Upside of Downtime (Velocity 2010)

Upside of Downtime Framework 1.0

CommunicatePrepare Explain

Time

Sunday, June 20, 2010

Page 118: The Upside of Downtime (Velocity 2010)

Upside of Downtime Framework 1.0

CommunicatePrepare Explain

Time

Sunday, June 20, 2010

Page 119: The Upside of Downtime (Velocity 2010)

CommunicatePrepare Explain

Sunday, June 20, 2010

Page 120: The Upside of Downtime (Velocity 2010)

CommunicatePrepare Explain

1. Communication channel

Sunday, June 20, 2010

Page 121: The Upside of Downtime (Velocity 2010)

1. Communication channel

Something is wrong

Can’t tell if it’s me or you

I’ll assume it’s you

You suck

CommunicatePrepare Explain

Sunday, June 20, 2010

Page 122: The Upside of Downtime (Velocity 2010)

Something is wrong

Can’t tell if it’s me or you

I’ll assume it’s you

I know it’s youTell me when you’re back

You suck a lot less

CommunicatePrepare Explain

1. Communication channel

Sunday, June 20, 2010

Page 123: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 124: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 125: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 126: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 127: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 128: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 129: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 130: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 131: The Upside of Downtime (Velocity 2010)

CommunicatePrepare Explain

1. Communication channel Easy to find

Sunday, June 20, 2010

Page 132: The Upside of Downtime (Velocity 2010)

CommunicatePrepare Explain

1. Communication channel Easy to find

Hosted off-site

Sunday, June 20, 2010

Page 133: The Upside of Downtime (Velocity 2010)

CommunicatePrepare Explain

1. Communication channel Easy to find

Hosted off-site

Real-time / automated

Sunday, June 20, 2010

Page 134: The Upside of Downtime (Velocity 2010)

7 keys for public health dashboards

1. Must show current status for each “service”

2. Data must be accurate and timely

3. Must be easy to find

4. Must provide details for events in real time

5. Provide historical uptime and performance data

6. Provide a way to be notified of status changes

7. Provide details on the data is gathered

Source: http://www.transparentuptime.com/2008/11/rules-for-successful-public-health.html

Sunday, June 20, 2010

Page 135: The Upside of Downtime (Velocity 2010)

CommunicatePrepare Explain

1. Communication channel Easy to find

Hosted off-site

Real-time / automated

2. Process

Sunday, June 20, 2010

Page 136: The Upside of Downtime (Velocity 2010)

CommunicatePrepare Explain

1. Communication channel Easy to find

Hosted off-site

Real-time / automated

2. Process Authority

Sunday, June 20, 2010

Page 137: The Upside of Downtime (Velocity 2010)

CommunicatePrepare Explain

1. Communication channel Easy to find

Hosted off-site

Real-time / automated

2. Process Authority

Mean-Time-To-Communicate (MTTC)

Sunday, June 20, 2010

Page 138: The Upside of Downtime (Velocity 2010)

CommunicatePrepare Explain

1. Communication channel Easy to find

Hosted off-site

Real-time / automated

2. Process Authority

Mean-Time-To-Communicate (MTTC)

On-call/drills/escalations/etc.Sunday, June 20, 2010

Page 139: The Upside of Downtime (Velocity 2010)

Your servers

Sunday, June 20, 2010

Page 140: The Upside of Downtime (Velocity 2010)

Prepare ExplainCommunicate

1. Communicate

Sunday, June 20, 2010

Page 141: The Upside of Downtime (Velocity 2010)

Prepare ExplainCommunicate

1. Communicate Use communication channel

Sunday, June 20, 2010

Page 142: The Upside of Downtime (Velocity 2010)

Prepare ExplainCommunicate

1. Communicate Use communication channel

MTTC

Sunday, June 20, 2010

Page 143: The Upside of Downtime (Velocity 2010)

Prepare ExplainCommunicate

1. Communicate Use communication channel

MTTC

Who/what is affected

Sunday, June 20, 2010

Page 144: The Upside of Downtime (Velocity 2010)

Prepare ExplainCommunicate

1. Communicate Use communication channel

MTTC

Who/what is affected

When the incident started

Sunday, June 20, 2010

Page 145: The Upside of Downtime (Velocity 2010)

Prepare ExplainCommunicate

1. Communicate Use communication channel

MTTC

Who/what is affected

When the incident started

ETA

Sunday, June 20, 2010

Page 146: The Upside of Downtime (Velocity 2010)

Prepare ExplainCommunicate

1. Communicate Use communication channel

MTTC

Who/what is affected

When the incident started

ETA

Update regularly

Sunday, June 20, 2010

Page 147: The Upside of Downtime (Velocity 2010)

Prepare ExplainCommunicate

1. Communicate Use communication channel

MTTC

Who/what is affected

When the incident started

ETA

Update regularly

2. Fix it!Sunday, June 20, 2010

Page 148: The Upside of Downtime (Velocity 2010)

Phew, close one!

Sunday, June 20, 2010

Page 149: The Upside of Downtime (Velocity 2010)

Prepare ExplainCommunicate

1. Postmortem

Sunday, June 20, 2010

Page 150: The Upside of Downtime (Velocity 2010)

Prepare ExplainCommunicate

1. PostmortemAdmit failure

Source: http://en.blog.wordpress.com/2010/02/19/wp-com-downtime-summary/

Sunday, June 20, 2010

Page 151: The Upside of Downtime (Velocity 2010)

Prepare ExplainCommunicate

1. PostmortemAdmit failure

Sound like a human

Source: http://www.bureauofcommunication.com/compose/apology

Sunday, June 20, 2010

Page 152: The Upside of Downtime (Velocity 2010)

Prepare ExplainCommunicate

“We apologize for any inconvenience this may

have caused”

Sunday, June 20, 2010

Page 153: The Upside of Downtime (Velocity 2010)

Prepare ExplainCommunicate

1. PostmortemAdmit failure

Sound like a human

Start time and end time

Source: https://groups.google.com/group/google-appengine/browse_thread/thread/a7640a2743922dcf

Sunday, June 20, 2010

Page 154: The Upside of Downtime (Velocity 2010)

Prepare ExplainCommunicate

1. PostmortemAdmit failure

Sound like a human

Start time and end time

Who/what was impacted

Source: http://techcrunch.com/2009/11/02/large-scale-downtime-at-rackspace-cloud/

Sunday, June 20, 2010

Page 155: The Upside of Downtime (Velocity 2010)

Prepare ExplainCommunicate

1. PostmortemAdmit failure

Sound like a human

Start time and end time

Who/what was impacted

What went wrong

Source: http://www.zendesk.com/2010/03/tuesday-double-whammy.html

Sunday, June 20, 2010

Page 156: The Upside of Downtime (Velocity 2010)

Prepare ExplainCommunicate

1. PostmortemAdmit failure

Sound like a human

Start time and end time

Who/what was impacted

What went wrong

Lessons learned

Source: http://graysky.org/2010/02/downtime-postmortem/

Sunday, June 20, 2010

Page 157: The Upside of Downtime (Velocity 2010)

Prepare ExplainCommunicate

1. PostmortemAdmit failure

Sound like a human

Start time and end time

Who/what was impacted

What went wrong

Lessons learned

Sunday, June 20, 2010

Page 158: The Upside of Downtime (Velocity 2010)

Prepare ExplainCommunicate

“I was completely overwhelmed by the amount of positive feedback and support I received.”

Sunday, June 20, 2010

Page 159: The Upside of Downtime (Velocity 2010)

Prepare ExplainCommunicate

1. PostmortemAdmit failure

Sound like a human

Start time and end time

Who/what was impacted

What went wrong

Lessons learned

2. Improve for the futureSunday, June 20, 2010

Page 160: The Upside of Downtime (Velocity 2010)

“Google is not just saying sorry, they are actually implementing serious changes which probably represents millions of dollars of development to help make sure this doesn't happen again.”

Prepare ExplainCommunicate

Source: http://news.ycombinator.com/item?id=1168493

Sunday, June 20, 2010

Page 161: The Upside of Downtime (Velocity 2010)

Prepare ExplainCommunicate

Source: https://groups.google.com/group/google-appengine/browse_thread/thread/a7640a2743922dcf

Sunday, June 20, 2010

Page 162: The Upside of Downtime (Velocity 2010)

Prepare ExplainCommunicate

Be human

Sunday, June 20, 2010

Page 163: The Upside of Downtime (Velocity 2010)

Prepare ExplainCommunicate

Be authentic

Sunday, June 20, 2010

Page 164: The Upside of Downtime (Velocity 2010)

Prepare ExplainCommunicate

Be transparent

Sunday, June 20, 2010

Page 165: The Upside of Downtime (Velocity 2010)

Prepare ExplainCommunicate

Accept responsibility

Sunday, June 20, 2010

Page 166: The Upside of Downtime (Velocity 2010)

Prepare ExplainCommunicate

Learn and improve

Sunday, June 20, 2010

Page 167: The Upside of Downtime (Velocity 2010)

Trust

Prepare ExplainCommunicate

Sunday, June 20, 2010

Page 168: The Upside of Downtime (Velocity 2010)

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

Sunday, June 20, 2010

Page 169: The Upside of Downtime (Velocity 2010)

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

Upside of Downtime Framework 1.0

Be HumanBe TransparentBe Prepared + +

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

Sunday, June 20, 2010

Page 170: The Upside of Downtime (Velocity 2010)

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

Upside of Downtime Framework 1.0

Be HumanBe TransparentBe Prepared + +

Trust=

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

Sunday, June 20, 2010

Page 171: The Upside of Downtime (Velocity 2010)

Disclaimer:Don’t screw up too often

Sunday, June 20, 2010

Page 172: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 173: The Upside of Downtime (Velocity 2010)

Transparent Not Transparent

Caught

Not Caught

Downtime Prisoner’s Dilemma

Sunday, June 20, 2010

Page 174: The Upside of Downtime (Velocity 2010)

Transparent Not Transparent

Caught

Not Caught Win

Downtime Prisoner’s Dilemma

Sunday, June 20, 2010

Page 175: The Upside of Downtime (Velocity 2010)

Transparent Not Transparent

Caught

Not Caught

Big Loss

Win

Downtime Prisoner’s Dilemma

Sunday, June 20, 2010

Page 176: The Upside of Downtime (Velocity 2010)

Transparent Not Transparent

Caught

Not Caught

Big Win Big Loss

Win

Downtime Prisoner’s Dilemma

Sunday, June 20, 2010

Page 177: The Upside of Downtime (Velocity 2010)

Transparent Not Transparent

Caught

Not Caught

Big Win Big Loss

Win Win

Downtime Prisoner’s Dilemma

Sunday, June 20, 2010

Page 178: The Upside of Downtime (Velocity 2010)

Transparent Not Transparent

Caught

Not Caught

Big Win Big Loss

Win Win

Downtime Prisoner’s Dilemma

Sunday, June 20, 2010

Page 179: The Upside of Downtime (Velocity 2010)

BenefitsGain trust

Reduce churn, increase loyalty

Reduce support costs

Ability to control the message

Competitive advantage

More time to focus on the actual problem

Reduce stress

Sunday, June 20, 2010

Page 180: The Upside of Downtime (Velocity 2010)

Change != Easy

Sunday, June 20, 2010

Page 181: The Upside of Downtime (Velocity 2010)

Change != Impossible

Sunday, June 20, 2010

Page 182: The Upside of Downtime (Velocity 2010)

Keys to Adoption

Getting past a culture of “hide the problem”

Sunday, June 20, 2010

Page 183: The Upside of Downtime (Velocity 2010)

Keys to Adoption

Getting past a culture of “hide the problem”

Overriding commitment to want to improve

Sunday, June 20, 2010

Page 184: The Upside of Downtime (Velocity 2010)

Keys to Adoption

Getting past a culture of “hide the problem”

Overriding commitment to want to improve

Available resources to improve

Sunday, June 20, 2010

Page 185: The Upside of Downtime (Velocity 2010)

Keys to Adoption

Getting past a culture of “hide the problem”

Overriding commitment to want to improve

Available resources to improve

Pain

Sunday, June 20, 2010

Page 186: The Upside of Downtime (Velocity 2010)

Keys to Adoption

Getting past a culture of “hide the problem”

Overriding commitment to want to improve

Available resources to improve

Pain

Buy-in

Sunday, June 20, 2010

Page 187: The Upside of Downtime (Velocity 2010)

Product Management

Support

Sales/Marketing

Engineering/Operations

Sunday, June 20, 2010

Page 188: The Upside of Downtime (Velocity 2010)

Product Management

Support

Default: Lets wait for complaints

Sales/Marketing

Engineering/Operations

Sunday, June 20, 2010

Page 189: The Upside of Downtime (Velocity 2010)

Product Management

Support

Default: Lets wait for complaints

Reality: Proactiveness => Forgiveness

Sales/Marketing

Engineering/Operations

Sunday, June 20, 2010

Page 190: The Upside of Downtime (Velocity 2010)

Product Management

Support

Reality: Proactiveness => Forgiveness

Default: Too much work

Sales/Marketing

Default: Lets wait for complaints

Engineering/Operations

Sunday, June 20, 2010

Page 191: The Upside of Downtime (Velocity 2010)

Product Management

Support

Reality: Proactiveness => Forgiveness

Default: Too much work

Reality: More upfront, less when it matters

Sales/Marketing

Default: Lets wait for complaints

Engineering/Operations

Sunday, June 20, 2010

Page 192: The Upside of Downtime (Velocity 2010)

Product Management

Support

Reality: Proactiveness => Forgiveness

Default: Too much work

Reality: More upfront, less when it matters

Default: Don’t want to look bad

Sales/Marketing

Default: Lets wait for complaints

Engineering/Operations

Sunday, June 20, 2010

Page 193: The Upside of Downtime (Velocity 2010)

Engineering/Operations

Product Management

Support

Reality: Proactiveness => Forgiveness

Default: Too much work

Reality: More upfront, less when it matters

Default: Don’t want to look bad

Reality: Opportunity to learn/improve

Sales/Marketing

Default: Lets wait for complaints

Sunday, June 20, 2010

Page 194: The Upside of Downtime (Velocity 2010)

Product Management

Support

Reality: Proactiveness => Forgiveness

Default: Too much work

Reality: More upfront, less when it matters

Default: Don’t want to look bad

Reality: Opportunity to learn/improve

Default: I don’t want my customers to knowSales/Marketing

Default: Lets wait for complaints

Engineering/Operations

Sunday, June 20, 2010

Page 195: The Upside of Downtime (Velocity 2010)

Product Management

Support

Reality: Proactiveness => Forgiveness

Default: Too much work

Reality: More upfront, less when it matters

Default: Don’t want to look bad

Reality: Opportunity to learn/improve

Default: I don’t want my customers to know

Reality: They’ll find out, better from usSales/

Marketing

Default: Lets wait for complaints

Engineering/Operations

Sunday, June 20, 2010

Page 196: The Upside of Downtime (Velocity 2010)

Product Management

Support

Reality: Proactiveness => Forgiveness

Default: Too much work

Reality: More upfront, less when it matters

Default: Don’t want to look bad

Reality: Opportunity to learn/improve

Default: I don’t want my customers to know

Reality: They’ll find out, better from usSales/

Marketing

Default: Lets wait for complaints

Engineering/Operations

Sunday, June 20, 2010

Page 197: The Upside of Downtime (Velocity 2010)

Source: http://delicious.com/lennysan/healthdashboard

Sunday, June 20, 2010

Page 198: The Upside of Downtime (Velocity 2010)

Simple as that!

Sunday, June 20, 2010

Page 199: The Upside of Downtime (Velocity 2010)

Your site will still fail!

Sunday, June 20, 2010

Page 200: The Upside of Downtime (Velocity 2010)

“The measure of a society is how well it transforms pain and suffering into something worthwhile.”

-- Fredrick Nietzsche

Sunday, June 20, 2010

Page 201: The Upside of Downtime (Velocity 2010)

“The measure of a company is how well it transforms pain of downtime into something worthwhile.”

-- Lenny Rachitsky

Source: Original quote inspired by Fredrick Nietzsche

Sunday, June 20, 2010

Page 202: The Upside of Downtime (Velocity 2010)

Bare minimum:Register a Twitter account

Sunday, June 20, 2010

Page 203: The Upside of Downtime (Velocity 2010)

Lenny Rachitsky@lennysanhttp://www.transparentuptime.com/

Webmetrics/Neustar@webmetricshttp://www.webmetrics.com/

Slides: http://bit.ly/upside-of-downtime

Thank You

Sunday, June 20, 2010

Page 204: The Upside of Downtime (Velocity 2010)

Bonus

Sunday, June 20, 2010

Page 205: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 206: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 207: The Upside of Downtime (Velocity 2010)

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

Sunday, June 20, 2010

Page 208: The Upside of Downtime (Velocity 2010)

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

Sunday, June 20, 2010

Page 209: The Upside of Downtime (Velocity 2010)

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

Sunday, June 20, 2010

Page 210: The Upside of Downtime (Velocity 2010)

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

"Unlikely that an accidental surface or subsurface oil spill would occur from the proposed activities"

-- Exploration and environmental impact plan

Source: http://en.wikipedia.org/wiki/Deepwater_Horizon_drilling_rig_explosion

Sunday, June 20, 2010

Page 211: The Upside of Downtime (Velocity 2010)

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

Sunday, June 20, 2010

Page 212: The Upside of Downtime (Velocity 2010)

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

Sunday, June 20, 2010

Page 213: The Upside of Downtime (Velocity 2010)

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

Sunday, June 20, 2010

Page 214: The Upside of Downtime (Velocity 2010)

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

Sunday, June 20, 2010

Page 215: The Upside of Downtime (Velocity 2010)

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

Sunday, June 20, 2010

Page 216: The Upside of Downtime (Velocity 2010)

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

Sunday, June 20, 2010

Page 217: The Upside of Downtime (Velocity 2010)

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

Sunday, June 20, 2010

Page 218: The Upside of Downtime (Velocity 2010)

CommunicatePrepare Explain1. Communication channel - Easy to find - Off-site - Real-time

2. Process - Give authority - M.T.T.C. - On-call/escalations

1. Communicate - Use channel - M.T.T.C. - Who/what affected - When started - ETA to resolution - Update regularly

2. Fix it!

1. Post-mortem - Admit failure - Sound like a human - Start time and end time - Who/what was impacted - What went wrong - Lessons learned

2. Learn and improve

Upside of Downtime Framework 1.0

Sunday, June 20, 2010

Page 219: The Upside of Downtime (Velocity 2010)

“Be not afraid of transparency; some are born transparent, some achieve transparency, and others have transparency thrust upon them.”

-- Burrowed from William Shakespeare

Sunday, June 20, 2010

Page 220: The Upside of Downtime (Velocity 2010)

Sunday, June 20, 2010

Page 221: The Upside of Downtime (Velocity 2010)

Making change1. Find the bright spots - (this presentation has a bunch)

Sunday, June 20, 2010

Page 222: The Upside of Downtime (Velocity 2010)

Making change1. Find the bright spots - (this presentation has a bunch)

2. Script the critical moves - (framework)

Sunday, June 20, 2010

Page 223: The Upside of Downtime (Velocity 2010)

Making change1. Find the bright spots - (this presentation has a bunch)

2. Script the critical moves - (framework)

3. Point to the destination - (W.W.G.D.)

Sunday, June 20, 2010

Page 224: The Upside of Downtime (Velocity 2010)

Making change1. Find the bright spots - (this presentation has a bunch)

2. Script the critical moves - (framework)

3. Point to the destination - (W.W.G.D.)

4. Find the feeling - (how would you feel?)

Sunday, June 20, 2010

Page 225: The Upside of Downtime (Velocity 2010)

Making change1. Find the bright spots - (this presentation has a bunch)

2. Script the critical moves - (framework)

3. Point to the destination - (W.W.G.D.)

4. Find the feeling - (how would you feel?)

5. Shrink the change - (start small)

Sunday, June 20, 2010

Page 226: The Upside of Downtime (Velocity 2010)

Making change1. Find the bright spots - (this presentation has a bunch)

2. Script the critical moves - (framework)

3. Point to the destination - (W.W.G.D.)

4. Find the feeling - (how would you feel?)

5. Shrink the change - (start small)

6. Grow your people - (everyone is learning as they go)

Sunday, June 20, 2010

Page 227: The Upside of Downtime (Velocity 2010)

Making change1. Find the bright spots - (this presentation has a bunch)

2. Script the critical moves - (framework)

3. Point to the destination - (W.W.G.D.)

4. Find the feeling - (how would you feel?)

5. Shrink the change - (start small)

6. Grow your people - (everyone is learning as they go)

7. Tweak the environment - (create a simple process)

Sunday, June 20, 2010

Page 228: The Upside of Downtime (Velocity 2010)

Making change1. Find the bright spots - (this presentation has a bunch)

2. Script the critical moves - (framework)

3. Point to the destination - (W.W.G.D.)

4. Find the feeling - (how would you feel?)

5. Shrink the change - (start small)

6. Grow your people - (everyone is learning as they go)

7. Tweak the environment - (create a simple process)

8. Build habits - (build process organically)

Sunday, June 20, 2010

Page 229: The Upside of Downtime (Velocity 2010)

Making change1. Find the bright spots - (this presentation has a bunch)

2. Script the critical moves - (framework)

3. Point to the destination - (W.W.G.D.)

4. Find the feeling - (how would you feel?)

5. Shrink the change - (start small)

6. Grow your people - (everyone is learning as they go)

7. Tweak the environment - (create a simple process)

8. Build habits - (build process organically)

9. Rally the herd - (get buy in, rest will follow)

Sunday, June 20, 2010