View
278
Download
0
Category
Tags:
Preview:
Citation preview
Operational InsightJune 15, 2015 Roy Rapoport
@royrapoport / linkedin.com/in/royrapoport / rrapoport@netflix.com
Oh, The Places We’ll Go!
John Boyd
Observe
Observe
Orient
Observe
Orient
Decide
Observe
Orient
Decide
Act
Observe
Orient
Decide
Act OODA
Observe
Orient
Decide
Act OODA
“This approach favors agility over raw power in dealing with human opponents in any endeavor” - Wikipedia
This Is What We Do
OODA KPI
OODA KPI
Speed
OODA KPI
Speed Effort
OODA KPI
Speed Effort Reliability
Winning
Speed Effort Reliability
WinningSpeed
Effort Reliability
WinningSpeed
Effort
Reliability
WinningSpeed
Effort
Reliability
Implications … for Observation (aka measurement, telemetry, metrics)
Implications … for Observation (aka measurement, telemetry, metrics)
• Make It Easy
Implications … for Observation (aka measurement, telemetry, metrics)
• Make It Easy• Make It Scalable
Implications … for Observation (aka measurement, telemetry, metrics)
• Make It Easy• Make It Scalable• Make it pluggable
Implications … for Observation (aka measurement, telemetry, metrics)
• Make It Easy• Make It Scalable• Make it pluggable• (Eventually) Ruthlessly Cull
Implications … for Observation (aka measurement, telemetry, metrics)
• Make It Easy• Make It Scalable• Make it pluggable• (Eventually) Ruthlessly Cull
“What decision will this help me make?”
A Joke
52
48
% of servers in major region with an even IP address
Implications … for Orientation (aka graphing, visualization)
Implications … for Orientation (aka graphing, visualization)
• First-class product
Implications … for Orientation (aka graphing, visualization)
• First-class product• Different decisions require different viz
Implications … for Orientation (aka graphing, visualization)
• First-class product• Different decisions require different viz• Low cognitive load better than
Implications … for Orientation (aka graphing, visualization)
• First-class product• Different decisions require different viz• Low cognitive load better than
• High refresh rates
Implications … for Orientation (aka graphing, visualization)
• First-class product• Different decisions require different viz• Low cognitive load better than
• High refresh rates• Deep data density
Better Like This …
Or Better Like That …
Implications … for Decisions (aka alerting, real-time analytics, etc)
Implications … for Decisions (aka alerting, real-time analytics, etc)
• You already have (some of) this
Implications … for Decisions (aka alerting, real-time analytics, etc)
• You already have (some of) this• Incremental improvement
Implications … for Decisions (aka alerting, real-time analytics, etc)
• You already have (some of) this• Incremental improvement• Sky’s the limit
Implications … for Decisions (aka alerting, real-time analytics, etc)
• You already have (some of) this• Incremental improvement• Sky’s the limit
• For benefits
Implications … for Decisions (aka alerting, real-time analytics, etc)
• You already have (some of) this• Incremental improvement• Sky’s the limit
• For benefits• For cost
Implications … for Action
Implications … for Action
1. Humans beat bureaucracy
Implications … for Action
1. Humans beat bureaucracy2. Machines beat humans
Implications … for Action
1. Humans beat bureaucracy2. Machines beat humans3. Repeatability beats one-offs
Implications … for Action
1. Humans beat bureaucracy2. Machines beat humans3. Repeatability beats one-offs
Repeatable machine processes TROUNCE one-off human bureaucracy
Implications … for Action
1. Humans beat bureaucracy2. Machines beat humans3. Repeatability beats one-offs4. Start with humans
Repeatable machine processes TROUNCE one-off human bureaucracy
Implications … for Action
1. Humans beat bureaucracy2. Machines beat humans3. Repeatability beats one-offs4. Start with humans5. If IFTTT, deprecate humans
Repeatable machine processes TROUNCE one-off human bureaucracy
Decision: Do I Have Enough
Instances?
Decision: Is My Canary Good?
25
Been there.Done that.Manually.Artisanally.
25
Been there.
• Started in the Data Center
Done that.Manually.Artisanally.
25
Been there.
• Started in the Data Center
• Manual, dashboard-driven
Done that.Manually.Artisanally.
25
Been there.Done that.Manually.
26
CPU
Requests
Errors
Been there.Done that.Manually.
27
Been there.Done that.Manually.• Context vs Precision
27
Been there.Done that.Manually.• Context vs Precision
• No …
27
Been there.Done that.Manually.• Context vs Precision
• No …
• Repeatability
27
Been there.Done that.Manually.• Context vs Precision
• No …
• Repeatability
• Trending
27
Been there.Done that.Manually.• Context vs Precision
• No …
• Repeatability
• Trending
• Manual effort is manual
27
So Now What?
28
So Now What?
• Automate Analysis
28
So Now What?
• Automate Analysis
• Took Some Effort
28
So Now What?
• Automate Analysis
• Took Some Effort
• Approach and analytics
28
So Now What?
• Automate Analysis
• Took Some Effort
• Approach and analytics
• Presentation matters
28
Version Control System
1000 servers @ 1.0.1
Customers
Build & Deployment
System
Automated Canary Analysis
Pretty Pictures
29
Version Control System
1000 servers @ 1.0.1
Customers
Build & Deployment
System1 server @ 1.0.2
Automated Canary Analysis
Pretty Pictures
29
10 servers @ 1.0.2Version
Control System
1000 servers @ 1.0.1
Customers
Build & Deployment
System
Automated Canary Analysis
Pretty Pictures
29
1000 servers @ 1.0.2
Version Control System
1000 servers @ 1.0.1
Customers
Build & Deployment
System
Automated Canary Analysis
Pretty Pictures
29
Version
1000 servers @ 1.0.1
Custome
Build & Deployment
Automated
1000 servers @ 1.0.2
Pretty Pictures
30
Version Control System
Build & Deployment
System
Automated Canary Analysis
Customers
Version Custome
Build & Deployment
Automated
1000 servers @ 1.0.2
Pretty Pictures
30
Version Control System
Build & Deployment
System
Automated Canary Analysis
Customers
Version
1000 servers @ 1.0.1
Custome
Build & Deployment
Automated
1000 servers @ 1.0.2
Pretty Pictures
31
Version Control System
Build & Deployment
System
Automated Canary Analysis
Version
1000 servers @ 1.0.1
Custome
Build & Deployment
Automated
1000 servers @ 1.0.2
Pretty Pictures
31
Version Control System
Build & Deployment
System
Automated Canary Analysis
Just The Stats 4-Week View
Just The Stats 4-Week View
6309 canary analysis cycles
Just The Stats 4-Week View
6309 canary analysis cycles16% canaries failed
Decision: Do I Have an Outlier?
Outlier Detection
Would You Like to Play a Game?
Spot the Outlier
The Outlier Is
“A”
Just The Stats 4-Week View
Just The Stats 4-Week View
739 Server Terminations
In a Nutshell Observe
Orient
Decide
Act
In a Nutshell Observe
Orient
Decide
Act
Need This First http://bit.ly/nflx-atlas-2013
http://metrics20.org
In a Nutshell Observe
Orient
Decide
Act
Need This First http://bit.ly/nflx-atlas-2013
http://metrics20.org
Understand the decision http://bit.ly/nflx-qcon-aca-2014
In a Nutshell Observe
Orient
Decide
Act
Need This First http://bit.ly/nflx-atlas-2013
http://metrics20.org
Understand the decision http://bit.ly/nflx-qcon-aca-2014
Make it easier for humans
In a Nutshell Observe
Orient
Decide
Act
Need This First http://bit.ly/nflx-atlas-2013
http://metrics20.org
Understand the decision http://bit.ly/nflx-qcon-aca-2014
Make it easier for humans
Make machinesdo it
In a Nutshell Observe
Orient
Decide
Act
Need This First http://bit.ly/nflx-atlas-2013
http://metrics20.org
Understand the decision http://bit.ly/nflx-qcon-aca-2014
Make it easier for humans
Make machinesdo it
Higher speed Lower effort
Higher reliability
Questions, Attributions, Feedback
42
Questions, Attributions, Feedback
@royrapoportrsr@netflix.comlinkedin.com/in/royrapoport?42
Recommended