Upload
royrapoport
View
112
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Presentation for Cloud Tech III on How Netflix Thinks of Metrics
Citation preview
Actionable MetricsEnabling Decision-Making in Netflix’s Decentralized
Environment
Cloud Tech IIIOctober 6, 2012Roy Rapoport
@royrapoport, [email protected]
Thursday, October 18, 12
Me
• Been in tech for about 20 years
• Systems engineering, networking, software development, QA, release management
• Time at Netflix: 1195 days (3y:3m:1w)
• (Current) job at Netflix: Make things better (Security Monkey, Python Platform, Central Alert Gateway, Breaking Stuff.. )
Thursday, October 18, 12
Metrics Humor
Thursday, October 18, 12
Metrics Humor
Thursday, October 18, 12
Metrics Humor
Thursday, October 18, 12
Metrics Humor
Thursday, October 18, 12
Metrics Humor
% of instances with even public IP addresses
Thursday, October 18, 12
Technology Overview
Thursday, October 18, 12
Technology Overview• SoA, REST, Mostly Java
Thursday, October 18, 12
Technology Overview• SoA, REST, Mostly Java
• Simple overall architecture:
Thursday, October 18, 12
Technology Overview• SoA, REST, Mostly Java
• Simple overall architecture:
Thursday, October 18, 12
Technology Overview• SoA, REST, Mostly Java
• Simple overall architecture:
Thursday, October 18, 12
Culture Overview
Thursday, October 18, 12
Culture Overview
• Freedom and Responsibility
Thursday, October 18, 12
Culture Overview
• Freedom and Responsibility
• Distributed Operations
Thursday, October 18, 12
Culture Overview
• Freedom and Responsibility
• Distributed Operations
• Get out of the way of Developers
Thursday, October 18, 12
The Metric Lifecycle
Thursday, October 18, 12
The Metric Lifecycle
•Send
Thursday, October 18, 12
The Metric Lifecycle
•Send
•Look
Thursday, October 18, 12
The Metric Lifecycle
•Send
•Look
•Alert
Thursday, October 18, 12
Systems
• Flexible
• Scalable
• Self-Service
Thursday, October 18, 12
TelemetryFlexible, Scalable, Self-Service
import netflix.metrics[...] self.nm = netflix.metrics.Metrics("core_cag")[...]def api(self): self.nm.nfCounter("api") [...] self.nm.nfCounter(“application_%s” % application)[...]
Thursday, October 18, 12
VisualizationFlexible, Scalable, Self-Service
Thursday, October 18, 12
VisualizationFlexible, Scalable, Self-Service
Thursday, October 18, 12
VisualizationFlexible, Scalable, Self-Service
Thursday, October 18, 12
VisualizationFlexible, Scalable, Self-Service
Thursday, October 18, 12
VisualizationFlexible, Scalable, Self-Service
Thursday, October 18, 12
VisualizationFlexible, Scalable, Self-Service
Thursday, October 18, 12
AlertingFlexible, Scalable, Self-Service
Thursday, October 18, 12
AlertingFlexible, Scalable, Self-Service
• Static vs Dynamic Thresholds
Thursday, October 18, 12
AlertingFlexible, Scalable, Self-Service
• Static vs Dynamic Thresholds
• Compare to history
Thursday, October 18, 12
For Example ...
What the ...
Last 3 hours’ core_tools.core_cag_api
Thursday, October 18, 12
For Example ...Visualization (Continued)
Last 4 days’ core_tools.core_cag_api
even more questions!
Thursday, October 18, 12
For Example ...Visualization (Continued)
Last 10 days’ core_tools.core_cag_api
What caused the spike?
Thursday, October 18, 12
For Example ...Visualization (Continued)
Show alert volume per application
Someone had a rough few days...
Thursday, October 18, 12
Don’t Like Surprises...{ "alerts": [ { "applyTo": "cluster", "condition": { "minPercent": 90.0, "noise" : .2, "maxPercent": 25.0, "type": "DoubleExponential" }, "metricName": "core_cag_api", "severity": "major" } ], "clusters": [ "core_tools" ]}
Thursday, October 18, 12
Threshold Tuning
• An Abbreviated History ...
Thursday, October 18, 12
Threshold Tuning(in the beginning)
Some priests offer their prayers to alien creatures best left forgotten. This ill-advised worship twists their minds in odd ways. Overlords find these warped men useful due to the unnatural powers they can channel. The dark priests most favored by their strange gods have powerful protections, and defeating one of them is sure to bring down a terrible curse upon the victor.
- http://www.descentinthedark.com/_d_/dark_priests.php
Thursday, October 18, 12
Threshold Tuning(in the beginning)
• Systems owned by IT
Some priests offer their prayers to alien creatures best left forgotten. This ill-advised worship twists their minds in odd ways. Overlords find these warped men useful due to the unnatural powers they can channel. The dark priests most favored by their strange gods have powerful protections, and defeating one of them is sure to bring down a terrible curse upon the victor.
- http://www.descentinthedark.com/_d_/dark_priests.php
Thursday, October 18, 12
Threshold Tuning(in the beginning)
• Systems owned by IT
• Want an alert? Submit a ticket
Some priests offer their prayers to alien creatures best left forgotten. This ill-advised worship twists their minds in odd ways. Overlords find these warped men useful due to the unnatural powers they can channel. The dark priests most favored by their strange gods have powerful protections, and defeating one of them is sure to bring down a terrible curse upon the victor.
- http://www.descentinthedark.com/_d_/dark_priests.php
Thursday, October 18, 12
Threshold Tuning(in the beginning)
• Systems owned by IT
• Want an alert? Submit a ticket
• Want to tune an alert? Submit a ticket
Some priests offer their prayers to alien creatures best left forgotten. This ill-advised worship twists their minds in odd ways. Overlords find these warped men useful due to the unnatural powers they can channel. The dark priests most favored by their strange gods have powerful protections, and defeating one of them is sure to bring down a terrible curse upon the victor.
- http://www.descentinthedark.com/_d_/dark_priests.php
Thursday, October 18, 12
Threshold Tuning(It gets better)
Thursday, October 18, 12
Threshold Tuning(It gets better)
• You get to configure your own threshold
Thursday, October 18, 12
Threshold Tuning(It gets better)
• You get to configure your own threshold
• Freedom!
Thursday, October 18, 12
Threshold Tuning(It gets better)
• You get to configure your own threshold
• Freedom!
• Also, you have to configure your own thresholds
Thursday, October 18, 12
Threshold Tuning(Are we there yet?)
Thursday, October 18, 12
Threshold Tuning(Are we there yet?)
• Play with historical data
Thursday, October 18, 12
Threshold Tuning(Are we there yet?)
• Play with historical data
• Huge difference
Thursday, October 18, 12
Threshold Tuning(Are we there yet?)
• Play with historical data
• Huge difference
• Still falls short
Thursday, October 18, 12
Threshold Tuning(Yeah, that’s the ticket)
Thursday, October 18, 12
Threshold Tuning(Yeah, that’s the ticket)
• Computers can be good at this
Thursday, October 18, 12
Threshold Tuning(Yeah, that’s the ticket)
• Computers can be good at this
Thursday, October 18, 12
Threshold Tuning(Yeah, that’s the ticket)
Thursday, October 18, 12
Threshold Tuning(Yeah, that’s the ticket)
• Computers can be good at this
Thursday, October 18, 12
Threshold Tuning(Yeah, that’s the ticket)
Thursday, October 18, 12
Threshold Tuning(Yeah, that’s the ticket)
• Computers can be good at this
Thursday, October 18, 12
If Time Allows ...
Thursday, October 18, 12
Events vs Metrics
Thursday, October 18, 12
Events vs Metrics
• Irregular Interval
Thursday, October 18, 12
Events vs Metrics
• Irregular Interval
• Point in time
Thursday, October 18, 12
Events vs Metrics
• Irregular Interval
• Point in time
• Lack magnitude
Thursday, October 18, 12
Why Build It?
Thursday, October 18, 12
Why Build It?
• Change management
• Vs Change control
Thursday, October 18, 12
Why Build It?
• Change management
• Vs Change control
• What Changed?
Thursday, October 18, 12
Why Build It?
• Change management
• Vs Change control
• What Changed?
• Better Alerting
Thursday, October 18, 12
Chronos
Thursday, October 18, 12
Chronos
• Rapidly Prototyped
Thursday, October 18, 12
Chronos
• Rapidly Prototyped
• Adapters and reporters
Thursday, October 18, 12
Chronos
• Rapidly Prototyped
• Adapters and reporters
• Easy querying
Thursday, October 18, 12
Chronos
• Rapidly Prototyped
• Adapters and reporters
• Easy querying
• Alarming
• Something happened
Thursday, October 18, 12
Chronos
• Rapidly Prototyped
• Adapters and reporters
• Easy querying
• Alarming
• Something happened
• ... X times in Y minutes
Thursday, October 18, 12
Chronos
• Rapidly Prototyped
• Adapters and reporters
• Easy querying
• Alarming
• Something happened
• ... X times in Y minutes
• Something didn’t happen
Thursday, October 18, 12
Chronos
• Rapidly Prototyped
• Adapters and reporters
• Easy querying
• Alarming
• Medium volume
Thursday, October 18, 12
Chronos
• Rapidly Prototyped
• Adapters and reporters
• Easy querying
• Alarming
• Medium volume
• Recursive
• Recursive
Thursday, October 18, 12
End Result
Thursday, October 18, 12
End Result
• Massive decrease in change control tickets
Thursday, October 18, 12
End Result
• Massive decrease in change control tickets
• Not talking about SOX or PCI
Thursday, October 18, 12
End Result
• Massive decrease in change control tickets
• Not talking about SOX or PCI
• Better visibility into changes
Thursday, October 18, 12
End Result
• Massive decrease in change control tickets
• Not talking about SOX or PCI
• Better visibility into changes
• Decreased TTR
Thursday, October 18, 12
End Result
• Massive decrease in change control tickets
• Not talking about SOX or PCI
• Better visibility into changes
• Decreased TTR
• Especially for bad code deployments
Thursday, October 18, 12
End Result
• Massive decrease in change control tickets
• Not talking about SOX or PCI
• Better visibility into changes
• Decreased TTR
• Especially for bad code deployments
• You should do this
Thursday, October 18, 12
I Didn’t Mention
• End-to-end testing and alerting
• External availability and performance
• Open Connect
• Jobs
Thursday, October 18, 12
Questions?
Thursday, October 18, 12