Upload
icinga
View
523
Download
2
Embed Size (px)
Citation preview
The UnrealizedRole of:
Monitoring& Alerting
@jasonhand | VictorOps | #DevOpsDays
JasonHandDevOps Evangelist
VictorOps@jasonhand | VictorOps | #DevOpsDays
SCaLE 14xSouthern
California
Linux
Expo
@jasonhand | VictorOps | #DevOpsDays
@jasonhand | VictorOps | #DevOpsDays
2015MonitoringSurvey@jasonhand | VictorOps | #DevOpsDays
@jasonhand | VictorOps | #DevOpsDays
Why Are You Collecting This Data?NOTE: You may choose more than one
» Performance analysis and trending
» Fault and Anomaly detection
» Capacity Planning
» A/B Testing
» We don’t do anything with collected metrics
@jasonhand | VictorOps | #DevOpsDays
The ResultsNOTE: Respondents may have chose more than one
» Performance analysis and trending - 63%
» Fault and Anomaly detection - 53%
» Capacity Planning - 45%
» A/B Testing - 11%
» We don’t do anything with collected metrics - 3%
@jasonhand | VictorOps | #DevOpsDays
Tyranny of the
S.L.A.(Service Level Agreement)@jasonhand | VictorOps | #DevOpsDays
HighAvailabilityPrediction & Prevention@jasonhand | VictorOps | #DevOpsDays
@jasonhand | VictorOps | #DevOpsDays
@jasonhand | VictorOps | #DevOpsDays
That's Important
... but ...@jasonhand | VictorOps | #DevOpsDays
@jasonhand | VictorOps | #DevOpsDays
@jasonhand | VictorOps | #DevOpsDays
Business
Objectives?
@jasonhand | VictorOps | #DevOpsDays
Happy Camper
@jasonhand | VictorOps | #DevOpsDays
Customerswant more than just
99.999% Uptime@jasonhand | VictorOps | #DevOpsDays
@jasonhand | VictorOps | #DevOpsDays
Where's the
Innovation?@jasonhand | VictorOps | #DevOpsDays
? =@jasonhand | VictorOps | #DevOpsDays
? =ContinuousImprovement
@jasonhand | VictorOps | #DevOpsDays
How Important is
Learning &Innovation?@jasonhand | VictorOps | #DevOpsDays
@jasonhand | VictorOps | #DevOpsDays
@jasonhand | VictorOps | #DevOpsDays
@jasonhand | VictorOps | #DevOpsDays
@jasonhand | VictorOps | #DevOpsDays
The result of underutilizing monitoring & alertingis that the IT department and the organization haveno chance to...
learn,improve, orinnovate.@jasonhand | VictorOps | #DevOpsDays
Continually understanding & responding to the feedbackfrom
monitoring, logging, & alerting
allows you to use information about events in the past to drive future actions.
@jasonhand | VictorOps | #DevOpsDays
SwitchingGears
@jasonhand | VictorOps | #DevOpsDays
@jasonhand | VictorOps | #DevOpsDays
@jasonhand | VictorOps | #DevOpsDays
It's not just about
Prediction& Prevention
@jasonhand | VictorOps | #DevOpsDays
Respond &Repair...Quickly@jasonhand | VictorOps | #DevOpsDays
Nope
@jasonhand | VictorOps | #DevOpsDays
MTTRRather Than
MTBF@jasonhand | VictorOps | #DevOpsDays
Failure IsInevitable
@jasonhand | VictorOps | #DevOpsDays
us·er/ˈyoozər/Distributed fault injection test suite for production.
credit: Leon Fayer (@papa_fire)@jasonhand | VictorOps | #DevOpsDays
Successis a result of
Failure@jasonhand | VictorOps | #DevOpsDays
Understand
LearnInnovate
@jasonhand | VictorOps | #DevOpsDays
re·sil·ient/rəˈzilyənt/The ability to resist, absorb, recover from or successfully adapt to adversity or a change in conditions
@jasonhand | VictorOps | #DevOpsDays
Changecan cause failure
but innovation requires
Change
@jasonhand | VictorOps | #DevOpsDays
Conflict
@jasonhand | VictorOps | #DevOpsDays
ChangeRequired
@jasonhand | VictorOps | #DevOpsDays
“Without deviation from the norm, progress is not possible ”Frank Zappa
@jasonhand | VictorOps | #DevOpsDays
What Did You
LearnFrom the Recovery Efforts?(including monitoring & alerting)
@jasonhand | VictorOps | #DevOpsDays
Postmortems / Learning Reviews:Stories of:
What took placeleading up to & duringthe disruption & recovery efforts
@jasonhand | VictorOps | #DevOpsDays
Who was
involved?
@jasonhand | VictorOps | #DevOpsDays
What did they
see?@jasonhand | VictorOps | #DevOpsDays
What was
said?@jasonhand | VictorOps | #DevOpsDays
What
actionswere taken?jhand.co/chatopsbook
@jasonhand | VictorOps | #DevOpsDays
How do events & actions
correlateover time?@jasonhand | VictorOps | #DevOpsDays
5 Why's@jasonhand | VictorOps | #DevOpsDays
What is the "cause"of the Problem?
Root Cause is ...
@jasonhand | VictorOps | #DevOpsDays
Our...
obsession with
"Root Cause"@jasonhand | VictorOps | #DevOpsDays
Asking "why".. leads to ..
Blame
@jasonhand | VictorOps | #DevOpsDays
Blamingleads to..
operators hiding relevant & important information
@jasonhand | VictorOps | #DevOpsDays
We must
believethat our operators are doing their best given theconstraints of the "system"
@jasonhand | VictorOps | #DevOpsDays
"We are here to"
LearnFrom Failure(and success)
@jasonhand | VictorOps | #DevOpsDays
Rather than ..
@jasonhand | VictorOps | #DevOpsDays
AvoidFailure
@jasonhand | VictorOps | #DevOpsDays
What's the
Story?@jasonhand | VictorOps | #DevOpsDays
InnovateLearning from both success & failureto develop & implementsmall incremental improvementsis critical.
@jasonhand | VictorOps | #DevOpsDays
LearningOrganization
@jasonhand | VictorOps | #DevOpsDays
Learning does NOT come from
Reading&Listening@jasonhand | VictorOps | #DevOpsDays
Learning comes from
Doing@jasonhand | VictorOps | #DevOpsDays
Real Learning comes from:
ObservingOrientingDecidingActingJohn Boyd's OODA Loop
@jasonhand | VictorOps | #DevOpsDays
Example:
Learning to play the
Dobro Guitar@jasonhand | VictorOps | #DevOpsDays
@jasonhand | VictorOps | #DevOpsDays
Learning
@jasonhand | VictorOps | #DevOpsDays
Why?Go from knowing...to understanding...to learning
NOTE:(Requires making mistakes)
@jasonhand | VictorOps | #DevOpsDays
@jasonhand | VictorOps | #DevOpsDays
“We will trade some uptime in exchange for innovation-Dave Hahn (Netflix)”
DevOpsDays Boise 2016(today)
@jasonhand | VictorOps | #DevOpsDays
Are We Doing
itRight?@jasonhand | VictorOps | #DevOpsDays
What do your
Postmortemslook like?
Are they setting you up to learn?
@jasonhand | VictorOps | #DevOpsDays
"The Story"-Timeline-Who Was Involved-Context
(Seeing, Saying, Executing)
-Action Items
(Small Incremental Improvements)
@jasonhand | VictorOps | #DevOpsDays
Shift our gazefrom:
maintaining& protecting
@jasonhand | VictorOps | #DevOpsDays
LearningWhich leads to...
Improving& Innovating
@jasonhand | VictorOps | #DevOpsDays
we increase value
of monitoring & alertingof the IT teamsof Products & Services& of the Organization.
@jasonhand | VictorOps | #DevOpsDays
HypothesizeExploreStretchExperimentFailLearnTry Again@jasonhand | VictorOps | #DevOpsDays
@jasonhand | VictorOps | #DevOpsDays
Learning & Innovatingleads to uncovering new ways of
building, deploying, and maintaining software & infrastructureWhich leads to...
@jasonhand | VictorOps | #DevOpsDays
ResilientSystems
@jasonhand | VictorOps | #DevOpsDays
The
By-product
of a highly
resilientsystem is ...
@jasonhand | VictorOps | #DevOpsDays
@jasonhand | VictorOps | #DevOpsDays
HighlyAvailablesystem@jasonhand | VictorOps | #DevOpsDays
The UnrealizedRole of:
Monitoring& Alerting is ....
@jasonhand | VictorOps | #DevOpsDays
Learning&
Innovation
@jasonhand | VictorOps | #DevOpsDays
Thank
YouBe Victorious!
@jasonhand | VictorOps | #DevOpsDays
References:
Monitoring Survey: https://kartar.net/2015/08/monitoring-survey-2015---metrics/Firefighter: https://www.learyfirefighters.org/wp-content/uploads/2013/09/cover-slide-1.jpgMechanic: https://upload.wikimedia.org/wikipedia/commons/4/4b/Flickr_-_Israel_Defense_Forces_-_Airplane_Technician,_March_2010.jpgGnome Plan: http://www.nerdfitness.com/wp-content/uploads/2012/04/Screen-Shot-2012-03-30-at-3.15.38-AM-1024x7591.jpgNOC: https://upload.wikimedia.org/wikipedia/commons/
@jasonhand | VictorOps | #DevOpsDays
References:
Kodak: http://file.answcdn.com/answ-cld/image/upload/v1/tk/brand_image/b59911fc/91d6e71d30a0878dfe3cb30a22751cb874a3ea8c.jpegVW Camper: https://upload.wikimedia.org/wikipedia/commons/d/d7/VW_Camper.jpgBlockbuster: https://jordanandeddie.files.wordpress.com/2013/11/blockbuster-feature.jpgBorders: http://smashingtops.com/wp-content/uploads/2012/06/borders_logo1.jpg
@jasonhand | VictorOps | #DevOpsDays
References:
Chained Hands: https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&ved=0ahUKEwjgrNCDh5TMAhXJs4MKHaoZDssQjBwIBA&url=http%3A%2F%2Fwww.publicdomainpictures.net%2Fdownload-picture.php%3Fadresar%3D50000%26soubor%3Dhands-in-chains.jpg%26id%3D40426&bvm=bv.119745492,d.amc&psig=AFQjCNFIdnDPzSqiLA-znIW5SCTCUHhqEw&ust=1460926880336203Inevitable: http://vignette4.wikia.nocookie.net/matrix/images/5/51/SMITH.png/revision/latest?cb=20110214092002
@jasonhand | VictorOps | #DevOpsDays
References:
Accident Free:http://www.compliancesigns.com/media/digital-scoreboard/1000/Safety-Awareness-Sign-DSE-195271000.gifStewie:http://chroniclesofredmark.com/wp-content/uploads/2014/01/Stewie.gifchange: http://i.imgur.com/EQyC6N3.gifHard drive: https://i.imgur.com/pWsKSEf.gifChange: https://farm6.staticflickr.com/5208/5270199049df99b234e9od.jpgValue: https://d13yacurqjgara.cloudfront.net/users/
@jasonhand | VictorOps | #DevOpsDays
@jasonhand | VictorOps | #DevOpsDays