View
167
Download
0
Category
Preview:
DESCRIPTION
Slide deck from my presentation at #velocityconf 2014: http://oreil.ly/NtSknc
Citation preview
The Operational Cost of Technical Debt
Kurt Andersen
@drkurta
In our daily work, there are things which slow us down or make us inefficient
3
noisrevNI
LinkedIn 2002 2011
With awareness and will, you can fix those problems
5
Kafka 0.7 0.8
Upgrading from 0.7
0.8, the release in which added replication, was our first backwards-incompatible release: major changes were made . . .
The upgrade from 0.7 to 0.8.x requires a special tool for migration.
This migration can be done without downtime.
from https://kafka.apache.org/documentation.html
We have all dealt with processes, systems or procedures that are lodged in the past
7
LinkedIn’s use of memcached
9
What do we need to solve the problems of technical debt?
INversion
Examples in action:
INversion
Kafka 0.7 0.8 migration
INversion
Kafka 0.7 0.8 migration6 year old version of memcached
INversion
Kafka 0.7 0.8 migration6 year old version of memcached
wire-line format change
from java-serialized objects via RPC to REST+JSON
If we recognize the problems and evaluate the costs correctly,
we make better decisions about how to spend our efforts
11
Technical Debt is a Decision
12
Technical debt accumulates from a series of small choices
ê 13
"Doing it this way" is good enough for now
ê 14
noisrevNI
We'll just skip version N+1 and look at N+2 or higher
ê 15
Changing to version X is going to take a lot of work
ê 16
"This is the way we do things, it is not open to discussion"
ê 17
Infrastructure becomes technical debt by focusing on shiny new
features
ê18
"We can't afford the time to upgrade infrastructure, we have to ship features A + B"
ê 19
"What have you done for me lately" is more sellable than preventing problems
ê 20
Y2k
Past decisions become debt unless they are updated to reflect
new realities
ê 21
Assumptions/predictions which are made early in the design process can be way off the mark.
ê 22
Mary, Mary, quite contraryHow does your system scale?
ê 23
noisrevNI
“One in a million” happens multiple times per hour or minute at web scale
ê 24
What are the direct costs of technical debt?
25
System outages and errors increase
ê 26
Development process was more and more bogged down in conflict resolution in the branch dev model
ê 27
noisrevNI
28
Teams develop work-arounds and procedures that are more
complicated than the problem
ê 29
Signs you are dealing with tech debt:
ê 30
1) “cult” ops
2) “red face” quotient
3) working around problems rather than fixing them
Signs you are dealing with tech debt:
ê 31
1) “cult” ops
2) “red face” quotient
3) working around problems rather than fixing them
Signs you are dealing with tech debt:
ê 32
1) “cult” ops
2) “red face” quotient
3) working around problems rather than fixing them
New features are blocked when the infrastructure can’t deal with new
loads.
ê 33
Capacity uplifts become increasingly painful
or impossible
ê 34
Constant rollbacks and rework cause stress on dev and ops everyone
ê 35
What are the indirect costs of technical debt?
36
Technical debt devalues ops in favor of new feature development
ê 37
"No one gets promoted for retiring debt"
ê 38
“Our ops guys are so good, they can make anything work”
ê 39
Supporting zombies leads to finger-pointing and avoidance
ê 40
Zombies are unsupported and unsupportable
ê 41
Zombies require active intervention to stop
ê 42
Technical debt leads to demoralization
ê 43
Being constantly reactive is no fun
ê 44
Friction for teams like customer support makes it harder than necessary to provide excellent support
ê 45
How do you balance retiring technical debt against other development work?
46
Recognize debt choices and decisions
ê 47
Never say "never"
ê 48
Keep an open mind
ê 49
Revisit old decisions as usage and requirements change
ê 50
Measure the right things
ê 51
52
Time to Repairand Effort
Impact frequency, severity and reach
53
Error rates
54
Capacity/Headroom
ê 55
If you were implementing package X today, what would you do differently?
ê 56
Evaluate all the costs: either to fix or to tolerate
ê 57
Make active decisions
ê 58
What is your job?
60
How did our examples turn out?
• INversion• memcached• Kafka 0.7 0.8• Rest.LinoisrevNI
1. Check code into trunk2. Peer review3. Release from trunk4. Continuous integration5. Service owners own their
services6. Canary all deployments7. New features ramped not
binary
61
How did our examples turn out?
• INversion• memcached• Kafka 0.7 0.8• Rest.Li
62
How did our examples turn out?
• INversion• memcached• Kafka 0.7 0.8• Rest.Li
63
How did our examples turn out?
• INversion• memcached• Kafka 0.7 0.8• Rest.Li
Moving beyond the debt crisis
64
Advance our standards, set upon our foes Our ancient word of courage, fair Saint George, Inspire us with the spleen of fiery dragons! Upon them! victory sits on our helms.
Richard III. act v, sc.3.
Transforming the way the world works.
Kurt Andersenkurta@linkedin.com
@drkurta
Appendix
Members first
Relationships matter
Be open, honest, and constructive
Demand excellence
Take intelligent risks
Act like an owner
Values
Recommended