106
Your Code is Wrong Nathan Marz @nathanmarz 1

Your Code is Wrong

Embed Size (px)

DESCRIPTION

My keynote at NoSQL Now! on August 21st, 2013

Citation preview

Page 1: Your Code is Wrong

Your Code is Wrong

Nathan Marz@nathanmarz 1

Page 2: Your Code is Wrong

Let’s start with an example

Page 3: Your Code is Wrong

Storm’s “reportError” method

Page 4: Your Code is Wrong

(Storm is a realtime computation system, like Hadoop but for realtime)

Page 5: Your Code is Wrong

Storm architecture

Page 6: Your Code is Wrong

Storm architecture

Master node (similar to Hadoop JobTracker)

Page 7: Your Code is Wrong

Storm architecture

Used for cluster coordination

Page 8: Your Code is Wrong

Storm architecture

Run worker processes

Page 9: Your Code is Wrong

Storm’s “reportError” method

Page 10: Your Code is Wrong

Used to show errors in the Storm UI

Page 11: Your Code is Wrong

Error info is stored in Zookeeper

Page 12: Your Code is Wrong

What happens when a user deploys code like this?

Page 13: Your Code is Wrong

Denial-of-service on Zookeeper and cluster goes down

Page 14: Your Code is Wrong

Robust!

Designed input space Actual input space

Page 15: Your Code is Wrong

Your code is wrong

Page 16: Your Code is Wrong

Your code is literally wrong

Page 17: Your Code is Wrong

Your code is wrong

Page 18: Your Code is Wrong
Page 19: Your Code is Wrong

Why do you believe your code is correct?

Page 20: Your Code is Wrong

Your code

Dependency 1

Dependency 2

Dependency 3

Page 21: Your Code is Wrong

Dependency 1

Dependency 4

Dependency 5

Page 22: Your Code is Wrong

Dependency 4

Dependency 6

Dependency 9

Dependency 7

Dependency 8

Page 23: Your Code is Wrong

Dependency 3,000,000

Hardware

Page 24: Your Code is Wrong

Electronics

Page 25: Your Code is Wrong

Chemistry

Page 26: Your Code is Wrong

Atomic physics

Page 27: Your Code is Wrong

Quantum mechanics

Page 28: Your Code is Wrong

I think I can safely say that nobody understands

quantum mechanics.

Richard Feynman

Page 29: Your Code is Wrong

Your code is wrong

Page 30: Your Code is Wrong

Your code

...

Page 31: Your Code is Wrong

All the software you’ve used has had bugs in it

Page 32: Your Code is Wrong

Including the software you’ve written

Page 33: Your Code is Wrong

Your code issometimes correct

Page 34: Your Code is Wrong

That’s good enough!

Page 35: Your Code is Wrong
Page 36: Your Code is Wrong

Treat code as nondeterministic

Page 37: Your Code is Wrong

Embrace “your code is wrong”to design better software

Page 38: Your Code is Wrong

Robust!

Designed input space Actual input space

Page 39: Your Code is Wrong

Robust!

Designed input space Actual input space

Page 40: Your Code is Wrong

An example

Page 41: Your Code is Wrong

Learning from Hadoop

Jobtracker

Job

Job

Job

Page 42: Your Code is Wrong

Learning from Hadoop

Jobtracker

Job

Job

Job

Page 43: Your Code is Wrong

Learning from Hadoop

Jobtracker

Job

Job

Job

Page 44: Your Code is Wrong

Your code is wrong

Page 45: Your Code is Wrong

So your processes will crash

Page 46: Your Code is Wrong

Storm’s daemons are process fault-tolerant

Page 47: Your Code is Wrong

Storm

Nimbus

Topology

Topology

Topology

Page 48: Your Code is Wrong

Storm

Nimbus

Topology

Topology

Topology

Page 49: Your Code is Wrong

Storm

Nimbus

Topology

Topology

Topology

Page 50: Your Code is Wrong

Storm

Nimbus

Topology

Topology

Topology

Page 51: Your Code is Wrong

Storm

Nimbus

Topology

Topology

Topology

Page 52: Your Code is Wrong

Robust!

Designed input space Actual input space

Page 53: Your Code is Wrong

Robust!

Designed input space Actual input space

Page 54: Your Code is Wrong

The impact of code being wrong

Page 55: Your Code is Wrong

Robust!

Designed input space Actual input space

Failures!Bad performance!Security holes!

Irrelevant!

Page 56: Your Code is Wrong

Design principle #1

Measuring and monitoring are the foundation of solid engineering

Page 57: Your Code is Wrong

Measuring: Under what range of inputs does my software function well?

Page 58: Your Code is Wrong

Monitoring: What’s the actual input space of my software?

Page 59: Your Code is Wrong

Measure & MonitorLatencyThroughputStack tracesBuffer sizesMemory usageCPU usage#threads spawned...

Page 60: Your Code is Wrong

How you monitor your software is as important as its functionality

Page 61: Your Code is Wrong

Design principle #2

Embrace immutability

Page 62: Your Code is Wrong

Read/write databaseApplication

Page 63: Your Code is Wrong

MySQLApplication

Page 64: Your Code is Wrong

MongoDBApplication

Page 65: Your Code is Wrong

RiakApplication

Page 66: Your Code is Wrong

CassandraApplication

Page 67: Your Code is Wrong

HBaseApplication

Page 68: Your Code is Wrong

Your code is wrong

Page 69: Your Code is Wrong

So data will be corrupted

Page 70: Your Code is Wrong

And you may not know why

Page 71: Your Code is Wrong

ViewsImmutable,

ever-growing data

Application

Architecture based on immutability

Page 72: Your Code is Wrong

ViewsImmutable,

ever-growing data

Application

Lambda architecture

Page 73: Your Code is Wrong

Design principle #3

Minimize dependencies

Page 74: Your Code is Wrong

The less that can go wrong, the less that will go wrong

Page 75: Your Code is Wrong

Example:Storm’s usage of Zookeeper

Page 76: Your Code is Wrong

Worker locations stored in Zookeeper

Page 77: Your Code is Wrong

All workers must know locations of other workers to send messages

Page 78: Your Code is Wrong

Two ways to get location updates

Page 79: Your Code is Wrong

1. Poll Zookeeper

Worker Zookeeper

Page 80: Your Code is Wrong

2. Use Zookeeper “watch” feature to get push notifications

Worker Zookeeper

Page 81: Your Code is Wrong

Method 2 is faster but relies on another feature

Page 82: Your Code is Wrong

Storm uses both methods

Worker Zookeeper

Page 83: Your Code is Wrong

If watch feature fails, locations still propagate via polling

Page 84: Your Code is Wrong

Eliminating dependence justified by small amount of code required

Page 85: Your Code is Wrong

Design principle #4

Explicitly respect functional input ranges

Page 86: Your Code is Wrong

Storm’s “reportError” method

Page 87: Your Code is Wrong

Implement self-throttling to avoid overloading other systems

Page 88: Your Code is Wrong

Design principle #5

Embrace recomputation

Page 89: Your Code is Wrong

“Your code is wrong” meanings1. Design input space differs from actual input space2. The logic of your code is wrong3. Requirements are constantly changing

Page 90: Your Code is Wrong

You must be able to change your code to match shifting requirements

Page 91: Your Code is Wrong

Example: blogging software

Page 92: Your Code is Wrong

New requirement: search

Page 93: Your Code is Wrong

Have to build a search index

Page 94: Your Code is Wrong
Page 95: Your Code is Wrong

Recomputation gives you so much more

Page 96: Your Code is Wrong

ViewsImmutable,

ever-growing data

Application

Page 97: Your Code is Wrong

Building software no different than any other engineering

Page 98: Your Code is Wrong

The underlying challenges are the same

Page 99: Your Code is Wrong
Page 100: Your Code is Wrong
Page 101: Your Code is Wrong

What will break it?

Page 102: Your Code is Wrong

What are limits of my dependencies?

Page 103: Your Code is Wrong

How can I add redundancy to increase robustness?

Page 104: Your Code is Wrong

Can I isolate failures?

Page 105: Your Code is Wrong

Our raw materials are ideas instead of matter

Page 106: Your Code is Wrong

Thank you