189
Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems Jonas Bonér CTO TypEsafe @jboner

Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Embed Size (px)

Citation preview

Page 1: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Resiliency, Failures vs Errors,

Isolation, Delegation

and Replication

in Reactive Systems

Jonas Bonér CTO TypEsafe

@jboner

Page 2: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

“But it ain’t how hard you’re hit; it’s about how hard you can get hit, and keep moving forward. How much you can take, and keep moving forward. That’s

how winning is done.”- Rocky Balboa

Page 3: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

“But it ain’t how hard you’re hit; it’s about how hard you can get hit, and keep moving forward. How much you can take, and keep moving forward. That’s

how winning is done.”- Rocky Balboa

This is Fault Tolerance

Page 4: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Resilience

is Beyond

Fault Tolerance

Page 5: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Resilience

“The ability of a substance or object to spring back into shape. The capacity to recover quickly

from difficulties.”-Merriam Webster

Page 6: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Without Resilience

Nothing Else Matters

Page 7: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Without Resilience

Nothing Else Matters

Page 8: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

“We can model and understand in isolation. But, when released into competitive nominally

regulated societies, their connections proliferate, their interactions and interdependencies multiply,

their complexities mushroom. And we are caught short.”

- Sidney Dekker

Drift into Failure - Sidney Dekker

Page 9: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

We Need to Study

Resilience in

Complex Systems

Page 10: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Complicated System

Page 11: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Complicated System

Page 12: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Complex System

Page 13: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Complex System

Page 14: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Complex System

Page 15: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Complicated ≠ Complex

Page 16: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

“Counterintuitive. That’s [Jay] Forrester’s word to describe complex systems. Leverage points are not intuitive. Or if they are, we intuitively

use them backward, systematically worsening whatever problems we are trying to solve.”

- Donella Meadows

Leverage Points: Places to Intervene in a System - Donella Meadows

Page 17: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013

Operating at the Edge of Failure

Page 18: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013

Economic Failure

Boundary

Operating at the Edge of Failure

Page 19: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013

Economic Failure

Boundary

Unacceptable Workload Boundary

Operating at the Edge of Failure

Page 20: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013

Economic Failure

Boundary

Unacceptable Workload Boundary

Accident Boundary

Operating at the Edge of Failure

Page 21: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013

Economic Failure

Boundary

Unacceptable Workload Boundary

Operating Point

Accident Boundary

Operating at the Edge of Failure

Page 22: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013

Economic Failure

Boundary

Unacceptable Workload Boundary

Accident Boundary

Operating at the Edge of Failure

Page 23: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013

Economic Failure

Boundary

Unacceptable Workload Boundary

FAILURE

Accident Boundary

Operating at the Edge of Failure

Page 24: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Economic Failure

Boundary

Unacceptable Workload Boundary

Accident Boundary

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013

Operating at the Edge of Failure

Page 25: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Economic Failure

Boundary

Unacceptable Workload Boundary

Accident Boundary

Management Pressure

Towards Economic Efficiency

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013

Operating at the Edge of Failure

Page 26: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Economic Failure

Boundary

Unacceptable Workload Boundary

Accident Boundary

Management Pressure

Towards Economic Efficiency

Gradient Towards Least Effort

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013

Operating at the Edge of Failure

Page 27: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Economic Failure

Boundary

Unacceptable Workload Boundary

Accident Boundary

Management Pressure

Towards Economic Efficiency

Gradient Towards Least Effort

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013

Operating at the Edge of Failure

Page 28: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Economic Failure

Boundary

Unacceptable Workload Boundary

Accident Boundary

Management Pressure

Towards Economic Efficiency

Gradient Towards Least Effort

Counter Gradient For More Resilience

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013

Operating at the Edge of Failure

Page 29: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013

Economic Failure

Boundary

Unacceptable Workload Boundary

Accident Boundary

Operating at the Edge of Failure

Page 30: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013

Economic Failure

Boundary

Unacceptable Workload Boundary

Accident Boundary

Error Margin

Marginal Boundary

Operating at the Edge of Failure

Page 31: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013

Economic Failure

Boundary

Unacceptable Workload Boundary

Accident Boundary

Error Margin

Marginal Boundary

Operating at the Edge of Failure

Page 32: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013

Economic Failure

Boundary

Unacceptable Workload Boundary

Accident Boundary

Error Margin

Marginal Boundary

Operating at the Edge of Failure

Page 33: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013

Accident Boundary

Marginal Boundary

Operating at the Edge of Failure

Page 34: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013

Marginal Boundary

?

Operating at the Edge of Failure

Page 35: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013

Operating at the Edge of Failure

Accident Boundary

Marginal Boundary

Page 36: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013

Operating at the Edge of Failure

Accident Boundary

Marginal Boundary

Page 37: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013

Operating at the Edge of Failure

Accident Boundary

Marginal Boundary

Page 38: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013

Operating at the Edge of Failure

Accident Boundary

Marginal Boundary

Page 39: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Embrace Failure

Page 40: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Resilience in

Social Systems

Page 41: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Simple Critical Infrastructure Map

Understanding vital services, and how they keep you safe

6 ways to die

3 sets of essential services

7 layers of PROTECTION

Dealing in Security - Mike Bennet, Vinay Gupta

Page 42: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

7 Principles for Building

Resilience in Social Systems

1. Maintain diversity & Redundancy 2. Manage connectivity 3. Manage slow variables & feedback 4. Foster complex adaptive systems thinking 5. Encourage learning 6. Broaden participation 7. Promote polycentric governance

Principles for Building Resilience: Sustaining Ecosystem Services in Social-Ecological Systems - Reinette Biggs et. al.

Page 43: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Resilience in

Biological Systems

Page 44: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

MeerkatsPuppies! Now that I’ve got your attention, complexity theory - Nicolas Perony, TED talk

Page 45: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

What We Can Learn

From Biological Systems

1. Feature Diversity and redundancy 2. Inter-Connected network structure 3. Wide distribution across all scales 4. Capacity to self-adapt & self-organize

Toward Resilient Architectures 1: Biology Lessons - Michael Mehaffy, Nikos A. Salingaros

Page 46: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

“Animals show extraordinary social complexity, and this allows them to adapt and

respond to changes in their environment. In three words, in the animal kingdom,

simplicity leads to complexity which leads to resilience.”

- Nicolas Perony

Puppies! Now that I’ve got your attention, complexity theory - Nicolas Perony, TED talk

Page 47: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Resilience in

Computer Systems

Page 48: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems
Page 49: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

“Complex systems run in degraded mode.” “Complex systems run as broken systems.”

- richard Cook

How Complex Systems Fail - Richard Cook

Page 50: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Resilience

is by

Design

Photo courtesy of FEMA/Joselyne Augustino

Page 51: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

We Need to

Manage Failure

Page 52: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

“Post-accident attribution to a ‘root cause’ is fundamentally wrong:

Because overt failure requires multiple faults, there is no isolated ‘cause’ of an accident.”

- richard Cook

How Complex Systems Fail - Richard Cook

Page 53: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

There is No

Root Cause

Page 54: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Crash Only

Software

Crash-Only Software - George Candea, Armando Fox

Stop = Crash Safely Start = Recover Fast

Page 55: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

“To make a system of interconnected components crash-only, it must be designed so that components

can tolerate the crashes and temporary unavailability of their peers. This means we require: [1] strong

modularity with relatively impermeable component boundaries, [2] timeout-based communication and

lease-based resource allocation, and [3] self-describing requests that carry a time-to-live and

information on whether they are idempotent.”- George Candea, Armando Fox

Crash-Only Software - George Candea, Armando Fox

Page 56: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Recursive Restartability

Turning the Crash-Only Sledgehammer into a Scalpel

Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel - George Candea, Armando Fox

Page 57: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Services need to accept

NO for an answer

Page 58: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

"Software components should be designed such that they can deny service for any request or call.

Then, if an underlying component can say No, apps must be designed to take No for an answer

and decide how to proceed: give up, wait and retry, reduce fidelity, etc.”

- George Candea, Armando Fox

Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel - George Candea, Armando Fox

Learn to take

NO for an answer

Page 59: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

“The explosive growth of software has added greatly to systems’ interactive

complexity. With software, the possible states that a system can end up in become

mind-boggling.”

- Sidney Dekker

Drift into Failure - Sidney Dekker

Page 60: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Classification

of State

• Static Data • Scratch Data • Dynamic Data

• Recomputable • not recomputable

Page 61: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Classification

of State

• Static Data • Scratch Data • Dynamic Data

• Recomputable • not recomputable

Critical

Page 62: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

We Need a

Way Out of the

State Tar Pit

Out of the Tar Pit - Ben Moseley , Peter Marks

Page 63: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Essential State

We Need a

Way Out of the

State Tar Pit

Out of the Tar Pit - Ben Moseley , Peter Marks

Page 64: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Essential State

We Need a

Way Out of the

State Tar Pit

Out of the Tar Pit - Ben Moseley , Peter Marks

Essential Logic

Page 65: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Essential State

We Need a

Way Out of the

State Tar Pit

Out of the Tar Pit - Ben Moseley , Peter Marks

Essential Logic

Accidental State and Control

Page 66: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Traditional

State Management

Object

Critical state that needs protection

Client

Thread boundary

Page 67: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Traditional

State Management

Object

Critical state that needs protection

Client

Thread boundary

Page 68: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Traditional

State Management

Object

Critical state that needs protection

Client

Thread boundary

Page 69: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Traditional

State Management

Object

Critical state that needs protection

Client

Thread boundary

Synchronous dispatch Thread boundary

Page 70: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Traditional

State Management

Object

Critical state that needs protection

Client

Thread boundary

Synchronous dispatch Thread boundary

Page 71: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Traditional

State Management

Object

Critical state that needs protection

Client

Thread boundary

Synchronous dispatch Thread boundary

?

Page 72: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Traditional

State Management

Object

Critical state that needs protection

Client

Thread boundary

Synchronous dispatch Thread boundary

?

Utterly broken

Page 73: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Requirements for a

Sane Failure Mode

1. Contained 2. Reified—as messages 3. Signalled—Asynchronously 4. Observed—by 1-N 5. Managed

Failures need to be

Page 74: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Actors

Page 75: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Actors

case class Greeting(who: String)

class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } }

Page 76: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Actors

case class Greeting(who: String)

class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } }

Define the message(s) the Actor should be able to respond to

Page 77: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Actors

case class Greeting(who: String)

class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } }

Define the message(s) the Actor should be able to respond to

Define the Actor class

Page 78: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Actors

case class Greeting(who: String)

class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } }

Define the message(s) the Actor should be able to respond to

Define the Actor class

Define the Actor’s behavior

Page 79: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Actors

case class Greeting(who: String)

class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } }

Page 80: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Actors

case class Greeting(who: String)

class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } }

val system = ActorSystem("MySystem") val greeter = system.actorOf(Props[GreetingActor], "greeter")

Page 81: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Actors

case class Greeting(who: String)

class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } }

val system = ActorSystem("MySystem") val greeter = system.actorOf(Props[GreetingActor], "greeter")

Create an Actor system

Page 82: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Actors

case class Greeting(who: String)

class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } }

val system = ActorSystem("MySystem") val greeter = system.actorOf(Props[GreetingActor], "greeter")

Create an Actor systemActor configuration

Page 83: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Actors

case class Greeting(who: String)

class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } }

val system = ActorSystem("MySystem") val greeter = system.actorOf(Props[GreetingActor], "greeter")

Give it a name

Create an Actor systemActor configuration

Page 84: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Actors

case class Greeting(who: String)

class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } }

val system = ActorSystem("MySystem") val greeter = system.actorOf(Props[GreetingActor], "greeter")

Give it a nameCreate the Actor

Create an Actor systemActor configuration

Page 85: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Actors

case class Greeting(who: String)

class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } }

val system = ActorSystem("MySystem") val greeter = system.actorOf(Props[GreetingActor], "greeter")

Give it a nameCreate the ActorYou get an ActorRef back

Create an Actor systemActor configuration

Page 86: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Actors

case class Greeting(who: String)

class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } }

val system = ActorSystem("MySystem") val greeter = system.actorOf(Props[GreetingActor], "greeter")

Page 87: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Actors

case class Greeting(who: String)

class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } }

val system = ActorSystem("MySystem") val greeter = system.actorOf(Props[GreetingActor], "greeter")

greeter ! Greeting("Charlie Parker")

Page 88: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Actors

case class Greeting(who: String)

class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } }

val system = ActorSystem("MySystem") val greeter = system.actorOf(Props[GreetingActor], "greeter")

greeter ! Greeting("Charlie Parker")

Send the message asynchronously

Page 89: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Actors

case class Greeting(who: String)

class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } }

val system = ActorSystem("MySystem") val greeter = system.actorOf(Props[GreetingActor], "greeter")

greeter ! Greeting("Charlie Parker")

Page 90: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Bulkhead

Pattern

Page 91: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Bulkhead

Pattern

Page 92: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Bulkhead

Pattern

Page 93: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Enter Supervision

Page 94: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Enter Supervision

Page 95: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

The

Vending

Machine

Pattern

Page 96: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Think Vending Machine

Coffee Machine

Programmer

Page 97: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Think Vending Machine

Coffee Machine

Programmer

Inserts coins

Page 98: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Think Vending Machine

Coffee Machine

Programmer

Inserts coins

Add more coins

Page 99: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Think Vending Machine

Coffee Machine

Programmer

Inserts coins

Gets coffee

Add more coins

Page 100: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Think Vending Machine

ProgrammerCoffee

Machine

Page 101: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Think Vending Machine

Programmer

Inserts coins

Coffee Machine

Page 102: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Think Vending Machine

Programmer

Inserts coins

Out of coffee beans error Coffee Machine

Page 103: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Think Vending Machine

Programmer

Inserts coins

Out of coffee beans error

WRONGCoffee

Machine

Page 104: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Think Vending Machine

Programmer

Inserts coins

Coffee Machine

Page 105: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Think Vending Machine

Programmer

Inserts coins

Out of coffee beans

failure

Coffee Machine

Page 106: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Think Vending Machine

Programmer

Service Guy

Inserts coins

Out of coffee beans

failure

Coffee Machine

Page 107: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Think Vending Machine

Programmer

Service Guy

Inserts coins

Out of coffee beans

failure

Adds more beans

Coffee Machine

Page 108: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Think Vending Machine

Programmer

Service Guy

Inserts coins

Gets coffee

Out of coffee beans

failure

Adds more beans

Coffee Machine

Page 109: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Think Vending Machine

ServiceClient

Page 110: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Think Vending Machine

ServiceClient

Request

Page 111: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Think Vending Machine

ServiceClient

Request

Response

Page 112: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Think Vending Machine

ServiceClient

Request

Response

Validation Error

Page 113: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Think Vending Machine

ServiceClient

Request

Response

Validation Error

Application Failure

Page 114: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Think Vending Machine

ServiceClient

Supervisor

Request

Response

Validation Error

Application Failure

Page 115: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Think Vending Machine

ServiceClient

Supervisor

Request

Response

Validation Error

Application Failure

Manages Failure

Page 116: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

“Accidents come from relationships not broken parts.”

- Sidney dekker

Drift into Failure - Sidney Dekker

Page 117: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems
Page 118: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Error Kernel

Pattern

Onion-layered state & Failure management

Making reliable distributed systems in the presence of software errors - Joe Armstrong On Erlang, State and Crashes - Jesper Louis Andersen

Page 119: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Onion Layered

State Management

Object

Critical state that needs protection

Client

Thread boundary

Page 120: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Onion Layered

State Management

Object

Critical state that needs protection

Client

Thread boundary

Page 121: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Onion Layered

State Management

Error Kernel

Object

Critical state that needs protection

Client

Thread boundary

Page 122: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Onion Layered

State Management

Error Kernel

Object

Critical state that needs protection

Client

Thread boundary

Page 123: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Onion Layered

State Management

Error Kernel

Object

Critical state that needs protection

Client

Supervision

Thread boundary

Page 124: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Onion Layered

State Management

Error Kernel

Object

Critical state that needs protection

Client

Supervision

Supervision

Thread boundary

Page 125: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Onion Layered

State Management

Error Kernel

Object

Critical state that needs protection

Client

Supervision

Supervision

Thread boundary

Page 126: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Onion Layered

State Management

Error Kernel

Object

Critical state that needs protection

Client

Supervision

Supervision

Thread boundary

Page 127: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Onion Layered

State Management

Error Kernel

Object

Critical state that needs protection

Client

Supervision

Supervision

Thread boundary

Page 128: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Onion Layered

State Management

Error Kernel

Object

Critical state that needs protection

Client

Supervision

Supervision

Thread boundary

Page 129: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Onion Layered

State Management

Error Kernel

Object

Critical state that needs protection

Client

Supervision

Supervision

Thread boundary

Page 130: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Onion Layered

State Management

Error Kernel

Object

Critical state that needs protection

Client

Supervision

Supervision

Thread boundary

Page 131: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Onion Layered

State Management

Error Kernel

Object

Critical state that needs protection

Client

Supervision

Supervision

Thread boundary

Page 132: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Onion Layered

State Management

Error Kernel

Object

Critical state that needs protection

Client

Supervision

Supervision

Thread boundary

Page 133: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Supervision in Akka

Every actor has a default supervisor strategy. Which can, and often should, be overridden.

class Supervisor extends Actor { override val supervisorStrategy = OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1 minute) { case _: ArithmeticException => Resume case _: NullPointerException => Restart case _: Exception => Escalate }

val worker = context.actorOf(Props[Worker], name = "worker")

def receive = { case number: Int => worker.forward(number) } }

Page 134: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Supervision in Akka

Every actor has a default supervisor strategy. Which can, and often should, be overridden.

class Supervisor extends Actor { override val supervisorStrategy = OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1 minute) { case _: ArithmeticException => Resume case _: NullPointerException => Restart case _: Exception => Escalate }

val worker = context.actorOf(Props[Worker], name = "worker")

def receive = { case number: Int => worker.forward(number) } }

Parent actor

Page 135: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Supervision in Akka

Every actor has a default supervisor strategy. Which can, and often should, be overridden.

class Supervisor extends Actor { override val supervisorStrategy = OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1 minute) { case _: ArithmeticException => Resume case _: NullPointerException => Restart case _: Exception => Escalate }

val worker = context.actorOf(Props[Worker], name = "worker")

def receive = { case number: Int => worker.forward(number) } }

Parent actor

All its children have their life-cycle managed through this declarative supervision strategy

Page 136: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Supervision in Akka

Every actor has a default supervisor strategy. Which can, and often should, be overridden.

class Supervisor extends Actor { override val supervisorStrategy = OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1 minute) { case _: ArithmeticException => Resume case _: NullPointerException => Restart case _: Exception => Escalate }

val worker = context.actorOf(Props[Worker], name = "worker")

def receive = { case number: Int => worker.forward(number) } }

Create a supervised child actor

Parent actor

All its children have their life-cycle managed through this declarative supervision strategy

Page 137: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Monitor through

Death Watch

class Watcher extends Actor { val child = context.actorOf(Props.empty, "child") context.watch(child)

def receive = { case Terminated(`child`) => … // handle child termination } }

Page 138: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Monitor through

Death Watch

class Watcher extends Actor { val child = context.actorOf(Props.empty, "child") context.watch(child)

def receive = { case Terminated(`child`) => … // handle child termination } }

Create a child actor

Page 139: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Monitor through

Death Watch

class Watcher extends Actor { val child = context.actorOf(Props.empty, "child") context.watch(child)

def receive = { case Terminated(`child`) => … // handle child termination } }

Create a child actor

Watch it

Page 140: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Monitor through

Death Watch

class Watcher extends Actor { val child = context.actorOf(Props.empty, "child") context.watch(child)

def receive = { case Terminated(`child`) => … // handle child termination } }

Create a child actor

Watch it

Receive termination signal

Page 141: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Maintain Diversity

and Redundancy

Page 142: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Maintain Diversity

and Redundancy

Page 143: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Routing

akka { actor { deployment { /service/router { router = round-robin-pool resizer { lower-bound = 12 upper-bound = 15 } } } } }

Page 144: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Cluster

akka { actor { deployment { /service/router { router = round-robin-pool resizer { lower-bound = 12 upper-bound = 15 } } }

provider = "akka.cluster.ClusterActorRefProvider" }    cluster { seed-nodes = [ “akka.tcp://[email protected]:2551", “akka.tcp://[email protected]:2552" ]   } }

Page 145: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

The Network

is Reliable

Page 146: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

The Network

is Reliable

NAT

Page 147: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

We are living in the Looming

Shadow of

Impossibility

Theorems

Page 148: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

We are living in the Looming

Shadow of

Impossibility

Theorems

CAP: Consistency is impossible

Page 149: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

We are living in the Looming

Shadow of

Impossibility

Theorems

CAP: Consistency is impossibleFLP: Consensus is impossible

Page 150: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Strong

Consistency

Is the wrong default

Page 151: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

We need Decoupling IN

Time and Space

Page 152: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Resilient Protocols

Depend on Asynchronous Communication

Eventual Consistency

Page 153: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Resilient Protocols

• are tolerant to• Message loss• Message reordering• Message duplication

Depend on Asynchronous Communication

Eventual Consistency

Page 154: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Resilient Protocols

• are tolerant to• Message loss• Message reordering• Message duplication

• Embrace ACID 2.0• Associative

• Commutative

• Idempotent

• Distributed

Depend on Asynchronous Communication

Eventual Consistency

Page 155: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Distributed Data

class DataBot extends Actor with ActorLogging { val replicator = DistributedData(context.system).replicator implicit val node = Cluster(context.system)

val tickTask = context.system.scheduler.schedule( 5.seconds, 5.seconds, self, Add)

val DataKey = ORSetKey[String]("key") replicator ! Subscribe(DataKey, self)

def receive = {

Page 156: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Distributed Data

class DataBot extends Actor with ActorLogging { val replicator = DistributedData(context.system).replicator implicit val node = Cluster(context.system)

val tickTask = context.system.scheduler.schedule( 5.seconds, 5.seconds, self, Add)

val DataKey = ORSetKey[String]("key") replicator ! Subscribe(DataKey, self)

def receive = { case Tick => val s = ThreadLocalRandom.current().nextInt(97, 123).toChar.toString if (ThreadLocalRandom.current().nextBoolean()) replicator ! Update(DataKey, ORSet.empty[String], WriteLocal)(_ + s) else replicator ! Update(DataKey, ORSet.empty[String], WriteLocal)(_ - s) case c @ Changed(DataKey) => val data = c.get(DataKey)

case _: UpdateResponse[_] => // ignore } }

Page 157: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

“An escalator can never break: it can only become stairs. You should never see an

Escalator Temporarily Out Of Order sign, just Escalator Temporarily Stairs. Sorry for the

convenience.”- Mitch Hedberg

Page 158: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Graceful

Degradation

Page 159: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Always Rely on Asynchronous

Communication

Page 160: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Always Rely on Asynchronous

Communication

…And If not pOssiblE Always use

Timeouts

Page 161: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Circuit Breaker

Page 162: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Circuit Breaker in Akka

val breaker = new CircuitBreaker( context.system.scheduler, maxFailures = 5, callTimeout = 10.seconds, resetTimeout = 1.minute ).onOpen(…) .onHalfOpen(…)

val result = breaker.withCircuitBreaker(Future(dangerousCall()))

Page 163: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Little’s Law

������

W: Response Time

L: Queue Length

Page 164: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Little’s Law

������

Queue Length = Arrival Rate * Response Time

W: Response Time

L: Queue Length

Page 165: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Little’s Law

�������

Response Time = Queue Length / Arrival Rate

W: Response Time

L: Queue Length

Page 166: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Flow Control

Page 167: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Flow Control

Always Apply Back Pressure

Page 168: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Streams

Page 169: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Streams

Page 170: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Streams

in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out bcast ~> f4 ~> merge

Page 171: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Streams

in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out bcast ~> f4 ~> merge

val g = FlowGraph.closed() { implicit builder: FlowGraph.Builder => import FlowGraph.Implicits._

val in = Source(1 to 10) val out = Sink.ignore

val bcast = builder.add(Broadcast[Int](2)) val merge = builder.add(Merge[Int](2))

val f1, f2, f3, f4 = Flow[Int].map(_ + 10)

}

Page 172: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Streams

in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out bcast ~> f4 ~> merge

val g = FlowGraph.closed() { implicit builder: FlowGraph.Builder => import FlowGraph.Implicits._

val in = Source(1 to 10) val out = Sink.ignore

val bcast = builder.add(Broadcast[Int](2)) val merge = builder.add(Merge[Int](2))

val f1, f2, f3, f4 = Flow[Int].map(_ + 10)

}

Set up the context

Page 173: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Streams

in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out bcast ~> f4 ~> merge

val g = FlowGraph.closed() { implicit builder: FlowGraph.Builder => import FlowGraph.Implicits._

val in = Source(1 to 10) val out = Sink.ignore

val bcast = builder.add(Broadcast[Int](2)) val merge = builder.add(Merge[Int](2))

val f1, f2, f3, f4 = Flow[Int].map(_ + 10)

}

Set up the context

Create the Source and Sink

Page 174: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Streams

in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out bcast ~> f4 ~> merge

val g = FlowGraph.closed() { implicit builder: FlowGraph.Builder => import FlowGraph.Implicits._

val in = Source(1 to 10) val out = Sink.ignore

val bcast = builder.add(Broadcast[Int](2)) val merge = builder.add(Merge[Int](2))

val f1, f2, f3, f4 = Flow[Int].map(_ + 10)

}

Set up the context

Create the Source and Sink

Create the fan out and fan in stages

Page 175: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Streams

in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out bcast ~> f4 ~> merge

val g = FlowGraph.closed() { implicit builder: FlowGraph.Builder => import FlowGraph.Implicits._

val in = Source(1 to 10) val out = Sink.ignore

val bcast = builder.add(Broadcast[Int](2)) val merge = builder.add(Merge[Int](2))

val f1, f2, f3, f4 = Flow[Int].map(_ + 10)

}

Set up the context

Create the Source and Sink

Create the fan out and fan in stages

Create a set of transformations

Page 176: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Akka Streams

in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out bcast ~> f4 ~> merge

val g = FlowGraph.closed() { implicit builder: FlowGraph.Builder => import FlowGraph.Implicits._

val in = Source(1 to 10) val out = Sink.ignore

val bcast = builder.add(Broadcast[Int](2)) val merge = builder.add(Merge[Int](2))

val f1, f2, f3, f4 = Flow[Int].map(_ + 10)

}

Set up the context

Create the Source and Sink

Create the fan out and fan in stages

Create a set of transformations

Define the graph processing blueprint

Page 177: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Testing

Page 178: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

What can we learn from Arnold?

Page 179: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

What can we learn from Arnold?

Page 180: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

What can we learn from Arnold?

Blow things up

Page 181: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Shoot

Your App

Down

Page 182: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Pull the Plug

…and see what happens

Page 183: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems
Page 184: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Executive

Summary

Page 185: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

“Complex systems run in degraded mode.” “Complex systems run as broken systems.”

- richard Cook

How Complex Systems Fail - Richard Cook

Page 186: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Resilience

is by

Design

Photo courtesy of FEMA/Joselyne Augustino

Page 187: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

ReferencesAntifragile: Things That Gain from Disorder - http://www.amazon.com/Antifragile-Things-that-Gain-Disorder-ebook/dp/B009K6DKTS Drift into Failure - http://www.amazon.com/Drift-into-Failure-Components-Understanding-ebook/dp/B009KOKXKYHow Complex Systems Fail - http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdfLeverage Points: Places to Intervene in a System - http://www.donellameadows.org/archives/leverage-points-places-to-intervene-in-a-system/ Going Solid: A Model of System Dynamics and Consequences for Patient Safety - http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1743994/Resilience in Complex Adaptive Systems: Operating at the Edge of Failure - https://www.youtube.com/watch?v=PGLYEDpNu60Dealing in Security - http://resiliencemaps.org/files/Dealing_in_Security.July2010.en.pdfPrinciples for Building Resilience: Sustaining Ecosystem Services in Social-Ecological Systems - http://www.amazon.com/Principles-Building-Resilience-Sustaining-Social-Ecological/dp/110708265X Puppies! Now that I’ve got your attention, Complexity Theory - https://www.ted.com/talks/nicolas_perony_puppies_now_that_i_ve_got_your_attention_complexity_theoryHow Bacteria Becomes Resistant - http://www.abc.net.au/science/slab/antibiotics/resistance.htmTowards Resilient Architectures: Biology Lessons - http://www.metropolismag.com/Point-of-View/March-2013/Toward-Resilient-Architectures-1-Biology-Lessons/Crash-Only Software - https://www.usenix.org/legacy/events/hotos03/tech/full_papers/candea/candea.pdfRecursive Restartability: Turning the Reboot Sledgehammer into a Scalpel - http://roc.cs.berkeley.edu/papers/recursive_restartability.pdfOut of the Tar Pit - http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.93.8928Bulkhead Pattern - http://skife.org/architecture/fault-tolerance/2009/12/31/bulkheads.htmlMaking Reliable Distributed Systems in the Presence of Software Errors - http://www.erlang.org/download/armstrong_thesis_2003.pdfOn Erlang, State and Crashes - http://jlouisramblings.blogspot.be/2010/11/on-erlang-state-and-crashes.htmlAkka Supervision - http://doc.akka.io/docs/akka/snapshot/general/supervision.htmlRelease It!: Design and Deploy Production-Ready Software - https://pragprog.com/book/mnee/release-itHystrix - https://github.com/Netflix/HystrixAkka Circuit Breaker - http://doc.akka.io/docs/akka/snapshot/common/circuitbreaker.html Reactive Streams - http://reactive-streams.orgAkka Streams - http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0/scala/stream-introduction.htmlRxJava - https://github.com/ReactiveX/RxJavaFeedback Control for Computer Systems - http://www.amazon.com/Feedback-Control-Computer-Systems-Philipp/dp/1449361692Simian Army - https://github.com/Netflix/SimianArmyGatling - http://gatling.ioAkka MultiNode Testing - http://doc.akka.io/docs/akka/snapshot/dev/multi-node-testing.html

Page 188: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

EXPERT TRAINING Delivered On-site For Spark, Scala, Akka And Play

Help is just a click away. Get in touch with Typesafe about our training courses.

• Intro Workshop to Apache Spark • Fast Track & Advanced Scala • Fast Track to Akka with Java or Scala • Fast Track to Play with Java or Scala • Advanced Akka with Java or Scala

Ask us about local trainings available by 24 Typesafe partners in 14 countries around the world.

CONTACT US Learn more about on-site training

Page 189: Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Thank

You