designing distributed scalable and reliable systems

designing distributed scalable and reliable systems

Mauro ServientiCTO @ Mastreeno, LTD

[email protected]@mauroservienti

//milestone.topics.it//github.com/mauroservienti

NServiceBus trainer/supportRavenDB trainer

Microsoft MVP – Visual C#

Side note: such little time…and I’m Italian :-)

• We have a tons of things to say;• We have a tons of slides (ok, my fault :-P);• My English is the worst thing you’ll ever

hear :-)

• If you, guys, can delay questions, leaving them for the break, it will be much easier :-)– Especially for me :-P

Resources

• Slides on SlideShare: //slideshare.net/mauroservienti

• Samples on GitHub: //github.com/mauroservienti– Repository: Conferences– Branches:• Full-Stack-Sample• 2014/SkillsMatter-InTheBrain-Distributed

TENETSArticles of faith…

None of the following is true

• Network is reliable;• Latency is near to zero or irrelevant;• Bandwidth is unlimited;• Network is secure;• Topology doesn’t change;• Transport cost is irrelevant;• Network is homogeneous;

DEFINITIONSLet’s get in touch…

Consistency

The rate of agreement of observers looking at a system at a given point in time.

The more the observers agree on what they see the more the system is consistent.

Coupling

The rate of dependency among parts of a system.

The more changing a portion of the system impacts on other portions of the system the

more the system is coupled.

Temporal Coupling

It’s a special form of coupling.

The more non-availability of a portion of the system impacts on other parts the more the

system is temporally coupled.

Scalability

the ability of a system to handle a growing amount of work in a capable manner

Scalability is generally difficult to define and in any particular case it is necessary to

define the specific requirements for scalability

The more we scale the more we cannot rely on

consistency

“ACD/C”Scaling can be achieved understanding that we need to choose and accept consequences of our decisions, our pillars should be:

- Asynchronous;- Cached;- Distributed;- And not Consistent;

CONSINSTENCY?Are you kidding me :-)

A strange world :-)

• A new order comes in;• The whole company is informed that a new order

we’ll be processed and we need to:– Understand if items are in stock;– Understand if we need to produce/buy something;

• At the same time production is trying to understand how to schedule the new order but is waiting for the warehouse that is currently used by the sales department to understand if the order can be shipped within the next week;

DEADLOCK

The real world…

• The obvious and only consequence of trying to scale a monolith is the collapse of the entire system;

• The real world:– Does not know at all what transactions are

(especially distributed);– Has a really low, if not null, coupling among parts;– Has no temporal coupling at all;

Transaction boundaries

• We cannot any more rely on transactions to guarantee consistency, e.g.: 1. Update the shopping chart;2. Checkout;3. Create the order;4. Create the shipment request at FedEx;

• “Simply” 1, 2, 3 and 4 can live in different systems on different machines with different databases;– And given our tenets we now have a problem– And a solution... :-)

EVENTUALCONSISTENCY

The rate of the agreement

• Will be low or really low;• Every communication must bring with itself its

version (or timestamp) in order to be able to sort stuff;

• Parts of the system are now free to move independently:– They can evolve due to the low coupling;– Be available or not, depending on their needs, because

there is no temporal coupling;• Parts…parts…parts…and parts

BOUNDEDCONTEXT

Bounded Context

• BCs, easily identified by/via the Ubiquitous Language perfectly map the concept of part or the one of transaction boundary;

• Within the BC the level of consistency is expected to be much higher than cross BCs;

• BCs can be isolated and should be able to live by their own;

• BCs generally are a unit of deployment;

MESSAGESWe have async and distributed parts…but…how they talk to each other?

asynchronous

We cannot rely on RPC calls

the other part is not guaranteed to be there when we need it.

asynchronous

QueueSender Receiver

Now

Some time in the future

distributed

We need an atomic piece of information

we cannot rely on ordering

we cannot rely on receiving the information at the same place (is distributed);

distributed

QueueSender Receiver

C

Receiver

Receiver

BA

B

A

Brok

en!

C

non-coupled

We need the message as small as possible

The more the exchanged vocabulary is large the more coupling we have

Changing the vocabulary is hard, think twice about it

IT’S A LONG WAY TO THE TOP IF YOU WANNA ROCK & ROLL…

Do we really need to go to that root?

without contextproblems

areempty talks

Context & Requirements

• Requirements set the boundary of the problem;– A problem not identified by a requirement does

not exists;• Much more: requirements set the boundary of

the solution;– A solution is valid when within the requirements

otherwise it is over-engineering;

DO WE NEED TO SCALE-OUT?The question should be…

“keep it simple”

• redesign the system from scratch;

• move everything to RavenDB;• Introduce Elastic Search;• Add a Redis cache on Linux

• 6 months, £££££• can fail

• Replace disks on the SQL machine with 2 fast SSD(s)

• 1hr, £• Observe the next 6 months :-)

A “small” e-commerce based on the traditional 3-layer architecture: Pages response time is slow;

“re-design”

• Single batch per request• Trying to scale this fails

• Multi-batch per request• Re-designing the system

guarantees to scale

A delivery system (e.g. your favorite retailer): want to increase the amount of order handled per unit of time

A new hope <cit.>

Moving away from a traditional architecture brings a lot of challenges to the table:• Everything will be async;• Non availability of a system must be managed;• Handling async failures;• Handling of synchronized access to shared resources;• How to correlate messages;• How to handle versioning and upgrades;• And more…

NServiceBusPlease welcome:

Concepts

• Message– An atomic piece of information that has a semantical meaning in

the business;• Component

– Something that can handle a message;• Service

– A set of components grouped by context;• Endpoint

– A set of services grouped by:• SLA(s);• Infrastructure concerns;• Etc..;

Concepts #2

• Command: A message that semantically identifies something to be done (imperative):– "CreateNewUser";

• Event: A message that semantically identifies something happened and immutable (past):– "NewUserCreated";

• Subscription: The notion that an endpoint is interested in an event;

DEMO

Transport

• Transport(s): The technology used to connect systems and transport the message:– MSMQ– RabbitMQ– SQL Server– Azure ServiceBus & Azure Queues

• Serialization: the way messages are"serialized" in order to be transported on the choosen transport;– it is transparent to the transport;

Advanced Concepts

• Saga: An orchestrator for a long running workflow, with the ability to store the saga state across requests and handling concurrency;

• Timeout: The way a Saga can take autonomous decisions;

• Retries: First level and Second level retry engine to handle transient failures;

• Error & Audit: error and auditing management

SCALE OUT & HIGH AVAILABILITYIn an eventual consistent world

Mail & Mail Servers

When we send an email message:• Our relation with the mail server is consistent?

Yes;• Cross-servers relation is consistent? Yes;• Relationship between the last server and the

recipient is consistent? Yes;

The entire system is consistent? No

But we have some guarantees:• Every single hop/node/BC is consistent;• If something along the way fails we will have,

with the same logic, an information back that our request is failed or succeeded;

• Do we need distributed transactions? No :-)• The message is fully enough to guarantee

consistency, in the long run.

DEMO

Publish/Subscribe

• Request/Response is generally considered an anti-pattern;

• Events are the easiest way to drive the world:– SomethingHappened;• DoSomethingToMoveOn;

• Lots of possible listener and lots of possible publishers:– CorrelationID

DEMO

QUESTIONS?We are all set :-)

Technology

designing distributed scalable and reliable systems