90
The Data Dichotomy: Rethinking Data & Services with Streams

The Data Dichotomy- Rethinking the Way We Treat Data and Services

Embed Size (px)

Citation preview

Page 1: The Data Dichotomy- Rethinking the Way We Treat Data and Services

The Data Dichotomy:

Rethinking Data & Services with Streams

Page 2: The Data Dichotomy- Rethinking the Way We Treat Data and Services
Page 3: The Data Dichotomy- Rethinking the Way We Treat Data and Services

What are microservices really about?

Page 4: The Data Dichotomy- Rethinking the Way We Treat Data and Services

GUI

UI Service

Orders Service

Returns Service

Fulfilment Service

Payment Service

Stock Service

Splitting the Monolith? Single Process /Code base

/Deployment

Many Processes /Code bases

/Deployments

Page 5: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Orders Service

Autonomy?

Page 6: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Orders Service

Stock Service

Email Service

Fulfillment Service

Independently Deployable

Independently Deployable

Independently Deployable

Independently Deployable

Page 7: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Independence is where services get

their value

Page 8: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Allows Scaling

Page 9: The Data Dichotomy- Rethinking the Way We Treat Data and Services

but not just in terms of data/load/throughput...

Page 10: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Scaling in terms of people

What happens when we grow?

Page 11: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Companies are inevitably a collection of applications They must work together to some degree

Page 12: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Interconnection is an afterthought

FTP / Enterprise Messaging etc

Page 13: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Internal World

External world

External world

External World

External world

Service Boundary

Services Force Us To Consider The External World

Page 14: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Internal World

External world

External world

External World

External world

Service Boundary

External World is something we should Design For

Page 15: The Data Dichotomy- Rethinking the Way We Treat Data and Services

But this independence

comes at a cost $$$

Page 16: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Orders Object

Statement Object getOpenOrders(),

Less Surface Area = Less Coupling

Encapsulate State

Encapsulate Functionality

Consider Two Objects in one address space

Encapsulation => Loose Coupling

Page 17: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Change 1 Change 2

Redeploy

Singly-deployable apps are easy

Page 18: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Orders Service Statement Service getOpenOrders(),

Independently Deployable Independently Deployable

Synchronized changes are painful

Page 19: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Services work best where requirements are isolated in a single

bounded context

Page 20: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Single Sign On Business Service authorise(),

It’s unlikely that a Business Service would need the internal SSO state / function to change

Single Sign On

Page 21: The Data Dichotomy- Rethinking the Way We Treat Data and Services

SSO has a tightly bounded context

Page 22: The Data Dichotomy- Rethinking the Way We Treat Data and Services

But business services are

different

Page 23: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Catalog Authorisation

Most business services share the same core stream of facts.

Most services live in here

Page 24: The Data Dichotomy- Rethinking the Way We Treat Data and Services

The futures of business services

are far more tightly intertwined.

Page 25: The Data Dichotomy- Rethinking the Way We Treat Data and Services

We need encapsulation to hide internal state. Be loosely coupled.

But we need the freedom to slice & dice shared data like any other

dataset

Page 26: The Data Dichotomy- Rethinking the Way We Treat Data and Services

But data systems have little to do

with encapsulation

Page 27: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Service Database

Data on inside

Data on outside

Data on inside

Data on outside

Interface hides data

Interface amplifies

data

Databases amplify the data they hold

Page 28: The Data Dichotomy- Rethinking the Way We Treat Data and Services

The data dichotomy Data systems are about exposing data.

Services are about hiding it.

Page 29: The Data Dichotomy- Rethinking the Way We Treat Data and Services

This affects services in one of

two ways

Page 30: The Data Dichotomy- Rethinking the Way We Treat Data and Services

getOpenOrders( fulfilled=false,

deliveryLocation=CA, orderValue=100,

operator=GreatherThan)

Either (1) we constantly add to the interface, as datasets grow

Page 31: The Data Dichotomy- Rethinking the Way We Treat Data and Services

getOrder(Id)

getOrder(UserId)

getAllOpenOrders()

getAllOrdersUnfulfilled(Id)

getAllOrders()

(1) Services can end up looking like kookie home-grown databases

Page 32: The Data Dichotomy- Rethinking the Way We Treat Data and Services

...and DATA amplifies this service boundary

problem

Page 33: The Data Dichotomy- Rethinking the Way We Treat Data and Services

(2) Give up and move whole datasets en masse

getAllOrders()

Data Copied Internally

Page 34: The Data Dichotomy- Rethinking the Way We Treat Data and Services

This leads to a different problem

Page 35: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Data diverges over time

Orders Service

Stock Service

Email Service

Fulfill- ment Service

Page 36: The Data Dichotomy- Rethinking the Way We Treat Data and Services

The more mutable copies,

the more data will diverge over time

Page 37: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Nice neat services

Service contract too

limiting

Can we change both services

together, easily? NO

Broaden Contract

Eek $$$

NO

YES

Is it a shared

Database?

Frack it! Just give me ALL the data

Data diverges. (many different

versions of the same facts) Lets encapsulate

Lets centralise to one copy

YES

Start here

Cycle of inadequacy:

Page 38: The Data Dichotomy- Rethinking the Way We Treat Data and Services

These forces compete in the systems we build

Divergence Accessibility Encapsulation

Page 39: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Shared database

Service Interfaces

Better Accessibility

Better Independence

No perfect solution

Page 40: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Is there a better way?

Page 41: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Data on the Inside

Data on the Outside

Data on the Outside

Data on the Outside

Data on the Outside

Service Boundary

Page 42: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Make data-on-the-outside a 1st class citizen

Page 43: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Separate Reads & Writes with Immutable Streams

Page 44: The Data Dichotomy- Rethinking the Way We Treat Data and Services

CQRS

•  Writes => “Normalized” into each service

•  Reads => “Denormalized” into Services

•  One canonical set of streams

Page 45: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Orders Service

Stock Service

Orders

Fulfilment Service

UI Service

Fraud Service

Writes enter via the Orders Service (which manages the workflow of orders)

Page 46: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Orders Service

Stock Service

Orders

Fulfilment Service

UI Service

Fraud Service

Readers use streams, materialized in each service

Page 47: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Orders Service

Stock Service

Orders

Fulfilment Service

UI Service

Fraud Service

Important: data is not retained, it’s just a view over those centralized streams

Page 48: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Use a Stream Processing tool-kit

Page 49: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Kafka is a Streaming Platform

Page 50: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Kafka: a Streaming Platform

The Log Connectors Connectors

Producer Consumer

Streaming Engine

Page 51: The Data Dichotomy- Rethinking the Way We Treat Data and Services

The Log

Page 52: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Shard on the way in

Producing Services

Kafka

Consuming Services

Page 53: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Each shard is a queue

Producing Services

Kafka

Consuming Services

Page 54: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Consumers share load

Producing Services

Kafka

Consuming Services

Page 55: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Scaling becomes a concern of the broker, not the

service

Page 56: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Load Balanced Services

Page 57: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Fault Tolerant Services

Page 58: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Build ‘Always On’ Services

Rely on Fault Tolerant Broker

Page 59: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Reset to any point in the shared narrative

Rewind & Replay

Page 60: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Compacted Log (retains only latest version)

Version 3

Version 2

Version 1

Version 2

Version 1

Version 5

Version 4

Version 3

Version 2

Version 1

Page 61: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Service Backbone Scalable, Fault Tolerant, Concurrent, Strongly Ordered, Retentive

The Log Connectors Connectors

Producer Consumer

Streaming Engine

Page 62: The Data Dichotomy- Rethinking the Way We Treat Data and Services

A place to keep the data-on-the-outside

as a immutable narrative

Page 63: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Now add Stream Processing

Page 64: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Kafka’s Streaming API A general, embeddable streaming engine

The Log Connectors Connectors

Producer Consumer

Streaming Engine

Page 65: The Data Dichotomy- Rethinking the Way We Treat Data and Services

What is Stream Processing?

Page 66: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Max(price) From orders where ccy=‘GBP’ over 1 day window emitting every second

DB Engine designed to process streams

Page 67: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Features: similar to database query engine

Join Filter Aggr- egate

View

Window

Page 68: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Avg(o.time – p.time) From orders, payment, user Group by user.region over 1 day window emitting every second

Stateful Stream Processing

Streams

Stream Processing Engine

Derived “Table”

Table Domain Logic

Page 69: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Stateful Stream Processing

stream

Compacted stream

Join

Stream Data

Stream-Tabular Data

Domain Logic

Tables /

Views

Kafka

Event Driven Service

Page 70: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Join shared streams from many other services

Email Service

Legacy App Orders Payments Stock

KSTREAMS

Kafka

Page 71: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Shared State is only cached in the service, so there is no way to diverge

Email Service

Orders Payments Stock

Lives here !

Cached here!

KSTREAMS

Page 72: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Tool for accessing shared, retentive streams

customer orders

catalogue Pay-

ments

Page 73: The Data Dichotomy- Rethinking the Way We Treat Data and Services

But sometimes you have to move data

Page 74: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Replicate it, so both copies are identical

Keep Immutable

Page 75: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Take only the data you need today

Page 76: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Oracle

Confluent’s Connectors make this easier

The Log

Connector Connector

Producer Consumer

Streaming Engine

Cassandra

Page 77: The Data Dichotomy- Rethinking the Way We Treat Data and Services

So…

Page 78: The Data Dichotomy- Rethinking the Way We Treat Data and Services

When building services consider more than just

REST

Page 79: The Data Dichotomy- Rethinking the Way We Treat Data and Services

The data dichotomy Data systems are about exposing data.

Services are about hiding it.

Remember:

Page 80: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Shared data is a reality for most organisations

Catalog

Page 81: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Embrace the data that both lives and flows between services

Page 82: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Avoid data services that have complex / amplifying

interfaces

Page 83: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Share Data via Immutable Streams

Embed Function into each service

Distributed Log Event Driven

Service

Page 84: The Data Dichotomy- Rethinking the Way We Treat Data and Services

(A) Shared Database (B) Messaging

(C) Service Interfaces

Data & Function Centralized

Data & Function in Owning Services

Data & Function Everywhere

Middle-ground between shared database, messaging and service interfaces

Middle Ground

Page 85: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Shared database

Service Interfaces

Better Accessibility

Better Independence

Balance the Data Dichotomy

Page 86: The Data Dichotomy- Rethinking the Way We Treat Data and Services

RIDER Principals

•  Reactive - Leverage Asynchronicity. Focus on the now.

•  Immutable - Build an immutable, shared narrative.

•  Decentralized - No GOD services. Receiver driven.

•  Evolutionary - Use only the data you need today.

•  Retentive - Rely on the central event log, indefinitely.

Page 87: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Event Driven Service Backbone

The Log (streams & tables)

Ingestion Services

Services with Polyglotic

persistence

Simple Services

Streaming Services

Page 88: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Tune in next time

• How do we actually build these things?

• Putting the micro into microservices

Page 89: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Discount code: kafcom17!Use the Apache Kafka community discount code to get $50 off !www.kafka-summit.org!Kafka Summit New York: May 8!Kafka Summit San Francisco: August 28!

Presented by!

Page 90: The Data Dichotomy- Rethinking the Way We Treat Data and Services

Twitter: @benstopford

Blog of this talk at http://confluent.io/blog