78
sergio bossa & alex snaps - @sbtourist & @alexsnaps To be relational, or not ? That's not the question! @sbtourist aka sergio bossa @alexsnaps aka alex snaps

To be relational, or not to be relational? That's NOT the question!

Embed Size (px)

DESCRIPTION

With distributed computing being a reality, whether between clouds or within data center walls, programmers and architects are facing new challenges. In traditional enterprise development, we?ve been relying on the database to be the keeper of our data and constraints, but since the pattern didn't scale, NoSQL recently came to our rescue. Now what does all that mean? And more importantly, what does it mean to you and your application? This session aims at guiding you on the journey from using a single relational database server, to making it as scalable as possible and choosing proper alternative solutions: explaining pros and cons of tools available depending on your needs, their impact on your architecture, the concepts and algorithms under the hood, and when NoSQL should really be Not Only SQL. This session should be attended by developers and architects that plan on delivering applications in the next couple of years, as the future is now and you can't get struggled with hamletic questions!

Citation preview

Page 1: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

To be relational, or not ?That's not the question!

@sbtourist aka sergio bossa

@alexsnaps aka alex snaps

Page 2: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Agenda

Act 1 – Relational databases: a difficult love affair.

Interlude 1 – Problems in the modern era: big data and the CAP theorem.

Act 2 : Non-relational databases: love is in the air.

Interlude 2 : Relational or non-relational? Not the correct question.

Act 3 : Building scalable apps.

The End

Page 3: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

About us

Sergio BossaSoftware Engineer at Bwin Italy.Long time open source contributor.(Micro)Blogger - @sbtourist.

Alex SnapsSoftware engineer at Terracotta …… after 10 years of industry experience. www.codespot.net — @alexsnaps

Page 4: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Act IRelational databases

A difficult love affair …

Page 5: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

The relational model

Defines constraintsFinite model

aka relation variable, relation or tableCandidate keysForeign keys

Queries are relations themselvesHeading & body

SQL differs slightly from the "pure" relational model

Page 6: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

The ACID guarantees

Let people easily reason about the problem.Atomic.

We see all changes, or no changes at all.Consistent.

Changes respect our rules and constraints.

Isolated.We see all changes as independently happening.

Durable.We keep the effect of our changes forever.

Fits a simplified model of our reality.

Page 7: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

The SQL ubiquity

SQL is everywhereStill trying to figure out why my blog

uses a relational database to be honest

SQL is known by everyoneRaise your hand if you've never written a SQL query

... and if you don't want toORM are there for youActiveRecord if you're into Rails

Simple persistence for all our objects!

Page 8: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Interlude 1

Problems of  the Modern Era

Page 9: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Big Data

What's Big Data for you?

Not a question of quantity.Gigabytes?Terabytes?Petabytes?

It's all about supporting your business growth.Growth in terms of schema evolution.Growth in terms of data storage.Growth in terms of data processing.

Is your data stack capable of handling such a growth?

Page 10: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

CAP Theorem

Year 2000: Formulated by Eric Brewer.

Year 2002: Demonstrated by Lynch and Gilbert.

Nowadays: A religion for many distributed system guys.

CAP Theorem in a few words:Consistency.Availability.Partition-Tolerance.Pick (at most) two.

More later.

Page 11: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Act II

Non-relational databases

Love is in the air

Page 12: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Non-relational databases

From the origins to the current explosion...

How do they differ?

Page 13: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Data Model (1)

Column-family.Key-identified rows

with a sparse number of columns.Columns grouped in families.Multiple families for the same key.Dynamically add/remove columns.Efficiently access same-family columns

Page 14: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Data Model (2)

Graph.Vertices represent your data.Edges represent meaningful relations between nodes.Key/Value properties attached to both.Indexed properties.Efficient traversal.

Page 15: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Data Model (3)

Document.Schemaless documents.

With denormalized data.

Stored as a whole unit.Clients can update/query contents.

Page 16: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Data Model (4)

Key/Value.Opaque values.Maximize flexibility.Efficiently store and retrieve by key.

Page 17: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Consistency Model (1)

Strict (Sequential) Consistency.Every read/write operation act on either:

The last value read.The last value written.

Page 18: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Consistency Model (2)

Eventual Consistency.All read/write operations will eventually reach

consistent state.Stale data may be served.Versions may diverge.Repair may be needed.

Page 19: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Partitioning (1)

Client-side partitioning.Every server is self-contained.Clients partition data per-request.

Page 20: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Partitioning (2)

Server-side partitioning.Servers automatically partition data.Consistent-hashing, ring-based

is the most used.

Page 21: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Replication (1)

Master/Slave.Master propagates changes to slaves.

Log replication.Statement replication.

Slaves may take over master in case of failures.

Page 22: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Replication (2)

N-Replication.Nodes are all peers.

No master, no slaves.Each node replicates its data

to a subset (N) of nodes.

Page 23: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Processing

A distributed system is built by:Moving data toward its behavior.

... or ...Moving behavior toward its data.

An efficient distributed system is built by:Moving behavior toward its data.

Map/Reduce is the most common and efficient.

Page 24: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

CAP - Problem

CAP Theorem.Consistency.Availability.Partition tolerance.Pick two.Do you remember?

Makes sense only when dealing with partitions/failures ...

Page 25: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

CAP - Trade-offs

Consistency + Availability.Requests will wait until partitions heal.

Full consistency.Full availability.Max latency.

Consistency + Partition tolerance.Some requests will act on partial data.Some requests will be refused.

Full consistency.Reduced availability.Min latency.

Availability + Partition tolerance.All requests will be fulfilled.

Sacrifice consistency.Max availability.Min latency.

Page 26: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Interlude II

Relational or non-relational? Not the correct question!

Page 27: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Relational or non-relational? Not the correct question!

Freedom to build the right solution for your problems.

Freedom to scale your solution as your problems scale.Know your use case.

Understand the problem domain, then choose technology.

Know your data.Understand your data and data access patterns, then choose technology.

Know your tools.Understand available tools, don't go blind.

Pick the simple solution.Choose the simpler technology that works for your problem.

Build on that.

Page 28: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Act III

Building scalable apps

Page 29: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Building scalable apps

Relational databases

Page 30: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Scaling out...

Adding app servers will work...... for a little while!

But at some point we need to scale the database as well

RDBMS

PowerBook G4

READ

WRITE

PowerBook G4PowerBook G4

Page 31: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Master-Slave

One mastergets all the writesreplicates to slaves

Slaves (& master)gets the read operations

We didn't really scale writes Static topologySPOF remains

Page 32: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Master-Master

Multiple mastersWrites & reads all participants

Writes are replicated to mastersSynchronously

Expensive 2PCAsynchronously

Conflicts have to be resolved

ComplexStatic topologyLimited performance againSolved SPOF, sort of…

Page 33: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Vertical Partitioning

Data is split across multiple database servers based onThe functional area

Joins are moved to the applicationNot relational anymore

SPOF backWhat about one functional area growing "out of hand" ?

Page 34: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Horizontal partitioning

Data is split across multiple database servers based onKey sharding

Joins are moved to the applicationNot relational anymore

SPOF backWhat about one functional area growing "out of hand" ?Routing required

Where's Order#123 ?

Page 35: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Building scalable apps

Caching

Page 36: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Caching

If going to the database is so expensive...... we should just avoid it!

Put a cache in front of the database

Data remains closer to processing unitIf needed, distribute

Page 37: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Distributed Caching

What if data isn't perfectly partitioned ?How do we keep this all in sync ?Peer-to-peer ?

RDBMS

PowerBook G4

READ

WRITE

PowerBook G4PowerBook G4

CacheCache Cache

Page 38: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Distributed Caching

Cached data remains close to the processing unit

Central unit orchestrates it all

RDBMS

PowerBook G4PowerBook G4PowerBook G4

L1L1 L1

L2

Page 39: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Distributed Caching

RDBMS

PowerBook G4PowerBook G4PowerBook G4

L1L1 L1

L2

Page 40: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Distributed Caching

Core Core Core

L1 L1 L1

L2

RAM

SLOWER

FASTER

LARGER

SMALLER

Page 41: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Distributed Caching

RDBMS

PowerBook G4PowerBook G4PowerBook G4

L1L1 L1

L2

SLOWER

FASTER

LARGER

SMALLER

Page 42: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Distributed Caching

We've scaled our readsBut what about writes ?

RDBMS

PowerBook G4PowerBook G4PowerBook G4

L1L1 L1

L2

SLOWER

FASTER

LARGER

SMALLER

Page 43: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Write-behind Cache

Rather than write changes directly to the slowest participantWrite to fastest durable store (persistent queue)

required for recovery in the face of failureOnly write to database later

in batches and/or coalescedwhile still controlling

the lagthe load on the database

In a distributed environment handling failures we enforce happens at least once loosens the contract vs. "once and only once"!

Page 44: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Write-through

Cache

Application code

RDBMS

Page 45: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Write-through

Cache

Application code

RDBMS

Page 46: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Write-through

Cache

Application code

RDBMS

Page 47: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Write-through

Cache

Application code

RDBMS

Page 48: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Write-behind

Cache

Application code

RDBMS

Page 49: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Write-behind

Cache

Application code

RDBMS

Page 50: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Writer

Write-behind

Cache

Application code

RDBMS

Page 51: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Writer

Write-behind

Cache

Application code

RDBMS

Page 52: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Writer

Write-behind

Cache

Application code

RDBMS

Page 53: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Writer

Write-behind

Cache

Application code

RDBMS

Page 54: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Writer

Write-behind

Cache

Application code

RDBMS

Page 55: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Writer

Write-behind

Cache

Application code

RDBMS

Page 56: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Writer

Write-behind

Cache

Application code

RDBMS

Page 57: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Writer

Write-behind

Cache

Application code

RDBMS

Writer

Page 58: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Writer

Write-behind

Cache

Application code

RDBMS

Writer

Page 59: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Building scalable apps

Non-relational databases

Page 60: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

The case for non-relationals

We've seen how to scale our relational database.

We've seen how to add caching.

We've seen how to make caching scale.

So why going non-relational?A matter of use case

Page 61: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Rich Data

Frequent schema changes.Relational databases

cannot easily handle table modifications.Column, Document and Graph databases can.

Tracking/Processing of large relations.Relational databases keep and

traverse table relations by foreign-key joins.Joins are expensive.

Graph databases provide cheap and fast traversal operations.

Page 62: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Runtime Data

Poorly structured data.Relational model

doesn't fit unstructured data.Unless you want BLOBs.

Non-relational databases provide more flexible models.

Throw-away data.Relational databases provide maximum data durability.But not all kind of data are the same.When you can afford to possibly lose some data, non-relational

databases let you trade durability for higher flexibility and performance.

Page 63: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Massive Data

Large quantity of data.Relational databases are typically single instance.

Otherwise cost too much.Choose a partitioned non-relational database.

High volume of writes.Scaling writes with relational databases is difficult.

Even if using write-behind caching.Choose a partitioned non-relational database.Eventual consistency may also help.

Complex, aggregated, queries.Complex aggregations and queries

are expensive in standard relational databases.Remember joins?

Choose a non-relational database supporting distributed data processing.Hint: Map/Reduce.

Page 64: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Building scalable apps

Multi-paradigm

Page 65: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

A case for multi-paradigm: CQRS

Page 66: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

CQRS - Explained

Command Store.Strictly modeled after the problem domain.Commands model domain changes.Domain changes cause events publishing.

Query Store.Fed by published events.Strictly modeled after the user interface.Possibly denormalized to accommodate queries.

Offline Store.Fed by published events.Strictly modeled after processing needs.

Business Intelligence.Reporting.Statistical aggregations, …

Page 67: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

What about stores implementation?

Page 68: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Introducing Ehcache

Started in 2003 by Greg LuckApache 2.0 License Integrated by lots of projects, productsHibernate Provider implemented 2003Web Caching 2004Distributed Caching 2006REST and SOAP APIs 2008Acquired

by Terracotta Sept. 2009

Page 69: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Introducing Ehcache 2.4

New Search APINew consistency modesNew transactional modes

Still:Small memory footprintCache WritersJTABulk load

Grows with your application with only two lines of configuration

Scale Up - BigMemory (100’s of Gig, in process, NO GC)Scale Out - Clustering Platform (Up to 2 Terabytes, HA)

Page 70: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Ehcache & Terracotta

Terabyte scale with minimal footprint

Server array with striping for linear scale

High availability

High performance data persistence

In-memory performance as you scale

Page 71: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Introducing Terrastore

Document Store.Ubiquitous.Consistent.Distributed.Scalable.

Written in Java.Based on Terracotta.Open Source.Apache-licensed.

Page 72: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Terrastore Architecture

Page 73: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

CQRS - Implemented

Page 74: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Command Store.Express your domain as an object model.Process and store it with Ehcache and Terracotta.Optionally write it back to a relational database.

Query Store.Map queries to user (screen) views.Map views to JSON documents.Store and get them back with Terrastore.

Offline Store.Collect data from input commands.Keep data denormalized.Aggregate and process with Terrastore Map/Reduce.

CQRS - Implemented

Page 75: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

The end

Be a polyglot!

Page 76: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Conclusion

Be polyglotGo out and play with all thisKnow your options (all of them)

Understand your domainIt’s current and future requirements

Be wise, and choose well!

Page 77: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

Creative Commons by wwworks

Page 78: To be relational, or not to be relational? That's NOT the question!

sergio bossa & alex snaps - @sbtourist & @alexsnaps

More info

www.ehcache.org

www.terracotta.org

code.google.com/p/terrastore