53
Eberhard Wolff - @ewolff NoSQL & Architectures Eberhard Wolff @ewolff

NoSQL and Architectures

Embed Size (px)

DESCRIPTION

This presentation shows the influence of NoSQL databases on software architectures. It discusses different NoSQL flavors and products and shows how software architects can get the maximum benefit from those databases.

Citation preview

Page 1: NoSQL and Architectures

Eberhard Wolff - @ewolff

NoSQL & Architectures

Eberhard Wolff @ewolff

Page 2: NoSQL and Architectures

Eberhard Wolff - @ewolff

About me

► Eberhard Wolff ► Freelance consultant ► Head technology advisory board at

adesso ► Speaker ► Author

► Blog: http://ewolff.com ► Twitter: @ewolff

Page 3: NoSQL and Architectures

Eberhard Wolff - @ewolff

Back in the Days….

Page 4: NoSQL and Architectures

Eberhard Wolff - @ewolff

NoSQL Is All About the Persistence Question

Page 5: NoSQL and Architectures

Eberhard Wolff - @ewolff

Key-Value Stores

► Maps keys to values ► Just a large globally available Map ► i.e. not very powerful data model

► No complex queries or indices ► Just access by key ► Might add e.g. full text engine

► Redis: Cache + Persistence ► Riak: Massive scale +Solr queries

Key Value 42 Some

data

Page 6: NoSQL and Architectures

Eberhard Wolff - @ewolff

Wide Column ► Add any "column" you like to a row ► key-(column-value) ► Column families like tables ► E.g. in the "Users" column family

>  "someuser" è ("username"è"someuser"), ("email" è"[email protected]")

► Columns named: indexing possible ► So fast queries possible

► Apache Cassandra ► Amazon SimpleDB ► Apache HBase ► All tuned for large data sets

XX

XX XX XX XX

XX XX XX

XX XX XX XX

XX XX XX XX

XX XX XX XX

XX XX XX XX

XX XX

XX XX XX

XX XX XX

XX XX XX

XX XX XX XX

XX XX XX XX

XX xX XX XX XX

Page 7: NoSQL and Architectures

Eberhard Wolff - @ewolff

Document Stores

► Aggregates are typically stored as "documents“ (key-value collection) ► JSON quite common ► No fixed schema ► Indexes possible ► Queries possible

> E.g. "find all baskets that contain the product 123"

► Still great horizontal scalability ► Relations might be modeled as links

► MongoDB, CouchDB

Page 8: NoSQL and Architectures

Eberhard Wolff - @ewolff

Graph ► Nodes with Properties ► Typed relationships with

properties

► Ideal e.g. to model relations in a social network

► Easy to find number of followers, degree of relation etc.

► Hard to scale out

► Neo4j

Page 9: NoSQL and Architectures

Eberhard Wolff - @ewolff

NoSQL Benefits Costs •  Scale out instead of Scale Up •  Cheap Hardware •  Usually Open Source

Flexibility •  Schema in code not in

database •  Easier to upgrade schema •  Easier to handle

heterogeneous data No Object/relational impedance mismatch •  NoSQL database are more OO like

Ops

Dev

Page 10: NoSQL and Architectures

Eberhard Wolff - @ewolff

Drivers Exponential Data

Growth

More Connected Data

Semi Structured Data

Scale Out

Key Value

Wide Column

Document

Graph

Cost

Flexibility

Page 11: NoSQL and Architectures

Eberhard Wolff - @ewolff

Document-oriented Databases are the best NoSQL database

For at least one definition of “best”

Page 12: NoSQL and Architectures

Eberhard Wolff - @ewolff

Document-oriented databases ► Offer scale out

> Unless you need huge amounts of data

► Offer a rich and flexible data model > …and queries

► Other databases have other sweet spots

> Huge data sets > Graph structures > Analyzing data

► Niches or mainstream?

Cost

Flexibility

Page 13: NoSQL and Architectures

Eberhard Wolff - @ewolff

Financial System

► Different financial products

► Mapping objects / database

► Inheritance

Page 14: NoSQL and Architectures

Eberhard Wolff - @ewolff

E/R Model

Investment

Stock Zero Bond Option

Country > 20 database tables Up to 25 attributes

Currency

Page 15: NoSQL and Architectures

Eberhard Wolff - @ewolff

#SRLSY??

Page 16: NoSQL and Architectures

Eberhard Wolff - @ewolff

Investment Type ID

Zero Bond

Interest Rate

Fixed Rate Bond

Interest Rate

Stock Option

Preferred Underlying asset

Country Price Country

Currency

Page 17: NoSQL and Architectures

Eberhard Wolff - @ewolff

Polyglot Persistence in Ecommerce Application

Financial Data

RDBMS

Product Catalog

Document Store

Shopping Cart

Key / Value

Recommendation

Graph

Needs transactions & reports. Data fit well in tables.

High Performance & Scalability No complex queries

Complex document-like data structures and complex queries

Based on friends, their purchases and reviews

Page 18: NoSQL and Architectures

Eberhard Wolff - @ewolff

The NoSQL Game

Financial Data

RDBMS

Product Catalog

Document Store

Shopping Cart

Key / Value

Recommendation

Graph

Needs transactions & reports. Data fit well in tables.

High Performance & Scalability No complex queries

Complex document-like data structures and complex queries

Based on friends, their purchases and reviews

0 1000

900 800

2700 High Score!

Page 19: NoSQL and Architectures

Eberhard Wolff - @ewolff

Just Like the Patterns Game! Points for each Pattern used Extra points if one class implements multiple Pattern

Page 20: NoSQL and Architectures

Eberhard Wolff - @ewolff

This is not howSoftware Architecture works.

Page 21: NoSQL and Architectures

Eberhard Wolff - @ewolff

Why not? More is worse!

More hardware More Ops Trouble •  Installation • Backup • Disaster Recovery • Monitoring • Optimizations

More Developer Skills Not necessarily bad

Page 22: NoSQL and Architectures

Eberhard Wolff - @ewolff

But: Polyglot Persistence Has a Point

► Object-oriented Databases did it wrong ► Strategy: Replace RDBMS ► Enterprises will stick to RDBMS ► Pure technology migration basically

never happens ► …only vendors think differently

Page 23: NoSQL and Architectures

Eberhard Wolff - @ewolff

Archive

Current Data

RDBMS

Archive

Document Store

Classic approach for current data NoSQL for the archive

Page 24: NoSQL and Architectures

Eberhard Wolff - @ewolff

Archives for Insurances

► Legacy migration ► Querying and visualizing not migrated

data ► i.e. old contracts ► Legacy hard- and software can be

switched off ► Flexibility: Host data formats ► Cost: Inexpensively handling large data

volumes

Page 25: NoSQL and Architectures

Eberhard Wolff - @ewolff

Complex Document Processing System

MongoDB Redis elastic search

Document-oriented

Meta Data for quick access

Search index

Documents

Key/value in memory Search

engine

Page 26: NoSQL and Architectures

Eberhard Wolff - @ewolff

Alternative: Only elasticsearch

elastic search

•  Stores original documents as well

•  (like a key/value store) •  Support for complex queries •  Very powerful features also for

data mining / analytics •  Not well suited for update heavy

operations •  Backup / disaster recovery? •  Written in Java

Page 27: NoSQL and Architectures

Eberhard Wolff - @ewolff

Scaling elasticsearch

Server Server Server

Shard 1 Replica 1

Replica 2 Shard 2

Replica 3 Shard 3

Page 28: NoSQL and Architectures

Eberhard Wolff - @ewolff

Alternative: Only MongoDB

MongoDB

•  Now with (limited beta) fulltext search

•  Excellent support for updates •  Quite fast – memory mapped

files •  Also fast for updates •  Disaster recovery possible •  Map/Reduce support •  Written in C++

Page 29: NoSQL and Architectures

Eberhard Wolff - @ewolff

Scaling MongoDB

Replica 1

Shard 1

Replica 2

Replica 3

Shard 2

Replica 1

Replica 2

Replica 3

Page 30: NoSQL and Architectures

Eberhard Wolff - @ewolff

Scaling MongoDB

Replica 1

Shard 1

Replica 2

Replica 3

Replica 1

Shard 2

Replica 2

Replica 3

Replica 1

Shard 3

Replica 2

Replica 3

Page 31: NoSQL and Architectures

Eberhard Wolff - @ewolff

What about Redis?

Redis

•  MongoDB uses memory mapped files – Why Redis?

•  Like a Swiss Knife •  Cache •  Messaging •  Central coordination in a

distributed environment •  Written in C

Page 32: NoSQL and Architectures

Eberhard Wolff - @ewolff

Scaling Redis

Asynchronous replication built in

Server

Replica

Replica

Page 33: NoSQL and Architectures

Eberhard Wolff - @ewolff

Alternative: Riak

•  Key / value store •  But includes Solr for fulltext

search •  What is the difference to a

document store then? •  Map/reduce possible •  Written in Erlang •  Smart scaling

Page 34: NoSQL and Architectures

Eberhard Wolff - @ewolff

Scaling Riak

Server A Shard1 Shard3

Shard4

Server B Shard2 Shard1

Shard4

Server D Shard4 Shard2

Shard3

Server C Shard3 Shard2

Shard1

Page 35: NoSQL and Architectures

Eberhard Wolff - @ewolff

Scaling Riak

Server A Shard1 Shard3

Shard4

Server B Shard2 Shard1

Shard4

Server D Shard4 Shard2

Shard3

Server C Shard3 Shard2

Shard1

Page 36: NoSQL and Architectures

Eberhard Wolff - @ewolff

Scaling Riak

Server A Shard1 Shard3

Shard4

Server B Shard2 Shard1

Shard4

Server D Shard4 Shard2

Shard3

Server C Shard3 Shard2

Shard1

New Server

Page 37: NoSQL and Architectures

Eberhard Wolff - @ewolff

Document-oriented Databases are the best NoSQL database

For at least one definition of “best”

Key/Value!

Page 38: NoSQL and Architectures

Eberhard Wolff - @ewolff

Your Choice – a trade off!

Typical architecture decision

MongoDB Redis elastic search riak

Page 39: NoSQL and Architectures

Eberhard Wolff - @ewolff

Data Access: RDBMS

RDBMS

•  Schema •  Stored Procedures

Data Model

Data Access •  Queries •  Other code

Architect/ Developer

Optimizations

•  Indices •  Tables

spaces No need to change code •  …

DBA

Page 40: NoSQL and Architectures

Eberhard Wolff - @ewolff

RDBMS separate data from data access

Joins and normalization allow flexible data access

patterns

Indices

Page 41: NoSQL and Architectures

Eberhard Wolff - @ewolff

Sacrifice Joins for Scalability ► Join: Combine tables to retrieve results ► Need transactions spanning multiple

tables ► Example: Customer table + addresses ► Inserts need locks and consistency

across both tables

► Limits scalability ► Global and distributed locks are nasty ► Consistency limits either availability or

partition tolerance

Page 42: NoSQL and Architectures

Eberhard Wolff - @ewolff

CAP Theorem

► Consistency > All nodes see the same data > Not the ACID Consistency

► Availability > Node failure do not prevent survivors from operating

► Partition Tolerance > System continues to operate despite arbitrary message loss

► Can at max have two ► Or rather: If network fail – choose A or C.

C

P A

Page 43: NoSQL and Architectures

Eberhard Wolff - @ewolff

Consistency

Partition Tolerance

Availability

RDBMS

2 Phase Commit

DNS Replication

Quorum

CAP Theorem

Page 44: NoSQL and Architectures

Eberhard Wolff - @ewolff

BASE ► Basically Available Soft state Eventually consistent ► I.e. trade consistency for availability

► Pun concerning ACID… ► Not the same C, however!

Page 45: NoSQL and Architectures

Eberhard Wolff - @ewolff

BASE ► Eventually consistent ► If no updates are sent for a while all previous updates will eventually propagate through the system ► Then all replicas are consistent ► Can deal with network partitioning: Message will be transferred later ► All replicas are always available

► Pun concerning ACID… ► Not the same C, however!

Page 46: NoSQL and Architectures

Eberhard Wolff - @ewolff

Banking is BASE

► ATMs relax rules on providing cash if network partitioned

► Your account is only guaranteed to be consistent by the end of the year

Page 47: NoSQL and Architectures

Eberhard Wolff - @ewolff

No Joins - What now? ► Customer and addresses must be

consistent! ► Solution: Store both as one entity ► Atomic changes easily possible ► Queries might be distributed across

multiple notes

► “NoSQL does not support transactions / ACID” is wrong

> NoSQL does not support Joins is better > Atomic changes still possible > Schema design different

Page 48: NoSQL and Architectures

Eberhard Wolff - @ewolff

Data Access MongoDB

MongoDB

•  Influences access patterns

Data Model

Data Access •  WriteConcerns

how much do love your data?

•  Shard key •  Consistency

Architect/ Developer

Optimizations

•  Only basic indices

Other optimizations must be done incode

DBA

Page 49: NoSQL and Architectures

Eberhard Wolff - @ewolff

Cluster: RDBMS

► Transparent to developers

► How many nodes?

► A special setup of hardware and RDBMS software

DBA

Page 50: NoSQL and Architectures

Eberhard Wolff - @ewolff

Cluster: MongoDB ►  CAP theorem

> If the network is down choose > Consistency xor > Availabilty

►  Deals with replication ►  MongoDB has

master / slave replication

Architect/ Developer

MongoDB

►  Write Concerns: > Unacknowledged > Acknowledged > Journaled > Some nodes in the replica set

►  Queries might go to master only or also slaves

►  Influences consistency

Page 51: NoSQL and Architectures

Eberhard Wolff - @ewolff

More Power and more Responsibility

DB Admin

Architect

Page 52: NoSQL and Architectures

Eberhard Wolff - @ewolff

Architects ►  Architecture has always been a multi-dimensional problem

►  Need to choose persistence technology

►  Need to think about operations

►  Needs to do DBA work

Page 53: NoSQL and Architectures

Eberhard Wolff - @ewolff

NoSQL Is All About the Persistence Question