44
NoSQL Databases Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi ve Yaratıcı İstanbul Mali Destek Programı kapsamında yürütülmekte olan TR10/16/YNY/0036 no’lu İstanbul Big Data Eğitim ve Araştırma Merkezi Projesi dahilinde gerçekleştirilmiştir. İçerik ile ilgili tek sorumluluk Bahçeşehir Üniversitesi’ne ait olup İSTKA veya Kalkınma Bakanlığı’nın görüşlerini yansıtmamaktadır. Adopted partially from Jimmy Lin’s slides (at UMD)

NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

  • Upload
    ngomien

  • View
    226

  • Download
    0

Embed Size (px)

Citation preview

Page 1: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

NoSQL Databases

Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi ve Yaratıcı İstanbul Mali Destek Programı kapsamında

yürütülmekte olan TR10/16/YNY/0036 no’lu İstanbul Big Data Eğitim ve Araştırma Merkezi Projesi dahilinde

gerçekleştirilmiştir. İçerik ile ilgili tek sorumluluk Bahçeşehir Üniversitesi’ne ait olup İSTKA veya Kalkınma Bakanlığı’nın

görüşlerini yansıtmamaktadır.

Adopted partially from Jimmy Lin’s slides (at UMD)

Page 2: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

The Fundamental Problem

We want to keep track of mutable state in a scalable

manner

Assumptions:

State organized in terms of many “records”

State unlikely to fit on single machine, must be distributed

MapReduce won’t do!

Page 3: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

Three Core Ideas

Partitioning (sharding)

For scalability

For latency

Replication

For robustness (availability)

For throughput

Caching

For latency

Page 4: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

We got 99 problems…

How do we keep replicas in sync?

How do we synchronize transactions across multiple

partitions?

What happens to the cache when the underlying data

changes?

Page 5: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

What do RDBMs provide?

Relational model with schemas

Powerful, flexible query language

Transactional semantics: ACID

Rich ecosystem, lots of tool support

Page 6: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

How do RDBMs do it?

Transactions on a single machine: (relatively) easy!

Partition tables to keep transactions on a single machine

Example: partition by user

What about transactions that require multiple machine?

Example: transactions involving multiple users

Solution: Two-Phase Commit

Page 7: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

2PC: Sketch

Coordinator

subordinates

Okay everyone,

PREPARE! YES

YES

YES

Good.

COMMIT!

ACK!

ACK!

ACK!

DONE!

Page 8: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

2PC: Sketch

Coordinator

subordinates

Okay everyone,

PREPARE! YES

YES

NO

ABORT!

Page 9: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

2PC: Sketch

Coordinator

subordinates

Okay everyone,

PREPARE! YES

YES

YES

Good.

COMMIT!

ACK!

ACK! ROLLBACK!

Page 10: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

2PC: Assumptions and Limitations

Assumptions:

Persistent storage and write-ahead log at every node

Write-Ahead Logging (WAL) is never permanently lost

Limitations:

It’s blocking and slow

What if the coordinator dies?

Page 11: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

Must design up front, painful to evolve

Note: Flexible design doesn’t mean no design!

Page 12: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

What do RDBMs provide?

Relational model with schemas

Powerful, flexible query language

Transactional semantics: ACID

Rich ecosystem, lots of tool support

What if we want a la carte?

Source: www.flickr.com/photos/vidiot/18556565/

Page 13: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

Features a la carte?

What if I’m willing to give up consistency for scalability?

What if I’m willing to give up the relational model for

something more flexible?

What if I just want a cheaper solution?

Enter… NoSQL!

Page 14: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

Source: geekandpoke.typepad.com/geekandpoke/2011/01/nosql.html

Page 15: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

NoSQL

1. Horizontally scale “simple operations”

2. Replicate/distribute data over many servers

3. Simple call interface

4. Weaker concurrency model than ACID

5. Efficient use of distributed indexes and RAM

6. Flexible schemas

Source: Cattell (2010). Scalable SQL and NoSQL Data Stores. SIGMOD Record.

(Not only SQL)

Page 16: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

(Major) Types of NoSQL databases

Key-value stores

Column-oriented databases

Document stores

Graph databases

Page 17: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

Key-Value Stores: Data Model

Stores associations between keys and values

Keys are usually primitives

For example, ints, strings, raw bytes, etc.

Values can be primitive or complex: usually opaque to

store

Primitives: ints, strings, etc.

Complex: JSON, HTML fragments, etc.

Page 18: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

Key-Value Stores: Operations

Very simple API:

Get – fetch value associated with key

Put – set value associated with key

Optional operations:

Multi-get

Multi-put

Range queries

Consistency model:

Atomic puts (usually)

Cross-key operations: who knows?

Page 19: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

Key-Value Stores: Implementation

Non-persistent:

Just a big in-memory hash table

Persistent

Wrapper around a traditional RDBMS

What if data doesn’t fit on a single machine?

Page 20: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

Simple Solution: Partition!

Partition the key space across multiple machines

Let’s say, hash partitioning

For n machines, store key k at machine h(k) mod n

Okay… But:

1. How do we know which physical machine to contact?

2. How do we add a new machine to the cluster?

3. What happens if a machine fails?

See the problems here?

Page 21: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

Clever Solution

Hash the keys

Hash the machines also!

Distributed hash tables! (following combines ideas from several sources…)

Page 22: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

h = 0 h = 2n – 1

Page 23: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

h = 0 h = 2n – 1

Routing: Which machine holds the key?

Each machine holds

pointers to predecessor

and successor

Send request to any node, gets

routed to correct one in O(n)

hops

Can we do better?

Page 24: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

h = 0 h = 2n – 1

Routing: Which machine holds the key?

Each machine holds

pointers to predecessor

and successor

Send request to any node, gets

routed to correct one in O(log n)

hops

+ “finger table”

(+2, +4, +8,...)

Page 25: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

h = 0 h = 2n – 1

Routing: Which machine holds the key?

Simpler Solution

Service

Registry

Page 26: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

h = 0 h = 2n – 1

New machine joins: What happens?

How do we rebuild the

predecessor, successor, finger

tables?

Stoica et al. (2001). Chord: A Scalable Peer-to-peer

Lookup Service for Internet Applications. SIGCOMM.

Cf. Gossip Protocols

Page 27: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

h = 0 h = 2n – 1

Machine fails: What happens?

Solution: Replication N = 3, replicate +1, –1

Covered!

Covered!

How to actually replicate?

Later…

Page 28: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

Consistency

CAP “Theorem”

Availability

(Brewer, 2000)

Partition tolerance

… pick two

Page 29: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

CAP Tradeoffs

CA = consistency + availability

E.g., parallel databases that use 2PC

AP = availability + tolerance to partitions

E.g., DNS, web caching

Page 30: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

Is this helpful?

CAP not really even a “theorem” because vague

definitions

More precise formulation came a few years later

Wait a sec, that

doesn’t sound

right!

Source: Abadi (2012) Consistency Tradeoffs in Modern Distributed Database System Design. IEEE Computer, 45(2):37-42

Page 31: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

Abadi Says…

CP makes no sense!

CAP says, in the presence of P, choose A or C

But you’d want to make this tradeoff even when there is no P

Fundamental tradeoff is between consistency and latency

Not available = (very) long latency

Page 32: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

Replication possibilities

Update sent to all replicas at the same time

To guarantee consistency you need something like Paxos

Update sent to a master

Replication is synchronous

Replication is asynchronous

Combination of both

Update sent to an arbitrary replica

All these possibilities involve tradeoffs!

“eventual consistency”

Page 33: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

CAP Theorem

Page 34: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

Three Core Ideas

Partitioning (sharding)

For scalability

For latency

Replication

For robustness (availability)

For throughput

Caching

For latency Quick look at this

Page 35: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

Facebook Architecture

Source: www.facebook.com/note.php?note_id=23844338919

MySQL

memcached

Read path:

Look in memcached

Look in MySQL

Populate in memcached

Write path:

Write in MySQL

Remove in memcached

Subsequent read:

Look in MySQL

Populate in memcached

Page 36: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

Facebook Architecture: Multi-DC

1. User updates first name from “Jason” to “Monkey”.

2. Write “Monkey” in master DB in CA, delete memcached entry in CA and VA.

3. Someone goes to profile in Virginia, read VA slave DB, get “Jason”.

4. Update VA memcache with first name as “Jason”.

5. Replication catches up. “Jason” stuck in memcached until another write!

Source: www.facebook.com/note.php?note_id=23844338919

MySQL

memcached

California

MySQL

memcached

Virginia

Replication lag

Page 37: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

Facebook Architecture

Source: www.facebook.com/note.php?note_id=23844338919

= stream of SQL statements

Solution: Piggyback on replication stream, tweak SQL

REPLACE INTO profile (`first_name`) VALUES ('Monkey’)

WHERE `user_id`='jsobel' MEMCACHE_DIRTY 'jsobel:first_name'

MySQL

memcached

California

MySQL

memcached

Virginia

Replication

Page 38: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

Cassandra

• Open-source database management system (DBMS)

• Decentralized design

o Each node has the same role

• Peer to Peer Cluster

o No single point of failure

o Avoids issues of master-slave DBMS’s

• No bottlenecking

Page 39: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

General Design Features

Emphasis on performance over analysis

● Still has support for analysis tools such as Hadoop

Organization

● Rows are organized into tables

● First component of a table’s primary key is the partition key

● Rows are clustered by the remaining columns of the key

● Columns may be indexed separately from the primary key

● Tables may be created, dropped, altered at runtime without blocking

queries

Language

● CQL (Cassandra Query Language) introduced, similar to SQL (flattened

learning curve)

Page 40: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

Performance

• Core architectural designs allow Cassandra to

outperform its competitors

• Very good read and write throughputs

o Consistently ranks as fastest amongst comparable NoSql

DBMS’s with large data sets

Page 41: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

Scalability Read and write throughput increase linearly as more machines are added

Page 42: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

Fault Tolerance/Durability

Failures happen all the time with multiple nodes

● Hardware Failure

● Bugs

● Operator error

● Power Outage, etc.

Solution: Buy cheap, redundant hardware, replicate, maintain

consistency

• Replication

o Data is automatically replicated to multiple nodes

o Allows failed nodes to be immediately replaced

• Distribution of data to multiple data centers

o An entire data center can go down without data loss occurring

Page 43: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

Comparisons Apache

Cassandra

Google Big

Table

Amazon

DynamoDB

Storage Type Column Column Key-Value

Best Use Write often,

read less

Designed for

large scalability

Large database

solution

Concurrency

Control

MVCC

(Multiversion

concurrency

control)

Locks ACID

Characteristics High

Availability

Partition

Tolerance

Persistence

Consistency

High Availability

Partition

Tolerance

Persistence

Consistency

High Availability

Key Point – Cassandra offers a healthy cross between BigTable and Dynamo.

Page 44: NoSQL Databases - istbigdata.com · What do RDBMs provide? Relational model with schemas ... Document stores ... BASE – Basically Available Soft-state Eventual consistency (versus

BASE – Basically Available Soft-state Eventual consistency (versus ACID)

If all writers stop (to a key), then all its values (replicas) will converge

eventually.

If writes continue, then system always tries to keep converging.

Moving “wave” of updated values lagging behind the latest values sent by clients,

but always trying to catch up

Converges when R + W > N

R = # records to read, W = # records to write, N = replication factor

Consistency Levels:

ONE -> R or W is 1

QUORUM -> R or W is ceiling (N + 1) / 2

ALL -> R or W is N

If you want to write with Consistency Level of ONE and get the same data

when you read, you need to read with Consistency Level of ALL

Eventual Consistency