61
NOSQL Databases: Topics Introduction Rationale Key-value stores MapReduce Implementations 1

NOSQL Databases: Topics

Embed Size (px)

Citation preview

Page 1: NOSQL Databases: Topics

NOSQL Databases: Topics

• Introduction

• Rationale

• Key-value stores

• MapReduce

• Implementations

1

Page 2: NOSQL Databases: Topics

Introduction

• NOSQL := Not Only SQL

• Acronym introduced in 2009

3 as the name of a meetup about open-source distributed non-relationaldatabases

• Message misunderstood, giving birth to “NoSQL”

2

Page 3: NOSQL Databases: Topics

Rationale (1)

• Performance

• Scalability

• Flexibility

• Kind of Data

3

Page 4: NOSQL Databases: Topics

Rationale (2)

• Brewer’s CAP Theorem

• Cannot guarantee more than two of

3 Coherence

3 Availability

3 Partition tolerance

4

Page 5: NOSQL Databases: Topics

Implementations

NOSQL

KV VolatileMemcached

Redis

Document

Store

eXist

CouchDB

MongoDB

Column Store

MonetDB

Infobright

KV Durable

Dynamo

Voldemort

Riak

Graph

Neo4j

HyperGraphDB

5

Page 6: NOSQL Databases: Topics

Key-Value Stores

• Global collection of Key/Value pairs

• Multiple types

3 In memory (Redis, Memcache)

3 On disk (BerkeleyDB)

3 Eventually Consistent (Cassandra, Dynamo, Voldemort)

6

Page 7: NOSQL Databases: Topics

Document Databases

• Similar to Key/Value database, with whole document as values.

• Flexible schema

• Documents are Serialized

• Examples: CouchDB, MongoDB

7

Page 8: NOSQL Databases: Topics

Column Family Database

• Similar to Key/Value database, with multiple attributes (columns) as values.

• Not to be confused with column-oriented DBMS

8

Page 9: NOSQL Databases: Topics

Graph Databases

• Inspired by Graph Theory

• Gains popularity as RDF store

• Examples Neo4j, InfiniteGraph

9

Page 10: NOSQL Databases: Topics

Other

• Many other exist:

3 Any database outside the relational model

• Object databases

• File System

10

Page 11: NOSQL Databases: Topics

Key-Value Stores

• Basic Idea

• Mapping Tables to KV pairs

• Consistent Hashing

11

Page 12: NOSQL Databases: Topics

Basic Idea

• Very simple data model

• {key,value} pairs with unique keys

3 {student_id: student_name}

3 {part_id: part_manufacturer}

3 {child_id: parent_id}

• Values have no type constraint

12

Page 13: NOSQL Databases: Topics

API

• put(key, value)

• get(key)

3 value = get(key)

• value is usually composite

3 Opaque blob (e.g. TokyoCabinet)

3 Directly supported (e.g. MongoDB)

13

Page 14: NOSQL Databases: Topics

Implementation

• Usually B-trees or extensible hash tables

• Well-known structures in RDMS world

14

Page 15: NOSQL Databases: Topics

Mapping Tables to KV pairs

15

Page 16: NOSQL Databases: Topics

Mapping Tables to KV pairs

CREATE TABLE user (

id INTEGER PRIMARY KEY,

username VARCHAR( 64 ) ,

password VARCHAR(64)

) ;

CREATE TABLE f o l l o w s (

f o l l o w e r INTEGER REFERENCES user ( id ) ,

f o l l owed INTEGER REFERENCES user ( id )

) ;

CREATE TABLE tweets (

id INTEGER,

u ser INTEGER REFERENCES user ( id ) ,

message VARCHAR(140) ,

timestamp TIMESTAMP

) ;

16

Page 17: NOSQL Databases: Topics

Mapping Tables to KV pairs — Redis

• Creating a user

INCR g l o b a l : nextUserId => 1000

SET uid : 1 0 0 0 : username john smith

SET uid : 1 0 0 0 : password sunnyEvening

• Enabling logging-in

SET username : john smith : uid 1000

• Following:

uid : 1 0 0 0 : f o l l o w e r s => Set o f u ids

uid : 1 0 0 0 : f o l l o w i n g => Set o f u ids

17

Page 18: NOSQL Databases: Topics

Mapping Tables to KV pairs — Redis

• Messages by user:

uid : 1 0 0 0 : pos t s => a L i s t o f post i d s

• Adding a new message:

SET post :10343 ” $owner id | $time | I ’m having fun ”

18

Page 19: NOSQL Databases: Topics

Consistent Hashing

• Huge amounts of data

3 Naive approach:

s e r v e r i d = hash ( key ) % number o f s e rve r s

3 Hash function: anything → int

• Distribution?

19

Page 20: NOSQL Databases: Topics

Consistent Hashing — Circle

• Assume int to be an 8-bit unsigned integer

• We have hash(key) ∈ J0, 255K

• We can represent these values on a circle and:

3 Assign a position to each server

3 Compute the position of each key

3 Assume a key k belongs to the next server on the circle (clockwise)

20

Page 21: NOSQL Databases: Topics

Consistent Hashing — Circle

• Each node (server) is assigned a random value

• The hash of this value gives the position of the server on the circle

• A server is responsible for the arc before its position

21

Page 22: NOSQL Databases: Topics

• Adding a node

22

Page 23: NOSQL Databases: Topics

Virtual nodes

23

Page 24: NOSQL Databases: Topics

Moving nodes

24

Page 25: NOSQL Databases: Topics

Replication

• Coordinator as defined previously

• In charge of replication to other nodes (e.g. N next ones)

• Parameters :

3 Number of replicates (N)

3 Minimal number of successful writes (W )

3 Minimal number of coherent reads (R)

3 Must respect R + W > N (Why ?)

• Repair-on-read

25

Page 26: NOSQL Databases: Topics

NOSQL Databases: Topics

I Introduction

I Rationale

I Key-value stores

I MapReduce

I Implementations

Page 27: NOSQL Databases: Topics

MapReduce

I Parallel processing model

I Introduced to tackle computations over very large datasetsI Based on the well-known divide and conquer approach

I Large problem divided in many small problemsI Each tackled by one “processor” (Map)I Results are then combined (Reduce)

I References: MapReduce (textbook), Lin and Dyer, 2010

Page 28: NOSQL Databases: Topics

Parallelism?

I Not a new problemI E.g. threads, MPI, sockets, remote shell, . . .I Generally tackles computation distribution, not data

distribution.I The developper is in charge of the implementation details.

I MapReduce offers an abstraction of many mechanisms byimposing a structure to the program.

Page 29: NOSQL Databases: Topics

MapReduce Concepts

Data

MapperMapperMapper Mapper Mapper

Reducer Reducer Reducer Reducer Reducer

Output

Page 30: NOSQL Databases: Topics

Origins of Map

I Map originally comes from the functional programming world

I Basic idea:

for(int i = 0; i < arr.length (); i++) {result[i] = function(arr[i]);

}

I where function is a function in the mathematical sense

Page 31: NOSQL Databases: Topics

Origins of Map

I Idea: isolate the loop, so we can write:

result = map(function , arr);

I What if you could pass functions around as values?I map could be a function that takes as arguments

I a sequenceI a function

and that returns a new sequence where every element is theresult of applying the function on the corresponding elementin the original sequence

I map can abstract many for loops

Page 32: NOSQL Databases: Topics

Origins of Reduce

I map does not cover all for loops

I For example, when you gradually aggregate the results:

int total = 0for(int i = 0; i < arr.length (); i++) {

total = total + arr[i];}

I More generally:

for(int i = 0; i < arr.length (); i++) {total = function(total , arr[i]);

}

I reduce covers these ones:

total = reduce(function , arr);

Page 33: NOSQL Databases: Topics

map and reduce in MapReduce

I In the context of MapReduce, the mapped function mustreturn key-value couples:

map(function, [data, . . . ])→ [(key, value), . . . ]

I Before the reduction, the data has to be aggregated by key:

[(key1, value1), (key1, value2), . . . ]→ (key1, value1, value2, . . . )

I Reduce step acts on values for each key

reduce(key1, value1, value2, . . . )→ (key1, value)

Page 34: NOSQL Databases: Topics

Example

I Counting the words in a text

I map: word→ (word, 1)

Pair make_pair(String word) {return new Pair(word , 1);

}

I Aggregation: (word, 1, 1, 1, . . . )

I reduce:

Pair compute_sum(String word , List <int > values) {int sum = 0;for(int i : values) {

sum += i;}return new Pair(word , sum);

}

Page 35: NOSQL Databases: Topics

Parallelization

I map:I Trivial: absolutely no side effectI (or not: what about transfer times?)

I reduce:I Not fully parallelizable (each step needs the result of the

previous step)

Page 36: NOSQL Databases: Topics

Parallelizing reduce

I Reduce needs to be idempotentI Mathematically: f (f (x)) = f (x)

I Computation can be tree-shaped:

4

2

1

1

2

1

1

I log N instead of N

Page 37: NOSQL Databases: Topics

We lied!

I There is still one step to discuss: How do we aggregate valuesby keys ?

I Naive idea: put a barrier between map and reduceI Wait for all maps to completeI Get all results in one place, sort themI Redistribute them for reduce

Data

MapperMapperMapper Mapper Mapper

Barrier

ReducerReducerReducer Reducer Reducer

Output

Page 38: NOSQL Databases: Topics

Parallelizing aggregation

I The naive approach:I is simpleI does not require an idempotent reduceI is not as parallel as it could be

I Other idea: consistent hashing and idempotenceI Can compute results incrementally (idempotence)I No barrier: better parallelism (hashing)I Can display current results (idempotence)

I Note: usually, the implementation sorts the intermediatekey-value pairs generated by map and the final results by key.This can be exploited by choosing a meaningful key.

Page 39: NOSQL Databases: Topics

Example: Sorting people by name

I map: person→ (person.name, person)

I reduce: (person.name, person1, person2, . . . )→(person.name, person1, person2, . . . )

I The result is sorted by virtue of the MapReduce machineryitself.

Page 40: NOSQL Databases: Topics

Example: Finding all (author,book) pairs

I There can be multiple authors per book!I map

I We need a polymorphic map function, say f , such that:I f (author)→ (author.name, author)I f (book)→ [(book.author.name, book), . . . ]

I Aggregation: (author.name; book∗, author, book∗)

I In the following code, Value is a superclass of Author, Bookand List.

Page 41: NOSQL Databases: Topics

Example: Finding all (author,book) pairsI reduce

Pair reduce(String authorName , List <Value > values) {

Author a = n u l l ;Book prevbook = n u l l ;List <Pair > list = new List <Pair >();

f o r (Value value : values) {

i f (value i n s t a n c e o f Author) {

a = (Author)value;

i f (prevbook != n u l l ) {

list.append(new Pair(a, prevbook ));

prevbook = n u l l ;}

} e l s e i f (value i n s t a n c e o f Book && a == n u l l ) {

i f (prevbook != n u l l ) emit(prevbook );

prevbook = (Book)value;

} e l s e i f (value i n s t a n c e o f Book && a != n u l l ) {

list.append(new Pair(a, prevbook ));

} e l s e i f (value i n s t a n c e o f List <Pair >) {

list.append_all(value);

a = list.first (). author;

}

}

i f (prevbook != n u l l ) emit(prevbook );

i f (!list.empty ()) emit(list);

}

Page 42: NOSQL Databases: Topics

Implementations

I LightCloud

I MongoDB

I Cassandra

Page 43: NOSQL Databases: Topics

LightCloud

I LightCloud is a distributed key-value storeI Implements distributed storage.I “On-site” storage is provided by Tokyo Tyrant/Redis

I Tokyo Tyrant is a local key-value storeI Implements database managment functions

I Network interface and concurency controlI Database replication

I Actual storage is provided by Tokyo Cabinet

I Tokyo CabinetI Implements storage of key/value pairsI Over a single file, for a single client.

Page 44: NOSQL Databases: Topics

LightCloud

Tokyo Tyrant

Tokyo Cabinet

Tokyo Tyrant

Tokyo Cabinet

Tokyo Tyrant

Tokyo Cabinet

Page 45: NOSQL Databases: Topics

Tokyo Cabinet/Tyrant

I Tokyo Cabinet/Tyrant provide a very raw interface for storingkey/value pairs in a given single file

I The desired on-disk layout must be chosenI Extensible Hash Map, B-Tree, Fixed-size records, . . .I Parameters of these structures can be tweaked for better

performanceI Very demanding on the user

I The API consists of get and put and a few variantsI The data are opaque, unstructured blobs!

Page 46: NOSQL Databases: Topics

LightCloud

I Adds (horizontal) scalability to Tokyo Tyrant nodes by meansof consistent hashing

I Mitigates the distribution problemI However, no replication is performed; consistency is preferred

over availability.

I The API is still get and put, over strings.

Page 47: NOSQL Databases: Topics

MongoDB

I MongoDB is a document oriented database

I json documents

{"name": "John Smith","address ": {

"city": "Owatonna","street ": "Lily Road","number ": 32,"zip": 55060

},"hobbies ": [ "yodeling", "ice skating" ]

}

Page 48: NOSQL Databases: Topics

Database Organisation

I Databases contain collections

I Collections contain documents and indexes

Page 49: NOSQL Databases: Topics

Physical layout

I Documents are stored as binary blobs (BSON)I Documents are opaque for the databaseI As a result of a query they are retrieved in their entirety

I Indexes are B-Trees referencing these documents.I Allows to find documents based on the values they contain

without explicitely opening the whole document.

Page 50: NOSQL Databases: Topics

Advanced querying

I Simple queries can be performed efficiently when an index isavailable

I E.g. db.employee.find({"address.city": "Owatonna"})with an index on ”address.city”

I Larger jobs can be done by means of map-reduceI map maps a document to the needed key-value pair.

Page 51: NOSQL Databases: Topics

Advanced querying

I However, there is no facility for:I Joining documentsI Quantifying over other documents (i.e. EXISTS in SQL)

I Such operations are left to the user of the database!I Processing outside the database is costly!I It is therefore important to design the data model in such a

way that it returns the appropriate data directly.

Page 52: NOSQL Databases: Topics

Sharding

I MongoDB can shard documents over multiple serversI Data are split into chunksI A chunk has a starting and ending value.I A server is Responsible for multiple chunks.

I Individual collections and not whole databases are sharded

Page 53: NOSQL Databases: Topics

I Example: Sharding Persons over the Age field on 3 servers

Server 1 Server 2 Server 3

1–10 11–20 22–2921–22 30–41 42–5051–72 73+

I To be efficient, each server must keep roughly the sameamount of data.

I Mongodb provides automated balancing (auto-sharding) asmuch as possible

I Shards are created explicitely by the database administratorI shard = (collection, key)I Well chosen, can improve query performanceI Otherwise, the load of each server can be very unbalanced

Page 54: NOSQL Databases: Topics

Cassandra

I Introduction and history

I Data model and layoutI Distribution

I ReplicationI Adding nodesI Handling problemsI Timestamping

Page 55: NOSQL Databases: Topics

Cassandra — Introduction

I Created by FacebookI Based on DynamoI Lead Dynamo engineer hired by Facebook

I Released as Apache projectI Source code released in July 2008I Adopted by Apache in March 2009I Became high priority in February 2010

Page 56: NOSQL Databases: Topics

Cassandra — Data model

I Databases are conceptually two-dimensional

I Disks are one-dimensional

I Table:1 23 4

can be stored as either row-oriented (1, 2, 3, 4)

or column-oriented (1, 3, 2, 4); Cassandra is column-oriented

I No cost for NULL entries

I Easy column creationI Structure:

I Column family ∼ tableI Super column ∼ columnsI Column ∼ column

I May be seen as a hash table with 4 or 5 dimensions:

get(keyspace , key , column_family[, super_column], column)

Page 57: NOSQL Databases: Topics

Cassandra — Distribution

I CAP Theorem:I (Consistency)I AvailabilityI Partition tolerance

I Design goalsI ScalabilityI SimplicityI SpeedI Uniformity between nodes

I Consistent Hashing on a ringI No virtual nodesI Random placement

Page 58: NOSQL Databases: Topics

Cassandra — Replication and consistency

I Availability ⇒ more than one node needs a copy of each pairI Responsible node choses N other nodes to hold copies

I Way in which those are chosen can be changedI Next ones on the ring, different geographic location, etc.

I Attribution table copied to each node

I Possibility of choosing R and W values

Page 59: NOSQL Databases: Topics

Cassandra — Timestamping

I Every data has an associated timestamp

I Every key actually has an associated vector of(timestamp, value) pairs (truncated)

I Used to reach consistency with repair-on-readI Query sequence:

I Identify the nodes that own the data for the keyI Route the request to the node and wait for the responseI If the reply does not arrive within the configured timeout, failI Figure out the latest response based on timestampsI Schedule a repair if needed

I Repair algorithm can be customized

Page 60: NOSQL Databases: Topics

Cassandra — Adding a node

I GossipI Each node must know the position of every other node (and all

replicas)I Whenever a node moves or changes its replicas, it tells a

number of other nodes, sending its whole replication tableI Routing information thus propagatesI Some nodes are preferred (seeds)

I When a new node is inserted, we must give it a keyspaceand the address of a seed

I It chooses its position at randomI It contacts the seed to get a view of the current stateI It begins to move its data

Page 61: NOSQL Databases: Topics

Cassandra — Problem solving

I Overloaded nodeI Causes

I The keys are not uniformly distributedI Some keys are accessed more than othersI The node runs on inferior hardware

I SolutionI Overloaded nodes may move on the ring

I Unresponsive nodeI Causes

I The machine has crashedI There is too much latency on the network

I SolutionI Each node attributes a score to its neighbourI Inverse logarithmic scale: 1 means 10% chance to wake up, 2

means 1%, etc.I Define a threshold after which the node is removed

I Can be mostly automated