35
NoSQL Data Stores Luca Rossi [email protected]

NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

NoSQL Data Stores

Luca [email protected]

Page 2: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

NoSQL Systems > Timeline

2003    Memcached2006 Google BigTable2007 Amazon Dynamo

2007 HBase2008 Cassandra, CouchDB2009 P.Voldemort, Redis, Riak, MongoDB

30/05/2011 Sistemi NoSQL 2

Page 3: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

NoSQL Systems > Memcached

• What ismemcached– Caching system intended to alleviate database load.– In‐memory key‐value store for small chunks of data.

• Extremely successful– Facebook, Yahoo, Wikipedia, Ebay, Digg, ….

30/05/2011 Sistemi NoSQL 3

Page 4: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

Memcached > How does it work

30/05/2011 Sistemi NoSQL 4

Super simple!

v = memcachedClient.get(key);if(v == NULL) {

v = db.query( SOME SLOW QUERY );memcachedClient.set(key, v);

}

Key‐value cache1. Keys are hashed2. Hash table span across an 

arbitrary number of servers

Page 5: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

NoSQL Systems > Google BigTable

30/05/2011 Sistemi NoSQL 5

Page 6: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

NoSQL Systems > Google BigTable

30/05/2011 Sistemi NoSQL 6

• BigTable is a distributed storagesystem for managing structureddata that is designed to scale to a very large size.

• Petabytes of data across thousands of commodity servers.

• Built on top of Google File System

Page 7: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

Google BigTable > Data Model

30/05/2011 Sistemi NoSQL 7

Row Id ColumnFamily1 ColumnFamily2 … ColumnFamilyN

rowid1 qualifier1 = “abc”qualifier2 = “def”qualifier3 = “123”…

qualifier1 = “xyz”qualifier5 = “fgh”

… …

rowid2…

• Column Families are (the only things) defined in the schema• Qualifiers are added dynamically.

• Simple queries• Get a row by key• Get a range of rows by (start key, end key)

Page 8: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

Google BigTable > Data Model > Example

30/05/2011 Sistemi NoSQL 8

• Student – Course– 1 student > many courses– 1 course > many students

Studentsid PKnameemailbirthdate

Courseid PKtitledescriptionteacher_id

Student2Coursestudent_idcourse_id

Page 9: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

Google BigTable > Data Model > Example

30/05/2011 Sistemi NoSQL 9

De‐normalized data

Single key‐space

Page 10: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

Google BigTable > Infrastructure

• Partition model: sharding on the row key :– Data is divided into tablets– Each tablet is defined by the range of row keys it isresponsible for (start key – end key)

– Each tablet is served by one tablet server at a time– Each tablet server may serve (has the lock for) manytablets.

• Distributed locking service called Chubby– Manages tablet servers lifecycle

30/05/2011 Sistemi NoSQL 10

Page 11: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

Google BigTable > Infrastructure

• Three‐level hierarchy to store tablet location– Analogous to a B+ Tree

30/05/2011 Sistemi NoSQL 11

Master ServerTablet Servers

Page 12: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

Google BigTable > Infrastructure

30/05/2011 Sistemi NoSQL 12

Client Master Server

Tablet Server

Tablet Server

Tablet Server

Tablet Serverrequest

request

response

• Strong consistency– Only one tablet server is responsible for a given piece of data.– Replication is handled on the GFS layer

• Trade‐off with availability– If a tablet server fails, its portion of data is temporarily unavailable until a new 

server is assigned

Page 13: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

NoSQL Systems > Amazon Dynamo

“An extra tenth of second in response times will cost us1% in sales” ‐ Amazon

• Dynamo: Highly available key‐value store

• Challenge: reliability at massive scale– Tens of millions of customers.– Tens of thousands of servers.

30/05/2011 Sistemi NoSQL 13

Page 14: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

Amazon Dynamo > Data Model

• Binary objects (i.e. blobs) identified by uniquekeys

• Query model:  – Simple read and write operations to data retrievedby primary key

– No operations span multiple data items

30/05/2011 Sistemi NoSQL 14

Page 15: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

Amazon Dynamo > Infrastructure

• Partitioning similar to P2P (Chord, Pastry, etc.)– Keys are hashed.– The range of the hash function is treated as a circular space (ring).

– Each node is responsible for a region of the ring.– Distributed Hash Table (DHT)

30/05/2011 Sistemi NoSQL 15

Page 16: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

AA

N=1N=1

N=2N=2

N=2N=2

N=3N=3

NoSQL Systems > Amazon Dynamo

30/05/2011 Sistemi NoSQL 16

“AE107FB…”

• Each node is responsiblefor the region between itand its N predecessors.

• N is tuned on per‐nodebasis

Page 17: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

NoSQL Systems > Amazon Dynamo

• Replication– Each data item is replicated at many hosts

• Eventual consistency– Updates are propagated to replicas asynchronously– The system eventually reaches a consistent state

• Tradeoff between consistency and availability– Number of replicas is crucial

30/05/2011 Sistemi NoSQL 17

Page 18: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

Case Study > Facebook Messages

30/05/2011 Sistemi NoSQL 18

Page 19: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

Case Study > Facebook Messages

• Real‐time, reliable messaging system that combines chat, messagesand emails.

• 135+ billion messages per month

• Two main usage patterns– A short set of temporal data that tends to be volatile– An ever growing set of data that rarely gets accessed

• Candidate systems: – MySQL– Apache Cassandra– Apache HBase

30/05/2011 Sistemi NoSQL 19

Page 20: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

Facebook Messages > MySQL

• Attractive choice:+ Facebook core infrastructure is MySQL‐based

• It is indeed a giant LAMP application+ Facebook team has extensive knowledge in running and managing MySQL

• But…– MySQL clustering is hard to mantain (and scale)– MySQL performances suffer with large indexes and data sets

30/05/2011 Sistemi NoSQL 20

Page 21: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

Facebook Messages > Apache HBase

• BigTable’s open‐source clone– Extensible record store– Strong consistency

• Availability trade‐off

• Part of the Hadoop ecosystem– Built on top of HDFS– Integrates with Hive, ZooKeeper, etc.

30/05/2011 Sistemi NoSQL 21

Page 22: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

Facebook Messages > Apache Cassandra

• Marriage between BigTable and Dynamo– Data model: Extensible record store (BigTable)– Infrastructure: Distributed Hash Table (Dynamo)

• Eventual consistency• High availability

• Developed by Facebook itself– To serve the (old) inbox system

30/05/2011 Sistemi NoSQL 22

Page 23: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

Facebook Messages > Evaluation results

• MySQL soon discarded• Hbase vs Cassandra

30/05/2011 Sistemi NoSQL 23

Data model Consistency model Availability

HBase Extensible record store

Strong consistency ‐ Replicas managed by HDFS

‐ Region servers are singlepoints of failure

Cassandra Extensible record store

Eventual consistency ‐ No single point of failure

Page 24: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

Facebook Messages > Evaluation results

• MySQL soon discarded• Hbase vs Cassandra

30/05/2011 Sistemi NoSQL 24

Data model Consistency model Availability

HBase Extensible record store

Strong consistency ‐ Replicas managed by HDFS

‐ Region servers are singlepoints of failure

Cassandra Extensible record store

Eventual consistency ‐ No single point of failure

• Hbase won– Strong consistency is a better match for real‐time systems

Page 25: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

NoSQL Systems > Overview

• We have seen:– Extensible record stores

• BigTable, HBase, Cassandra

– Key‐value stores• Dynamo

• There’s more to it!– Document stores

30/05/2011 Sistemi NoSQL 25

Page 26: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

NoSQL Systems > Document stores

• Systems that store collections of documents

• What is a document?– Generally, an object with a number of fields, whosevalues can be scalars, lists, or nested documents aswell

• e.g.: XML, JSON

30/05/2011 Sistemi NoSQL 26

Page 27: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

Case Study > Guardian.co.uk

30/05/2011 Sistemi NoSQL 27

Page 28: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

Guardian.co.uk > 2005‐09

Modern Java application– Strong model in Java– Oracle RDBMS– Database abstractedwith ORM

30/05/2011 Sistemi NoSQL 28

Problems: increasing complexity– Complex Hibernate binding (10.000+ lines of XML config)– Lots of optimisations– Complex caching strategy– Load becoming an issue– …

Page 29: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

Guardian.co.uk > 2009‐10

30/05/2011 Sistemi NoSQL 29

• Introduce yet more caching

Memcached

• Decouple applications from db by building APIs– Power APIs using scalable technologies (Apache Solr)– JSON results

DB Load

Page 30: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

Guardian.co.uk > 2009‐10

30/05/2011 Sistemi NoSQL 30

Three models now:– RDBMS Tables– Java objects– JSON API

JSON model is very simple:– Multiple domain objects expressed in a single doc– Can be designed in a forwardly extensible way

headache

Page 31: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

Guardian.co.uk > 2009‐10

30/05/2011 Sistemi NoSQL 31

Article

Tags

Page 32: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

Guardian.co.uk > 2009‐10

30/05/2011 Sistemi NoSQL 32

Article

Tags

What if the JSON API was the primary model?• CouchDB• MongoDB

What if the JSON API was the primary model?• CouchDB• MongoDB

Page 33: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

NoSQL Systems > MongoDB vs CouchDB

30/05/2011 Sistemi NoSQL 33

CouchDB MongoDB

Data Model Collections of JSON docs Collections of BSON docs

Queries Low‐level query language Rich, declarative query language

Consistency Model Eventual Consistency Strong Consistency (tunable though)

Replication Master‐Master Master‐Slave

Scalability Through replication Sharding

Page 34: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

NoSQL Systems > MongoDB vs CouchDB

30/05/2011 Sistemi NoSQL 34

CouchDB MongoDB

Data Model Collections of JSON docs Collections of BSON docs

Queries Low‐level query language Rich, declarative query language

Consistency Model Eventual Consistency Strong Consistency (tunable though)

Replication Master‐Master Master‐Slave

Scalability Through replication Sharding

• MongoDB was chosen:• Can easily express complex queries• Good if you come from RDBMS• No need for extreme scalability (where CouchDB shines)

Page 35: NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdf · NoSQL Systems > Timeline 2003 Memcached 2006 Google BigTable 2007 Amazon Dynamo 2007 HBase 2008 Cassandra, CouchDB

NoSQL Systems > Links and References

• Rick Cattel – Scalable SQL and NoSQL Datastores• R.Cattel, M.Stonebraker – Ten Rules for Scalable Performance in “Simple 

Operation” Datastores• M.Stonebraker – SQL vs NoSQL Databases

• A.Popescu – MyNoSQL Blog

• Chang et al. – Google BigTable• DeCandia et al – Amazon Dynamo

• We have encountered:– Cassandra – cassandra.apache.org– Hbase ‐ hbase.apache.org– CouchDB ‐ couchdb.apache.org– MongoDB ‐ http://www.mongodb.org– Memcached ‐memcached.org

30/05/2011 Sistemi NoSQL 35