NOSQL Session GlueCon May 2010

Preview:

DESCRIPTION

Overview of NoSQL at GlueCon. Talk given by Dwight from 10gen/MongoDB.

Citation preview

NoSQL : Channeling the Data Explosion

Dwight MerrimanCEO, 10gen

@dmerr dmerr.tumblr.com

GlueCon 2010

The database world is changingNo longer one-size-fits-all

NoSQL = Non-relational next generation operation data stores

and databases

Scaling Out

no joins +light transactional semantics = horizontally scalable architectures

Why?

http://www.globalnerdy.com/2007/09/07/multicore-musings/

cloud

commodity

How the NoSQL Products Vary

• What’s the same– No joins– No complex transactions

• What varies– Scale-out model– Consistency model– Data model

Scaling Out

distribution & query models

Consistent hashing

Order preserving range chunking

Scatter gather

Data models

no joins +light transactional semantics = horizontally scalable architectures

Important side effect : new data models = improved ways to develop apps

Data Models

• Key/value• Column-oriented “bigtable-style”• Document-oriented (JSON)

Data Models

{ title: ‘Too Big to Fail’, author: ‘John S’, ts: Date(“05-Nov-09 10:33”), comments: [ { author: 'Ian White', comment: 'Great article!' }, { author: 'Joe Smith', comment: 'But how fast is it?', replies: [ {author: 'Jane Smith', comment: 'scalable?'} ] } ] ], tags: [‘finance’, ‘economy’]}

{ title: ‘Too Big to Fail’, author: ‘John S’, ts: Date(“05-Nov-09 10:33”), comments: [ { author: 'Ian White', comment: 'Great article!' }, { author: 'Joe Smith', comment: 'But how fast is it?', replies: [ {author: 'Jane Smith', comment: 'scalable?'} ] } ] ], tags: [‘finance’, ‘economy’]}

db.posts.find( { tags : ‘economy’ } ) .sort({ts:-1}).limit(10).skip(10)

db.posts.find( { “comments.author” : “Ian White” } )

Influences

CAP

It is impossible in the asynchronous network model to implement a read/write data object that guarantees the following properties:• Availability• Atomic consistency in all fair executions (including those in which messages are lost).

Consistency Models - CAP

• Choices are AP or CP• Write Availability, not Read Availability, is the

Main Question• It’s not all about CAP

Eventual consistency makes these non-availability aspects better:• Multi data center• Speed• Even load distribution

Eventual Consistency

Eventual Consistency

Read(x) : 1, 2, 2, 4, 4, 4, 4 …

Could we get this?

Read(x) : 1, 2, 1, 4, 2, 4, 4, 4 …

Terms

• R• W• N– R+W>N has nice properties

• Sloppy quorum

R+W>N

If R+W > N, we can’t have both fast local reads and writes at the same time if all the data centers are equal peers?

Network Partitions

Trivial Network Partitions

Sometimes we need global state / more consistency

• Unique key constraints– User registration

• ACL changes• Are we surprising the user?– read-your-own-writes

Could it be the case that…

uptime( CP + average developer ) >= uptime( AP + average developer )

where uptime:= system is up and non-buggy?

Predictions

• JSON will be the most popular building block for non-relational data models

• Tunable consistency in all the products• Some SQL in these products!

Questions?Thank you

dwight@10gen.com@dmerrdmerr.tumblr.com@mongodbDownload : www.mongodb.org10gen is hiring in SF and NYC – info@10gen.com