Upload
jeff-smith
View
679
Download
5
Tags:
Embed Size (px)
Citation preview
NoSQL in Perspective
Jeff Smith [email protected]
NoSQL on Wikipedia
92 databases 8 types 6 sub-types
Easy Questions
Is this a graph? Do I already have XML or JSON? Is this a caching problem?
Paul Graham on Programming Languages
Lisp
C
Math Problem
Lisp is just math. Math doesn't get stale. What in databases is just math?
Putting the R in RDBMSes
Relation
Attributes
Tuples
Database Analogy
C is to Lisp as Relational Algebra is to Relational Calculus
C: Lisp::Relational Algebra: Relational Calculus
Relational Algebra in Action
Relational Algebra:
SQL:
R ⋉S = { t : t R, s S, Fun (t s) }
SELECT * FROM audience WHERE clue > 0;
Relational Calculus in Action?
Relational Calculus:
Relevant Implemented Language:
{ t : {name} | ∃ s : {name, wage} ( Employee(s) ∧ s.wage = 50.000 ∧ t.name = s.name ) }
This space under construction.
Relational Model Utility
Essentially, all models are wrong, but some are useful.
- George E. P. Box
When relations are wrong
Sparse data Irregular data Poorly understood interrelationships No definable indexes Big data No vertically scalable hardware
Papers Read Around the World
Google's BigTable: http://research.google.com/archive/bigtable.html
Amazon's Dynamo: http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
Lessons from Functional Programming
MapReduce: http://research.google.com/archive/mapreduce.html
MapReduce map(String key, String value): // key: document name
// value: document contents for each word w in value:
EmitIntermediate(w, "1");
reduce(String key, Iterator values):
// key: a word // values: a list of counts
int result = 0;
for each v in values: result += ParseInt(v);
Emit(AsString(result)); [1]
CAP Theorem
Consistency Availability Partition tolerance
CAP Theorem?
Consistency
Availability
Partition Tolerance
Sacrifice Availability
Consistency Partition Tolerance
Then, sacrifice what?
Consistency Partition Tolerance
Availability Availability
PACELC
In the event of a Partition, does the system prioritize Availability or Consistency
Else does the system prioritize Latency or Consistency?
PACELC as a Tree
Partition Else
Availability Consistency Latency Consistency
Traditional RDBMSes: PC/EC
Partition Else
Consistency Consistency
Eventually Consistent: PA/EL
Partition Else
Availability Latency
ELC: Replication Options
1. Update all nodes 2. Update the master node first 3. Update an arbitrary node first
Best of both worlds?
SQL
HadoopDB
MySQL Cluster
Riak Demo
N: persisted copies
R: read copies
W: write copies
Strong Consistency: R + W > N
Thanks
Jeff Smith [email protected]