Upload
zahid-mian
View
13
Download
0
Embed Size (px)
Citation preview
Zahid MianPart of the Brown-bag Series
Could be “Not SQL” Could be “Not Only SQL” Could be “Not SQL Yet” Essentially any database system that doesn’t
require storage in tables/rows/columns Development as early as 1960s NoSQL DB often tailored to specific need
TRADITIONAL RDBMS
Record persistence Well-defined schemas SQL querying tools ACID support
Atomic
Consistent
Isolated
Durable
Doesn’t scale horizontally (easily)
NOSQL
Multiple formats No Schema No single querying
language BASE support
Basic Availability
Soft-state
Eventual consistency
Scales horizontally
Traditional RDBM wasn’t designed for high rates of growth (emails, tweets, message boards, etc.)
Not all data is relational SQL can be too cumbersome (joins,
subqueries, etc.) Defining schema limits growth So … NoSQL offers a solution to all these
issues
KEY-VALUE
Dynamo, Cassandra, SimpleDB, etc.
Essentially store a key and a corresponding value
Simple to program Easy to distribute across
clusters
DOCUMENT STORES
MongoDB, CouchDB Similar to key-store, but
maps keys to documents in either XML or JSON format
No need for joins because a single document contains the entire information
A little more technical than Key-Value, but still easier than relational
ORACLE NOSQL (KEY-VALUE) USING JAVA
// Define the major and minor path components for the key
majorComponents.add("Smith"); majorComponents.add("Bob"); minorComponents.add("phonenumber");
// Create the key Key myKey = Key.createKey(majorComponents, minorComponents);
String data = "408 555 5555"; // Create the value. Notice that we serialize the contents of the // String object when we create the value. Value myValue = Value.createValue(data.getBytes()); // Now put the record. Note that we do not show the creation of the // kvstore handle here. kvstore.put(myKey, myValue);
MONGODB USING C#
MongoCollection<BsonDocument> employees = database.GetCollection("employee");
for (int i = 1; i <= 5; i++) {
BsonDocument employee = new BsonDocument { { "name", "Employee " + i }, { "email", String.Format("email{0}@email.com", i) }, { "createddate", DateTime.Now }
};
employees.Insert(employee); }
PROS
No Schema leads to faster changes in application
Various options available; Open Source
Scales well API driven interaction
doesn’t require SQL query
CONS
No Schema leads to unmanageable code
Vendors may not be around in the future
For truly large databases, need planning
API interaction is a little more complex than SQL
Very few tools to support reporting/analytics
Replicas: Ensure availability even if a replica is lost
With read access (if one doesn’t respond, go to the second replica)
With Updates, send data to all replicas
Two Implementations:
Eventual Consistency
Majority Write/Majority Read
Resync Replicas when available
PostgreSQL is a hybrid Oracle NoSQL Microsoft Azure NoSQL offerings: DocumentDB,
Tables, HBase But … Splice Machine offers RDBMS support for Hadoop FoundationDB offers a SQL engine for Key-Value
Support What if RDBMS vendors support JSON and KV?
If they add KV and Document search capabilities? Game over for NoSQL databases? A bit more complex than that (underlying architecture is
the problem)
Eliminates the “wasted” processing from RDBM
No Disk
▪ break DB into RAM-sized chunks & dist. across cluster
No Locking
No Concurrency Control
▪ One transaction at a time per partition
▪ Easier to do when db is broken into multiple partitions
No Disk Logging
▪ Recover from existing replicas
▪ Limited, efficient logging to disk