Upload
manu-cohen-yashar
View
218
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Summary on NOSQL databases.
Citation preview
Manu Cohen-Yashar
The Cloud, Big Data and
NoSQL
Agenda
Data boom
Problems with RDBMS
No SQL
Big Data
What’s next
Understand NO SQL
Types of databases
Primary usage
Data model
Pros and Cons
Lots of Data
Data is doubles every 18 month
Pictures
Web site
emails
Sensors
Geo Information
Financial Information
Science
Art
. . . (Infinite list)
No Limits
With the cloud it is now possible to mount any
size if cluster and conduct any computation in
any scale.
The one who will make sense of all available
data will rule the world.
The conclusion:
Use the cloud to analyze large scale of data.
Lets Talk about data
When we think of data we think of …
Data has many forms
Yet data comes in many forms and shapes
Graphs Documents
Time Series
Blobs
Geo
Sensors
Unstructured
Structured
Web
Problems with RDBMS
Does not scale very well
Sharding
Replication
Models data according to the relational model
Is this the best model for all data types?
Complex and Expensive
Require a DBA
Expensive to buy
Oracle
SQL
No Relational
Not all types of data fit well into the relational
world.
Not all data use cases fit well into the ACID
convention
The relational model does not scale very good
Difficult to distribute
Difficult to replicate
The CAP Theory
RDBMS
Replicated NoSQL
ShardedNoSQL
During a network partition, a distributed system must choose either Consistency or Availability.
NO SQL
Large family of databases
No Schema
No relations enforced
Designed for high scale and distribution
Types of NO SQL DB
Key Value
Wide Columns
Documents
Graph
Motivation for NO SQL
Large Scale and Distribution
Simplicity
Low cost
Good fit with the data model
Volume, Velocity and Variety
What Is No Schema
Some data is structured, and some does not.
No SQL databases do not ENFORCE a
schema like RDBMS systems.
You can leverage data structure by creating
indexes and smart queries.
Types of NO SQL Databases
Key values
Wide column
Document
Graph
Key values
Data is ordered as a key - values pair
Query by key and values
Simple indexes (by partition key)
ExamplesAzure Table Storage
Amazon DynamoDB
Key1 Key2 VaIue1 VaIue2 VaIue3 VaIue4 VaIue5
Israel 1234 1 2 3
France 2345 4 5 8
Demo
DynamoDB and Azure Tables
Wide column / Column Families
Data is ordered as a key – value groups
Store data by columnA column family is how the data is stored on the disk
Query by key\key range only
No Indexes (on some dbs)
ExamplesGoogle Big-Table
Cassandra
HBase
Example – Cassandra Data Model
Column
Key value
Super Column
Collection of columns
Column Family
Dictionary of columns
Super Column Family
Dictionary of Column Families
Demo
Cassandra
Document Database
Data is ordered as a Key – Document
Query by key and document content
Use indexes
Examples
Mongo
Raven
CouchDB \ Couchbase
Demo
Graph databases
Data is ordered in elements and relations.
Query by relations
Supports complicated mathematical graph
calculus
Examples
Neo 4J
StarDog (used for sematic web)
RDF and OWL
TripleSubject - Predicate – Object
Define facts
RDF (Resource Description Framework)Defines some extra structure to triples.
Example: "rdf:type“ is used to say that things are of certain types.
Schema: Defines some classes which represent the concept of subjects, objects, predicates etc.
Enables making statements about classes of thing, and types of relationship.
OWLAdds semantics to the schema.
Expressed in triples.
Example: "If A isMarriedTo B" then this implies "B isMarriedTo A".
Demo
There is no one NO SQL solution for all
use cases
Important
There are over than 150 possible offerings…
Replication and Sharding
No SQL databases can span over a large cluster
ReplicationCopy the data to multiple servers
Usually each data element is copied 3 times
One master two slaves
Result: High Availability
ShardingSplit the data between servers
Horizontal partitioning of the data
Result: Horizontal scale
Replication and Sharding can be done together
The Cloud and NO SQL
All Cloud Providers have NO SQL solutions
Azure Tables
Google Big Table
Amazon DynamoDB
NO SQL Databases are deployed on a cluster
There are large number of cloud hosting offerings for
no-sql clusters
MongoHQ (MongoDB)
Cassandra on Google Compute engine
Many more
Example – Mongo in Azure
Check your schema
Be open to use NO-SQL data stores
Identify your use-case and find the right
database for you
Create a simple POC
Questions