84
Data, data, data. I cannot make bricks without clay. Sherlock Holmes, Sherlock Holmes [2009]

NoSQL

Embed Size (px)

DESCRIPTION

My presentation in the Architects forum

Citation preview

Page 1: NoSQL

Data, data, data. I cannot make bricks

without clay. Sherlock Holmes, Sherlock Holmes [2009]

Page 2: NoSQL

Data

0 Qualitative or Quantitative attributes of a variable or set of variables

0 Lowest level of abstraction from which information and then knowledge are derived.

Representation of a fact, figure and idea.

Page 3: NoSQL

A well organized newspaper or

a clumsy, cluttered one?

Page 4: NoSQL

Data explosionFrom Gigabytes to Terabytes to Petabytes to perhaps (I’m

out of nomenclature)-bytes

Page 5: NoSQL

NoSQL = Not Only SQL

!= No to SQL

!= Never SQL

Page 6: NoSQL

Open SourceAbridged version of this presentation and notes will be available for everyone.

Distributed under no License

FREE AS IN SPEECH AND BEER

Page 7: NoSQL

Necessity is the mother of Invention

DDBMS

OODB

RDBMS performance

Cloud Computing

WEB 2.0

RnD

Multiple Solutions

Page 8: NoSQL

SQL Databases, the ‘Hammer’It’s a wonderful tool

Page 9: NoSQL

Commercial SQL DatabasesEven Gods use it

Design

Ergonomics

Warranty

Upgrades

Power

Apart fromHole in the Pocket

Ease of useFeatures

Page 10: NoSQL

Nail is a nail, Screw is a screw

Hammering a screw or Screw driving a nail is FOOLISHNESS!

Page 11: NoSQL

What?NoSQL is a new look at data to deliver:0 High Performance0 Unlimited horizontal scalability

0 Economic, common, unreliable hardware0 Auto Sharding

0Support for wide range of data0 Recursive, Hierarchical0 Non-Rigid

0High Availability

Non-relational next generation operational data stores and databases

Page 12: NoSQL

What? (Continued…)0Partly or completely independent of

RDBMS concepts0No specific implementation

0 Breakthrough Approaches

Key:0Non-relational approach0Non-ACIDness

A STEP BACKWARDS, THEN MANY STEPS FORWARD

Page 13: NoSQL

NoSQL, the ‘screwdriver’Yet another tool in our repository to go along with the

hammer

Page 14: NoSQL

NoSQL is about choice

Not all problems are nails.Not all screws are same.

GOOD PROGRAMMING PRACTICE: Know your tools and use them appropriately

Page 15: NoSQL

SQL Databases

0Data0 Relational0 Tabular –

Rows/Columns0 Interface

0 Sql0Basic Design

Inspiration0 Set Theory

0ACID Design0Scale Up Design

•Oracle•MySQL•Teradata•SQLite•SQL ServerAnd many more

Page 16: NoSQL

Why? Is all data really relational? If Consistency is ensured, do we have to enforce/check it again at the database level.

Are RDBMS ready for challenges of the future like: Dynamic schema/metadata Huge amounts of data

Through horizontal auto scaling Ability to handle complex data types

Images, Videos, Audios and much more

Not Really!

Page 17: NoSQL

Why? (Continued…)0RDBMS drawbacks:

0 Scalability0CRUD

0 Performance0Write Overhead0Limited by single disk architecture0Lack of In Memory design

0 Rigid schema designAnd more …..

Page 18: NoSQL

HAMMERSAre under some

Hammering

Page 19: NoSQL

DRAWBACKSEEPDIVE

Page 20: NoSQL

Scalability0 True Scalability

0 Horizontal Scaling0 Transparency to the

application0 No single point of failure

0 Problems with SQL databases0 Vertical Scaling0 Partitioning aka

Sharding0 Read Slaves

0 Anti Patterns0 Normalized Data0 Joins0 ACID Transactions

Page 21: NoSQL

No Breadcrumbs0CRUD is crude

0 Delete/Update strategy is improper0CRA!

0 Create, Read, Archive – way to go ahead

Audit information is lost in CRUD but not in the case of CRA

Page 22: NoSQL

Naive Data Support0Not designed for

0 Complex Data Structures0Recursive0Hierarchical0Ordered List0Circular

0 Dynamic Metadata

Page 23: NoSQL

Logical/Physical separation concerns

0Relational model -> Logical Model0RDBMS implement it at physical level

0 Using Multiple indices0Artificial overhead in managing the

database0Frequent drop and create index to make

DB perform

Page 24: NoSQL

Spinning Disk Storage

0 Design flaw for most RDBMS systems0 With cheaper memory, Memory based approach

should also be included in the design

0 Defiance of Moore’s law0 Disk reads grew only 12.5 times in about 50 years0 Disk writes much lesser.

0 Disk write is expensive.0 RDBMS make things worse by writing more.

0 ACID rains are UNHEALTHY

Page 25: NoSQL

Think ‘Out of the ROM’

Page 26: NoSQL

At Snail’s pace0 RDBMS engine growth – SLOW

0 Optimizations have been minor since initial days

0 Majority of growth due to Moore’s law0 Faster hardware0 Slightly faster storage0 Faster memory

0 What when Moore’s law diminishes thanks to external factors like heat generated.

Page 27: NoSQL

Database size limits

0RDBMS are too slow0 Over multiterabyte and petabyte

databases

0Purpose designed parallel processing would be needed to handle such capacities of data in a RDBMS.

Page 28: NoSQL

RDBMS has been there since years

and is proven technology

What about

NoSQL

Page 29: NoSQL

RDBMS grew fast but

growth slowed down over time and might eventually reach a stale point

NoSQLunarguably a new immature tool,

has been growing faster than RDBMS ever didand is being supported by the Big Players

Page 30: NoSQL

Did you say

BIG PLAYERS!

WHO?

Page 31: NoSQL

NoSQL Real World Implementations

Google – BigTable Facebook – Hbase Digg – Cassandra Amazon – Dynamo Trend Micro – Hbase Netflix – Amazon SimpleDB Shutterfly – MongoDB LinkedIn – Voldemortand more

Microsoft is considering NoSQL as well for Azure services so is Twitter

Are we next?

Major IT Companies have implemented or even better created their own NoSQL to manage huge Data stores which couldn’t be managed by SQL Databases.

Page 32: NoSQL

We are used to SQL and relatedness,

why can’t they just fix RDBMS

to handle Big Data

Big Data can be handled via Scale Out/Partitionability across Multiple Nodes

STORAGE SEEK RATESLarge writes and ACID being a huge limitation

Page 33: NoSQL

CAP TheoremApplies to distributed shared data system

Page 34: NoSQL

CAP THEOREM

Consistency

PartitionabilityAvailability

Page 35: NoSQL

A Deeper look0 Consistency: The system is in a consistent

state after an operation0 All clients see the same data0 Strong Consistency(ACID) vs. Eventual (BASE)

0 Availability: ‘Always On’ mode, no downtime0 All clients can find some available replica0 Software/hardware upgrade tolerance

0 Partition Tolerance: The system continues to function even when split into disconnected subsets (by a network disruption)0 Reads and Writes combined

Page 36: NoSQL

Consistency

PartitionabilityAvailability

RDBMS

NoSQ

L

Paxos

CA- Single Site

Clusters

CP- Some data maybe

inaccessible but rest is accurate/consistent

- Sharded database- TERADATA comes

here

AP- System is still available under

partitioning but some of the data returned may be inaccurate

Page 37: NoSQL

Atomicity

Consistency

Isolation

Durability

All of the operations in the transaction will complete, or none will.

The database will be in a consistent state when the transaction begins and ends.

The transaction will behave as if it is the only operation being performed upon the database.

Upon completion of the transaction, the operation will not be reversed.

Page 38: NoSQL

Basically

Available

Soft State

Eventually Consistent

When Availability and Partitionability are prioritized over Consistency,

think in terms of BASE

Page 39: NoSQL

Eventual Consistency0 If no new updates are made to the

object, eventually all accesses will return the last updated value.Ex: Domain Name System (DNS)

Page 40: NoSQL

Types of Eventual Consistency

0Read-your-write consistency0 Session consistency

0Monotonic read consistency0Monotonic write consistency0Causal consistency

Practically, Read-your-write consistency and monotonic read consistency are desirable in an eventually consistent system

Page 41: NoSQL

Hash()0Different Apps – Different CAP

requirement0 Prioritize among

0Consistency – Availability0Availability – Partitionability0Consistency - Partitionability

Page 42: NoSQL

WHERE?So will NoSQL eventually replace RDBMSs everywhere?

No, RDBMS are there to stay.NoSQL is here to help.

Page 43: NoSQL

Wherever you want to take

Advantageof

NoSQL

Page 44: NoSQL

Big DataDenormalizeShardScale OutAnd look no further than NoSQL

Page 45: NoSQL

Write Intensive Applications

I/OpS of the Best storage device <<< n * I/OpS of relatively cheaper storage devices in simple terms: ‘HARNESS THE POWER OF YOUR CLOUD’

Page 46: NoSQL

Fast Key-Value AccessNoSQL – ‘User, you are looking for $value’

RDBMS – ‘Query executing ….’

A O(1) Hash operation or O(log n) B+/B tree traversals

Page 47: NoSQL

Flexible Schema and Data types‘I once was a integer, then a string then a date; What am I’ - Field

RDBMS – ‘WTH! Whatever you are, You are beyond my scope’

Page 48: NoSQL

Transient Data ~μs

~ms

Data – ‘I’m here only for a while and want to get my work done fast’

RDBMS – ‘You are data and you shall be treated like the rest’

NoSQL – ‘Okay, I’ll allot you space in the RAM using Memcached If available otherwise you still have my cloud’

Page 49: NoSQL

High Write AvailabilityWarning - Incoming data ….

NoSQL – ‘Anytime you like, user’

RDBMS – ‘This is insane, I’m already busy with other things’

Page 50: NoSQL

ECONOMICSRDBMS – ‘I’m powered by a wonderful, beautiful rabbit’

NoSQL – ‘I’m powered by many cute little hamsters’

Page 51: NoSQL

No Single Point of Failure

0Designed to run over0 Economic0 Commonly Available0 Unreliable hardware

Page 52: NoSQL

Full table scan operations

0MapReduce:0 Map:

0To define your problems into optimal sub problems which can be computed in parallel and reduced later

0 Reduce:0To merge the sub

optimal solutions into the result

Divide and Conquer your way to Victory

Powered by MapReduce! Or something similar

Page 53: NoSQL

Ability to restore, maintain, repair itself

No DBA required Design

Page 54: NoSQL

HOW?Let us welcome

Keys, Values, Collections, Data Structures, Objects, Documents Graphs

Page 55: NoSQL

NoSQL ViewThe basic approach at data:0 Key/Value store0 Run on multiple machines0 Partitions and Replication across these

machines0 Relax consistency

0 Aim at Eventual Consistency0 Asynchronous replication

But not all NoSQL take the same path.

Page 56: NoSQL

NoSQL

Key-Value Store

Document Store

BigTable Clones

Graph StoresMultivalue

Object

Tuble Store

Page 57: NoSQL

Key-Value Stores0 One key, one value, no duplicates and crazy

fast0 Distributed hash tables

0 The value is stored as binary object – BLOB0 The DB doesn’t understand it and doesn’t

want to

Ex: Amazon Dynamo, MemcacheDB

Page 58: NoSQL

Key/Value store doesn’t know what is in here

Key4

Key3

Key2

Key1

Page 59: NoSQL

Document Store0Key-value store, but the value is

structured and understood by the DB0Querying data is possible

0 On not just the key

Ex: MongoDB, CouchDB, Riak etc

Page 60: NoSQL

0Each database has collections0 Each collection has a set of documents

0They are well-designed for access through applications0 Suitable for web applications

0Few Document databases provide SQL Like query interface now

Page 61: NoSQL

Name: $NameValue: $ValueVersion: $VersionType: $Type

Key4

Key3

Key2

Key1

Emb Object2

Emb Object1

Objects inside ObjectsCRAZY!

Page 62: NoSQL

BigTable & its Clones0Database, tables, rows, columns and ’

SuperColumn’0Row consists of columns and

SuperColumns0 Few supercolumns can be made a must

0Each supercolumn – arbitrary set of columns

0Rows are typically versioned by a system assigned timestamp.

Page 63: NoSQL

0 Intended for tables with huge number of columns0 Millions can also be supported very

easily

0 ‘a sparse, distributed multi-dimensional sorted map’

0Also referred to as Wide Column stores

0Ex: Google BigTable, Cassandra, Hbase, Voldemort, Azure Tables

Page 64: NoSQL

Column1

Column2

Column3

SuperColumn1

Key1

Key2

Key3

1 2 3 4

1 2 3

1 2 3 4 5 6 7

Page 65: NoSQL

Graph Databases0 Nodes, Edges, Properties

0 Replace traditional tables, columns, rows

0 Graph database can be implement in different ways0 Key/value store, columnar, bigtable clone or

even combination of these

0 Fields are used to directly store the id of another entity forming the edge

Page 66: NoSQL

0Graph database is a multi-relational graph0 No need for secondary indexes

0 Relationships in RDBMS are ‘weak’0 Relationships in Graphs are ‘strong’0 The rest don’t really care about relations

at db level

Page 67: NoSQL

Matt

April

Honda

Is related to

Drives

owns

Age: 32

SSN

Address

Mobile

City

Model

registration

Spouse

Page 68: NoSQL

Complexity

Size

Key-Value Store

Document Store

BigTable Clone

Graph Databases

Page 69: NoSQL

The Menu0 Document Store

0 CouchDB0 Lotus Notes0 MongoDB

0 Graph0 AllegroGraph0 Neo4j0 DEX

0 Tabular0 BigTable0 Hbase0 HyperTable

0 On Disk0 BigTable0 Membase0 Tokyo Cabinet

0 In RAM0 Memcached0 Velocity

0 Eventually Consistent

0 Cassandra0 Dynamo0 Riak

0Hierarchical0 GT.M

0Ordered0 Berkeley DB0 NMDB0 C-ISAM

0 Multivalue0 eXe0 OpenQM

The list isn’t even a quarter of the whole

Page 70: NoSQL

_theOpenSourceIssue0 Most of them are open source

0 Thus fork-able like Linux

0 The first of the lot0 Google’s BigTable0 Amazon’s Dynamo

0 All in all, there are about 10 roots with 4 major ones.

Page 71: NoSQL

No single database to rule them all

Page 72: NoSQL

MongoDB

0 Document Store0 JSON Storage0 REST ….. Not out of the box0 Map/Reduce0 Master slave replication0 Strong suite of query APIs0 Good support for SQL

0 Work in Progress:0 Autosharding based

scalability0 Failover support

Open SourceNon RelationalScalableSchemalessQueryable

Page 73: NoSQL

Document Oriented0Mongo stores documents in collections

0 Documents are slightly enhanced JSON Objects

0 Complex data structures is very much possible

0Data Modelling is a more natural process

Page 74: NoSQL

Embeddable Objects0 Complexity.begin()

0 Embed objects within a single document0 Document is an enhanced form of object like

mentioned earlier0 The same thing in RDBMS can be achieved using

multiple tables and joining them together

0 Consider our requirement is to store a blogging post with this information

0 Post Content0 Post Title0 Post Author 0 Comments

0 Comment order0 Comment content0 Comment author

Page 75: NoSQL

RDBMS solutionPostID

PostName

PostContent

PostAuthor

PostID

CommentID

Author

Content

PreCommentID

Page 76: NoSQL

MongoDB Solution0 Documents …. Each one of them is a post{ Name: $name,

Author: $author, Comment: [ { Author: $author1,

Comment: $comment1} , { Author: $author2,Comment: $comment2,Replies: [ { Author: $author3,Comment: $comment3} ] } ]

}

Page 77: NoSQL

ID

Name

Author

Content

Comments

01

$name

$author

$content

Author Comment Replies

$author1

$comment1 NULL

$author2

$comment2Author

Comment

Replies

Auth3 comm3 NULL

RDBMS Viewpoint

Page 78: NoSQL

ODFMongodb’ed

Page 79: NoSQL

HEADER LEVEL

TIE LEVEL

LINE LEVEL

Page 80: NoSQL

Schema-less0No database enforced Schema

0 Addition, Deletion of columns are simple

0 Its about how the application uses APIs

0Data definition need not be defined up front.

Page 81: NoSQL

Other Features0Data Tagging0Caching0Real Time Analytics0 Image Storage0Dynamic Queries0Binary Storage

Page 82: NoSQL

Try MongoDB @

http://try.mongodb.org/

Page 83: NoSQL

EOL

\n

Page 84: NoSQL

Calm down!Eventually Answered System

All your questions will be answered eventually