Upload
kirandanduprolu
View
1.936
Download
1
Tags:
Embed Size (px)
DESCRIPTION
My presentation in the Architects forum
Citation preview
Data, data, data. I cannot make bricks
without clay. Sherlock Holmes, Sherlock Holmes [2009]
Data
0 Qualitative or Quantitative attributes of a variable or set of variables
0 Lowest level of abstraction from which information and then knowledge are derived.
Representation of a fact, figure and idea.
A well organized newspaper or
a clumsy, cluttered one?
Data explosionFrom Gigabytes to Terabytes to Petabytes to perhaps (I’m
out of nomenclature)-bytes
NoSQL = Not Only SQL
!= No to SQL
!= Never SQL
Open SourceAbridged version of this presentation and notes will be available for everyone.
Distributed under no License
FREE AS IN SPEECH AND BEER
Necessity is the mother of Invention
DDBMS
OODB
RDBMS performance
Cloud Computing
WEB 2.0
RnD
Multiple Solutions
SQL Databases, the ‘Hammer’It’s a wonderful tool
Commercial SQL DatabasesEven Gods use it
Design
Ergonomics
Warranty
Upgrades
Power
Apart fromHole in the Pocket
Ease of useFeatures
Nail is a nail, Screw is a screw
Hammering a screw or Screw driving a nail is FOOLISHNESS!
What?NoSQL is a new look at data to deliver:0 High Performance0 Unlimited horizontal scalability
0 Economic, common, unreliable hardware0 Auto Sharding
0Support for wide range of data0 Recursive, Hierarchical0 Non-Rigid
0High Availability
Non-relational next generation operational data stores and databases
What? (Continued…)0Partly or completely independent of
RDBMS concepts0No specific implementation
0 Breakthrough Approaches
Key:0Non-relational approach0Non-ACIDness
A STEP BACKWARDS, THEN MANY STEPS FORWARD
NoSQL, the ‘screwdriver’Yet another tool in our repository to go along with the
hammer
NoSQL is about choice
Not all problems are nails.Not all screws are same.
GOOD PROGRAMMING PRACTICE: Know your tools and use them appropriately
SQL Databases
0Data0 Relational0 Tabular –
Rows/Columns0 Interface
0 Sql0Basic Design
Inspiration0 Set Theory
0ACID Design0Scale Up Design
•Oracle•MySQL•Teradata•SQLite•SQL ServerAnd many more
Why? Is all data really relational? If Consistency is ensured, do we have to enforce/check it again at the database level.
Are RDBMS ready for challenges of the future like: Dynamic schema/metadata Huge amounts of data
Through horizontal auto scaling Ability to handle complex data types
Images, Videos, Audios and much more
Not Really!
Why? (Continued…)0RDBMS drawbacks:
0 Scalability0CRUD
0 Performance0Write Overhead0Limited by single disk architecture0Lack of In Memory design
0 Rigid schema designAnd more …..
HAMMERSAre under some
Hammering
DRAWBACKSEEPDIVE
Scalability0 True Scalability
0 Horizontal Scaling0 Transparency to the
application0 No single point of failure
0 Problems with SQL databases0 Vertical Scaling0 Partitioning aka
Sharding0 Read Slaves
0 Anti Patterns0 Normalized Data0 Joins0 ACID Transactions
No Breadcrumbs0CRUD is crude
0 Delete/Update strategy is improper0CRA!
0 Create, Read, Archive – way to go ahead
Audit information is lost in CRUD but not in the case of CRA
Naive Data Support0Not designed for
0 Complex Data Structures0Recursive0Hierarchical0Ordered List0Circular
0 Dynamic Metadata
Logical/Physical separation concerns
0Relational model -> Logical Model0RDBMS implement it at physical level
0 Using Multiple indices0Artificial overhead in managing the
database0Frequent drop and create index to make
DB perform
Spinning Disk Storage
0 Design flaw for most RDBMS systems0 With cheaper memory, Memory based approach
should also be included in the design
0 Defiance of Moore’s law0 Disk reads grew only 12.5 times in about 50 years0 Disk writes much lesser.
0 Disk write is expensive.0 RDBMS make things worse by writing more.
0 ACID rains are UNHEALTHY
Think ‘Out of the ROM’
At Snail’s pace0 RDBMS engine growth – SLOW
0 Optimizations have been minor since initial days
0 Majority of growth due to Moore’s law0 Faster hardware0 Slightly faster storage0 Faster memory
0 What when Moore’s law diminishes thanks to external factors like heat generated.
Database size limits
0RDBMS are too slow0 Over multiterabyte and petabyte
databases
0Purpose designed parallel processing would be needed to handle such capacities of data in a RDBMS.
RDBMS has been there since years
and is proven technology
What about
NoSQL
RDBMS grew fast but
growth slowed down over time and might eventually reach a stale point
NoSQLunarguably a new immature tool,
has been growing faster than RDBMS ever didand is being supported by the Big Players
Did you say
BIG PLAYERS!
WHO?
NoSQL Real World Implementations
Google – BigTable Facebook – Hbase Digg – Cassandra Amazon – Dynamo Trend Micro – Hbase Netflix – Amazon SimpleDB Shutterfly – MongoDB LinkedIn – Voldemortand more
Microsoft is considering NoSQL as well for Azure services so is Twitter
Are we next?
Major IT Companies have implemented or even better created their own NoSQL to manage huge Data stores which couldn’t be managed by SQL Databases.
We are used to SQL and relatedness,
why can’t they just fix RDBMS
to handle Big Data
Big Data can be handled via Scale Out/Partitionability across Multiple Nodes
STORAGE SEEK RATESLarge writes and ACID being a huge limitation
CAP TheoremApplies to distributed shared data system
CAP THEOREM
Consistency
PartitionabilityAvailability
A Deeper look0 Consistency: The system is in a consistent
state after an operation0 All clients see the same data0 Strong Consistency(ACID) vs. Eventual (BASE)
0 Availability: ‘Always On’ mode, no downtime0 All clients can find some available replica0 Software/hardware upgrade tolerance
0 Partition Tolerance: The system continues to function even when split into disconnected subsets (by a network disruption)0 Reads and Writes combined
Consistency
PartitionabilityAvailability
RDBMS
NoSQ
L
Paxos
CA- Single Site
Clusters
CP- Some data maybe
inaccessible but rest is accurate/consistent
- Sharded database- TERADATA comes
here
AP- System is still available under
partitioning but some of the data returned may be inaccurate
Atomicity
Consistency
Isolation
Durability
All of the operations in the transaction will complete, or none will.
The database will be in a consistent state when the transaction begins and ends.
The transaction will behave as if it is the only operation being performed upon the database.
Upon completion of the transaction, the operation will not be reversed.
Basically
Available
Soft State
Eventually Consistent
When Availability and Partitionability are prioritized over Consistency,
think in terms of BASE
Eventual Consistency0 If no new updates are made to the
object, eventually all accesses will return the last updated value.Ex: Domain Name System (DNS)
Types of Eventual Consistency
0Read-your-write consistency0 Session consistency
0Monotonic read consistency0Monotonic write consistency0Causal consistency
Practically, Read-your-write consistency and monotonic read consistency are desirable in an eventually consistent system
Hash()0Different Apps – Different CAP
requirement0 Prioritize among
0Consistency – Availability0Availability – Partitionability0Consistency - Partitionability
WHERE?So will NoSQL eventually replace RDBMSs everywhere?
No, RDBMS are there to stay.NoSQL is here to help.
Wherever you want to take
Advantageof
NoSQL
Big DataDenormalizeShardScale OutAnd look no further than NoSQL
Write Intensive Applications
I/OpS of the Best storage device <<< n * I/OpS of relatively cheaper storage devices in simple terms: ‘HARNESS THE POWER OF YOUR CLOUD’
Fast Key-Value AccessNoSQL – ‘User, you are looking for $value’
RDBMS – ‘Query executing ….’
A O(1) Hash operation or O(log n) B+/B tree traversals
Flexible Schema and Data types‘I once was a integer, then a string then a date; What am I’ - Field
RDBMS – ‘WTH! Whatever you are, You are beyond my scope’
Transient Data ~μs
~ms
Data – ‘I’m here only for a while and want to get my work done fast’
RDBMS – ‘You are data and you shall be treated like the rest’
NoSQL – ‘Okay, I’ll allot you space in the RAM using Memcached If available otherwise you still have my cloud’
High Write AvailabilityWarning - Incoming data ….
NoSQL – ‘Anytime you like, user’
RDBMS – ‘This is insane, I’m already busy with other things’
ECONOMICSRDBMS – ‘I’m powered by a wonderful, beautiful rabbit’
NoSQL – ‘I’m powered by many cute little hamsters’
No Single Point of Failure
0Designed to run over0 Economic0 Commonly Available0 Unreliable hardware
Full table scan operations
0MapReduce:0 Map:
0To define your problems into optimal sub problems which can be computed in parallel and reduced later
0 Reduce:0To merge the sub
optimal solutions into the result
Divide and Conquer your way to Victory
Powered by MapReduce! Or something similar
Ability to restore, maintain, repair itself
No DBA required Design
HOW?Let us welcome
Keys, Values, Collections, Data Structures, Objects, Documents Graphs
NoSQL ViewThe basic approach at data:0 Key/Value store0 Run on multiple machines0 Partitions and Replication across these
machines0 Relax consistency
0 Aim at Eventual Consistency0 Asynchronous replication
But not all NoSQL take the same path.
NoSQL
Key-Value Store
Document Store
BigTable Clones
Graph StoresMultivalue
Object
Tuble Store
Key-Value Stores0 One key, one value, no duplicates and crazy
fast0 Distributed hash tables
0 The value is stored as binary object – BLOB0 The DB doesn’t understand it and doesn’t
want to
Ex: Amazon Dynamo, MemcacheDB
Key/Value store doesn’t know what is in here
Key4
Key3
Key2
Key1
Document Store0Key-value store, but the value is
structured and understood by the DB0Querying data is possible
0 On not just the key
Ex: MongoDB, CouchDB, Riak etc
0Each database has collections0 Each collection has a set of documents
0They are well-designed for access through applications0 Suitable for web applications
0Few Document databases provide SQL Like query interface now
Name: $NameValue: $ValueVersion: $VersionType: $Type
Key4
Key3
Key2
Key1
Emb Object2
Emb Object1
Objects inside ObjectsCRAZY!
BigTable & its Clones0Database, tables, rows, columns and ’
SuperColumn’0Row consists of columns and
SuperColumns0 Few supercolumns can be made a must
0Each supercolumn – arbitrary set of columns
0Rows are typically versioned by a system assigned timestamp.
0 Intended for tables with huge number of columns0 Millions can also be supported very
easily
0 ‘a sparse, distributed multi-dimensional sorted map’
0Also referred to as Wide Column stores
0Ex: Google BigTable, Cassandra, Hbase, Voldemort, Azure Tables
Column1
Column2
Column3
SuperColumn1
Key1
Key2
Key3
1 2 3 4
1 2 3
1 2 3 4 5 6 7
Graph Databases0 Nodes, Edges, Properties
0 Replace traditional tables, columns, rows
0 Graph database can be implement in different ways0 Key/value store, columnar, bigtable clone or
even combination of these
0 Fields are used to directly store the id of another entity forming the edge
0Graph database is a multi-relational graph0 No need for secondary indexes
0 Relationships in RDBMS are ‘weak’0 Relationships in Graphs are ‘strong’0 The rest don’t really care about relations
at db level
Matt
April
Honda
Is related to
Drives
owns
Age: 32
SSN
Address
Mobile
City
Model
registration
Spouse
Complexity
Size
Key-Value Store
Document Store
BigTable Clone
Graph Databases
The Menu0 Document Store
0 CouchDB0 Lotus Notes0 MongoDB
0 Graph0 AllegroGraph0 Neo4j0 DEX
0 Tabular0 BigTable0 Hbase0 HyperTable
0 On Disk0 BigTable0 Membase0 Tokyo Cabinet
0 In RAM0 Memcached0 Velocity
0 Eventually Consistent
0 Cassandra0 Dynamo0 Riak
0Hierarchical0 GT.M
0Ordered0 Berkeley DB0 NMDB0 C-ISAM
0 Multivalue0 eXe0 OpenQM
The list isn’t even a quarter of the whole
_theOpenSourceIssue0 Most of them are open source
0 Thus fork-able like Linux
0 The first of the lot0 Google’s BigTable0 Amazon’s Dynamo
0 All in all, there are about 10 roots with 4 major ones.
No single database to rule them all
MongoDB
0 Document Store0 JSON Storage0 REST ….. Not out of the box0 Map/Reduce0 Master slave replication0 Strong suite of query APIs0 Good support for SQL
0 Work in Progress:0 Autosharding based
scalability0 Failover support
Open SourceNon RelationalScalableSchemalessQueryable
Document Oriented0Mongo stores documents in collections
0 Documents are slightly enhanced JSON Objects
0 Complex data structures is very much possible
0Data Modelling is a more natural process
Embeddable Objects0 Complexity.begin()
0 Embed objects within a single document0 Document is an enhanced form of object like
mentioned earlier0 The same thing in RDBMS can be achieved using
multiple tables and joining them together
0 Consider our requirement is to store a blogging post with this information
0 Post Content0 Post Title0 Post Author 0 Comments
0 Comment order0 Comment content0 Comment author
RDBMS solutionPostID
PostName
PostContent
PostAuthor
PostID
CommentID
Author
Content
PreCommentID
MongoDB Solution0 Documents …. Each one of them is a post{ Name: $name,
Author: $author, Comment: [ { Author: $author1,
Comment: $comment1} , { Author: $author2,Comment: $comment2,Replies: [ { Author: $author3,Comment: $comment3} ] } ]
}
ID
Name
Author
Content
Comments
01
$name
$author
$content
Author Comment Replies
$author1
$comment1 NULL
$author2
$comment2Author
Comment
Replies
Auth3 comm3 NULL
RDBMS Viewpoint
ODFMongodb’ed
HEADER LEVEL
TIE LEVEL
LINE LEVEL
Schema-less0No database enforced Schema
0 Addition, Deletion of columns are simple
0 Its about how the application uses APIs
0Data definition need not be defined up front.
Other Features0Data Tagging0Caching0Real Time Analytics0 Image Storage0Dynamic Queries0Binary Storage
Try MongoDB @
http://try.mongodb.org/
EOL
\n
Calm down!Eventually Answered System
All your questions will be answered eventually