17
The Whys of NoSQL

DataStax: The Whys of NoSQL

Embed Size (px)

Citation preview

The Whys of NoSQL

1 Jargon Galore

2 Schema

3 Modeling and Internals

4 Deployment

5 Conclusion

2 © 2015. All Rights Reserved.

©2015 DataStax Confidential. Do not distribute without consent. 3

SQL Jargon

©2015 DataStax Confidential. Do not distribute without consent. 4

NoSQL Noise?

Schema

©2015 DataStax Confidential. Do not distribute without consent.

Rigid Schema

Schema Free

Schema on read

Schema Easy to change

In flexible Writes are schema free, reads are freaking slow

Reads/Writes are schema aware Schema changes are O(1) operations

BLOBs

Too Slow

Optimized for Agility of change when needed, not theoretical extremes

©2015 DataStax Confidential. Do not distribute without consent. 6

Normalization, Joins, Referential Integrity

Database normalization is the process of organizing the columns (attributes) and tables (relations) of a relational database to minimize data redundancy.

Referential integrity is a property of data which, when satisfied, requires every value of one column of a table to exist as a value of another column in a different table. A JOIN is a means for combining

fields from two tables (or more) by using values common to each.

Source - https://en.wikipedia.org/

©2015 DataStax Confidential. Do not distribute without consent. 7

Not all Data Access is equal

1:168K random vs. sequential

1:10 random vs. sequential

Source - https://queue.acm.org/detail.cfm?id=1563874

©2015 DataStax Confidential. Do not distribute without consent. 8

Disk Density

Source http://silvertonconsulting.com/blog/2010/04/22/save-the-planet-buy-fatter-disks-and-flash/#sthash.sh2nwqtX.dpbs

©2015 DataStax Confidential. Do not distribute without consent. 9

$0.01

$0.10

$1.00

$10.00

$100.00

$1,000.00

$10,000.00

$100,000.00

$1,000,000.00

2014 2013 2010 2005 2000 1995 1990 1985 1980

HDD Price / GB Minimize Data Redundancy?

Disk Price / GB

OS Cache

C* Read and Write paths

©2015 DataStax Confidential. Do not distribute without consent.

Memtable 1 Memtable 2 Memtable N

SSTable 1 SSTable 2 SSTable N

Commit Log

Persistent Storage

Off Heap

In Process Memory

Reads (memtable + N SSTables where N >= 1)

Mandatory Flush

Writes

Max # of SSTables = N (based on compaction)

Creation of new memtable during flush operation (cleanup tombstones, cleanup token ranges, etc.)

Time (memtable_flush_in_ms controls the frequency)

Accounting

SSTable Compacted

RANDOM ACCESS

SEQUENTIAL ACCESS

Execution Engine

©2015 DataStax Confidential. Do not distribute without consent.

Key takeaways

©2015 DataStax Confidential. Do not distribute without consent.

Optimal utilization of physical resources (random access, sequential IO and CPU) No Read before Write (well mostly!) Plan for Compaction (like commercial paper, you need a regular pay back) De-Normalize for optimal application response (use 2NF instead of 3NF)

Deployment Semantics

©2014 DataStax Confidential. Do not distribute without consent.

R/W R

Single Box DR GR

Sca

le U

p by

. S

hard

ing

Replication

GR + DR

San Francisco

New York

Stockholm

DC1 DC2

Linear Scaling

©2015 DataStax Confidential. Do not distribute without consent.

http://www.datastax.com/apache-cassandra-leads-nosql-benchmark

End Point Report Excerpt: Balanced Read/Write YCSB Test

So what's the catch?

©2015 DataStax Confidential. Do not distribute without consent.

©2015 DataStax Confidential. Do not distribute without consent. 16

Conclusion Best in class performance, backed by physics

Enables pragmatic business agility, Delivering delightful customer experience, Always on, Linear Scale architecture delivering optimal ROI

Thank you