1 Jargon Galore
2 Schema
3 Modeling and Internals
4 Deployment
5 Conclusion
2 © 2015. All Rights Reserved.
Schema
©2015 DataStax Confidential. Do not distribute without consent.
Rigid Schema
Schema Free
Schema on read
Schema Easy to change
In flexible Writes are schema free, reads are freaking slow
Reads/Writes are schema aware Schema changes are O(1) operations
BLOBs
Too Slow
Optimized for Agility of change when needed, not theoretical extremes
©2015 DataStax Confidential. Do not distribute without consent. 6
Normalization, Joins, Referential Integrity
Database normalization is the process of organizing the columns (attributes) and tables (relations) of a relational database to minimize data redundancy.
Referential integrity is a property of data which, when satisfied, requires every value of one column of a table to exist as a value of another column in a different table. A JOIN is a means for combining
fields from two tables (or more) by using values common to each.
Source - https://en.wikipedia.org/
©2015 DataStax Confidential. Do not distribute without consent. 7
Not all Data Access is equal
1:168K random vs. sequential
1:10 random vs. sequential
Source - https://queue.acm.org/detail.cfm?id=1563874
©2015 DataStax Confidential. Do not distribute without consent. 8
Disk Density
Source http://silvertonconsulting.com/blog/2010/04/22/save-the-planet-buy-fatter-disks-and-flash/#sthash.sh2nwqtX.dpbs
©2015 DataStax Confidential. Do not distribute without consent. 9
$0.01
$0.10
$1.00
$10.00
$100.00
$1,000.00
$10,000.00
$100,000.00
$1,000,000.00
2014 2013 2010 2005 2000 1995 1990 1985 1980
HDD Price / GB Minimize Data Redundancy?
Disk Price / GB
OS Cache
C* Read and Write paths
©2015 DataStax Confidential. Do not distribute without consent.
Memtable 1 Memtable 2 Memtable N
SSTable 1 SSTable 2 SSTable N
Commit Log
Persistent Storage
Off Heap
In Process Memory
Reads (memtable + N SSTables where N >= 1)
Mandatory Flush
Writes
Max # of SSTables = N (based on compaction)
Creation of new memtable during flush operation (cleanup tombstones, cleanup token ranges, etc.)
Time (memtable_flush_in_ms controls the frequency)
Accounting
SSTable Compacted
RANDOM ACCESS
SEQUENTIAL ACCESS
Key takeaways
©2015 DataStax Confidential. Do not distribute without consent.
Optimal utilization of physical resources (random access, sequential IO and CPU) No Read before Write (well mostly!) Plan for Compaction (like commercial paper, you need a regular pay back) De-Normalize for optimal application response (use 2NF instead of 3NF)
Deployment Semantics
©2014 DataStax Confidential. Do not distribute without consent.
R/W R
Single Box DR GR
Sca
le U
p by
. S
hard
ing
Replication
GR + DR
San Francisco
New York
Stockholm
DC1 DC2
Linear Scaling
©2015 DataStax Confidential. Do not distribute without consent.
http://www.datastax.com/apache-cassandra-leads-nosql-benchmark
End Point Report Excerpt: Balanced Read/Write YCSB Test
©2015 DataStax Confidential. Do not distribute without consent. 16
Conclusion Best in class performance, backed by physics
Enables pragmatic business agility, Delivering delightful customer experience, Always on, Linear Scale architecture delivering optimal ROI