Upload
patrick-mcfadin
View
10.825
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
The data model is dead, long live the data model!!Patrick McFadinSenior Solutions ArchitectDataStax
Thursday, May 2, 13
The data model is dead, long live the data model!!Patrick McFadinSenior Solutions ArchitectDataStax
Thursday, May 2, 13
Bridging the divide
The era of relational everything is over
The era of Polyglot Persistence* has begun
* http://www.martinfowler.com/bliki/PolyglotPersistence.html
Thursday, May 2, 13
Coming from a relational world
Tradeoffs are hard
Feature RDBMS Cassandra
Single Point of Failure
Cross Datacenter
Linear Scaling
Data modeling
Thursday, May 2, 13
Background - The data model
• The data model is alive and well• Models define the business requirements• Define of the structure of your data• Relational is just one type (Network model anyone?)
4
Wait? I thought NoSQL meant no model?
Thursday, May 2, 13
Background - ACID vs CAP
5
ACID
CAP - Pick two
Atomic - All or noneConsistency - Only valid data is writtenIsolation - One operation at a timeDurability - Once committed, it stays that way
Consistency - All data on clusterAvailability - Cluster always accepts writesPartition tolerance - Nodes in cluster can’t talk to each other
Thursday, May 2, 13
Background - ACID vs CAP
5
ACID
CAP - Pick two
Atomic - All or noneConsistency - Only valid data is writtenIsolation - One operation at a timeDurability - Once committed, it stays that way
Consistency - All data on clusterAvailability - Cluster always accepts writesPartition tolerance - Nodes in cluster can’t talk to each other
Thursday, May 2, 13
Background - ACID vs CAP
5
ACID
CAP - Pick two
Atomic - All or noneConsistency - Only valid data is writtenIsolation - One operation at a timeDurability - Once committed, it stays that way
Consistency - All data on clusterAvailability - Cluster always accepts writesPartition tolerance - Nodes in cluster can’t talk to each other
Cassandra let’s you tune this
Thursday, May 2, 13
Relational Background - Normal forms
• This IS the relational model• 5 normal forms• Need foreign keys• Need joins
6
id First Last
1 Edgar Codd
2 Raymond Boyce
id Dept
1 Engineering
2 Math
Employees
Department
Thursday, May 2, 13
Background - How Cassandra Stores Data
• Model brought from big table*• Row Key and a lot of columns• Column names sorted (UTF8, Int, Timestamp, etc)
7
Column Name ... Column Name
Column Value Column Value
Timestamp Timestamp
TTL TTL
Row Key
1 2 Billion
* http://research.google.com/archive/bigtable.html
Thursday, May 2, 13
Background - How Cassandra Stores Data
• Rows belong to a node and are replicated• Row lookups are fast• Randomly distributed in cluster
8
RowKey1
RowKey2
RowKey3
RowKey4RowKey5
RowKey6
RowKey7
RowKey8
RowKey9
RowKey10
RowKey11
RowKey12
Lookup5RowKey5
Thursday, May 2, 13
Relational Concept - Sequences
• Handy feature for auto-creation of Ids• Guaranteed unique• Depends on a single source of truth (one server)
9
INSERT INTO user (id, firstName, LastName)VALUES (seq.nextVal(), ‘Ted’, ‘Codd’)
Thursday, May 2, 13
Cassandra Concept - No sequences
• Difficult in a distributed system• Requires a lock (perf killer)• What to do?
- Use part of the data to create a unique index, or...- UUID to the rescue!
10
Thursday, May 2, 13
Concept - UUID
• Universal Unique ID• 128 bit number represented in character form• Easily generated on the client• Same as GUID for the MS folks
11
99051fe9-6a9c-46c2-b949-38ef78858dd0
RFC 4122 if you want a reference
Thursday, May 2, 13
Cassandra Concept - Entity model
• User table (!!)• Username is the unique key• Static but can be changed dynamically without downtime
12
CREATE TABLE users ( username varchar, firstname varchar, lastname varchar, email varchar, password varchar, created_date timestamp, PRIMARY KEY (username));
ALTER TABLE users ADD city text;
Thursday, May 2, 13
Relational Concept - De-normalization
• To combine relations into a single row• Used in relational modeling to avoid complex joins
13
id First Last
1 Edgar Codd
2 Raymond Boyce
id Dept
1 Engineering
2 Math
Employees
Department
SELECT e.First, e.Last, d.DeptFROM Department d, Employees eWHERE 1 = e.idAND e.id = d.id
Take this and then...
Thursday, May 2, 13
Relational Concept - De-normalization
• Combine table columns into a single view• No joins• All in how you set the data for fast reads
14
SELECT First, Last, DeptFROM employeesWHERE id = ‘1’
id First Last Dept
1 Edgar Codd Engineering
2 Raymond Boyce Math
Employees
Thursday, May 2, 13
Cassandra Concept - One-to-Many
• Relationship without being relational• Users have many videos• Wait? Where is the foreign key?
15
username firstname lastname email
tcodd Edgar Codd [email protected]
rboyce Raymond Boyce [email protected]
videoid videoname username description tags
99051fe9 My funny cat tcodd My cat plays the piano cats,piano,lolb3a76c6b Math tcodd Now my dog plays dogs,piano,lol
Users
Videos
Thursday, May 2, 13
Cassandra Concept - One-to-many
• Static table to store videos• UUID for unique video id• Add username to denormalize
16
CREATE TABLE videos ( videoid uuid, videoname varchar, username varchar, description varchar, tags varchar, upload_date timestamp, PRIMARY KEY(videoid));
Thursday, May 2, 13
Cassandra Concept - One-to-Many
• Lookup video by username• Write in two tables at once for fast lookups
17
CREATE TABLE username_video_index ( username varchar, videoid uuid, upload_date timestamp, video_name varchar, PRIMARY KEY (username, videoid));
SELECT video_nameFROM username_video_indexWHERE username = ‘ctodd’AND videoid = ‘99051fe9’
Creates a wide row!
Thursday, May 2, 13
Cassandra concept - Many-to-many
• Users and videos have many comments
18
username firstname lastname email
tcodd Edgar Codd [email protected]
rboyce Raymond Boyce [email protected]
videoid videoname username description tags
99051fe9 My funny cat tcodd My cat plays the piano cats,piano,lolb3a76c6b Math tcodd Now my dog plays dogs,piano,lol
Users
Videos
username videoid comment
tcodd 99051fe9 Sweet!
rboyce b3a76c6b Boring :(
Comments
Thursday, May 2, 13
Cassandra concept - Many-to-many
• Model both sides of the view• Insert both when comment is created• View from either side
19
CREATE TABLE comments_by_video ( videoid uuid, username varchar, comment_ts timestamp, comment varchar, PRIMARY KEY (videoid,username));
CREATE TABLE comments_by_user ( username varchar, videoid uuid, comment_ts timestamp, comment varchar, PRIMARY KEY (username,videoid));
Thursday, May 2, 13
Cassandra concept - Many-to-many
• Model both sides of the view• Insert both when comment is created• View from either side
19
CREATE TABLE comments_by_video ( videoid uuid, username varchar, comment_ts timestamp, comment varchar, PRIMARY KEY (videoid,username));
CREATE TABLE comments_by_user ( username varchar, videoid uuid, comment_ts timestamp, comment varchar, PRIMARY KEY (username,videoid));
Don’t be afraid of writes. Bring it!
Thursday, May 2, 13
Relational Concept - Transactions
• Built in and easy to use• Can be slow and heavy so don’t use them all the time• Normal forms force ACID writes into many tables
20
lock -change table one -change table two -change table threecommit
-or-
lock -change table one -change table two -change table threerollback
Thursday, May 2, 13
Crazy Concept - Do you need a transaction?
• Since they were easy in RDBMS, was it just default?• Read this article• In a nutshell,
21
http://www.eaipatterns.com/docs/IEEE_Software_Design_2PC.pdf
Asynchronous transaction Cashier takes your money Barista makes your coffee Error? Barista deals with it
Thursday, May 2, 13
Cassandra Concept - Transaction quality
• Requires a lock, which is costly in distributed systems• Cassandra features can be used to advantage
- Row level isolation- Atomic batches
22
Thursday, May 2, 13
Cassandra Concept - Transaction
• Track that something happened• Use time stamps to preserve order• Rectify when any doubt (just like banks do)
23
CREATE TABLE credit_transaction ( username varchar, type varchar, datetime timestamp, credits int, PRIMARY KEY (username,datetime,type)) WITH CLUSTERING ORDER BY (datetime DESC, type ASC);
Create this table
Sort the columns in reverse orderso last action is first on the list
Thursday, May 2, 13
Cassandra Concept - Transaction
• All transactions are stored• Think RPN calculator, latest first
24
ADD:2013-04-25 21:10:32.745
REMOVE:2013-04-25 15:45:22.813
ADD:2013-04-25 07:15:12.542
$20 $5 $100tcodd
Rectify account: + $100- $5+ 20---------= $115 Current balance
Thursday, May 2, 13
Cassandra Concept - Transaction
25
Create credit_transaction record with ADD + Timestamp
Read user record total_credits and credit_timestamp
user credit_timestamp <credit_transaction
timestamp?
Set back in user recordcredit_timestamp and incremented total_credits
Create credit_transaction record with REMOVE + Timestamp
Read user record total_credits and credit_timestamp
user credit_timestamp <credit_transaction
timestamp?
Set back in user recordcredit_timestamp and decremented total_credits
Fail transaction and rectify
Success
Add Credit Remove credit
Thursday, May 2, 13
And if that doesn’t work...
• Lightweight transactions coming soon.• Cassandra 2.0• See CASSANDRA-5062
26
Thursday, May 2, 13
But wait there is more!!
• The next in this series: May 16th
27
Become a super modeler
• Final will be at the Cassandra Summit: June 11th
The worlds next top data model
Thursday, May 2, 13
Be there!!!
28
Sony, eBay, Netflix, Intuit, Spotify... the list goes on. Don’t miss it.
Thursday, May 2, 13
Thank You
Q&A
Thursday, May 2, 13