Upload
patrick-mcfadin
View
5.261
Download
2
Tags:
Embed Size (px)
DESCRIPTION
My
Citation preview
#CASSANDRA13
Patrick McFadin | Solution Architect, DataStax
The World's Next Top Data Model
Monday, June 24, 13
#CASSANDRA13
The saga continues!★ Data model is dead, long live the data
model.★ Bridging from Relational to Cassandra
★ Become a Super Modeler★ Core data modeling techniques using
CQL
Monday, June 24, 13
#CASSANDRA13
Because I love talking about this
Just to recap...
Monday, June 24, 13
#CASSANDRA13
Why does this matter?* Cassandra lives closer to your users or applications
* Not a hammer for all use case nails
* Proper use case, proper model...
* Get it wrong and...
Monday, June 24, 13
#CASSANDRA13
When to use Cassandra*
* Need to be in more than one datacenter. active-active
* Scaling from 0 to, uh, well... we’re not really sure.
* Need as close to 100% uptime as possible.
* Getting these from any other solution would just be mega $
and...
*nutshell version. These are all ORs not ANDs
Monday, June 24, 13
#CASSANDRA13
You get the data model right!
Monday, June 24, 13
#CASSANDRA13
So let’s do that* Four real world examples
* Use case, what they were avoiding and model to accomplish
* You may think this is you, but it isn’t. I hear these all the time.
* All examples are in CQL3
Monday, June 24, 13
#CASSANDRA13
But wait you say
CQL doesn’t do dynamic wide rows!
Monday, June 24, 13
#CASSANDRA13
Yes it can!* CQL does wide rows the same way you did them in Thrift
* No really
* Read this blog post
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows
...or just trust me and I’ll show you how
Monday, June 24, 13
#CASSANDRA13
Customers giving you money is a good reason for uptime
Shopping Cart Data Model
Monday, June 24, 13
#CASSANDRA13
Shopping cart use case* Store shopping cart data reliably
* Minimize (or eliminate) downtime. Multi-dc
* Scale for the “Cyber Monday” problem
* Every minute off-line is lost $$
* Online shoppers want speed!
The bad
Monday, June 24, 13
#CASSANDRA13
Shopping cart data model* Each customer can have one or more shopping carts
* De-normalize data for fast access
* Shopping cart == One partition (Row Level Isolation)
* Each item a new column
Monday, June 24, 13
#CASSANDRA13
Shopping cart data modelCREATE TABLE user (! username varchar,! firstname varchar,! lastname varchar,! shopping_carts set<varchar>,! PRIMARY KEY (username));
CREATE TABLE shopping_cart (! username varchar,! cart_name text! item_id int,! item_name varchar,
description varchar,! price float,! item_detail map<varchar,varchar>! PRIMARY KEY ((username,cart_name),item_id));
INSERT INTO shopping_cart (username,cart_name,item_id,item_name,description,price,item_detail)VALUES ('pmcfadin','Gadgets I want',8675309,'Garmin 910XT','Multisport training watch',349.99,{'Related':'Timex sports watch','Volume Discount':'10'});
INSERT INTO shopping_cart (username,cart_name,item_id,item_name,description,price,item_detail)VALUES ('pmcfadin','Gadgets I want',9748575,'Polaris Foot Pod','Bluetooth Smart foot pod',64.00{'Related':'Timex foot pod','Volume Discount':'25'});
One partition (storage row) of data
Item details. Flexible for whatev
Partition row key for one users cart
Creates partition row key
Monday, June 24, 13
#CASSANDRA13
Watching users, making decisions. Freaky, but cool.
User Activity Tracking
Monday, June 24, 13
#CASSANDRA13
User activity use case* React to user input in real time
* Support for multiple application pods
* Scale for speed
* Losing interactions is costly
* Waiting for batch(hadoop) is to long
The bad
Monday, June 24, 13
#CASSANDRA13
User activity data model* Interaction points stored per user in short table
* Long term interaction stored in similar table with date partition
* Process long term later using batch
* Reverse time series to get last N items
Monday, June 24, 13
#CASSANDRA13
User activity data modelCREATE TABLE user_activity (! username varchar,! interaction_time timeuuid,! activity_code varchar,! detail varchar,! PRIMARY KEY (username, interaction_time)) WITH CLUSTERING ORDER BY (interaction_time DESC);
CREATE TABLE user_activity_history (! username varchar,! interaction_date varchar,! interaction_time timeuuid,! activity_code varchar,! detail varchar,! PRIMARY KEY ((username,interaction_date),interaction_time));
INSERT INTO user_activity (username,interaction_time,activity_code,detail)VALUES ('pmcfadin',0D1454E0-D202-11E2-8B8B-0800200C9A66,'100','Normal login')USING TTL 2592000;
INSERT INTO user_activity_history (username,interaction_date,interaction_time,activity_code,detail)VALUES ('pmcfadin','20130605',0D1454E0-D202-11E2-8B8B-0800200C9A66,'100','Normal login');
Reverse order based on timestamp
Expire after 30 days
Monday, June 24, 13
#CASSANDRA13
Data model usage
username | interaction_time | detail | activity_code----------+--------------------------------------+------------------------------------------+------------------ pmcfadin | 9ccc9df0-d076-11e2-923e-5d8390e664ec | Entered shopping area: Jewelry | 301 pmcfadin | 9c652990-d076-11e2-923e-5d8390e664ec | Created shopping cart: Anniversary gifts | 202 pmcfadin | 1b5cef90-d076-11e2-923e-5d8390e664ec | Deleted shopping cart: Gadgets I want | 205 pmcfadin | 1b0e5a60-d076-11e2-923e-5d8390e664ec | Opened shopping cart: Gadgets I want | 201 pmcfadin | 1b0be960-d076-11e2-923e-5d8390e664ec | Normal login | 100
select * from user_activity limit 5;
Maybe put a sale item for flowers too?
Monday, June 24, 13
#CASSANDRA13
Machines generate logs at a furious pace. Be ready.
Log collection/aggregation
Monday, June 24, 13
#CASSANDRA13
Log collection use case* Collect log data at high speed
* Cassandra near where logs are generated. Multi-datacenter
* Dice data for various uses. Dashboard. Lookup. Etc.
* The scale needed for RDBMS is cost prohibitive
* Batch analysis of logs too late for some use cases
The bad
Monday, June 24, 13
#CASSANDRA13
Log collection data model* Use Flume to collect and fan out data to various tables
* Tables for lookup based on source and time
* Tables for dashboard with aggregation and summation
Monday, June 24, 13
#CASSANDRA13
Log collection data model
CREATE TABLE log_lookup (! source varchar,! date_to_minute varchar,! timestamp timeuuid,! raw_log blob,! PRIMARY KEY ((source,date_to_minute),timestamp));
CREATE TABLE login_success (! source varchar,! date_to_minute varchar, ! successful_logins counter, ! PRIMARY KEY (source,date_to_minute)) WITH CLUSTERING ORDER BY (date_to_minute DESC);
CREATE TABLE login_failure (! source varchar,! date_to_minute varchar, ! failed_logins counter, ! PRIMARY KEY (source,date_to_minute)) WITH CLUSTERING ORDER BY (date_to_minute DESC);
Consider storing raw logs as GZIP
Monday, June 24, 13
#CASSANDRA13
Log dashboard
0
25
50
75
100
10:01 10:03 10:05 10:07 10:09 10:11 10:13 10:15 10:17 10:19
Sucessful LoginsFailed Logins
SELECT date_to_minute,successful_loginsFROM login_successLIMIT 20;
SELECT date_to_minute,failed_loginsFROM login_failureLIMIT 20;
Monday, June 24, 13
#CASSANDRA13
Because mistaks mistakes happen
User Form Versioning
Monday, June 24, 13
#CASSANDRA13
Form versioning use case* Store every possible version efficiently
* Scale to any number of users
* Commit/Rollback functionality on a form
* In RDBMS, many relations that need complicated join
* Needs to be in cloud and local data center
The bad
Monday, June 24, 13
#CASSANDRA13
Form version data model* Each user has a form
* Each form needs versioning
* Separate table to store live version
* Exclusive lock on a form
Monday, June 24, 13
#CASSANDRA13
Form version data model
CREATE TABLE working_version (! username varchar,! form_id int,! version_number int,! locked_by varchar,! form_attributes map<varchar,varchar> ! PRIMARY KEY ((username, form_id), version_number)) WITH CLUSTERING ORDER BY (version_number DESC);
INSERT INTO working_version (username, form_id, version_number, locked_by, form_attributes)VALUES ('pmcfadin',1138,1,'',{'FirstName<text>':'First Name: ','LastName<text>':'Last Name: ','EmailAddress<text>':'Email Address: ','Newsletter<radio>':'Y,N'});
UPDATE working_version SET locked_by = 'pmcfadin'WHERE username = 'pmcfadin'AND form_id = 1138AND version_number = 1;
INSERT INTO working_version (username, form_id, version_number, locked_by, form_attributes)VALUES ('pmcfadin',1138,2,null,{'FirstName<text>':'First Name: ','LastName<text>':'Last Name: ','EmailAddress<text>':'Email Address: ','Newsletter<checkbox>':'Y'});
1. Insert first version
2. Lock for one user
3. Insert new version. Release lock
Monday, June 24, 13
#CASSANDRA13
That’s it!
“Mind what you have learned. Save you it can.”
- Yoda. Master Data Modeler
Monday, June 24, 13
#CASSANDRA13
Your data model is next!* Try out a few things
* See what works
* All else fails, engage an expert in the community
* Want more? Follow me on twitter: @PatrickMcFadin
Monday, June 24, 13