33
The data model is dead, long live the data model!! Patrick McFadin Senior Solutions Architect DataStax Thursday, May 2, 13

The data model is dead, long live the data model

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: The data model is dead, long live the data model

The data model is dead, long live the data model!!Patrick McFadinSenior Solutions ArchitectDataStax

Thursday, May 2, 13

Page 2: The data model is dead, long live the data model

The data model is dead, long live the data model!!Patrick McFadinSenior Solutions ArchitectDataStax

Thursday, May 2, 13

Page 3: The data model is dead, long live the data model

Bridging the divide

The era of relational everything is over

The era of Polyglot Persistence* has begun

* http://www.martinfowler.com/bliki/PolyglotPersistence.html

Thursday, May 2, 13

Page 4: The data model is dead, long live the data model

Coming from a relational world

Tradeoffs are hard

Feature RDBMS Cassandra

Single Point of Failure

Cross Datacenter

Linear Scaling

Data modeling

Thursday, May 2, 13

Page 5: The data model is dead, long live the data model

Background - The data model

• The data model is alive and well• Models define the business requirements• Define of the structure of your data• Relational is just one type (Network model anyone?)

4

Wait? I thought NoSQL meant no model?

Thursday, May 2, 13

Page 6: The data model is dead, long live the data model

Background - ACID vs CAP

5

ACID

CAP - Pick two

Atomic - All or noneConsistency - Only valid data is writtenIsolation - One operation at a timeDurability - Once committed, it stays that way

Consistency - All data on clusterAvailability - Cluster always accepts writesPartition tolerance - Nodes in cluster can’t talk to each other

Thursday, May 2, 13

Page 7: The data model is dead, long live the data model

Background - ACID vs CAP

5

ACID

CAP - Pick two

Atomic - All or noneConsistency - Only valid data is writtenIsolation - One operation at a timeDurability - Once committed, it stays that way

Consistency - All data on clusterAvailability - Cluster always accepts writesPartition tolerance - Nodes in cluster can’t talk to each other

Thursday, May 2, 13

Page 8: The data model is dead, long live the data model

Background - ACID vs CAP

5

ACID

CAP - Pick two

Atomic - All or noneConsistency - Only valid data is writtenIsolation - One operation at a timeDurability - Once committed, it stays that way

Consistency - All data on clusterAvailability - Cluster always accepts writesPartition tolerance - Nodes in cluster can’t talk to each other

Cassandra let’s you tune this

Thursday, May 2, 13

Page 9: The data model is dead, long live the data model

Relational Background - Normal forms

• This IS the relational model• 5 normal forms• Need foreign keys• Need joins

6

id First Last

1 Edgar Codd

2 Raymond Boyce

id Dept

1 Engineering

2 Math

Employees

Department

Thursday, May 2, 13

Page 10: The data model is dead, long live the data model

Background - How Cassandra Stores Data

• Model brought from big table*• Row Key and a lot of columns• Column names sorted (UTF8, Int, Timestamp, etc)

7

Column Name ... Column Name

Column Value Column Value

Timestamp Timestamp

TTL TTL

Row Key

1 2 Billion

* http://research.google.com/archive/bigtable.html

Thursday, May 2, 13

Page 11: The data model is dead, long live the data model

Background - How Cassandra Stores Data

• Rows belong to a node and are replicated• Row lookups are fast• Randomly distributed in cluster

8

RowKey1

RowKey2

RowKey3

RowKey4RowKey5

RowKey6

RowKey7

RowKey8

RowKey9

RowKey10

RowKey11

RowKey12

Lookup5RowKey5

Thursday, May 2, 13

Page 12: The data model is dead, long live the data model

Relational Concept - Sequences

• Handy feature for auto-creation of Ids• Guaranteed unique• Depends on a single source of truth (one server)

9

INSERT INTO user (id, firstName, LastName)VALUES (seq.nextVal(), ‘Ted’, ‘Codd’)

Thursday, May 2, 13

Page 13: The data model is dead, long live the data model

Cassandra Concept - No sequences

• Difficult in a distributed system• Requires a lock (perf killer)• What to do?

- Use part of the data to create a unique index, or...- UUID to the rescue!

10

Thursday, May 2, 13

Page 14: The data model is dead, long live the data model

Concept - UUID

• Universal Unique ID• 128 bit number represented in character form• Easily generated on the client• Same as GUID for the MS folks

11

99051fe9-6a9c-46c2-b949-38ef78858dd0

RFC 4122 if you want a reference

Thursday, May 2, 13

Page 15: The data model is dead, long live the data model

Cassandra Concept - Entity model

• User table (!!)• Username is the unique key• Static but can be changed dynamically without downtime

12

CREATE TABLE users ( username varchar, firstname varchar, lastname varchar, email varchar, password varchar, created_date timestamp, PRIMARY KEY (username));

ALTER TABLE users ADD city text;

Thursday, May 2, 13

Page 16: The data model is dead, long live the data model

Relational Concept - De-normalization

• To combine relations into a single row• Used in relational modeling to avoid complex joins

13

id First Last

1 Edgar Codd

2 Raymond Boyce

id Dept

1 Engineering

2 Math

Employees

Department

SELECT e.First, e.Last, d.DeptFROM Department d, Employees eWHERE 1 = e.idAND e.id = d.id

Take this and then...

Thursday, May 2, 13

Page 17: The data model is dead, long live the data model

Relational Concept - De-normalization

• Combine table columns into a single view• No joins• All in how you set the data for fast reads

14

SELECT First, Last, DeptFROM employeesWHERE id = ‘1’

id First Last Dept

1 Edgar Codd Engineering

2 Raymond Boyce Math

Employees

Thursday, May 2, 13

Page 18: The data model is dead, long live the data model

Cassandra Concept - One-to-Many

• Relationship without being relational• Users have many videos• Wait? Where is the foreign key?

15

username firstname lastname email

tcodd Edgar Codd [email protected]

rboyce Raymond Boyce [email protected]

videoid videoname username description tags

99051fe9 My funny cat tcodd My cat plays the piano cats,piano,lolb3a76c6b Math tcodd Now my dog plays dogs,piano,lol

Users

Videos

Thursday, May 2, 13

Page 19: The data model is dead, long live the data model

Cassandra Concept - One-to-many

• Static table to store videos• UUID for unique video id• Add username to denormalize

16

CREATE TABLE videos ( videoid uuid, videoname varchar, username varchar, description varchar, tags varchar, upload_date timestamp, PRIMARY KEY(videoid));

Thursday, May 2, 13

Page 20: The data model is dead, long live the data model

Cassandra Concept - One-to-Many

• Lookup video by username• Write in two tables at once for fast lookups

17

CREATE TABLE username_video_index ( username varchar, videoid uuid, upload_date timestamp, video_name varchar, PRIMARY KEY (username, videoid));

SELECT video_nameFROM username_video_indexWHERE username = ‘ctodd’AND videoid = ‘99051fe9’

Creates a wide row!

Thursday, May 2, 13

Page 21: The data model is dead, long live the data model

Cassandra concept - Many-to-many

• Users and videos have many comments

18

username firstname lastname email

tcodd Edgar Codd [email protected]

rboyce Raymond Boyce [email protected]

videoid videoname username description tags

99051fe9 My funny cat tcodd My cat plays the piano cats,piano,lolb3a76c6b Math tcodd Now my dog plays dogs,piano,lol

Users

Videos

username videoid comment

tcodd 99051fe9 Sweet!

rboyce b3a76c6b Boring :(

Comments

Thursday, May 2, 13

Page 22: The data model is dead, long live the data model

Cassandra concept - Many-to-many

• Model both sides of the view• Insert both when comment is created• View from either side

19

CREATE TABLE comments_by_video ( videoid uuid, username varchar, comment_ts timestamp, comment varchar, PRIMARY KEY (videoid,username));

CREATE TABLE comments_by_user ( username varchar, videoid uuid, comment_ts timestamp, comment varchar, PRIMARY KEY (username,videoid));

Thursday, May 2, 13

Page 23: The data model is dead, long live the data model

Cassandra concept - Many-to-many

• Model both sides of the view• Insert both when comment is created• View from either side

19

CREATE TABLE comments_by_video ( videoid uuid, username varchar, comment_ts timestamp, comment varchar, PRIMARY KEY (videoid,username));

CREATE TABLE comments_by_user ( username varchar, videoid uuid, comment_ts timestamp, comment varchar, PRIMARY KEY (username,videoid));

Don’t be afraid of writes. Bring it!

Thursday, May 2, 13

Page 24: The data model is dead, long live the data model

Relational Concept - Transactions

• Built in and easy to use• Can be slow and heavy so don’t use them all the time• Normal forms force ACID writes into many tables

20

lock -change table one -change table two -change table threecommit

-or-

lock -change table one -change table two -change table threerollback

Thursday, May 2, 13

Page 25: The data model is dead, long live the data model

Crazy Concept - Do you need a transaction?

• Since they were easy in RDBMS, was it just default?• Read this article• In a nutshell,

21

http://www.eaipatterns.com/docs/IEEE_Software_Design_2PC.pdf

Asynchronous transaction Cashier takes your money Barista makes your coffee Error? Barista deals with it

Thursday, May 2, 13

Page 26: The data model is dead, long live the data model

Cassandra Concept - Transaction quality

• Requires a lock, which is costly in distributed systems• Cassandra features can be used to advantage

- Row level isolation- Atomic batches

22

Thursday, May 2, 13

Page 27: The data model is dead, long live the data model

Cassandra Concept - Transaction

• Track that something happened• Use time stamps to preserve order• Rectify when any doubt (just like banks do)

23

CREATE TABLE credit_transaction ( username varchar, type varchar, datetime timestamp, credits int, PRIMARY KEY (username,datetime,type)) WITH CLUSTERING ORDER BY (datetime DESC, type ASC);

Create this table

Sort the columns in reverse orderso last action is first on the list

Thursday, May 2, 13

Page 28: The data model is dead, long live the data model

Cassandra Concept - Transaction

• All transactions are stored• Think RPN calculator, latest first

24

ADD:2013-04-25 21:10:32.745

REMOVE:2013-04-25 15:45:22.813

ADD:2013-04-25 07:15:12.542

$20 $5 $100tcodd

Rectify account: + $100- $5+ 20---------= $115 Current balance

Thursday, May 2, 13

Page 29: The data model is dead, long live the data model

Cassandra Concept - Transaction

25

Create credit_transaction record with ADD + Timestamp

Read user record total_credits and credit_timestamp

user credit_timestamp <credit_transaction

timestamp?

Set back in user recordcredit_timestamp and incremented total_credits

Create credit_transaction record with REMOVE + Timestamp

Read user record total_credits and credit_timestamp

user credit_timestamp <credit_transaction

timestamp?

Set back in user recordcredit_timestamp and decremented total_credits

Fail transaction and rectify

Success

Add Credit Remove credit

Thursday, May 2, 13

Page 30: The data model is dead, long live the data model

And if that doesn’t work...

• Lightweight transactions coming soon.• Cassandra 2.0• See CASSANDRA-5062

26

Thursday, May 2, 13

Page 31: The data model is dead, long live the data model

But wait there is more!!

• The next in this series: May 16th

27

Become a super modeler

• Final will be at the Cassandra Summit: June 11th

The worlds next top data model

Thursday, May 2, 13

Page 32: The data model is dead, long live the data model

Be there!!!

28

Sony, eBay, Netflix, Intuit, Spotify... the list goes on. Don’t miss it.

Thursday, May 2, 13

Page 33: The data model is dead, long live the data model

Thank You

Q&A

Thursday, May 2, 13