Real-Time Big Data in practice with Cassandra
Michaël Figuière@mfiguiere
©2012 DataStax
Speaker
Michaël Figuière
@mfiguiere
2
©2012 DataStax
Ring Architecture
CassandraNode
Node
NodeNode
Node
Node
3
©2012 DataStax
Ring Architecture
4
Node
Node Replica
Replica
NodeReplica
©2012 DataStax
Linear Scalability
5
Client Writes/s by Node Count - Replication Factor = 3
©2012 DataStax
Client / Server Communication
Client
Client
Client
Client
Node?
Node
6
Replica
Replica
Replica
Node
©2012 DataStax
Client / Server Communication
Client
Client
Client
Client
Node
Node
Coordinator node:Forwards all R/W requeststo corresponding replicas
7
Replica
Replica
ReplicaNode
©2012 DataStax
3 replicas
A A A
Time
8
Tunable Consistency
©2012 DataStax
Write and wait for acknowledge from one node
Write ‘B’
B A A
9
Time
A A A
Tunable Consistency
©2012 DataStax
R + W < N
Read waiting for one node to answer
B A A
10
B A A
A A A
Write and wait for acknowledge from one node
Time
Tunable Consistency
©2012 DataStax
R + W = N
11
B B A
B A
A A A
B
Write and wait for acknowledges from two nodes
Read waiting for one node to answer
Tunable ConsistencyTime
©2012 DataStax
Tunable Consistency
R + W > N
12
B A
B A
A A A
B
B
Write and wait for acknowledges from two nodes
Read waiting for two nodes to answer
Time
©2012 DataStax
Tunable Consistency
R = W = QUORUM
13
B A
B A
A A A
B
B
Time
QUORUM = (N / 2) + 1
©2012 DataStax
Request Path
1
2
2
2
3
3
3
4
14
Client
Client
Client
Client
Node
Node
Node
Replica
Replica
Replica
Coordinator node
©2012 DataStax
Column Family Data Model
15
Jonathan
name
123 main
address
TX
statejbellis
Daria
name
45 2nd st
address
CA
statedhutch
Eric
name
emailegilmore
Row Key Columns
©2012 DataStax
Column Family Data Model
16
dhutch egilmore datastax mzcassiejbellis
egilmoredhutch
datastax mzcassieegilmore
Row Key Columns
©2012 DataStax
CQL3 Data Model
17
gmason
user_id
1765
tweet_id
phenry
author
Give me liberty or give me death
body
PartitionKey
gmason 1742 gwashington I chopped down the cherry tree
ahamilton 1797 jadams A government of laws, not men
ahamilton 1742 gwashington I chopped down the cherry tree
RemainingKey
Timeline Table
©2012 DataStax
CQL3 Data Model
18
gmason
user_id
1765
tweet_id
phenry
author
Give me liberty or give me death
body
gmason 1742 gwashington I chopped down the cherry tree
ahamilton 1797 jadams A government of laws, not men
ahamilton 1742 gwashington I chopped down the cherry tree
Timeline Table
CREATE TABLE timeline ( user_id varchar, tweet_id uuid, author varchar, body varchar, PRIMARY KEY (user_id, tweet_id));
CQL
©2012 DataStax
CQL3 Data Model
19
gmason
user_id
1765
tweet_id
phenry
author
Give me liberty or give me death
body
gmason 1742 gwashington I chopped down the cherry tree
ahamilton 1797 jadams A government of laws, not men
ahamilton 1742 gwashington I chopped down the cherry tree
gwashington
[1742, author]
I chopped down the...
[1742, body]
phenry
[1765, author]
Give me liberty or give...
[1765, body]gmason
gwashington
[1742, author]
I chopped down the...
[1742, body]
jadams
[1797, author]
A government of laws...
[1797, body]ahamilton
Timeline Table
Timeline Physical Layout
©2012 DataStax
Denormalized Data Model
20
Data duplicated over several tables
©2012 DataStax
Real-Time Analytics
Google Analytics gives you immediate statistics about
your website traffic
21
©2012 DataStax
Web Analytics Data Model
22
/index.html
url
12:00
time
354
views
/index.html 12:01 402
/contacts.html 12:00 23
/contacts.html 12:01 20
Analytics Table
CREATE TABLE analytics ( url varchar, time timestamp, views counter, from_search counter, direct counter, from_referrer counter, PRIMARY KEY (url, time));
300
from_search
333
3
4
20
direct
25
0
1
34
from_referrer
44
20
15
CQL
©2012 DataStax
Web Analytics Data Model
23
/index.html
url time
354
views
/index.html 402
/contacts.html 23
/contacts.html 20
Analytics Table
UPDATE analyticsSET views = views + 1, from_search = from_search + 1WHERE url = '/index.html'AND time = '2012-10-06 12:00';
300
from_search
333
3
4
20
direct
25
0
1
34
from_referrer
44
20
15
CQL
12:00
12:01
12:00
12:01
©2012 DataStax
Web Analytics Data Model
24
/index.html
url time
354
views
/index.html 402
/contacts.html 23
/contacts.html 20
Analytics Table
SELECT * FROM analyticsWHERE url = '/index.html'
300
from_search
333
3
4
20
direct
25
0
1
34
from_referrer
44
20
15
CQL
12:00
12:01
12:00
12:01
©2012 DataStax
Online Business Intelligence
Application Cassandra Hadoop
Storage for application in production
Using results in production
Distributed batch processing
Storage forresults
25
@mfiguiere
blog.datastax.com
Stay Tuned!