Real-Time Big Data in practice with Cassandra
Michaël Figuière@mfiguiere
©2012 DataStax
Speaker
Michaël Figuière
@mfiguiere
2
©2012 DataStax
Ring Architecture
CassandraNode
Node
NodeNode
Node
Node
3
©2012 DataStax
Ring Architecture
4
Node
Node Replica
Replica
NodeReplica
©2012 DataStax
Linear Scalability
5
Client Writes/s by Node Count - Replication Factor = 3
©2012 DataStax
Client / Server Communication
Client
Client
Client
Client
Node?
Node
6
Replica
Replica
Replica
Node
©2012 DataStax
Client / Server Communication
Client
Client
Client
Client
Node
Node
Coordinator node:Forwards all R/W requeststo corresponding replicas
7
Replica
Replica
ReplicaNode
©2012 DataStax
3 replicas
A A A
Time
8
Tunable Consistency
©2012 DataStax
Write and wait for acknowledge from one node
Write ‘B’
B A A
9
Time
A A A
Tunable Consistency
©2012 DataStax
R + W < N
Read waiting for one node to answer
B A A
10
B A A
A A A
Write and wait for acknowledge from one node
Time
Tunable Consistency
©2012 DataStax
R + W = N
11
B B A
B A
A A A
B
Write and wait for acknowledges from two nodes
Read waiting for one node to answer
Tunable ConsistencyTime
©2012 DataStax
Tunable Consistency
R + W > N
12
B A
B A
A A A
B
B
Write and wait for acknowledges from two nodes
Read waiting for two nodes to answer
Time
©2012 DataStax
Tunable Consistency
R = W = QUORUM
13
B A
B A
A A A
B
B
Time
QUORUM = (N / 2) + 1
©2012 DataStax
Request Path
1
2
2
2
3
3
3
4
14
Client
Client
Client
Client
Node
Node
Node
Replica
Replica
Replica
Coordinator node
©2012 DataStax
Column Family Data Model
15
Jonathan
name
123 main
address
TX
statejbellis
Daria
name
45 2nd st
address
CA
statedhutch
Eric
name
emailegilmore
Row Key Columns
©2012 DataStax
Column Family Data Model
16
dhutch egilmore datastax mzcassiejbellis
egilmoredhutch
datastax mzcassieegilmore
Row Key Columns
©2012 DataStax
CQL3 Data Model
17
gmason
user_id
1765
tweet_id
phenry
author
Give me liberty or give me death
body
PartitionKey
gmason 1742 gwashington I chopped down the cherry tree
ahamilton 1797 jadams A government of laws, not men
ahamilton 1742 gwashington I chopped down the cherry tree
RemainingKey
Timeline Table
©2012 DataStax
CQL3 Data Model
18
gmason
user_id
1765
tweet_id
phenry
author
Give me liberty or give me death
body
gmason 1742 gwashington I chopped down the cherry tree
ahamilton 1797 jadams A government of laws, not men
ahamilton 1742 gwashington I chopped down the cherry tree
Timeline Table
CREATE TABLE timeline ( user_id varchar, tweet_id uuid, author varchar, body varchar, PRIMARY KEY (user_id, tweet_id));
CQL
©2012 DataStax
CQL3 Data Model
19
gmason
user_id
1765
tweet_id
phenry
author
Give me liberty or give me death
body
gmason 1742 gwashington I chopped down the cherry tree
ahamilton 1797 jadams A government of laws, not men
ahamilton 1742 gwashington I chopped down the cherry tree
gwashington
[1742, author]
I chopped down the...
[1742, body]
phenry
[1765, author]
Give me liberty or give...
[1765, body]gmason
gwashington
[1742, author]
I chopped down the...
[1742, body]
jadams
[1797, author]
A government of laws...
[1797, body]ahamilton
Timeline Table
Timeline Physical Layout
©2012 DataStax
Real-Time Analytics
Google Analytics gives you immediate statistics about
your website traffic
20
©2012 DataStax
Web Analytics Data Model
21
/index.html
url
12:00
time
354
views
/index.html 12:01 402
/contacts.html 12:00 23
/contacts.html 12:01 20
Analytics Table
CREATE TABLE analytics ( url varchar, time timestamp, views counter, from_search counter, direct counter, from_referrer counter, PRIMARY KEY (url, time));
300
from_search
333
3
4
20
direct
25
0
1
34
from_referrer
44
20
15
CQL
©2012 DataStax
Web Analytics Data Model
22
/index.html
url time
354
views
/index.html 402
/contacts.html 23
/contacts.html 20
Analytics Table
UPDATE analyticsSET views = views + 1, from_search = from_search + 1WHERE url = '/index.html'AND time = '2012-10-06 12:00';
300
from_search
333
3
4
20
direct
25
0
1
34
from_referrer
44
20
15
CQL
12:00
12:01
12:00
12:01
©2012 DataStax
Web Analytics Data Model
23
/index.html
url time
354
views
/index.html 402
/contacts.html 23
/contacts.html 20
Analytics Table
SELECT * FROM analyticsWHERE url = '/index.html'
300
from_search
333
3
4
20
direct
25
0
1
34
from_referrer
44
20
15
CQL
12:00
12:01
12:00
12:01
©2012 DataStax
Connect and Write
24
Cluster cluster = Cluster.builder() .addContactPoints("127.0.0.1", "127.0.0.2") .build();
Session session = cluster.connect();
session.execute( "INSERT INTO user (user_id, name, email) VALUES (12345, 'johndoe', '[email protected]')");
©2012 DataStax
Read
25
ResultSet rs = session.execute("SELECT * FROM user");
List<CQLRow> rows = rs.fetchAll(); for (CQLRow row : rows) {
String userId = row.getString("user_id"); String name = row.getString("name"); String email = row.getString("email");}
©2012 DataStax
Object Mapping
26
public enum Gender {
@EnumValue("m") MALE, @EnumValue("f") FEMALE;}
@Table("user_and_messages")public class User { @Column("user_id") private String userId; private String name; private String email; private Gender gender;}
©2012 DataStax
Aggregation
27
@Table("user_and_messages")public class User { @Column("user_id") private String userId; private String name; private String email; @GroupBy("user_id") private List<Message> messages;}
public class Message {
private String title; private String body; }
©2012 DataStax
Inheritance
28
@InheritanceValue("tv")public class TV extends Product {
private float size;}
@Table("catalog")@Inheritance({Phone.class, TV.class})@InheritanceColumn("product_type")public abstract class Product {
@Column("product_id") private String productId; private float price; private String vendor; private String model;
}
©2012 DataStax
Online Business Intelligence
Application Cassandra Hadoop
Storage for application in production
Using results in production
Distributed batch processing
Storage forresults
29
@mfiguiere
blog.datastax.com
Stay Tuned!