Scaling Out Without Flipping Out

Scaling Out Without Flipping Out

Luke Tillman (@LukeTillman)

Language Evangelist at DataStax

Who are you?!

• Evangelist with a focus on the .NET Community

• Long-time developer (mostly with relational databases)

• Recently presented at Cassandra Summit 2014 with Microsoft

2

1 More Users, More Problems

2 With Great Power Comes Great Responsibility

3 Leaving the Relational Past Behind

4 Cassandra and Azure, BFFs

3

More Users, More Problems

Scaling and Availability

• We all want applications and

services that are scalable and highly

available

• Scaling our app tier is usually pretty

painless, especially with cloud

infrastructure

– App tier tends to be stateless

Ways We Scale our Relational Databases

6

SELECT array_agg(players), player_teams FROM ( SELECT DISTINCT t1.t1player AS players, t1.player_teams FROM ( SELECT p.playerid AS t1id, concat(p.playerid, ':', p.playername, ' ') AS t1player, array_agg (pl.teamid ORDER BY pl.teamid) AS player_teams FROM player p LEFT JOIN plays pl ON p.playerid = pl.playerid GROUP BY p.playerid, p.playername ) t1 INNER JOIN ( SELECT p.playerid AS t2id, array_agg (pl.teamid ORDER BY pl.teamid) AS player_teams FROM player p LEFT JOIN plays pl ON p.playerid = pl.playerid GROUP BY p.playerid, p.playername ) t2 ON t1.player_teams = t2.player_teams AND t1.t1id <> t2.t2id ) innerQuery GROUP BY player_teams

Scaling Up

SELECT * FROM denormalized_view

Denormalization

Ways we Scale our Relational Databases

7

Client

Users Data


7

Client

Users Data

Replication

Primary

Replica 1

Replica 2


7

Client

Users Data

Replication

Primary

Replica 1

Replica 2

Failover

Process


7

Client

Users Data

Replication

Primary

Replica 1

Replica 2

Failover

Process

Monitor

Failover


7

Client

Users Data

Replication

Primary

Replica 1

Replica 2

Failover

Process

Monitor

Failover

Write Requests Read Requests


7

Client

Users Data

Replication

Primary

Replica 1

Replica 2

Failover

Process

Monitor

Failover

Write Requests Read Requests

Replication Lag


7

Client

Users Data


7

Client

Users Data

Sharding

A-F G-M N-T U-Z


7

Client

Users Data

Sharding

Router

A-F G-M N-T U-Z


7

Client

Users Data

Router

A-F G-M N-T U-Z

Sharding and Replication (and probably Denormalization)


7

Client

Users Data

Failover

Process

Router

A-F G-M N-T U-Z



7

Client

Users Data

Failover

Process

Monitor

Failover

Router

A-F G-M N-T U-Z



7

Client

Users Data

Failover

Process

Monitor

Failover

Router

A-F G-M N-T U-Z



7

Client

Users Data

Failover

Process

Monitor

Failover

Router

A-F G-M N-T U-Z

Replication Lag



7

Client

Users Data

Failover

Process

Monitor

Failover

Router

A-F G-M N-T U-Z

Replication Lag


What is Cassandra?

• A Linearly Scaling and Fault Tolerant Distributed Database

• Fully Distributed

– Data spread over many nodes

– All nodes participate in a cluster

– All nodes are equal

– No SPOF (shared nothing)

– Run on commodity hardware

22

What is Cassandra?

Linearly Scaling

– Have More Data? Add more nodes.

– Need More Throughput? Add more nodes.

23

Fault Tolerant

– Nodes Down != Database Down

– Datacenter Down != Database Down

What is Cassandra?

• Fully Replicated

• Clients write local

• Data syncs across WAN

• Replication Factor per DC

24

US Europe

Client

Cassandra and the CAP Theorem

• The CAP Theorem limits what distributed systems can do

• Consistency

• Availability

• Partition Tolerance

• Limits? “Pick 2 out of 3”

• Cassandra is an AP system that is Eventually Consistent

25

With Great Power Comes Great

Responsibility

You Control the Fault Tolerance of Cassandra

• Replication Factor

– You set this on the server-side in

Cassandra

• Consistency Level

– You set this on the client-side in

your application

– Choose this for each read and

write you do against Cassandra

27

Replication Factor (server-side)

• How many copies of the data should exist?

28

Client

B AD

C AB

A CD

D BC

Write A

RF=3

Consistency Level (client-side)

• How many replicas do we need to hear from before we

acknowledge?

29

Client

B AD

C AB

A CD

D BC

Write A

CL=QUORUM

Client

B AD

C AB

A CD

D BC

Write A

CL=ONE

Consistency Levels

• Applies to both Reads and Writes (i.e. is set on each query)

• ONE – one replica from any DC

• LOCAL_ONE – one replica from local DC

• QUORUM – 51% of replicas from any DC

• LOCAL_QUORUM – 51% of replicas from local DC

• ALL – all replicas

• TWO

30

Consistency Level and Speed

• How many replicas we need to hear from can affect how quickly

we can read and write data in Cassandra

31

Client

B AD

C AB

A CD

D BC

5 µs ack

300 µs ack

12 µs ack

12 µs ack

Read A

(CL=QUORUM)

Consistency Level and Availability

• Consistency Level choice affects availability

• For example, QUORUM can tolerate one replica being down and

still be available (in RF=3)

32

Client

B AD

C AB

A CD

D BC

A=2

A=2

A=2

Read A

(CL=QUORUM)

Consistency Level and Eventual Consistency

• Cassandra is an AP system that is Eventually Consistent so

replicas may disagree

• Column values are timestamped

• In Cassandra, Last Write Wins (LWW)

33

Client

B AD

C AB

A CD

D BC

A=2

Newer

A=1

Older

A=2

Read A

(CL=QUORUM)

Christos from Netflix: “Eventual Consistency != Hopeful Consistency”

https://www.youtube.com/watch?v=lwIA8tsDXXE



Leaving the Relational Past Behind

34

KillrVideo, a Video Sharing Site (like YouTube)

• Live demo available at http://www.killrvideo.com – Written in C#, JavaScript

– Live Demo running in Azure, backed by DataStax Enterprise cluster

– Open source: https://github.com/luketillman/killrvideo-csharp

http://www.killrvideo.com/

https://github.com/luketillman/killrvideo-csharp




Data Structures

• Keyspace is like RDBMS Database or Schema

• Like RDBMS, Cassandra uses Tables to store data

• Partitions can have one row (narrow) or multiple

rows (wide)

36

Keyspace

Tables

Partitions

Rows

Schema Definition (DDL)

• Easy to define tables for storing data

• First part of Primary Key is the Partition Key

CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) );

37

Partition Key

Partition Key Determines Data Distribution

• The Partition Key determines node placement

38

name description ...

Keyboard Cat Keyboard Cat is the ... ...

Nyan Cat Check out Nyan cat ... ...

Original Grumpy Cat Visit Grumpy Cat’s … ...

videoid

689d56e5- …

93357d73- …

d978b136- …

Partition Key – Hashing

• The Partition Key is hashed using a consistent hashing function

(Murmur 3) and the output is used to place the data on a node

• The data is also replicated to RF-1 other nodes

39

Murmur3 videoid: 689d56e5- ... Murmur3: A

B AD

C AB

A CD

D BC

RF=3 Partition Key

name description ...

Keyboard Cat Keyboard Cat is the ... ...

videoid

689d56e5- ...

Hashing – Back to Reality

• Back in reality, Partition Keys actually hash to 128 bit numbers

• Nodes in Cassandra own token ranges (i.e. hash ranges)

40

B AD

C AB

A CD

D BC

Range Start End

A 0xC000000..1 0x0000000..0

B 0x0000000..1 0x4000000..0

C 0x4000000..1 0x8000000..0

D 0x8000000..1 0xC000000..0

Murmur3 0xadb95e99da887a8a4cb474db86eb5769

Partition Key

videoid

689d56e5- ...

Clustering Columns

• Second part of Primary Key is Clustering Column(s)

• Clustering columns affect ordering of data inside a partition (and on disk)

• Ascending/Descending order is possible

41

CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC);

Clustering Columns – Wide Rows

• Use of Clustering Columns (and the layout on disk) is where the

term “Wide Rows” comes from

42

videoid='0fe6a...'

userid= 'ac346...'

comment= 'Awesome!'

commentid='82be1...' (10/1/2014 9:36AM)

userid= 'f89d3...'

comment= 'Garbage!'

commentid='765ac...' (9/17/2014 7:55AM)

CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC);

Inserts and Updates

• Use INSERT or UPDATE to add and modify data

• Both will overwrite data (no constraints like RDBMS)

• INSERT and UPDATE functionally equivalent 43

INSERT INTO comments_by_video ( videoid, commentid, userid, comment) VALUES ( '0fe6a...', '82be1...', 'ac346...', 'Awesome!');

UPDATE comments_by_video SET userid = 'ac346...', comment = 'Awesome!' WHERE videoid = '0fe6a...' AND commentid = '82be1...';

TTL and Deletes

• Can specify a Time to Live (TTL) in seconds when doing an

INSERT or UPDATE

• Use DELETE statement to remove data

• Can optionally specify columns to remove part of a row

44

INSERT INTO comments_by_video ( ... ) VALUES ( ... ) USING TTL 86400;

DELETE FROM comments_by_video WHERE videoid = '0fe6a...' AND commentid = '82be1...';

Querying

• Use SELECT to get data from your tables

• Always include Partition Key and optionally Clustering Columns in queries

• Can use ORDER BY (on Clustering Columns) and LIMIT

• Use range queries (for example, by date) to slice partitions

45

SELECT * FROM comments_by_video WHERE videoid = 'a67cd...' LIMIT 10;

Breaking the Relational Mindset

• How do we data model when we have to query by the Partition Key (and optionally Clustering Columns)?

• Denormalize all the things!

• Disk is cheap now and writes in Cassandra are FAST

• Data modeling is very much query driven

• Many times we end up with a “table per query”

46

Users – The Relational Way

• Single Users table with all user data and an Id Primary Key

• Add an index on email address to allow queries by email

User Logs

into site

Find user by email

address

Show basic

information

about user Find user by id

47

Users – The Cassandra Way

User Logs

into site

Find user by email

address

Show basic

information

about user Find user by id

CREATE TABLE user_credentials ( email text, password text, userid uuid, PRIMARY KEY (email) );

CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, created_date timestamp, PRIMARY KEY (userid) );

48

Cassandra and Azure, BFFs

Cassandra and Azure: Languages and Platforms

50

Open

Source

Languages

Platforms

https://github.com/datastax https://github.com/Azure

Notes:

• DataStax also offers a C++ driver

• Over 20% of Azure VMs run Linux

Deploying Cassandra in Azure

• IOPs are super important, choosing can be tricky



Azure Storage

(Blob)

A7 instances



Azure Storage

(Blob) SSD SSD

A7 instances G3/G4 instances



Azure Storage

(Blob) SSD SSD

More safety

Less Performance

Less safety

More Performance

A7 instances G3/G4 instances



Azure Storage

(Blob) SSD SSD

Logical DC1 Logical DC2

Multi-DC Replication



Azure Storage

(Blob) SSD SSD

Frequent Snapshots

Scripted Setup from the Command Line

• Finer control over the number of VMs and configuration

– Customize Bash and Powershell scripts to fit your scenario

• Provision VMs and configure them with scripts, then use

OpsCenter to deploy Cassandra

• Detailed instructions and scripts available:

– https://academy.datastax.com/demos/enterprise-deployment-microsoft-

azure-cloud

57

https://academy.datastax.com/demos/enterprise-deployment-microsoft-azure-cloud










Marketplace Deployment from Preview Portal

58 https://portal.azure.com

https://portal.azure.com/

Marketplace Deployment from Preview Portal

• Configure VM size in the Portal UI, click a button (yes, that easy)

• What you get:

– 8 VMs configured for use as DataStax Enterprise nodes

– 1 VM with OpsCenter

• Decommission any nodes you don't want/need, then use

OpsCenter to deploy Cassandra

– More detailed instructions:

http://www.tonyguid.net/2014/11/Datastax_now_what/

59




OpsCenter: Management and Monitoring

60

OpsCenter: Creating a Cluster and Adding Nodes

61

Picking a Distribution: Apache Cassandra

• Get the latest bleeding-edge

features

• File JIRAs

• Support via community on

mailing list and IRC

• Perfect for hacking

62

http://cassandra.apache.org

http://cassandra.apache.org/

http://cassandra.apache.org/

Picking a Distribution: DataStax Enterprise

• Integrated Multi-DC Solr

• Integrated Spark

• Extended support

• Additional QA

• Focused on stable releases for

enterprise

• Free for startups

– < 3MM revenue and < 30MM funding

63

http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/startups











Some Parting Thoughts

• Spending time re-architecting or changing infrastructure to

meet scale challenges doesn't add business value

• Can you make infrastructure and architecture decisions now that

will help you scale in the future?

• Learn more

– Apache Cassandra: http://planetcassandra.org

– DataStax Enterprise, Free Tools: http://www.datastax.com

– Azure: http://azure.microsoft.com

64

http://planetcassandra.org/

http://planetcassandra.org/

http://www.datastax.com/

http://www.datastax.com/

http://azure.microsoft.com/

http://azure.microsoft.com/

Questions?

Follow me for updates or to ask questions later @LukeTillman Slides: http://www.slideshare.net/LukeTillman

65

http://www.slideshare.net/LukeTillman

http://www.slideshare.net/LukeTillman

Technology

Scaling Out Without Flipping Out