Upload
luke-tillman
View
254
Download
1
Embed Size (px)
Citation preview
Scaling Out Without Flipping Out
Luke Tillman (@LukeTillman)
Language Evangelist at DataStax
Who are you?!
• Evangelist with a focus on the .NET Community
• Long-time developer (mostly with relational databases)
• Recently presented at Cassandra Summit 2014 with Microsoft
2
1 More Users, More Problems
2 With Great Power Comes Great Responsibility
3 Leaving the Relational Past Behind
4 Cassandra and Azure, BFFs
3
More Users, More Problems
Scaling and Availability
• We all want applications and
services that are scalable and highly
available
• Scaling our app tier is usually pretty
painless, especially with cloud
infrastructure
– App tier tends to be stateless
Ways We Scale our Relational Databases
6
SELECT array_agg(players), player_teams FROM ( SELECT DISTINCT t1.t1player AS players, t1.player_teams FROM ( SELECT p.playerid AS t1id, concat(p.playerid, ':', p.playername, ' ') AS t1player, array_agg (pl.teamid ORDER BY pl.teamid) AS player_teams FROM player p LEFT JOIN plays pl ON p.playerid = pl.playerid GROUP BY p.playerid, p.playername ) t1 INNER JOIN ( SELECT p.playerid AS t2id, array_agg (pl.teamid ORDER BY pl.teamid) AS player_teams FROM player p LEFT JOIN plays pl ON p.playerid = pl.playerid GROUP BY p.playerid, p.playername ) t2 ON t1.player_teams = t2.player_teams AND t1.t1id <> t2.t2id ) innerQuery GROUP BY player_teams
Scaling Up
SELECT * FROM denormalized_view
Denormalization
Ways we Scale our Relational Databases
7
Client
Users Data
Ways we Scale our Relational Databases
7
Client
Users Data
Replication
Primary
Replica 1
Replica 2
Ways we Scale our Relational Databases
7
Client
Users Data
Replication
Primary
Replica 1
Replica 2
Failover
Process
Ways we Scale our Relational Databases
7
Client
Users Data
Replication
Primary
Replica 1
Replica 2
Failover
Process
Monitor
Failover
Ways we Scale our Relational Databases
7
Client
Users Data
Replication
Primary
Replica 1
Replica 2
Failover
Process
Monitor
Failover
Write Requests Read Requests
Ways we Scale our Relational Databases
7
Client
Users Data
Replication
Primary
Replica 1
Replica 2
Failover
Process
Monitor
Failover
Write Requests Read Requests
Replication Lag
Ways we Scale our Relational Databases
7
Client
Users Data
Ways we Scale our Relational Databases
7
Client
Users Data
Sharding
A-F G-M N-T U-Z
Ways we Scale our Relational Databases
7
Client
Users Data
Sharding
Router
A-F G-M N-T U-Z
Ways we Scale our Relational Databases
7
Client
Users Data
Router
A-F G-M N-T U-Z
Sharding and Replication (and probably Denormalization)
Ways we Scale our Relational Databases
7
Client
Users Data
Failover
Process
Router
A-F G-M N-T U-Z
Sharding and Replication (and probably Denormalization)
Ways we Scale our Relational Databases
7
Client
Users Data
Failover
Process
Monitor
Failover
Router
A-F G-M N-T U-Z
Sharding and Replication (and probably Denormalization)
Ways we Scale our Relational Databases
7
Client
Users Data
Failover
Process
Monitor
Failover
Router
A-F G-M N-T U-Z
Sharding and Replication (and probably Denormalization)
Ways we Scale our Relational Databases
7
Client
Users Data
Failover
Process
Monitor
Failover
Router
A-F G-M N-T U-Z
Replication Lag
Sharding and Replication (and probably Denormalization)
Ways we Scale our Relational Databases
7
Client
Users Data
Failover
Process
Monitor
Failover
Router
A-F G-M N-T U-Z
Replication Lag
Sharding and Replication (and probably Denormalization)
What is Cassandra?
• A Linearly Scaling and Fault Tolerant Distributed Database
• Fully Distributed
– Data spread over many nodes
– All nodes participate in a cluster
– All nodes are equal
– No SPOF (shared nothing)
– Run on commodity hardware
22
What is Cassandra?
Linearly Scaling
– Have More Data? Add more nodes.
– Need More Throughput? Add more nodes.
23
Fault Tolerant
– Nodes Down != Database Down
– Datacenter Down != Database Down
What is Cassandra?
• Fully Replicated
• Clients write local
• Data syncs across WAN
• Replication Factor per DC
24
US Europe
Client
Cassandra and the CAP Theorem
• The CAP Theorem limits what distributed systems can do
• Consistency
• Availability
• Partition Tolerance
• Limits? “Pick 2 out of 3”
• Cassandra is an AP system that is Eventually Consistent
25
With Great Power Comes Great
Responsibility
You Control the Fault Tolerance of Cassandra
• Replication Factor
– You set this on the server-side in
Cassandra
• Consistency Level
– You set this on the client-side in
your application
– Choose this for each read and
write you do against Cassandra
27
Replication Factor (server-side)
• How many copies of the data should exist?
28
Client
B AD
C AB
A CD
D BC
Write A
RF=3
Consistency Level (client-side)
• How many replicas do we need to hear from before we
acknowledge?
29
Client
B AD
C AB
A CD
D BC
Write A
CL=QUORUM
Client
B AD
C AB
A CD
D BC
Write A
CL=ONE
Consistency Levels
• Applies to both Reads and Writes (i.e. is set on each query)
• ONE – one replica from any DC
• LOCAL_ONE – one replica from local DC
• QUORUM – 51% of replicas from any DC
• LOCAL_QUORUM – 51% of replicas from local DC
• ALL – all replicas
• TWO
30
Consistency Level and Speed
• How many replicas we need to hear from can affect how quickly
we can read and write data in Cassandra
31
Client
B AD
C AB
A CD
D BC
5 µs ack
300 µs ack
12 µs ack
12 µs ack
Read A
(CL=QUORUM)
Consistency Level and Availability
• Consistency Level choice affects availability
• For example, QUORUM can tolerate one replica being down and
still be available (in RF=3)
32
Client
B AD
C AB
A CD
D BC
A=2
A=2
A=2
Read A
(CL=QUORUM)
Consistency Level and Eventual Consistency
• Cassandra is an AP system that is Eventually Consistent so
replicas may disagree
• Column values are timestamped
• In Cassandra, Last Write Wins (LWW)
33
Client
B AD
C AB
A CD
D BC
A=2
Newer
A=1
Older
A=2
Read A
(CL=QUORUM)
Christos from Netflix: “Eventual Consistency != Hopeful Consistency”
https://www.youtube.com/watch?v=lwIA8tsDXXE
Leaving the Relational Past Behind
34
KillrVideo, a Video Sharing Site (like YouTube)
• Live demo available at http://www.killrvideo.com – Written in C#, JavaScript
– Live Demo running in Azure, backed by DataStax Enterprise cluster
– Open source: https://github.com/luketillman/killrvideo-csharp
Data Structures
• Keyspace is like RDBMS Database or Schema
• Like RDBMS, Cassandra uses Tables to store data
• Partitions can have one row (narrow) or multiple
rows (wide)
36
Keyspace
Tables
Partitions
Rows
Schema Definition (DDL)
• Easy to define tables for storing data
• First part of Primary Key is the Partition Key
CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) );
37
Partition Key
Partition Key Determines Data Distribution
• The Partition Key determines node placement
38
name description ...
Keyboard Cat Keyboard Cat is the ... ...
Nyan Cat Check out Nyan cat ... ...
Original Grumpy Cat Visit Grumpy Cat’s … ...
videoid
689d56e5- …
93357d73- …
d978b136- …
Partition Key – Hashing
• The Partition Key is hashed using a consistent hashing function
(Murmur 3) and the output is used to place the data on a node
• The data is also replicated to RF-1 other nodes
39
Murmur3 videoid: 689d56e5- ... Murmur3: A
B AD
C AB
A CD
D BC
RF=3 Partition Key
name description ...
Keyboard Cat Keyboard Cat is the ... ...
videoid
689d56e5- ...
Hashing – Back to Reality
• Back in reality, Partition Keys actually hash to 128 bit numbers
• Nodes in Cassandra own token ranges (i.e. hash ranges)
40
B AD
C AB
A CD
D BC
Range Start End
A 0xC000000..1 0x0000000..0
B 0x0000000..1 0x4000000..0
C 0x4000000..1 0x8000000..0
D 0x8000000..1 0xC000000..0
Murmur3 0xadb95e99da887a8a4cb474db86eb5769
Partition Key
videoid
689d56e5- ...
Clustering Columns
• Second part of Primary Key is Clustering Column(s)
• Clustering columns affect ordering of data inside a partition (and on disk)
• Ascending/Descending order is possible
41
CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC);
Clustering Columns – Wide Rows
• Use of Clustering Columns (and the layout on disk) is where the
term “Wide Rows” comes from
42
videoid='0fe6a...'
userid= 'ac346...'
comment= 'Awesome!'
commentid='82be1...' (10/1/2014 9:36AM)
userid= 'f89d3...'
comment= 'Garbage!'
commentid='765ac...' (9/17/2014 7:55AM)
CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC);
Inserts and Updates
• Use INSERT or UPDATE to add and modify data
• Both will overwrite data (no constraints like RDBMS)
• INSERT and UPDATE functionally equivalent 43
INSERT INTO comments_by_video ( videoid, commentid, userid, comment) VALUES ( '0fe6a...', '82be1...', 'ac346...', 'Awesome!');
UPDATE comments_by_video SET userid = 'ac346...', comment = 'Awesome!' WHERE videoid = '0fe6a...' AND commentid = '82be1...';
TTL and Deletes
• Can specify a Time to Live (TTL) in seconds when doing an
INSERT or UPDATE
• Use DELETE statement to remove data
• Can optionally specify columns to remove part of a row
44
INSERT INTO comments_by_video ( ... ) VALUES ( ... ) USING TTL 86400;
DELETE FROM comments_by_video WHERE videoid = '0fe6a...' AND commentid = '82be1...';
Querying
• Use SELECT to get data from your tables
• Always include Partition Key and optionally Clustering Columns in queries
• Can use ORDER BY (on Clustering Columns) and LIMIT
• Use range queries (for example, by date) to slice partitions
45
SELECT * FROM comments_by_video WHERE videoid = 'a67cd...' LIMIT 10;
Breaking the Relational Mindset
• How do we data model when we have to query by the Partition Key (and optionally Clustering Columns)?
• Denormalize all the things!
• Disk is cheap now and writes in Cassandra are FAST
• Data modeling is very much query driven
• Many times we end up with a “table per query”
46
Users – The Relational Way
• Single Users table with all user data and an Id Primary Key
• Add an index on email address to allow queries by email
User Logs
into site
Find user by email
address
Show basic
information
about user Find user by id
47
Users – The Cassandra Way
User Logs
into site
Find user by email
address
Show basic
information
about user Find user by id
CREATE TABLE user_credentials ( email text, password text, userid uuid, PRIMARY KEY (email) );
CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, created_date timestamp, PRIMARY KEY (userid) );
48
Cassandra and Azure, BFFs
Cassandra and Azure: Languages and Platforms
50
Open
Source
Languages
Platforms
https://github.com/datastax https://github.com/Azure
Notes:
• DataStax also offers a C++ driver
• Over 20% of Azure VMs run Linux
Deploying Cassandra in Azure
• IOPs are super important, choosing can be tricky
Deploying Cassandra in Azure
• IOPs are super important, choosing can be tricky
Azure Storage
(Blob)
A7 instances
Deploying Cassandra in Azure
• IOPs are super important, choosing can be tricky
Azure Storage
(Blob) SSD SSD
A7 instances G3/G4 instances
Deploying Cassandra in Azure
• IOPs are super important, choosing can be tricky
Azure Storage
(Blob) SSD SSD
More safety
Less Performance
Less safety
More Performance
A7 instances G3/G4 instances
Deploying Cassandra in Azure
• IOPs are super important, choosing can be tricky
Azure Storage
(Blob) SSD SSD
Logical DC1 Logical DC2
Multi-DC Replication
Deploying Cassandra in Azure
• IOPs are super important, choosing can be tricky
Azure Storage
(Blob) SSD SSD
Frequent Snapshots
Scripted Setup from the Command Line
• Finer control over the number of VMs and configuration
– Customize Bash and Powershell scripts to fit your scenario
• Provision VMs and configure them with scripts, then use
OpsCenter to deploy Cassandra
• Detailed instructions and scripts available:
– https://academy.datastax.com/demos/enterprise-deployment-microsoft-
azure-cloud
57
Marketplace Deployment from Preview Portal
• Configure VM size in the Portal UI, click a button (yes, that easy)
• What you get:
– 8 VMs configured for use as DataStax Enterprise nodes
– 1 VM with OpsCenter
• Decommission any nodes you don't want/need, then use
OpsCenter to deploy Cassandra
– More detailed instructions:
http://www.tonyguid.net/2014/11/Datastax_now_what/
59
OpsCenter: Management and Monitoring
60
OpsCenter: Creating a Cluster and Adding Nodes
61
Picking a Distribution: Apache Cassandra
• Get the latest bleeding-edge
features
• File JIRAs
• Support via community on
mailing list and IRC
• Perfect for hacking
62
http://cassandra.apache.org
Picking a Distribution: DataStax Enterprise
• Integrated Multi-DC Solr
• Integrated Spark
• Extended support
• Additional QA
• Focused on stable releases for
enterprise
• Free for startups
– < 3MM revenue and < 30MM funding
63
http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/startups
Some Parting Thoughts
• Spending time re-architecting or changing infrastructure to
meet scale challenges doesn't add business value
• Can you make infrastructure and architecture decisions now that
will help you scale in the future?
• Learn more
– Apache Cassandra: http://planetcassandra.org
– DataStax Enterprise, Free Tools: http://www.datastax.com
– Azure: http://azure.microsoft.com
64
Questions?
Follow me for updates or to ask questions later @LukeTillman Slides: http://www.slideshare.net/LukeTillman
65