Upload
acunu
View
771
Download
0
Embed Size (px)
DESCRIPTION
Cassandra SF meetup, October 2013
Citation preview
Virtual nodes:Operational AspirinNicolas Favre-Felix
[email protected]@yowgi
Thursday, 24 October 13
2
1-minute recap on Cassandra distribution
• Nodes are clustered in a “ring”
• Each node has a token in [0,2127-1]:• 0
• 42535295865117307932921825928971026432
• 85070591730234615865843651857942052864
• 127605887595351923798765477786913079296
• Keys are hashed using MD5 (now Murmur3)
• Each node owns a share of the key-space
Thursday, 24 October 13
3
Cassandra distribution limitations
• Operational complexity
• Rebuild cost for capacity bound clusters
• Impact on maintenance operations
• Impact on topology changes
• No native support for heterogeneous hardware
Thursday, 24 October 13
8
Add/remove node
• Need to rebalance ranges between nodes
• Move more data than is optimal
• (optimal would be 1/N)
• Impacts at most RF nodes
• (prefer to spread load across cluster)
• Manual, tedious, error-prone, painful...
Thursday, 24 October 13
9
• nodetool removetoken (removenode from 1.2)
• Dead host's token removed from ring
• Next host in ring assumes range
• Replica count restored
• Involves at most 2 * RF - 1 nodes
• If we can make it faster, we can store more data!
Removing a node
Thursday, 24 October 13
11
• More than one token per node
• Random token assignment
• Incremental cluster resize, one node at a time
• Streaming to/from all nodes, not just neighbors
• Only random partitioners are supported
• Multi-DC support still works in the same way
Virtual nodes in Cassandra 1.2+
Thursday, 24 October 13
12
Different virtual nodes strategies
Number partitions Partition Size
Random(Cassandra 1.2+) O(N) O(B/N)
Fixed(Riak) O(1) O(B)
Auto-sharding(MongoDb) O(B) O(1)
N = number of nodesB = size of dataset
(read more at http://bit.ly/virtualnodes)
Thursday, 24 October 13
13
Virtual Nodes!
New in 1.2 Enabled by
default in 2.0
→ set num_tokens: 256 in cassandra.yaml
Thursday, 24 October 13
14
Adding nodes to a cluster
• From a single node...
• Multiple tokens
• Ranges of differentsizes
Thursday, 24 October 13
15
Adding nodes to a cluster
• We add a second node
• “Steals” ranges fromthe existing node
Thursday, 24 October 13
16
Adding nodes to a cluster
• And a third one...
• “Steals” ranges fromthe existing nodes
• Distribution is closeto 1/3 each
Thursday, 24 October 13
19
• Assign a new host T random tokens (T=256)
• New tokens split ranges from existing nodes
• Each existing node contributes to the bootstrap
• Optimal data movement
• No need to rebalance, or double cluster size
• No need to calculate tokens
Bootstrap
Thursday, 24 October 13
22
Removing a node
• Nodetool removetoken removenode
• nodetool removenode <host_id>
• Dead host's tokens removed from ring
• Ranges recalculated & data moved
• All nodes participate!
Thursday, 24 October 13
23
nodetool ring becomes useless...$ nodetool ring
Datacenter: datacenter1==========Address Rack Status State Load Owns Token 9080863078500373906192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -9213883331796139815192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -9144950523687551651192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -8996170981218131496192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -8983323746604361628192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -8982914591092048007192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -8834964592535450767192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -8794005378459731469192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -8731574464309751995192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -8683340393587441432192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -8617209272936614380192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -8520769804095723522192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -8513488815084100031192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -8511017804983511458192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -8496547308245734122
Thursday, 24 October 13
24
nodetool status
$ nodetool status
Datacenter: datacenter1=======================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving-- Address Load Tokens Owns Host ID RackUN 192.168.100.2 48.18 KB 256 19.5% bb84e34e-b929-41c2-a5b1-31614a5c0bda rack1UN 192.168.100.3 48.21 KB 256 19.9% 50e8c1b1-a28f-431a-85ed-74f25cdff61b rack1UN 192.168.100.1 46.19 KB 256 21.3% 67bdd989-1b34-4bbd-a5b8-df3f69e2d9b9 rack1UN 192.168.100.5 48.13 KB 256 18.9% 1a88e040-84fd-4461-805c-bc3a3ea0edfa rack1UN 192.168.100.4 48.15 KB 256 20.5% 3be40484-8225-467c-8280-6e83ea43f521 rack1
Thursday, 24 October 13
26
Virtual Nodes: Operational Aspirin
Range size (arbitrary units)
Freq
uenc
y
mean range size
Modeled with simulated token assignment
Thursday, 24 October 13
27
How does this lead to balanced load?!
• Each host has the same distribution of range sizes
• So will assume roughly equal portions of the key-space
• Modelled with simulated data inserted into ranges...
Thursday, 24 October 13
28
Virtual Nodes: Operational Aspirin
Virtual node (location in key-space)
Nor
mal
ised
dat
a lo
ad
Thursday, 24 October 13
29
Virtual Nodes: Operational Aspirin
Normalised load (arbitrary units)
Freq
uenc
yHow balanced is balanced?
Thursday, 24 October 13
30
A balanced cluster
• Keys are randomly distributed
• V-node partition will assume the load proportional to its size
• Load tends towards balance with increase in number of nodes
• 2 nodes: 48.4% and 51.6%
• 3 nodes: 34.3%, 33.0%, 32.7%
• 4 nodes: 24.3%, 25.2%, 24.9%, 25.6%
Thursday, 24 October 13
31
Performance testing
• 17 node EC2 m1.large
• Inserted 460 million keys
• at RF=3
• Timed removenode and then bootstrap
• Results at http://bit.ly/vnodesperf
Thursday, 24 October 13
32
Performance testing
0
125
250
375
500
removenode bootstrap
Cassandra 1.2 Cassandra 1.1
Tim
e (s
econ
ds)
Thursday, 24 October 13
33
Migration path for a non-vnode cluster
• Several techniques to migrate to vnodes
• The “simplest” is to rebuild your cluster
• With downtime: restore from backup
• Without downtime: twice the hardware
• “shuffle” is the proposed alternative
• Migrate all nodes to vnodes, shuffle ranges
• Very few success stories
Thursday, 24 October 13
34
Conclusion
• You should already be using virtual nodes!
• Token management is a thing of the past
• Embrace the randomness
• Scale up and down without pain
Thursday, 24 October 13