36
Virtual nodes: Operational Aspirin Nicolas Favre-Felix [email protected] @yowgi Thursday, 24 October 13

Virtual nodes: Operational Aspirin

  • Upload
    acunu

  • View
    771

  • Download
    0

Embed Size (px)

DESCRIPTION

Cassandra SF meetup, October 2013

Citation preview

Virtual nodes:Operational AspirinNicolas Favre-Felix

[email protected]@yowgi

Thursday, 24 October 13

2

1-minute recap on Cassandra distribution

• Nodes are clustered in a “ring”

• Each node has a token in [0,2127-1]:• 0

• 42535295865117307932921825928971026432

• 85070591730234615865843651857942052864

• 127605887595351923798765477786913079296

• Keys are hashed using MD5 (now Murmur3)

• Each node owns a share of the key-space

Thursday, 24 October 13

3

Cassandra distribution limitations

• Operational complexity

• Rebuild cost for capacity bound clusters

• Impact on maintenance operations

• Impact on topology changes

• No native support for heterogeneous hardware

Thursday, 24 October 13

4

Adding a node to an existing cluster

Thursday, 24 October 13

5

Insert the new node...

Thursday, 24 October 13

6

Recalculate ranges and rebalance by hand

Thursday, 24 October 13

7

Usually just double the number of nodes

Thursday, 24 October 13

8

Add/remove node

• Need to rebalance ranges between nodes

• Move more data than is optimal

• (optimal would be 1/N)

• Impacts at most RF nodes

• (prefer to spread load across cluster)

• Manual, tedious, error-prone, painful...

Thursday, 24 October 13

9

• nodetool removetoken (removenode from 1.2)

• Dead host's token removed from ring

• Next host in ring assumes range

• Replica count restored

• Involves at most 2 * RF - 1 nodes

• If we can make it faster, we can store more data!

Removing a node

Thursday, 24 October 13

10

Virtual Nodes!

Thursday, 24 October 13

11

• More than one token per node

• Random token assignment

• Incremental cluster resize, one node at a time

• Streaming to/from all nodes, not just neighbors

• Only random partitioners are supported

• Multi-DC support still works in the same way

Virtual nodes in Cassandra 1.2+

Thursday, 24 October 13

12

Different virtual nodes strategies

Number partitions Partition Size

Random(Cassandra 1.2+) O(N) O(B/N)

Fixed(Riak) O(1) O(B)

Auto-sharding(MongoDb) O(B) O(1)

N = number of nodesB = size of dataset

(read more at http://bit.ly/virtualnodes)

Thursday, 24 October 13

13

Virtual Nodes!

New in 1.2 Enabled by

default in 2.0

→ set num_tokens: 256 in cassandra.yaml

Thursday, 24 October 13

14

Adding nodes to a cluster

• From a single node...

• Multiple tokens

• Ranges of differentsizes

Thursday, 24 October 13

15

Adding nodes to a cluster

• We add a second node

• “Steals” ranges fromthe existing node

Thursday, 24 October 13

16

Adding nodes to a cluster

• And a third one...

• “Steals” ranges fromthe existing nodes

• Distribution is closeto 1/3 each

Thursday, 24 October 13

17

An ideal distribution

Thursday, 24 October 13

18

Actually more like this

Thursday, 24 October 13

19

• Assign a new host T random tokens (T=256)

• New tokens split ranges from existing nodes

• Each existing node contributes to the bootstrap

• Optimal data movement

• No need to rebalance, or double cluster size

• No need to calculate tokens

Bootstrap

Thursday, 24 October 13

20

Removing nodes from a cluster

Removing the nodewith blue ranges:

Thursday, 24 October 13

21

Removing nodes from a cluster

Thursday, 24 October 13

22

Removing a node

• Nodetool removetoken removenode

• nodetool removenode <host_id>

• Dead host's tokens removed from ring

• Ranges recalculated & data moved

• All nodes participate!

Thursday, 24 October 13

23

nodetool ring becomes useless...$ nodetool ring

Datacenter: datacenter1==========Address Rack Status State Load Owns Token 9080863078500373906192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -9213883331796139815192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -9144950523687551651192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -8996170981218131496192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -8983323746604361628192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -8982914591092048007192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -8834964592535450767192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -8794005378459731469192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -8731574464309751995192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -8683340393587441432192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -8617209272936614380192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -8520769804095723522192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -8513488815084100031192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -8511017804983511458192.168.100.2 rack1 Up Normal 48.18 KB 19.46% -8496547308245734122

Thursday, 24 October 13

24

nodetool status

$ nodetool status

Datacenter: datacenter1=======================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving-- Address Load Tokens Owns Host ID RackUN 192.168.100.2 48.18 KB 256 19.5% bb84e34e-b929-41c2-a5b1-31614a5c0bda rack1UN 192.168.100.3 48.21 KB 256 19.9% 50e8c1b1-a28f-431a-85ed-74f25cdff61b rack1UN 192.168.100.1 46.19 KB 256 21.3% 67bdd989-1b34-4bbd-a5b8-df3f69e2d9b9 rack1UN 192.168.100.5 48.13 KB 256 18.9% 1a88e040-84fd-4461-805c-bc3a3ea0edfa rack1UN 192.168.100.4 48.15 KB 256 20.5% 3be40484-8225-467c-8280-6e83ea43f521 rack1

Thursday, 24 October 13

25

Fewer tokens, less data!

Heterogeneity

Thursday, 24 October 13

26

Virtual Nodes: Operational Aspirin

Range size (arbitrary units)

Freq

uenc

y

mean range size

Modeled with simulated token assignment

Thursday, 24 October 13

27

How does this lead to balanced load?!

• Each host has the same distribution of range sizes

• So will assume roughly equal portions of the key-space

• Modelled with simulated data inserted into ranges...

Thursday, 24 October 13

28

Virtual Nodes: Operational Aspirin

Virtual node (location in key-space)

Nor

mal

ised

dat

a lo

ad

Thursday, 24 October 13

29

Virtual Nodes: Operational Aspirin

Normalised load (arbitrary units)

Freq

uenc

yHow balanced is balanced?

Thursday, 24 October 13

30

A balanced cluster

• Keys are randomly distributed

• V-node partition will assume the load proportional to its size

• Load tends towards balance with increase in number of nodes

• 2 nodes: 48.4% and 51.6%

• 3 nodes: 34.3%, 33.0%, 32.7%

• 4 nodes: 24.3%, 25.2%, 24.9%, 25.6%

Thursday, 24 October 13

31

Performance testing

• 17 node EC2 m1.large

• Inserted 460 million keys

• at RF=3

• Timed removenode and then bootstrap

• Results at http://bit.ly/vnodesperf

Thursday, 24 October 13

32

Performance testing

0

125

250

375

500

removenode bootstrap

Cassandra 1.2 Cassandra 1.1

Tim

e (s

econ

ds)

Thursday, 24 October 13

33

Migration path for a non-vnode cluster

• Several techniques to migrate to vnodes

• The “simplest” is to rebuild your cluster

• With downtime: restore from backup

• Without downtime: twice the hardware

• “shuffle” is the proposed alternative

• Migrate all nodes to vnodes, shuffle ranges

• Very few success stories

Thursday, 24 October 13

34

Conclusion

• You should already be using virtual nodes!

• Token management is a thing of the past

• Embrace the randomness

• Scale up and down without pain

Thursday, 24 October 13

Thanks!@yowgi@acunu

Thursday, 24 October 13

36

We’re hiring!

• Acunu suggested and developed virtual nodes

• Patches by @samoverton and @jericevans

• Eric Evans also contributed much of CQL

• We are looking for developers to work on Apache Cassandra, contributing features and enhancements to the Open-Source project

Thursday, 24 October 13