31
Bucket your partitions wisely Markus Höfer IT Consultant

Bucket your partitions wisely - Cassandra summit 2016

Embed Size (px)

Citation preview

Page 1: Bucket your partitions wisely - Cassandra summit 2016

Bucket your partitions wiselyMarkus HöferIT Consultant

Page 2: Bucket your partitions wisely - Cassandra summit 2016

2

Recap c* partitions

Page 3: Bucket your partitions wisely - Cassandra summit 2016

3

• Partition defines on which c* node the data resides

• Identified by partition key• Nodes „own“ tokenranges

which are directly related to partitions

• Tokens calculated by hashing partition key

Recap c* partitions

Page 4: Bucket your partitions wisely - Cassandra summit 2016

4

• DataStax recommendations for partitions:

•Maximum number of rows: hundreds of thousands

•Disk size: 100‘s of MB

Recap c* partitions

Page 5: Bucket your partitions wisely - Cassandra summit 2016

5

What‘s the problem with big partitions?

• Every request for these partitions hit the same nodes. -> Not scaleable!

• Deleting frequently will slow down your reads or even lead to TombstoneOverwhelmingExceptions

Recap c* partitions

Page 6: Bucket your partitions wisely - Cassandra summit 2016

6

Use case„notebook“

Page 7: Bucket your partitions wisely - Cassandra summit 2016

7

Use case - Environment

Keyspace µ

Keyspace µ

Keyspace µ

Page 8: Bucket your partitions wisely - Cassandra summit 2016

8

• Many concurrent processes• Scaleability important• Load peaks will happen!

Use case – Load and requirements

Page 9: Bucket your partitions wisely - Cassandra summit 2016

9

• A user (owner) can create a notebook

• An owner can create notes belonging to a notebook

• Users can fetch notes (idealy only once), not necessarily in certain order

• Users can delete notes

Use caseNote_by_notebook

P Notebook [text]C Title [text]

Comment [text]

Owner [text]CreatedOn [timestamp]

Page 10: Bucket your partitions wisely - Cassandra summit 2016

10

First things first:

Dev: „How many notes per notebook?“

PO: „I assume a maximum of 100.000 notes“

Use case

Page 11: Bucket your partitions wisely - Cassandra summit 2016

11

Use case – Let‘s do the math

How many values do we store having 100.000 rows per

notebook?‚ Note_by_notebookP Notebook [text]C Title [text]

Comment [text]

Creator [text]CreatedOn [timestamp]

num_rows * num_regular_columns + num_static_columns

= values_per_notebook

100.000* 3 + 0

= 300.000

Page 12: Bucket your partitions wisely - Cassandra summit 2016

12

Use case – Size assumptions

Note_by_notebookP Notebook [text] 16

bytesC Title [text] 60

bytesComment [text] 200

bytesOwner[text] 16

bytesCreatedOn [timestamp] 8 bytes

Page 13: Bucket your partitions wisely - Cassandra summit 2016

13

Use case – Let‘s do the math

Ok, so how much data is that on disk?

Note_by_notebookP Notebook [text]C Title [text]

Comment [text]

Owner [text]CreatedOn [text]

sizeof(P)+ sizeof(S)+ num_rows* (sizeof(C)+sizeof(regular_column))+ 8*num_values= bytes_per_partition

16 bytes+ 0 bytes+ 100.000* (60 bytes + 224 bytes) + 8 bytes * 300.000= 30.800.016 bytes

Page 14: Bucket your partitions wisely - Cassandra summit 2016

14

Use case

Dev: „31 MB for 100.000 rows on a partition“

PO: „Sorry ‘bout that, but its going to be 300.000 rows. Is that

a problem?“

Page 15: Bucket your partitions wisely - Cassandra summit 2016

15

Use case – Let‘s do the math

How many data do we store having

300.000 rows per notebook?Note_by_noteboo

kP Notebook

[text]C Title [text]

Comment [text]Owner [text]CreatedOn [text]

92.400.016 bytes

Page 16: Bucket your partitions wisely - Cassandra summit 2016

16

Use case

Dev: „That might be ok if we don‘t delete too much, it‘ll be around 93 MB for 300.000 rows on a partition“

PO: „Small mistake on my side... It actually could happen that someone

inserts 20 million notes.“Well, that escalated quickly

Page 17: Bucket your partitions wisely - Cassandra summit 2016

17

Use case – Let‘s do the math

Ok, just for fun: How much data is that on disk? Note_by_noteboo

kP Notebook

[text]C Title [text]

Comment [text]Owner [text]CreatedOn [text]

sum(sizeof(P))+ sum(sizeof(S))+ num_rows* (sum(sizeof(C)+sum(regular_column))+ 8*num_values= bytes_per_partition

16 bytes+ 0 bytes+ 20.000.000* (60 bytes + 224 bytes) + 8 bytes * 60.000.000= 6.160.000.016 bytes

Page 18: Bucket your partitions wisely - Cassandra summit 2016

18

Bucketing strategies

Page 19: Bucket your partitions wisely - Cassandra summit 2016

19

Bucketing strategies – Incrementing Bucket id

Incrementing bucket „counter“ based on row count inside partition

+ Good if client is able to track the count- Not very scalable- Possible unreliable counter

insertNote bucketFull? no

yesBucket++

notebook Bucketn1 0n1 1

Note_by_notebook

P Notebook [text]

P bucket [int]C Title [text]

...

Page 20: Bucket your partitions wisely - Cassandra summit 2016

20

Bucketing strategies – Unique bucketing

insertNote bucketFull? no

yes New bucketuuid2

notebook Bucketn1 uuid1n1 uuid2

Identify buckets using uuids

+ Good if clients are able to track the count+ Better scaleable- Possibly unreliable counter- Lookuptable(s) needed

Note_by_notebook

P Notebook [text]

P bucket [uuid]C Title [text]

...

Page 21: Bucket your partitions wisely - Cassandra summit 2016

21

Bucketing strategies – Time based bucketing

Split partitions in descrete timeframese.g. new Bucket every 10 minutes

+ Amount of buckets per day defined+ Fast solution on insert- Not very scalable

Time notebookBucket0:00 – 0:10 n1 00:10 – 0:20 n1 10:20 – 0:30 n1 2 Note_by_noteboo

kP Notebook

[text]P bucket [int]C Title [text]

...

Page 22: Bucket your partitions wisely - Cassandra summit 2016

22

Bucketing strategies – Hash based bucketing

Calculate buckets using primary key

Note_by_notebook

P Notebook [text]

C Title [text]...

9523% 2000

notebook Bucketn1 1523n1 1723

Example: Amount of Buckets = 2000

7723% 2000

#

#

+ Amount of buckets defined+ Deterministic+ Fast solution- Not possible if amount of rows is unknown

Note_by_notebook

P Notebook [text]

P bucket [int]C Title [text]

...

Page 23: Bucket your partitions wisely - Cassandra summit 2016

23

Incrementing

Time based

Unique Hash based

Unknown amount of Notes

- + -

Scaleable - - + -No lookuptables needed

- - +

Fast for writing + + +Amount of buckets known

- + - +

Bucketing strategies – Comparison

Page 24: Bucket your partitions wisely - Cassandra summit 2016

24

Datamodel„notebook“

Page 25: Bucket your partitions wisely - Cassandra summit 2016

25

Datamodel – Unique bucketingnote_by_notebook

P Notebook [text]P Bucket [timeuuid]C Title [text]

Comment [text]Creator [text]CreatedOn [timestamp]

notebook_partitions_by_name

P Notebook [text]C Bucket [timeuuid]

notebook_partitions_by_note

P Notebook [text]P Note_title [text]

Bucket [timeuuid]

Problems:● How to make sure

partitions don‘t grow too big?

● How to make sure notes are not picked twice?

Page 26: Bucket your partitions wisely - Cassandra summit 2016

26

How to make sure partitions don‘t grow too big?

● Client side caching for writing● Client instance „owns“ partition for

distinct time● Creates new partition after this time

Datamodel

Page 27: Bucket your partitions wisely - Cassandra summit 2016

27

How to make sure notes are not picked twice?

● Fetch whole partition not only one note

● Partition is „owned“ by one client instance for a certain amount of time

● After that time it can be fetched again

Datamodel

Page 28: Bucket your partitions wisely - Cassandra summit 2016

28

Conclusion

● Scaleable● Partition sizes something around 1000

notes per notebook● Fast writes● Fast enough reads

Datamodel

Page 29: Bucket your partitions wisely - Cassandra summit 2016

29

Lessons learned

Page 30: Bucket your partitions wisely - Cassandra summit 2016

30

Lessons learned

• Annoy your PO!

• Be sure about your datamodel before going productive!

• Do the math!

• Be aware of the problems caused by too big partitions and tombstones!

• Delete partitions, not rows when possible!

Page 31: Bucket your partitions wisely - Cassandra summit 2016

31

Questions?

Markus HöferIT [email protected]

www.codecentric.deblog.codecentric.de/en

HashtagMarkus