55
Cassandra Anti- Patterns by Victor Anjos [email protected] - http://victoranjos.com twitter.com/ victorfanjos This slideshow is for educational use only and is not affiliated with Jeopardy!® or Sony Pictures Digital Inc. Jeopardy! is a registered trademark of Jeopardy Productions, Inc. ©2005 Jeopardy Productions, Inc. All rights reserved.

Cassandra Jeopardy Best Practices

Embed Size (px)

Citation preview

Page 1: Cassandra Jeopardy Best Practices

Cassandra Anti-Patterns

by Victor [email protected] - http://victoranjos.com

twitter.com/victorfanjos

This slideshow is for educational use only and is not affiliated with Jeopardy!® or Sony Pictures Digital Inc. Jeopardy! is a registered trademark of Jeopardy Productions, Inc. ©2005 Jeopardy Productions, Inc. All rights reserved.

Page 2: Cassandra Jeopardy Best Practices

JEOPARDY BOARDMath is Hard

tl;drRTFM

So 20th Century

On the catwalk Physics

$100 $100 $100 $100 $100

$200 $200 $200 $200 $200

$300 $300 $300 $300 $300

$400 $400 $400 $400 $400

$500 $500 $500 $500 $500

FINAL JEOPARDY

Page 3: Cassandra Jeopardy Best Practices

In a cluster of containing 5 nodes, I am the very important number of 3.

Math is Hard - $100 Question

Click to see answer

Page 4: Cassandra Jeopardy Best Practices

What is Quorum?

Math is Hard - $100 Answer

Click to return to Jeopardy Board

Page 5: Cassandra Jeopardy Best Practices

With Replication Factor of 2 (RF = 2) and

Consistency Level of Quorum (CL = Quorum), many operations were

failing when a single node was down because of this

simple math equation.

Math is Hard - $200 Question

Click to see answer

Page 6: Cassandra Jeopardy Best Practices

What is 1 + 1 = 2?

Math is Hard - $200 Answer

Click to return to Jeopardy Board

Page 7: Cassandra Jeopardy Best Practices

This was very high due to the CFHistograms

showing the majority of reads hitting hundreds of

SSTables

Math is Hard - $300 Question

Click to see answer

Page 8: Cassandra Jeopardy Best Practices

What is Read Latency?

Math is Hard - $300 Answer

Click to return to Jeopardy Board

Page 9: Cassandra Jeopardy Best Practices

Updating 90% of your rows with TTL = 0 to

purge data because disk space was running low

was this.

Math is Hard - $400 Question

Click to see answer

Page 10: Cassandra Jeopardy Best Practices

What is a REALLY BAD IDEA?

(we will also accept “What is Incredibly Stupid?”)

Math is Hard - $400 Answer

Click to return to Jeopardy Board

Page 11: Cassandra Jeopardy Best Practices

Adding this one word to session.execute in your

application code will greatly enhance your

application’s perceived performance.

Math is Hard - $500 Question

Click to see answer

Page 12: Cassandra Jeopardy Best Practices

What is Async?(session.executeAsync)

Math is Hard - $500 Answer

Click to return to Jeopardy Board

Page 13: Cassandra Jeopardy Best Practices

2GB is too small and 20GB is too large for this,

but people have used both in production and

had horrible things happen.

RTFM - $100 Question

Click to see answer

Page 14: Cassandra Jeopardy Best Practices

What is Heap Size?

RTFM - $100 Answer

Click to return to Jeopardy Board

Page 15: Cassandra Jeopardy Best Practices

The docs would recommend you to turn this off, however we can

go further and tell you it’s really a requirement.

RTFM - $200 Question

Click to see answer

Page 16: Cassandra Jeopardy Best Practices

What is Swap?

RTFM - $200 Answer

Click to return to Jeopardy Board

Page 17: Cassandra Jeopardy Best Practices

These should be used for atomicity, not performance.

RTFM - $300 Question

Click to see answer

Page 18: Cassandra Jeopardy Best Practices

What are Batches?

RTFM - $300 Answer

Click to return to Jeopardy Board

Page 19: Cassandra Jeopardy Best Practices

As of Cassandra 1.2, the best way to start thinking of wide or dynamic rows

was to use.

RTFM - $400 Question

Click to see answer

Page 20: Cassandra Jeopardy Best Practices

What are Compound Keys?

(Can also accept “What are Collections?”)

RTFM - $400 Answer

Click to return to Jeopardy Board

Page 21: Cassandra Jeopardy Best Practices

By setting up a DC in New York and another one in

sunny Rio de Janeiro, then using the Simple Snitch strategy, you are sure to

have this happen.

RTFM - $500 Question

Click to see answer

Page 22: Cassandra Jeopardy Best Practices

What is everything timing out?

RTFM - $500 Answer

Click to return to Jeopardy Board

Page 23: Cassandra Jeopardy Best Practices

This storage mechanism introduces a single point

of failure and totally negates the performance

benefits of Cassandra.

So 20th Century - $100 Question

Click to see answer

Page 24: Cassandra Jeopardy Best Practices

What is a SAN?

So 20th Century - $100 Answer

Click to return to Jeopardy Board

Page 25: Cassandra Jeopardy Best Practices

A commonly used RDBMS object that should only be

used on low-cardinality columns and not as a band-aid solution for a poorly designed data

model.

So 20th Century - $200 Question

Click to see answer

Page 26: Cassandra Jeopardy Best Practices

What is a Secondary Index?

So 20th Century - $200 Answer

Click to return to Jeopardy Board

Page 27: Cassandra Jeopardy Best Practices

Sometimes referred to as Brewer’s Theorem,

traditional RDBMS sits on one edge and most web

scale DBs sit on the other two or this triangle.

So 20th Century - $300 Question

Click to see answer

Page 28: Cassandra Jeopardy Best Practices

What is CAP Theorem?

So 20th Century - $300 Answer

Click to return to Jeopardy Board

Page 29: Cassandra Jeopardy Best Practices

Using these type of machines is an expensive

way to make your Cassandra clusters

inefficient.

So 20th Century - $400 Question

Click to see answer

Page 30: Cassandra Jeopardy Best Practices

What are fat nodes?

So 20th Century - $400 Answer

Click to return to Jeopardy Board

Page 31: Cassandra Jeopardy Best Practices

You should not use this to find out exactly how

many rows you have in your Cassandra database.

So 20th Century - $500 Question

Click to see answer

Page 32: Cassandra Jeopardy Best Practices

What is SELECT COUNT(*)?

So 20th Century - $500 Answer

Click to return to Jeopardy Board

Page 33: Cassandra Jeopardy Best Practices

Cassandra can often be referred to as a specific type of database, which

leads its data model paradigm.

On the catwalk - $100 Question

Click to see answer

Page 34: Cassandra Jeopardy Best Practices

What is Column-oriented?

On the catwalk - $100 Answer

Click to return to Jeopardy Board

Page 35: Cassandra Jeopardy Best Practices

Combining several fields to create one key.

On the catwalk - $200 Question

Click to see answer

Page 36: Cassandra Jeopardy Best Practices

What are composite keys?

On the catwalk - $200 Answer

Click to return to Jeopardy Board

Page 37: Cassandra Jeopardy Best Practices

When creating data models for Cassandra,

one should focus on this.

On the catwalk - $300 Question

Click to see answer

Page 38: Cassandra Jeopardy Best Practices

What is Query Optimization?

On the catwalk - $300 Answer

Click to return to Jeopardy Board

Page 39: Cassandra Jeopardy Best Practices

A concept borrowed from mathematical logic used

to perform extra writes on a CF, thus making

querying more flexible.

On the catwalk - $400 Question

Click to see answer

Page 40: Cassandra Jeopardy Best Practices

What is a truth table?

On the catwalk - $400 Answer

Click to return to Jeopardy Board

Page 41: Cassandra Jeopardy Best Practices

Data seems to disappear from collections due to

this.

On the catwalk - $500 Question

Click to see answer

Page 42: Cassandra Jeopardy Best Practices

What is a 64K item limit?

On the catwalk - $500 Answer

Click to return to Jeopardy Board

Page 43: Cassandra Jeopardy Best Practices

These can only spin so fast, so even though C*

was designed to minimize disk seeks, SSDs are still a better choice for low

read latency SLAs.

Physics - $100 Question

Click to see answer

Page 44: Cassandra Jeopardy Best Practices

What are rotational disks?

Physics - $100 Answer

Click to return to Jeopardy Board

Page 45: Cassandra Jeopardy Best Practices

Physics - $200 Question

These were timing out on 20GB partitions with

15MB slices.

Click to see answer

Page 46: Cassandra Jeopardy Best Practices

What are reads?

Physics - $200 Answer

Click to return to Jeopardy Board

Page 47: Cassandra Jeopardy Best Practices

You need to consider this constant when looking at the time for replication

across a WAN.

Physics - $300 Question

Click to see answer

Page 48: Cassandra Jeopardy Best Practices

What is the Speed of Light?

Physics - $300 Answer

Click to return to Jeopardy Board

Page 49: Cassandra Jeopardy Best Practices

Having more is better, specifically as of

Cassandra 2.1 since memtables are (nearly)

off heap.

Physics - $400 Question

Click to see answer

Page 50: Cassandra Jeopardy Best Practices

What is RAM?

Physics - $400 Answer

Click to return to Jeopardy Board

Page 51: Cassandra Jeopardy Best Practices

Using a 50MB network pipe to move 300MB

inserts is this.

Physics - $500 Question

Click to see answer

Page 52: Cassandra Jeopardy Best Practices

What is a really poor idea?

Physics - $500 Answer

Click to return to Jeopardy Board

Page 53: Cassandra Jeopardy Best Practices

Topic: Consensus

FINAL

Click to see question

Page 54: Cassandra Jeopardy Best Practices

As of Cassanda 2.0, using the “IF NOT EXISTS”

directive at the end of an INSERT/UPDATE gives us.

Final Jeopardy Question

Click to see answer

Page 55: Cassandra Jeopardy Best Practices

What are Lightweight Transactions?

Final Jeopardy Answer

Click to return to Jeopardy Board