Cassandra Anti-Patterns
by Victor [email protected] - http://victoranjos.com
twitter.com/victorfanjos
This slideshow is for educational use only and is not affiliated with Jeopardy!® or Sony Pictures Digital Inc. Jeopardy! is a registered trademark of Jeopardy Productions, Inc. ©2005 Jeopardy Productions, Inc. All rights reserved.
JEOPARDY BOARDMath is Hard
tl;drRTFM
So 20th Century
On the catwalk Physics
$100 $100 $100 $100 $100
$200 $200 $200 $200 $200
$300 $300 $300 $300 $300
$400 $400 $400 $400 $400
$500 $500 $500 $500 $500
FINAL JEOPARDY
In a cluster of containing 5 nodes, I am the very important number of 3.
Math is Hard - $100 Question
Click to see answer
What is Quorum?
Math is Hard - $100 Answer
Click to return to Jeopardy Board
With Replication Factor of 2 (RF = 2) and
Consistency Level of Quorum (CL = Quorum), many operations were
failing when a single node was down because of this
simple math equation.
Math is Hard - $200 Question
Click to see answer
What is 1 + 1 = 2?
Math is Hard - $200 Answer
Click to return to Jeopardy Board
This was very high due to the CFHistograms
showing the majority of reads hitting hundreds of
SSTables
Math is Hard - $300 Question
Click to see answer
What is Read Latency?
Math is Hard - $300 Answer
Click to return to Jeopardy Board
Updating 90% of your rows with TTL = 0 to
purge data because disk space was running low
was this.
Math is Hard - $400 Question
Click to see answer
What is a REALLY BAD IDEA?
(we will also accept “What is Incredibly Stupid?”)
Math is Hard - $400 Answer
Click to return to Jeopardy Board
Adding this one word to session.execute in your
application code will greatly enhance your
application’s perceived performance.
Math is Hard - $500 Question
Click to see answer
What is Async?(session.executeAsync)
Math is Hard - $500 Answer
Click to return to Jeopardy Board
2GB is too small and 20GB is too large for this,
but people have used both in production and
had horrible things happen.
RTFM - $100 Question
Click to see answer
What is Heap Size?
RTFM - $100 Answer
Click to return to Jeopardy Board
The docs would recommend you to turn this off, however we can
go further and tell you it’s really a requirement.
RTFM - $200 Question
Click to see answer
What is Swap?
RTFM - $200 Answer
Click to return to Jeopardy Board
These should be used for atomicity, not performance.
RTFM - $300 Question
Click to see answer
What are Batches?
RTFM - $300 Answer
Click to return to Jeopardy Board
As of Cassandra 1.2, the best way to start thinking of wide or dynamic rows
was to use.
RTFM - $400 Question
Click to see answer
What are Compound Keys?
(Can also accept “What are Collections?”)
RTFM - $400 Answer
Click to return to Jeopardy Board
By setting up a DC in New York and another one in
sunny Rio de Janeiro, then using the Simple Snitch strategy, you are sure to
have this happen.
RTFM - $500 Question
Click to see answer
What is everything timing out?
RTFM - $500 Answer
Click to return to Jeopardy Board
This storage mechanism introduces a single point
of failure and totally negates the performance
benefits of Cassandra.
So 20th Century - $100 Question
Click to see answer
What is a SAN?
So 20th Century - $100 Answer
Click to return to Jeopardy Board
A commonly used RDBMS object that should only be
used on low-cardinality columns and not as a band-aid solution for a poorly designed data
model.
So 20th Century - $200 Question
Click to see answer
What is a Secondary Index?
So 20th Century - $200 Answer
Click to return to Jeopardy Board
Sometimes referred to as Brewer’s Theorem,
traditional RDBMS sits on one edge and most web
scale DBs sit on the other two or this triangle.
So 20th Century - $300 Question
Click to see answer
What is CAP Theorem?
So 20th Century - $300 Answer
Click to return to Jeopardy Board
Using these type of machines is an expensive
way to make your Cassandra clusters
inefficient.
So 20th Century - $400 Question
Click to see answer
What are fat nodes?
So 20th Century - $400 Answer
Click to return to Jeopardy Board
You should not use this to find out exactly how
many rows you have in your Cassandra database.
So 20th Century - $500 Question
Click to see answer
What is SELECT COUNT(*)?
So 20th Century - $500 Answer
Click to return to Jeopardy Board
Cassandra can often be referred to as a specific type of database, which
leads its data model paradigm.
On the catwalk - $100 Question
Click to see answer
What is Column-oriented?
On the catwalk - $100 Answer
Click to return to Jeopardy Board
Combining several fields to create one key.
On the catwalk - $200 Question
Click to see answer
What are composite keys?
On the catwalk - $200 Answer
Click to return to Jeopardy Board
When creating data models for Cassandra,
one should focus on this.
On the catwalk - $300 Question
Click to see answer
What is Query Optimization?
On the catwalk - $300 Answer
Click to return to Jeopardy Board
A concept borrowed from mathematical logic used
to perform extra writes on a CF, thus making
querying more flexible.
On the catwalk - $400 Question
Click to see answer
What is a truth table?
On the catwalk - $400 Answer
Click to return to Jeopardy Board
Data seems to disappear from collections due to
this.
On the catwalk - $500 Question
Click to see answer
What is a 64K item limit?
On the catwalk - $500 Answer
Click to return to Jeopardy Board
These can only spin so fast, so even though C*
was designed to minimize disk seeks, SSDs are still a better choice for low
read latency SLAs.
Physics - $100 Question
Click to see answer
What are rotational disks?
Physics - $100 Answer
Click to return to Jeopardy Board
Physics - $200 Question
These were timing out on 20GB partitions with
15MB slices.
Click to see answer
What are reads?
Physics - $200 Answer
Click to return to Jeopardy Board
You need to consider this constant when looking at the time for replication
across a WAN.
Physics - $300 Question
Click to see answer
What is the Speed of Light?
Physics - $300 Answer
Click to return to Jeopardy Board
Having more is better, specifically as of
Cassandra 2.1 since memtables are (nearly)
off heap.
Physics - $400 Question
Click to see answer
What is RAM?
Physics - $400 Answer
Click to return to Jeopardy Board
Using a 50MB network pipe to move 300MB
inserts is this.
Physics - $500 Question
Click to see answer
What is a really poor idea?
Physics - $500 Answer
Click to return to Jeopardy Board
Topic: Consensus
FINAL
Click to see question
As of Cassanda 2.0, using the “IF NOT EXISTS”
directive at the end of an INSERT/UPDATE gives us.
Final Jeopardy Question
Click to see answer
What are Lightweight Transactions?
Final Jeopardy Answer
Click to return to Jeopardy Board