Tuesday, April 20, 2010
Tuesday, April 20, 2010
Tuesday, April 20, 2010
“NoSQL”
Reliability
Scaling B&D
Performance
Tuesday, April 20, 2010
Performance
Tuesday, April 20, 2010
Each Cassandra node manages its storage locally. Not limited by obsolete systems, and not slowed by layering on top of a DFS.
b-trees
Tuesday, April 20, 2010
read-before-writeindex in ramrandom i/o
Memtable / SSTable
Tuesday, April 20, 2010
Durable
• Write to commitlog
• fsync is cheap since it’s append-only
• Write to memtable
• [amortized] flush memtable to sstable
Tuesday, April 20, 2010
Cassandra is one of the few NoSQL systems that is suitable for use when data loss is unacceptable.
SSTable format, briefly
<row data 0><row data 1>
...<row data 127>
...<row data 255>
...
<key 127><key 255>
...
Tuesday, April 20, 2010
Scaling
Tuesday, April 20, 2010
How managing our own data helps scaling
Scaling
• Facebook: grew from less than 80 machines to 150+
• SimpleGEO: from 20 EC2 Large instances to 50+
Tuesday, April 20, 2010
How it works
Tuesday, April 20, 2010
A
L
T
W
Tuesday, April 20, 2010
A
L
T
W
F
Tuesday, April 20, 2010
A
L
T
W
F(A-F]
(F-L]
Tuesday, April 20, 2010
A
L
T
W
F
Key “C”
Tuesday, April 20, 2010
Reliability
• No single points of failure
• Multiple datacenters
• Monitorable
Tuesday, April 20, 2010
Design
Tuesday, April 20, 2010
The opposite of heroes
• “If your software wakes people up at 4 AM to fix it, you’re doing it wrong.”
Tuesday, April 20, 2010
A
L
T
W
Tuesday, April 20, 2010
Every node is equal
A
L
T
W
F
P
Y Key “C”
U
Tuesday, April 20, 2010
Always at least one copy in each datacenterAlternate datacenters on the ring
Monitorable
Tuesday, April 20, 2010
Events
Tuesday, April 20, 2010
JMX
Tuesday, April 20, 2010
Bondage & Discipline
• Twitter: “Fifteen months ago, it took two weeks to perform ALTER TABLE on the statuses [tweets] table.”
Tuesday, April 20, 2010
ColumnFamiliesColumns
Tuesday, April 20, 2010
SuperColumns
SuperColumns
Tuesday, April 20, 2010
TwissandraUser = { 'a4a70900-24e1-11df-8924-001ff3591711': { 'id': 'a4a70900-24e1-11df-8924-001ff3591711', 'username': 'ericflo', 'password': '****', },}
Followers = { 'a4a70900-24e1-11df-8924-001ff3591711': { # friend id: timestamp of when the followership was added '10cf667c-24e2-11df-8924-001ff3591711': '1267413962580791', '343d5db2-24e2-11df-8924-001ff3591711': '1267413990076949', '3f22b5f6-24e2-11df-8924-001ff3591711': '1267414008133277', },}
Tuesday, April 20, 2010
Tweet = { '7561a442-24e2-11df-8924-001ff3591711': { 'id': '89da3178-24e2-11df-8924-001ff3591711', 'user_id': 'a4a70900-24e1-11df-8924-001ff3591711', 'body': 'Trying out Twissandra. This is awesome!', '_ts': '1267414173047880', },}Timeline = { 'a4a70900-24e1-11df-8924-001ff3591711': { # timestamp of tweet: tweet id 1267414247561777: '7561a442-24e2-11df-8924-001ff3591711', 1267414277402340: 'f0c8d718-24e2-11df-8924-001ff3591711', 1267414305866969: 'f9e6d804-24e2-11df-8924-001ff3591711', 1267414319522925: '02ccb5ec-24e3-11df-8924-001ff3591711', },}
Tuesday, April 20, 2010
Tuesday, April 20, 2010
DenormalizeUserline = { 'a4a70900-24e1-11df-8924-001ff3591711': { # timestamp of tweet: tweet id 1267414247561777: '7561a442-24e2-11df-8924-001ff3591711', 1267414277402340: 'f0c8d718-24e2-11df-8924-001ff3591711', 1267414305866969: 'f9e6d804-24e2-11df-8924-001ff3591711', 1267414319522925: '02ccb5ec-24e3-11df-8924-001ff3591711', },}
Tuesday, April 20, 2010
A note on UUIDs
• TimeUUID = Version 1 UUID
• LexicalUUID = any UUID
• usually version 4
Tuesday, April 20, 2010
UUIDs are better than timestamps
Questions
Tuesday, April 20, 2010