40
Professional Cassandra support and services Tuesday, August 10, 2010

State of Cassandra, August 2010

  • Upload
    jbellis

  • View
    7.788

  • Download
    2

Embed Size (px)

DESCRIPTION

Keynote from the 2010 Cassandra Summit

Citation preview

Page 1: State of Cassandra, August 2010

Professional Cassandra support and services

Tuesday, August 10, 2010

Page 2: State of Cassandra, August 2010

Cassandra: Present & FutureJonathan Ellis

@spyced

Tuesday, August 10, 2010

Page 3: State of Cassandra, August 2010

Cassandra 0.6 & 0.7Jonathan Ellis

@spyced

Tuesday, August 10, 2010

Page 4: State of Cassandra, August 2010

Quiet change of policy

• 0.5.1 was bug fixes only

• Too early to be strict about bugfix-only policy in stable branch, especially w/ 0.7 being longer/more break-y

• Maybe after 1.0?

Tuesday, August 10, 2010

Page 5: State of Cassandra, August 2010

0

375

750

1125

1500

Jan(0.5)

Feb(0.5.1) Mar

Apr(0.6, 0.6.1)

May(0.6.2)

Jun(0.6.3)

Jul(0.6.4)

mails sent

Tuesday, August 10, 2010

Page 6: State of Cassandra, August 2010

Lots of bug fixes

• 85 issues marked Resolved/Fixed in 0.6 branch after 0.6 released

Tuesday, August 10, 2010

Page 7: State of Cassandra, August 2010

Runtime configuration

• concurrent reads, writes (0.6.2)

• making it easier to bandage your foot after you shoot it

• PhiConvictThreshold (0.6.2)

Tuesday, August 10, 2010

Page 8: State of Cassandra, August 2010

Performance

• JVM GC defaults (0.6.2)

• Faster commitlog (0.6.2)

• Faster range slice, Hadoop jobs (0.6.1, 2)

• Better parallelization of multiget (0.6.4)

• UTF8Type, UUIDType optimizations (0.6.5)

Tuesday, August 10, 2010

Page 9: State of Cassandra, August 2010

Bulletproofing

• HH disable (0.6.2)

• compaction priority (0.6.3)

• HH hourly scan (0.6.3)

• JMX metrics for row-level bloom filters (0.6.3)

• Flow control (0.6.4, 5)

• HH paging (0.6.5)

• Dynamic snitch (0.6.5)

Tuesday, August 10, 2010

Page 10: State of Cassandra, August 2010

Hinted Handoff

• 0.6.0: send hints to natural replicas

• 0.6.0: fix row-level concurrency bottleneck

• 0.6.2: option to disable entirely

• 0.6.3: remove hourly scan

• 0.6.4: lower priority

• 0.6.5: paging of large hinted rows

• 0.7.0: large rows

Tuesday, August 10, 2010

Page 11: State of Cassandra, August 2010

Why keep HH around?

https://www.cloudkick.com/blog/2010/jan/12/visual-ec2-latency/

Tuesday, August 10, 2010

Page 12: State of Cassandra, August 2010

Compaction priority

-XX:+UseThreadPriorities \-XX:ThreadPriorityPolicy=42 \-Dcassandra.compaction.priority=1 \

Extended to HH in 0.6.4

Tuesday, August 10, 2010

Page 13: State of Cassandra, August 2010

http://www.javamex.com/tutorials/threads/priority_what.shtml

Tuesday, August 10, 2010

Page 14: State of Cassandra, August 2010

JMX for bloom filters

• o.a.c.db:ColumnFamilyStores

• getBloomFilterFalsePositives

• [not in nodetool yet]

Tuesday, August 10, 2010

Page 15: State of Cassandra, August 2010

Flow control in 0.5

• Why backpressure doesn’t fit Cassandra

Tuesday, August 10, 2010

Page 16: State of Cassandra, August 2010

Flow Control in 0.6.4

• Replica nodes drop hopeless requests on the floor

• Coordinator node is unaffected

• TimedOutException signals client to back off

• Requires enough memory to buffer RPCTimeout’s worth of requests

• (In the short term, you’re still screwed)

Tuesday, August 10, 2010

Page 17: State of Cassandra, August 2010

Flow Control, 0.6.4IncomingTcpConnection

Message Deserializer

MutationRead

Uncapped

Capped at 4096

Tuesday, August 10, 2010

Page 18: State of Cassandra, August 2010

IncomingTcpConnection

Message Deserializer

MutationRead Gossip

Tuesday, August 10, 2010

Page 19: State of Cassandra, August 2010

Flow Control, 0.6.5IncomingTcpConnection

MutationRead Gossip Uncapped

Tuesday, August 10, 2010

Page 20: State of Cassandra, August 2010

Dynamic snitch

• sortByProximity

Tuesday, August 10, 2010

Page 21: State of Cassandra, August 2010

Open problems

• Linux/mmap/swap unholy trio (0.6.5)

• Memory fragmentation (0.6.5?)

• Compaction effect on caches (0.7.1?)

Tuesday, August 10, 2010

Page 22: State of Cassandra, August 2010

mmap and swap

• The problem

• Mitigations

• mmap_index_only

• swappiness=0

• turn off swap

• mlockall at startup (Xms=Xmx)

Tuesday, August 10, 2010

Page 23: State of Cassandra, August 2010

GC Fragmentation

• Culprit of infamous CASSANDRA-1014?

• Mitigation: tune with much larger new generation / tenuring threshold?

Tuesday, August 10, 2010

Page 24: State of Cassandra, August 2010

Compaction and caches

• Compactions wrecks the OS fs cache

• Wrecks Cassandra key cache, too

• (but not row cache)

Tuesday, August 10, 2010

Page 25: State of Cassandra, August 2010

0.7

Tuesday, August 10, 2010

Page 26: State of Cassandra, August 2010

New in 0.7

• live schema changes

• large rows

• secondary indexes

• efficient Streaming

• DatacenterStrategy

Tuesday, August 10, 2010

Page 28: State of Cassandra, August 2010

Large rows

• 0.6: smaller of {2GB, memory limit}

• 0.7: in_memory_compaction_limit_in_mb

Tuesday, August 10, 2010

Page 29: State of Cassandra, August 2010

Secondary indexes

Tuesday, August 10, 2010

Page 30: State of Cassandra, August 2010

A

L

T

W

F(A-L]

Streaming in 0.6

Tuesday, August 10, 2010

Page 31: State of Cassandra, August 2010

A

L

T

W

F(A-F]

(F-L]

(A-F]

Tuesday, August 10, 2010

Page 32: State of Cassandra, August 2010

A

L

T

W

F

Data

Index

Filter

Tuesday, August 10, 2010

Page 33: State of Cassandra, August 2010

A

L

T

W

F

Index

Filter

Streaming in 0.7

Tuesday, August 10, 2010

Page 34: State of Cassandra, August 2010

DatacenterStrategy

• RackAwareStrategy is tuned for 3 replicas and 2 data centers

• DS allows configuring replicas per data center, per Keyspace

Tuesday, August 10, 2010

Page 35: State of Cassandra, August 2010

Minor features in 0.7

• read_repair_chance

• per-keyspace request scheduling

• Hadoop OutputFormat

• Per CF what used to be global (gc_grace_seconds, memtable thresholds)

Tuesday, August 10, 2010

Page 36: State of Cassandra, August 2010

0.7 API changes

• String keys become byte[]

• Thrift keyspace argument moved to set_keyspace

• i64 timestamp becomes Clock

• SlicePredicate for _count methods

Tuesday, August 10, 2010

Page 37: State of Cassandra, August 2010

0.7 performance

• Reads roughly 100% faster, thanks largely to removing String creation

• Row-cached reads up to 8x faster after optimizations by tjake and jbellis

• Optimizations for reads of large rows

• 0.7.1? ~20% improvement everywhere from Thrift optimizations

Tuesday, August 10, 2010

Page 38: State of Cassandra, August 2010

Thrift

• OOMs on malformed packets

• Python Unicode string issues

• PHP support is buggy and maintainerless

Tuesday, August 10, 2010

Page 39: State of Cassandra, August 2010

After 0.7.0

• IndexOperator.GT

• Triggers / plugins

• Avro?

• On-disk data format improvements (Compression, heirarchical data?)

• Auth

Tuesday, August 10, 2010

Page 40: State of Cassandra, August 2010

Questions

Tuesday, August 10, 2010