29
Dealing with JVM limitations in Apache Cassandra Jonathan Ellis / @spyced

Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

  • Upload
    jbellis

  • View
    13.368

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

Dealing with JVM limitationsin Apache Cassandra

Jonathan Ellis / @spyced

Page 2: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

Pain points for Java databases

✤ GC✤ GC✤ GC

Page 3: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

Pain points for Java databases

✤ GC✤ Platform specific code

Page 4: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

GC

✤ Concurrent and compacting: choose one✤ G1✤ Azul C4 / Zing?

Page 5: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

Fragmentation

✤ Bloom filter arrays✤ Compression offsets

Page 7: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

Fragmentation, 2

✤ Arena allocation for memtables

Page 8: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

(Memtables?)

Memory

Hard drive

Memtable

write( , )k1 c1:v1

Commit log

Page 9: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

Memory

Hard drive

Memtable

write( , )k1 c1:v

Commit log

k1 c1:v

k1 c1:v

Page 10: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

Memory

Hard drive

write( , )k1 c2:v

k1 c1:v

k1 c1:v

k1 c2:v

c2:v

Page 11: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

Memory

Hard drive

k1 c1:v

k1 c1:v

k1 c2:v

c2:v

write( , )k2 c1:v c2:v

k2 c1:v c2:v

k2 c1:v c2:v

Page 12: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

Memory

Hard drive

k1 c1:v

k1 c1:v

k1 c2:v

c2:v

write( , )k1 c1:v c3:v

k2 c1:v c2:v

k2 c1:v c2:v

k1 c1:v c3:v

c3:v

Page 13: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

Memory

Hard drive

SSTable

flush

k1 c1:v c2:v

k2 c1:v c2:v

c3:v

index

cleanup

Page 14: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

“Java is a memory hog”

✤ Large overhead for typical objects and collections✤ How large?✤ java.lang.instrument.Instrumentation

✤ JAMM: Java Agent for Memory Measurements✤ https://github.com/jbellis/jamm

Page 15: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

org.apache.cassandra.cache.SerializingCache

✤ Live objects are about 85% JVM bookeeping✤ org.apache.cassandra.cache.FreeableMemory using reference

counting✤ Considering doing reference-counted, off-heap memtables

as well

Page 16: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

Don’t forget about young gen

✤ Always stop-the-world for ~100ms

Page 17: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

Platform-specific code

✤ OS✤ JVM

Page 18: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

m[un]map

✤ Log-structured storage wants to remove old files post-compaction; some platforms disallow deleting open files

✤ Old workaround (pre-1.0): ✤ use PhantomReference to tell when mmap’d file is GC (hence

unmapped)✤ Poor user experience and messy corner cases

✤ New workaround:✤ Class.forName("sun.nio.ch.DirectBuffer").getMethod("cleaner")

Page 19: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

mmap part 2

✤ 2GB limit via ByteBuffer: public abstract byte get(int index)

✤ Workaround: MmappedSegmentedFilepublic Iterator<DataInput> iterator(long position)

Page 20: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

link

✤ Used for snapshots✤ Old workaround: JNA✤ New workaround: supported directly by Java7

Page 21: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

mlockall

✤ swappiness: pissing off database developers since 2001 (?)✤ mlockall(MCL_CURRENT)

Page 22: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

Low-level i/o

✤ posix_fadvise✤ mincore/fincore✤ fctl

✤ ... JNA

Page 23: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

A plug for JNA

✤ https://github.com/twall/jna

static { try { Native.register("c"); ...

private static native int mlockall(int flags) throws LastErrorException;

Page 24: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

The fallacy of choosing portability over power

✤ Applets have been dead for years✤ Python gets it right

✤ import readline

Page 25: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

The fallacy of choosing safety over power

✤ Allowing munmap would expose developers to segfaults✤ But, relying on the GC to clean up external resources is a

well-known antipattern✤ File.close

✤ We need munmap badly enough that we resort to unnatural and unportable code to get it✤ You haven’t kept us from risking segfaults, you’ve just made us

miserable

Page 26: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

Compatibility through obscurity?

✤ sun.misc.Unsafe✤ Used by high-profile libraries like high-scale-lib

Page 27: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

... even public options

http://blogs.oracle.com/dave/entry/false_sharing_induced_by_card

Page 28: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

Too negative?

Page 29: Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

Still true

✤ "Many concurrent algorithms are very easy to write with a GC and totally hard (to down right impossible) using explicit free." -- Cliff Click