Upload
jbellis
View
13.368
Download
1
Tags:
Embed Size (px)
Citation preview
Dealing with JVM limitationsin Apache Cassandra
Jonathan Ellis / @spyced
Pain points for Java databases
✤ GC✤ GC✤ GC
Pain points for Java databases
✤ GC✤ Platform specific code
GC
✤ Concurrent and compacting: choose one✤ G1✤ Azul C4 / Zing?
Fragmentation
✤ Bloom filter arrays✤ Compression offsets
Automatic mitigation?
✤ http://www.research.ibm.com/people/d/dfb/papers/Bacon03Controlling.pdf
✤ http://researcher.ibm.com/files/us-hirzel/pldi10-arraylets.pdf
Fragmentation, 2
✤ Arena allocation for memtables
(Memtables?)
Memory
Hard drive
Memtable
write( , )k1 c1:v1
Commit log
Memory
Hard drive
Memtable
write( , )k1 c1:v
Commit log
k1 c1:v
k1 c1:v
Memory
Hard drive
write( , )k1 c2:v
k1 c1:v
k1 c1:v
k1 c2:v
c2:v
Memory
Hard drive
k1 c1:v
k1 c1:v
k1 c2:v
c2:v
write( , )k2 c1:v c2:v
k2 c1:v c2:v
k2 c1:v c2:v
Memory
Hard drive
k1 c1:v
k1 c1:v
k1 c2:v
c2:v
write( , )k1 c1:v c3:v
k2 c1:v c2:v
k2 c1:v c2:v
k1 c1:v c3:v
c3:v
Memory
Hard drive
SSTable
flush
k1 c1:v c2:v
k2 c1:v c2:v
c3:v
index
cleanup
“Java is a memory hog”
✤ Large overhead for typical objects and collections✤ How large?✤ java.lang.instrument.Instrumentation
✤ JAMM: Java Agent for Memory Measurements✤ https://github.com/jbellis/jamm
org.apache.cassandra.cache.SerializingCache
✤ Live objects are about 85% JVM bookeeping✤ org.apache.cassandra.cache.FreeableMemory using reference
counting✤ Considering doing reference-counted, off-heap memtables
as well
Don’t forget about young gen
✤ Always stop-the-world for ~100ms
Platform-specific code
✤ OS✤ JVM
m[un]map
✤ Log-structured storage wants to remove old files post-compaction; some platforms disallow deleting open files
✤ Old workaround (pre-1.0): ✤ use PhantomReference to tell when mmap’d file is GC (hence
unmapped)✤ Poor user experience and messy corner cases
✤ New workaround:✤ Class.forName("sun.nio.ch.DirectBuffer").getMethod("cleaner")
mmap part 2
✤ 2GB limit via ByteBuffer: public abstract byte get(int index)
✤ Workaround: MmappedSegmentedFilepublic Iterator<DataInput> iterator(long position)
link
✤ Used for snapshots✤ Old workaround: JNA✤ New workaround: supported directly by Java7
mlockall
✤ swappiness: pissing off database developers since 2001 (?)✤ mlockall(MCL_CURRENT)
Low-level i/o
✤ posix_fadvise✤ mincore/fincore✤ fctl
✤ ... JNA
A plug for JNA
✤ https://github.com/twall/jna
static { try { Native.register("c"); ...
private static native int mlockall(int flags) throws LastErrorException;
The fallacy of choosing portability over power
✤ Applets have been dead for years✤ Python gets it right
✤ import readline
The fallacy of choosing safety over power
✤ Allowing munmap would expose developers to segfaults✤ But, relying on the GC to clean up external resources is a
well-known antipattern✤ File.close
✤ We need munmap badly enough that we resort to unnatural and unportable code to get it✤ You haven’t kept us from risking segfaults, you’ve just made us
miserable
Compatibility through obscurity?
✤ sun.misc.Unsafe✤ Used by high-profile libraries like high-scale-lib
... even public options
http://blogs.oracle.com/dave/entry/false_sharing_induced_by_card
Too negative?
Still true
✤ "Many concurrent algorithms are very easy to write with a GC and totally hard (to down right impossible) using explicit free." -- Cliff Click