38
WORK WITH MULTIPLE HOT TERABYTES IN JVMS PER MINBORG @PMINBORG CTO, SPEEDMENT, INC. See all the presentations from the In-Memory Computing Summit at http://imcsummit.org

IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

Embed Size (px)

Citation preview

Page 1: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

WORK WITH MULTIPLE HOT TERABYTES IN JVMSPER MINBORG@PMINBORGCTO, SPEEDMENT, INC.

See all the presentations from the In-Memory Computing Summit at http://imcsummit.org

Page 2: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

SPEEDMENT, INC.

Page 3: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

3ABOUT PER

Page 4: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

SCENARIO

>1 TB

Application

Source of Truth

In-JVM-Cache

In-Memory Solution

Web ShopStock TradeBankMachine learningEtc.

Page 5: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

PROS OF IN-MEMORY

Improved performance Consistent performance Cost reduction (server, AWS and licenses)

Page 6: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

CHALLENGES OF IN-MEMORY

Optimized Speed Cost and size of Memory Consistency, Restart, DB impact, etc. Organization and size of JVMs

Page 7: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

CHALLENGES OF IN-MEMORY

Optimized Speed Cost and size of Memory Consistency, Restart, DB impact, etc. Organization and size of JVMs

Page 8: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

OPTIMIZED SPEED

No matter how advanced database you may ever use, it is really the data locality that counts

Eventually, memory will cost less than x $/GB (Pick any x)

Page 9: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

LATENCIES USING THE SPEED OF LIGHT

Database query (1 s)

Page 10: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

LATENCIES USING THE SPEED OF LIGHT

Disk Seek – LA TCP (DC) – SJ SSD - Oakland

Page 11: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

LATENCIES USING THE SPEED OF LIGHT

Main Memory CPU L3 Cache

Page 12: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

LATENCIES USING THE SPEED OF LIGHT

CPU L2 Cache CPU L1 Cache

Page 13: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

CHALLENGES OF IN-MEMORY

Optimized Speed Cost and size of Memory Consistency, Restart, DB impact, etc. Organization and size of JVMs

Page 14: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

TITLE OF SLIDE GOES HEREHow much does 1 GB cost?

Page 15: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

BACK TO THE FUTURE

$ 5

$ 0.04

$ 720,000

$ 67,000,000,000

Source: http://www.jcmit.com/memoryprice.htm

Page 16: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

BACK TO THE FUTURE

Page 17: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

CHALLENGES OF IN-MEMORY

Optimized Speed Cost and size of Memory Consistency, Restart, DB impact, etc. Organization and size of JVMs

Page 18: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

CACHE SYNCHRONIZATION STRATEGIES

• Dumps are reloaded periodically • All data elements are reloaded• Data remains unchanged between

reloads• System restart is just a reload

DUMP AND LOAD• Data evicted, refreshed or marked

as old• Evicted element are reloaded• Data changes all the time• System restart either warm-up the

cache or use a cold cache

POLL

Page 19: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

CACHE SYNCHRONIZATION STRATEGIES

• Changed data is captured in the Database• Changed data events are pushed into the cache• Events are grouped in transactions• Cache updates are persisted• Data changes all the time• System restart, replay the missed events

REACTIVE PERSISTANT CACHING

Page 20: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

COMPARISON

Dump and Load Caching

Poll Caching Reactive Persistance Caching

Max Data Age Dump period Eviction time Replication Latency - ms

Lookup Performance Consistently Instant ~20% slow Consistently Instant

Consistency Eventually Consistent Inconsistent - stale data Eventually Consistent

Database Cache Update Load

Total Size Depends on Eviction Time and Access Pattern

Rate of Change

Restart Complete Reload Eviction Time Down time update rate -> 10% of down time

*

Page 21: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

CHALLENGES OF IN-MEMORY

Optimized Speed Cost and size of Memory Consistency, Restart, DB impact, etc. Organization and size of JVMs

Page 22: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

BIG JVMS WITH TERABYTES OF DATA

Scale Up One large JVM handles all data Map memory to (SSD backed) files Several JVMs can share data via the file system Instant restart

Scale Out Have several JVMs in a network Use sharding between nodes Redundant nodes

Page 23: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

CONVENTIONAL JAVA APPLICATIONS

Java Objects live on the Heap and are Garbage Collected periodically Garbage Collection times increases with the Java Heap size Garbage Collection times increases with the Java Heap mutation rate “The app has hit the GC wall” Hard to meet reasonable SLAs with more than 16:ish GB JVMs 10 TB data and 10 GB JVMs -> ~1000 JVMs

Page 24: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

OFF HEAP STORAGE

Stores data outside of the Java heap The Garbage Collector does not see the content Scales up to terra bytes of main memory in a single JVM Use any number of nodes for scale out solutions

Page 25: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

PERSISTENT SCALE OUT CACHE

Persists data in files or memory mapped files SSD backing device recommended 1.3 GB/s reload per node

10 GB in 6s 100 GB in 1 min 1 TB in 10 min

6.5 GB/s reload in a system with 10 nodes (1 active and 1 backup) 10 GB in 1 s 100 GB in 12 s 1 TB in 2 min

65 GB/s reload in a system with 100 nodes, 1 TB in 12 s

Page 26: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

COMPRESSED OOPS IN JAVA 8

Using the default of –XX:+UseCompressedOops –XX:ObjectAlignmentInBytes=16

In a 64-bit JVM, it can use “compressed” memory references. This allows the heap to be up to 64 GB without the overhead of 64-bit object references. As all object must be 8 or 16-byte aligned, the lower 3 or 4 bits of the address are

always zeros and don’t need to be stored. This allows the heap to reference 4 billion * 16-bytes or 64 GB.

Uses 32-bit references.

Page 27: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

JVM SIZE SWEET SPOT

50 GB off heap per node 20 nodes per terabyte 40 nodes per terabyte with minimum redundancy

Page 28: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

CONCLUSIONS

Get speed by keeping your data close to the application RAM is cheap and getting bigger and ever cheaper Consistent solution with Reactive Persistent Caching

Reactive Persistent Caching imposes minimum load on restart and on the DB Scale up solutions can be in the terabytes with virtual memory or file mapped memory

Scale out solutions can use 50 GBish nodes

Page 29: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

SOLUTION

>1 TB

Application

In-JVM-Cache Web Shop

Stock TradeBankMachine learningEtc.

Source of Truth

Page 30: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

SPEEDMENT

Java Application Development Tool In-JVM-memory cache Database SQL Reflector (CDC, Change Data Capture) Pluggable storage engines (Speedment, Chronicle Map, Hazelcast, Grid Gain, etc.) Code generation tool -> Automatic domain model extraction from databases Transaction-aware

Page 31: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

SPEEDMENT SCALE UP ULTRA-LOW LATENCY CACHE

Ultra-low latency (Runs in the same JVM as the application) Millions of TPS Latencies measured in microseconds Supports file mapping Terabytes of data O(1) for equality operations O(log(N)) for other operations

Page 32: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

SPEEDMENT SQL REFLECTOR Detects changes in a

database Buffers the changes Can replay the changes later

on Will preserve order

Will preserve transactions Sees data as it was persisted Detects changes from any

source

Database

INSERTUPDATEDELETE

Page 33: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

DOWNLOAD TRIAL @ WWW.SPEEDMENT.COM

Page 34: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

CONNECT TO YOUR EXISTING SQL DB

Page 35: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

AUTOMATIC SCHEMA ANALYSIS

Page 36: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

PUSH AND PLAY

Page 37: IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in JVMs

OFFERINGS

Complete solutions for in-memory hot big data Software licenses Service and support Consulting