In-Memory Computing: How, Why? and common Patterns

In-Memory Computing

Srinath Perera Director, Research

WSO2 Inc.

Performance Numbers (based on Jeff Dean’s numbers )

Mem Ops / Sec

If Memory access is a Second

L1 cache reference 0.05 1/20th sec

Main memory reference 1 1 sec

Send 2K bytes over 1 Gbps network 200 3 min

Read 1 MB sequentially from memory 2500 41 minDisk seek 1*10^5 27 hours

Read 1 MB sequentially from disk 2*10^5 2 days

Send packet CA->Netherlands->CA 1.5*10^6 17 days

OperationSpeed

MB/sec

Hadoop Select 3Terasort Bench mark 18Complex Query Hadoop 0.2

CEP 60

CEP Complex 2.5

SSD 300-500

Disk 50-100

Performance Numbers (based on Jeff Dean’s numbers )

Mem Ops / Sec

If Memory access is a Second

L1 cache reference 0.05 1/20th sec

Main memory reference 1 1 sec

Send 2K bytes over 1 Gbps network 200 3 min

Read 1 MB sequentially from memory 2500 41 minDisk seek 1*10^5 27 hours

Read 1 MB sequentially from disk 2*10^5 2 days

Send packet CA->Netherlands->CA 1.5*10^6 17 days

OperationSpeed

MB/sec

Hadoop Select 3Terasort Bench mark 18Complex Query Hadoop 0.2

CEP 60

CEP Complex 2.5

SSD 300-500

Disk 50-100

Most Big Data Apps are Latency-bound!!

Often, your app waste CPU waiting for data to arrive

Latency Lags Bandwidth

• Observation in prof. Patterson’s Keynote at 2004

• Bandwidth improves, but not latency • Same holds now, and the gap is

widening with new systems

Handling Speed Differences in Memory Hierarchy

1. Caching – E.g. Processor caches, file cache,

disk cache, permission cache

2. Replication – E.g. RAID, Content Distribution

Networks (CDN), Web Cache

3. Prediction – Predict what data will be needed and prefect – Tradeoff bandwidth – E.g. disk caches, Google Earth

Above three does not always work

• Limitations – Caching works only if working set is small – Prefetching only works when access patterns are predictable – Replication is expensive and limited by receiving side machines

• Lets assume you are reading and filtering 10G data (assuming 6b per record that is 17Billion records)– 3 minutes to read the data from disk– 35ms to filter 10M in my laptop => 1 minutes to process all

data – Keeping data in memory can give about 30X more

Data Access Patterns in Big Data Applications

• Read from Disk, process once (Basic Analytics)– Data can be perfected, batch load is only about 100 times faster.– OK if processing time > data read time

• Read from Disk, iteratively Process (Machine Learning Algos, e.g. KMean)– Need to load data from disk once and process (e.g. Spark supports this)

• Interactive (OLAP)– Queries are random, data may be scattered. Once query started, can load data to

memory and process

• Random Access (e.g. Graph Processing)– Very hard to optimize

• Realtime Access – As data comes in

In-Memory Computing

Four Myths

• Myths– Too expensive 1TB RAM cluster for 20-40k (about 1$/GB)– It is not durable – Flash is fast enough – It is about In-Memory DBs

• From Nikita Ivanov’s post– http

://gridgaintech.wordpress.com/2013/09/18/four-myths-of-in-memory-computing/

http://gridgaintech.wordpress.com/2013/09/18/four-myths-of-in-memory-computing/




Let us look at each Big data access pattern and where In-Memory

Computing can make a difference

Access Pattern 1:Read from Disk, Process Once • If Tp = 35ms vs

Td=1.2 sec with 60MB chunks, it will give about 30X to keep all data in Memory

• However, this benefit is less if computation is more complex (e.g. Sort)

Access Pattern 2: Read from Disk, iteratively Process

• Very common pattern for machine learning algorithms (e.g. KMean)

• On this case, advantages are greater – If we cannot hold data in memory fully, we need to offload– Then we need to read again – Then cost is very high to load and process and much faster

with in memory computing

• Spark let you load to memory fully and process

Spark• New Programming Model

built on functional programming concepts

• Can be much faster for recursive usecases

• Have a complete stack of products

file = spark.textFile("hdfs://...”)file.flatMap(

line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _)

Access Pattern 3: Interactive Queries

• Need to be responsive, < 10 sec• Harder to predict what data is needed• Queries tend to be simpler • Can be made faster by a RAM Cloud

– SAP Hana– Volt DB

• With smaller queries, disk may still be OK. Apache Drill as an Alternative

VoltDB Story

• VoltDB Team (Michael Stonebraker et al.) observed 92% of work in a DB related to Disk

• By building complete in-memory database cluster they made it 20x faster!

Distributed Cloud (e.g. Hazelcast)

• Store the data portioned and replicated across many machines

• Used as a cache that span multipme machines• Key value access

Access Pattern 4: Random Accesses

• E.g. Graph Traversal • This is the hardest usecase • In easy cases, there is a small working set and can be

solved with a cache ( checking users against a black list), not the case with Graph most graph operations like traversal

• Hard cases, In Memory Computing is only real solution • Can be as fast as 1000x or more

Access Pattern 5: Realtime Processing

• This is already In-Memory technology using tools like Complex Event Processing (e.g. WSO2 CEP) or stream processing (e.g. Apache Storm)

Faster Access to Data

• In-Memory databases (e.g. VoltDB, MemSQL)– Provide Same SQL interface– Can think as fast database– VoltDB has shown to about 20X faster than MySQL

• Distributed Cache – Can Integrated as a Large Cache

Load Data Set to Memory and Analyze

• Used with Interactive and Random access usecases • Can be as 1000x faster for some usecases • Tools

– Spark – Hazelcast– SAP Hana

Realtime Processing

• Realtime analytics tools – CEP (WSO2 CEP)– Stream Processing (e.g. Storm)

• Can generate results within few milliseconds to seconds

• Can process 10ks-millions of events per second

• Not all algorithms can be implemented

In Memory Computing with WSO2 Platform

Thank You

Software

In-Memory Computing: How, Why? and common Patterns