46
SSDs, IMDGs and All the Rest A short intro into how SSDs are powering the data revolution Uri Cohen Head of Product @ GigaSpaces @uri1803 #jaxlondon 2014

SSDs, IMDGs and All the Rest - Jax London

Embed Size (px)

Citation preview

SSDs, IMDGs and All the Rest

A short intro into how SSDs are powering the data revolution

Uri Cohen Head of Product @ GigaSpaces@uri1803

#jaxlondon 2014

The Data Processing Hierarchy

But Data Amounts Just Keep Growing

But We Have a Performance Gap

In Memory Computing

to the Rescue?

Not enough anymore…

• Average GigaSpaces XAP cluster size grew 5-10 fold since 2008

• We’re in the realm of terabytes, not gigabytes

SSD to Save the Day!

https://www.mimoco.com

(It Actually Looks More

Like This)

Some Numbers

Level Access time Typical size

Registers instantaneous under 1KB

Level 1 Cache 1-3 ns 64KB per core

Level 2 Cache 3-10 ns 256KB per core

Level 3 Cache 10-20 ns 2-20 MB per chip

Main Memory 30-60 ns 4-32 GB per system

Hard Disk 3,000,000-10,000,000 ns over 1TB

Some Numbers

Level Random Access Time Typical Size

Registers instantaneous under 1KB

Level 1 Cache 1-3 ns 64KB per core

Level 2 Cache 3-10 ns 256KB per core

Level 3 Cache 10-20 ns 2-20 MB per chip

Main Memory 30-60 ns 4-32 GB per system

SSD < 1,000,000 ns 128GB – 2TB

Hard Disk 3,000,000-10,000,000 ns over 1TB

Performance Is All the Rage

http://arstechnica.com/information-technology/2012/06/inside-the-ssd-revolution-how-solid-state-disks-really-work/

Is It All Roses and Daisies?

Step Back –How SSDs

Work

The Foundation - NAND Chips

NAND Traits

Space-efficient (60% less than NOR)

Effectively only NAND is used commercially

NAND Traits

Can only write and read whole pages, 4096 or 8192 bytes at a time

Modern FSs work this way anyway (but keep that in mind for later)

NAND Traits

Limited life span (5K-10K write/erase cycles)

Need to evenly distribute load across all blocks

NAND Traits

You cannot update a page “in place”

So why not delete it and write a new one instead?

Duh, you can only delete

whole blocks

Typical Update Cycle

Typical Update Cycle

• Updating 4096 (or less) bytes of data can result in 2MB of data moving around on the SSD

• It’s called Write Amplification

Controllers to the Rescue

Write Caching

Garbage Collection (Grrrrrr….)

Compacts fragmented disk blocks but has a performance cost• Modern SSDs try to do

this in the background...

• When no empty blocks are available, GC must be done before ANY write can go through

Striping

Wear Leveling

A bag of techniques the controller uses to keep all of the flash cells at roughly the same level of use

Dedupe & Compression

Databases, Charge Ahead!

http://cdn.pcworld.idg.com.au/article/images/740x500/dimg/larry-mario_500.jpg

The Naive -MySQL (or

PostgreSQL, Oracle,

Mongo, …)

Let’s just use it! (and write data in place FTW)

The Naive -MySQL (or

PostgreSQL, Oracle,

Mongo, …)

• They all perform buffering of writes before flushing to disk

• ... but flushes are still RANDOM writes

Source: Anandtech

Source: Anandtech

Cassandra Already

Optimized (But for what?)

Cassandra Write Path

http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives

Cassandra Write Path

http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives

Cassandra Write Path

http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives

Cassandra Write Path

http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives

C* Observations

(for SSDs)

• All disk writes are sequential and append only

• Compaction is applied when merging SSTables

• SSTables are immutable once written

No write amplification

But Still…

• Read path is complex

• Compaction can cause performance variations

Why DO WE Treat SSDs

the Same as HDDs?

Software Optimizations

Direct access:

• No kernel space overhead

• TRIM

• Multithreading

• Caching in DRAM

• On Disk and DRAM Indexing

Flash Optimized

APIs

How We Did It

43

Raw Performance Numbers

RAM Only : ~1M read Txns/sec

RAM + SSD: 242K read Txns/sec

Looking at It from a Cost Perspective

44

While Reducing Servers by 50%

- 1KB object size and uniform distribution

- 2 sockets 2.8GHz CPU with total 24 cores, CentOS 5.8, 2 FusionIO SLC PCIe cards RAID

- YCSB measurements performed by SanDiskAssumptions: 1TB Flash = $2K; 1TB RAM = $20K

Provides 2x – 3.6x Better TPS/$

Resources

• http://arstechnica.com/information-technology/2012/06/inside-the-ssd-revolution-how-solid-state-disks-really-work/

• http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives

• http://www.sandisk.com/enterprise/zetascale/

• http://www.gigaspaces.com/xap-memoryxtend-flash-performance-big-data

Thank You!