Upload
uri-cohen
View
415
Download
0
Embed Size (px)
Citation preview
SSDs, IMDGs and All the Rest
A short intro into how SSDs are powering the data revolution
Uri Cohen Head of Product @ GigaSpaces@uri1803
#jaxlondon 2014
In Memory Computing
to the Rescue?
Not enough anymore…
• Average GigaSpaces XAP cluster size grew 5-10 fold since 2008
• We’re in the realm of terabytes, not gigabytes
Some Numbers
Level Access time Typical size
Registers instantaneous under 1KB
Level 1 Cache 1-3 ns 64KB per core
Level 2 Cache 3-10 ns 256KB per core
Level 3 Cache 10-20 ns 2-20 MB per chip
Main Memory 30-60 ns 4-32 GB per system
Hard Disk 3,000,000-10,000,000 ns over 1TB
Some Numbers
Level Random Access Time Typical Size
Registers instantaneous under 1KB
Level 1 Cache 1-3 ns 64KB per core
Level 2 Cache 3-10 ns 256KB per core
Level 3 Cache 10-20 ns 2-20 MB per chip
Main Memory 30-60 ns 4-32 GB per system
SSD < 1,000,000 ns 128GB – 2TB
Hard Disk 3,000,000-10,000,000 ns over 1TB
Performance Is All the Rage
http://arstechnica.com/information-technology/2012/06/inside-the-ssd-revolution-how-solid-state-disks-really-work/
NAND Traits
Can only write and read whole pages, 4096 or 8192 bytes at a time
Modern FSs work this way anyway (but keep that in mind for later)
NAND Traits
Limited life span (5K-10K write/erase cycles)
Need to evenly distribute load across all blocks
Typical Update Cycle
• Updating 4096 (or less) bytes of data can result in 2MB of data moving around on the SSD
• It’s called Write Amplification
Garbage Collection (Grrrrrr….)
Compacts fragmented disk blocks but has a performance cost• Modern SSDs try to do
this in the background...
• When no empty blocks are available, GC must be done before ANY write can go through
Wear Leveling
A bag of techniques the controller uses to keep all of the flash cells at roughly the same level of use
Databases, Charge Ahead!
http://cdn.pcworld.idg.com.au/article/images/740x500/dimg/larry-mario_500.jpg
The Naive -MySQL (or
PostgreSQL, Oracle,
Mongo, …)
• They all perform buffering of writes before flushing to disk
• ... but flushes are still RANDOM writes
Cassandra Write Path
http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives
Cassandra Write Path
http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives
Cassandra Write Path
http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives
Cassandra Write Path
http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives
C* Observations
(for SSDs)
• All disk writes are sequential and append only
• Compaction is applied when merging SSTables
• SSTables are immutable once written
No write amplification
Software Optimizations
Direct access:
• No kernel space overhead
• TRIM
• Multithreading
• Caching in DRAM
• On Disk and DRAM Indexing
Looking at It from a Cost Perspective
44
While Reducing Servers by 50%
- 1KB object size and uniform distribution
- 2 sockets 2.8GHz CPU with total 24 cores, CentOS 5.8, 2 FusionIO SLC PCIe cards RAID
- YCSB measurements performed by SanDiskAssumptions: 1TB Flash = $2K; 1TB RAM = $20K
Provides 2x – 3.6x Better TPS/$
Resources
• http://arstechnica.com/information-technology/2012/06/inside-the-ssd-revolution-how-solid-state-disks-really-work/
• http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives
• http://www.sandisk.com/enterprise/zetascale/
• http://www.gigaspaces.com/xap-memoryxtend-flash-performance-big-data