Upload
rainer-brendle
View
199
Download
0
Tags:
Embed Size (px)
DESCRIPTION
In-Memory Computing
Citation preview
Did You Know?
Today, a CPU core can cycle three billion times in
one second.
In about 1 second, light travels to the moon …
… but during one CPU cycle, light travels only
10cm.
Did You Know?
A motherboard with eight x 16 core CPUs will
soon be available …
That is 128x the computing power of a
single CPU …
… or over 400 billion CPU cycles per second on a
single server blade or socket.
But …
… most of that computing power will be wasted …
… waiting for data.
RAM FLASH DISKCPU
2010 - 2022 128X increase in transistors per chip
NIC
Moore’s Law will continue for at least 10 Years
Transistors per area will double ~ every 2 year
128 X increase in ~ 12 years
2022: 512Gbit / DRAM, 8 Tbit / Flash
Frequency Gains are difficult
Pollack’s rule: Power scales quadratic with clock
performance
Parallelism with more cores is a must
RAM FLASH DISKCPU
2010 - 2022 128X increase in transistors per chip
NIC
2014: 64 cores, 2016: 128 cores, 2022: 1024
cores
Memory/IO bandwidth need to grow with
processing power
Disks cannot follow!
RAM FLASH DISKCPU
2010 - 2022 128X increase in transistors per chip
2010 2022
CORES PER CHIP 10 1024
MEMORYBANDWIDTH 40 Gb/s 2.5 Tb/s
IO BANDWIDTH 2 Gb/s 250 Gb/s
• No big change : Single Core Clock Rate (will stay < 5GHz )
• But impressive overall computing power: 5000 ( core * GHz )
NIC
Challenging! But needed to feed the
cores !
DISK
Disks are Tape
Forget Hard Disks !
Disks cannot go faster
Disks cannot follow bandwidth requirements
Random-read scanning of a 1TB disk space today :
takes 15 – 150 days (!)
To reach 1TB/s you would need 10.000 disks in
parallel
Disks can only be archives any more (sequential
access)
DRAM, Flash and PCM will be replacement
“Spinning Rust”
RAM FLASH DISKCPU
2010 - 2022 128X increase in transistors per chip
2010 2022
CORES PER CHIP 16 1024
MEMORYBANDWIDTH 40 GB/s 2.5 TB/s
IO BANDWIDTH 2 GB/s 250 GB/s
NIC
No big change : Latency
RAM
FLASH
DISKCPUNIC
NICs move to PCI Express
May move onto CPU chip
10 – 100 Gbit/s already today
Latency in cluster ~1 µs
possible (Infiniband/opt.
Ethern.)
LAN/WAN latency 0.1 – 100
ms
Latency and Bandwidth
Throughput x 2 / year
Access time falls by 50% /
year
goes from SATA to PCI
Express
2 determining factors , which won’t change : RAM – CPU latency : ~ 0.1 µs
NIC latency via LAN or WAN : 0.1 – 100 ms
archive
Did You Know?
A CPU accesses Level 1 cache
memory in 1 – 2 cycles.
A CPU accesses Level 1 cache memory in 1
– 2 cycles.
It accesses Level 2 cache memory
in 6 – 20 cycles.
It accesses Level 2 cache memory in 6 – 20
cycles.
It accesses RAM in 100 – 400
cycles.
It accesses RAM in 100 – 400 cycles.It accesses Flash memory in 5000
cycles.
It accesses Flash memory in 5000 cycles.It accesses Disc storage
in 1, 000, 000 cycles.
translate cycles to miles and assume you were a CPU core ..
… then Level 1 cache would be in the building …
Level 2 cache would be at the edge of this city …
RAM would be in a different state …Flash memory would be a different country
…... and disc storage would be the planet
Mars.
RAM
FLASH
DISKCPUNIC
Software Implications
archive
500 cycles
5,000 cycles
1000 – 500,000,000
cycles
1,000,000cycles
Roundtrip latency
RAM
FLASH
DISKCPUNIC
Software Implications
Latency and locality are the determining factorsWhat could that mean?
archive
500 cycles
5,000 cycles
1000 – 500,000,000
cycles
1,000,000cycles
Roundtrip latency
Systems may just get smaller !
More users for transaction processing on a single machine -
isn’t that great?
Already today most customers could run the ERP load of a company on a single blade
Commodity hardware becomes sufficient for ERP
No threat! (… or may be becoming a commodity is a threat?)
Why Bother ?
or ? .......
Think in opportunities .......