48
© 2013 IBM Corporation High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems IBM Thomas J Watson Research Center, Yorktown Heights, NY

Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

High Performance Microprocessor Design, and Automation:

Challenges and Opportunities

Ruchir Puri IBM Fellow, VLSI Systems

IBM Thomas J Watson Research Center, Yorktown Heights, NY

Page 2: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2012 IBM Corporation

First, something about lunch and technical work…

LUNCH

LUNCH

ISPD/TAU

attendee

ISPD/TAU

attendee ISPD/TAU

attendee

Page 3: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

The Technology Era: Frequency Scaling

Once upon a time, life used to be Great,

when technology was the superman and

Design tagged along for the ride and even EDA

g rabbed de s igne r l egs fo r the fun !

Page 4: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Characteristics of Single Thread Era

0

1

2

3

4

5

6

POWER4 POWER5 POWER6

Fre

qu

en

cy

(G

Hz)

0

100

200

300

400

500

POWER4 POWER5 POWER6

TX

s p

er

co

re

Dennard Scaling

Exponential Frequency Growth

Optical Scaling / Node Migration

Expanding uArch Complexity

Page 5: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Single Thread Era EDA: Transistor Analysis & Optimization

Static timing analysis of complex circuits

Transistor Level timing optimization

clkin

w2

w3

w0

fbk

Cycle

fbk = 0

evaluate precharge

fbk = 1

w2 = 1 1st Timing

2nd Timing

w3_int

0

200

400

600

800

1000

1200

-6 -4 -2 0 2 4 6 8

Slack (ps)#

of

pa

ths

Pre-Tuning

Post-Tuning

Page 6: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

End of Frquency Scaling : The Power Wall

0.010.11

0.001

0.01

0.1

1

10

100

1000

Gate Length (microns)

Active Power

Passive Power

1994 2004

Po

wer

Den

sit

y (

W/c

m2) Air Cooling limit

Inability to scale Oxide thickness & lower voltage resulted in a power wall for single thread performance

Page 7: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Frequency Scaling : POWER6 (65nm, 2007)

5+ GHz operation, >790M transistors, 341mm2 die

65nm SOI with 10 levels of Cu interconnect

Same pipeline depth & power @ 2x frequency versus POWER5

2 MB

L2

2 MB

L2

2 MB

L2

2 MB

L2

L2 Dir

L2 Dir

Mem.

Cntl.

Mem.

Cntl. SMP Fabric

L

3

C

O

N

T

R

O

L

L

E

R

LSU

IFU /

IDU V

M

X

F

X

U

B

F

U

D

F

U

RU

Core 1

Page 8: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Technology Tantrums

Designers

Technology

Design

Shock and awe of 65nm:

Wire delays overtaking

Gate delays

End of Frequency

Scaling with

Technology

Squeezing the

design hard

Unfortunately, Liner for copper

Interconnects doesn’t scale

Liner

Copper

Page 9: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Multi-Core Era

Multi-Core

End of frequency scaling ushered in a new era of innovation with multi-core design

Page 10: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

POWER Processors Began the Multi-Core / Multi-Thread Era

Power 4

2001

Introduced First Dual core

Power 5

2004

Dual Core

Introduces SMT (4 threads)

Power 6

2007

Dual Core – 4 threads

Enhances SMT Efficiency

Page 11: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Life starts to become interesting: Technology ride very bumpy

0%

20%

40%

60%

80%

100%

180nm 130nm 90nm 65nm 45nm 32nm

Gain by Traditional Scaling Gain by Innovation

Rela

tive

% I

mp

rovem

en

tIBM Transistor Performance Improvement

0%

20%

40%

60%

80%

100%

180nm 130nm 90nm 65nm 45nm 32nm

Gain by Traditional Scaling Gain by Innovation

Rela

tive

% I

mp

rovem

en

tIBM Transistor Performance Improvement

WL

BOX

Deep Trench

Cap

BL

Node

Passing WL

4.0

um

18fF Storage Node

WL

3fF BL (32 Cells)

Node

High-K Metal Gate

eDRAM

Innovation

Playing a

Dominant

Role

Page 12: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

POWER7 Processor Chip

567mm2, 45nm SOI w/ eDRAM

1.2B transistors – Equivalent function of 2.7B

(eDRAM)

Eight processor cores w/ 4 way SMT – 32 threads / chip

32MB on chip eDRAM shared L3

Dual DDR3 Memory Controllers w/ 100GB/s sustained Memory bandwidth

Scalability up to 32 Sockets – 360GB/s SMP bandwidth/chip – 20,000 coherent operations in flight

FX0

FX1

FP0

FP1

LS0

LS1

BRX

CRL

4 Way SMT

Thread 1

Executing

Thread 0

Executing

No Thread

Executing

Thread 3

Executing

Thread 2

Executing

Page 13: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Z EC 12 (32nm, 2012)

Core Core

Core Core Core

Core

L3 Cache

MC

U

I/O

s

I/O

s

GX

G

X I

/Os

G

X I

/Os

L2 D L2 D L2 D

L2 D L2 D L2 D

L

2

I

L

2

I

L

2

I

L

2

I

L

2

I

L

2

I

SC I/Os SC I/Os

SC I/Os SC I/Os

32nm high-k CMOS

597 mm2

5.5 GHz

2.75B transistors

15 levels of metal

Page 14: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Multi-Core Era Limiters

1

2

4

8

16

32

64

1

10

100

90 65 45 32 22 14 10

Technology Node

log

(p

erf

orm

an

ce)

Ideal Growth

Likely Multi-Core Path

Technology complexity & rising costs

Power

SW parallelism

Socket BW

Page 15: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Innovation Drive Architecture &

Productivity

Innovation

Page 16: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Compute Throughput Potential

Socket Throughput Limitation

(Power, memory bandwidth)

Need to Amplify Effective

Socket Throughput

To Achieve Potential

Multi-Core Advantage

Page 17: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Compute Throughput Potential

Socket Throughput Limitation

(Power, memory bandwidth)

EDRAM = large, low power cache

High bandwidth memory buffer

Low-Power Off-Chip Signaling

Technology

Coherence Innovation to

minimize socket-to-socket

communication

High performance uP Designs: Extending Multi-Core Gains (Power processor)

Page 18: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Innovation Drive : System Level Technologies

3D Stacking with Through Silicon Vias

FPGA Accelerators Flash Memory / SSD Silicon Photonics

Single Processor–Memory Socket

Heterogeneous systems on Chip

Specialized functions

Specialized cores:

Single thread focused

Throughput focused

Page 20: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Productivity Innovation: Structured Synthesis and Large Block Synthesis

P/Z server Macro Quad FPU

Customs take large amount of resources and productivity is key

Merge the domain of customs and Synthesis targeting design productivity

and improved quality

– through merging of custom and synthesis hierarchy with structure in

synthesis (not random logic any more)

–Global Optimization view; Targeted structured data paths and synthesis

–A methodology with numerous algorithmic and practical innovations

spanning from incremental logic design processing, to data paths to

structured clocking to custom synthesis merged techniques.

Page 21: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Productivity Innovation : Reduce Custom Design (Structured Synthesis)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

# of Customs over Time

Synthesis results w/ custom-like data flow alignment.

>10x reduction over 5 generation

*ISPD 2013 Best Paper Award: “Network Flow Based Datapath Bit Slicing” H.Xiang et al.

Milestone: Digital Logic

in 22nm server class

Microprocessors

99% synthesized

and signed-off by

Gate Level signoff

Page 22: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Productivity Innovation: Reduced # of Design Partitions (Large Block Synthesis)

60 logic macros, 25 customs, 14 unique arrays/RFs

1 macro, 0 customs, 9 unique arrays / RFs Reduced area & power; equal cycle time

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

# of Macros over Time

Page 23: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Productivity & TAT Innovation: Gate Level Analysis & Signoff

0

1

Arb

itra

ry U

nit

s

Runtime Cleanup Work Accuracy

TX Level

Gate Level

Large speedup

Reduced cleanup

Similar accuracy

Page 24: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Productivity & TAT Innovation: Hierarchical Abstraction & Multi-Threading

0

50

hrs.

Base Cleanup Coarse

Parallelism

Hierarchical

abstracts

Multi-

threading

Projected Chip Timing Runtime

Fast global analysis tools allow designers to iterate more

often resulting in improved final designs.

Hierarchical abstraction & multi-threading are the most

promising ways to minimize TAT.

–Applies to all disciplines (timing, verification, etc)

Page 25: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Productivity Innovation Challenges: Retiming

Significant fraction of logic designer effort spent in optimizing cycle boundaries

Retiming enables physical synthesis to optimally place latches in logic cones to

balance timing/area/power

Invention is required to seamlessly handle divergence between functional RTL

(Verilog/VHDL) and physical implementation throughout methodology.

.

.

. latch

Doesn’t meet cycle time

Area/Power too high

Optimal

Page 26: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Challenges Back-end: Scaling and Interconnects

Continued

Pressure on

Wiring.

Innovation

needed

Liner

Copper

High-performance designs will not be able to tolerate such large RC increases

• Push for more wiring interconnect layers (coarse-pitch)

• Will still need some number of fine-pitch layers for short run local connections

Improved DA tools (routers) needed*

• Optimize wire plane usage to limit technology complexity

• Negotiate through special design rules for the finest levels

• Via optimization, especially at driver end

• Tricky performance vs wireability tradeoffs

• Many wires will need “special” treatment

• Increase width, push higher, add buffers, etc.

0

1

2

3

4

5

6

7

9s 10s 11s 12s 13s

Mx Res (ohm/um)

0

2

4

6

8

10

12

14

16

18

20

9s

Cu11

10s

Cu08

11s

Cu65

12s

Cu45

Mx Via ResNOM (ohm)

WC (ohm)

*J.Warnock’s talk earlier yesterday at ISPD

Page 27: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Innovation at Technology, Design Interface: Double/Triple Patterning

Need for Double / Triple Patterning

0

50

100

150

32nm 20nm Future

Technology Node

Pit

ch

(n

m)

Device Pitch

Single Exposure Limit

Metal Pitch

Double Patterning Limit

EUV?

Ripe field for

Physical Design

Innovation

Page 28: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

How We Stage Designs to Market: Current vs. Desired

Innovation Implement

Implement

Implement

Go to Market

Go to Market

Go to Market

Innovation Implement Go to Market

Innovation Implement

Significant effort spent on implementation

Significant effort spent on Innovation

Page 29: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation 29

Innovation: The sweet spot in this new era

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Designer

Time

Wait forTools

Implement

Plan

Innovate

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Designer

Time

Wait forTools

Implement

Plan

Innovate

Page 30: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

New Frontier: Innovating up the Value Stack

Design Implementation Innovation

IP Design Process Innovation

IP Design content creation innovation

:

Syste

m v

alu

e m

ovin

g u

p th

e s

tack

Page 31: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Innovation Drive : Hardware-Software Co-Optimization

Benefit of Specialized HW

Acceleration

0

20

40

60

80

100

Base w/ HW Accel

Encryption

Application

Profiling Hardware

Original

program Dynamically

Optimized program

Software

Optimizer

Execution

Engine

Dynamic Re-compilation with Profile Data

Heterogeneous design: Specialized HW acceleration viable in many areas

•Graphics •Compression •Cryptography…

FPGA Accelerators

Innovation on

Acceleration in

PD/timing domain?

Page 32: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Hardware Programming

VHDL / Verilog

Synthesis

Place & Route

LUT

FF

RAM

VHDL / Verilog

Synthesis

Place & Route

LUT

FF

RAM

HLL: C/C++,

LiMe, OpenCL

HL Compiler

Traditional High-level

1000s of RTL

designers

Millions of Software designers

Hardware

- Hardware mistakes

cannot be patched

- Significant barriers

to be overcome

between RTL,

physical design

domain and high

level languages.

- Correlation between

higher level spec. and

lower level design.

There is more

religion in

logic design

than number priests

in religion

Page 33: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Lime Virtual Machine (LVM)

“Bus”

Artifact Store

Liquid Metal (A high level harware compiler)

GPU CPU FPGA

bytecode bitfile

Lime

Lime Compiler

bitfile bitfile binary

http://www.research.ibm.com/liquidmetal

Page 34: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Architectural Synthesis

Functional Cycle Accurate RTL: VHDL/Verilog

Metric: Cache

Miss rate etc.

Back End PD

Implementation

and Analysis

C/C++

Model

Metrics:

Performance

Models, CPI

etc

C/C++

Model

Metrics: Electrical,

Timing, Area, Noise

etc.

Successive Refinement

Page 35: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Capturing the Vibrant spirit of DA Researcher

Once upon a time there lived three men: a

doctor, a chemist, and a DA Researcher.

For some reason all three offended the

king and were sentenced to die on the

same day.

The day of the execution arrived, and the

doctor was led up to the guillotine.

Page 36: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

As he strapped the doctor to the guillotine, the

executioner asked, "Head up or head down?" "Head up," said the doctor.

"Blindfold or no blindfold?" "No blindfold."

So the executioner raised the axe, z-z-z-z-ing!

Down came the blade--and stopped barely an

inch above the doctor's neck. Well, the law

stated that if an execution didn't succeed the

first time the prisoner had to be released, so

the doctor was set free.

Capturing the Vibrant spirit of DA Researcher

Page 37: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Then the chemist was led up to the guillotine. "Head up or head down?" said the executioner.

"Head up."

"Blindfold or no blindfold?" "No blindfold."

So the executioner raised his axe, z-z-z-z-ing! Down came the blade--and stopped an inch above the chemist's neck. Well, the law stated that if the execution didn't succeed the first time the prisoner had to be released, so the chemist was set free.

Capturing the Vibrant spirit of DA Researcher

Page 38: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Finally the DA Researcher was led up to the guillotine.

"Head up or head down?" "Head up."

"Blindfold or no blindfold?" "No blindfold."

So the executioner raised his axe, but before he could cut the rope, ever curious Chandu yelled out:

"WAIT! I see what the problem is!".

Capturing the Vibrant spirit of DA Researcher

Page 39: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Role of Memory

Page 40: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Cache Levels

Main Memory

Storage Class

Memory

Storage

Processor

Data moves across all levels of

storage hierarchy before and

after being processed at the

processor

Processor

Processing

at Storage

Processing

at SCM

Processing

in Memory

Traditional Computing Hierarchical Data Processing

Increasing

opportunity to

exploit

parallelism

Systems view of Data processing

Page 41: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2012 IBM Corporation

More Memory Capacity per Socket though More DRAM Chips per Socket

Memory Capacity per Socket

0

500

1000

1500

2000

2500

3000

3500

4000

4500

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016

GB

ytes

Capacity per Socket

Moore's Law Trend line

Capicity per socket pure DRAM

• Memory is becoming pervasive throughout, and bits are becoming plentier

and cheaper every day (< 50c /GB for flash)

- Memory technology continues to evolve even if scaling slows down.

Scope of memories is expanding

Page 42: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation 42

42

Phase Change

Memory (PCM)

Storage Class Memory – Market View

Bit Line Complement

Bit Line

Word Line

Spin Torque

Transfer

MRAM (STT-

MRAM)

Resistive RAM

(RRAM)

Projected Date of

Enterprise

Availability

Vendors

Researching

Initial Characteristics

Density Cost Latency

Micron

Hynix

Samsung

Low Density

High Density

2015+

2018+

Risk

Comments

= DRAM

0.25x

NAND

Parity in

2015

4x

NAND

3x DRAM

Reads

10x DRAM

Writes

Most advanced

technology

Productization and

Aggressive dev. on hold

until market is identified

Current very low

density NOR Flash

Offerings exist for

a small market

2016/2017+

Micron- Partnership

with IBM but also

pursuing other DRAM

replacement tech

Hynix-allied

w/Toshiba

Samsung- Acquired start-up

Grandis

= DRAM = DRAM in

both

Reads &

writes

Early in technology dev.

Promising scalability

Manufacturability

unexplored

Target

DRAM

Parity

Low Density

2016+

High Density

2018+

Micron

Hynix-partnered

with HP

Samsung

Sandisk/Toshiba

2x NAND Target

NAND

Parity

> DRAM,

< NAND

(~us)

Early in technology dev.

Promising scalability

& performance

Manufacturability

unexplored

Page 43: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Scope of memories: The whole spectrum

21 23 27 211 213 215 219 223

Typical access latency in processor cycles (@ 4 GHz)

L1(SRAM) eDRAM DRAM HDD

25 29 217 221

NAND PCM?

Storage (IO) Memory System

RRAM?

STT-MRAM PCM

Page 44: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Computing with Memory

• Potential to solve a systems bottleneck of memory

bandwidth in a flexible way for data intensive

applications/tasks

– Reconfigurable on demand (compilation step)

– Robust (ECC protected)

– Verification advantage

– Fast and enables implementation of explicit parallelism

• Utilizes bit level parallelism.

• One must think about computational memory from a

systems and software perspective (How and where

will computational memory fit in and what functions

can it implement).. Active Ongoing research…

Innovating in IP Content Creation

Page 45: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Life Cycle: Technology Trigger, Hype, Disillusionment, Enlightenment, and Commoditization

Page 46: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Summary

• Information technology landscape is changing dramatically

– Value is in innovating across the entire stack and increasingly higher up in the stack

– Key problems remain to be solved in technology, design and automation as technology continues to scale

– Significant emerging opportunities in new ways to solve system bottlenecks at every levels: Logic, Architecture, Memory.

– In last several years, life became very challenging but also very interesting as the ride has gotten a lot choppier…

– With challenges and opportunities abound, Winners and Losers will be decided by organizations that grab these challenge and innovate their way out of the current dilemmas…

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Designer

Time

Wait forTools

Implement

Plan

Innovate

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Designer

Time

Wait forTools

Implement

Plan

Innovate

Design Implementation Innovation

IP Design Process Innovation

IP Design content creation innovation Sys

tem

valu

e m

ovin

g u

p th

e s

tack

Page 47: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

Memoirs of a DA Engineer

A lawyer is flying in a hot air balloon over lake tahoe and

realizes he is lost.

He reduces height and spots a man down below. He

lowers the balloon further and shouts, "Excuse me, can

you tell me where I am?"

Page 48: Challenges and Opportunities Ruchir Puri - TAU Workshop · High Performance Microprocessor Design, and Automation: Challenges and Opportunities Ruchir Puri IBM Fellow, VLSI Systems

© 2013 IBM Corporation

The man below said, "Yes, you're in a hot air balloon, hovering 30 feet above this field." "You must be a DA engineer," said the lawyer. "I am," replied the man. "How did you know?"

"Well," said the lawyer, "everything you have told me is technically correct, but it's of absolutely no use to anyone." The man below said, "You must be a lawyer." "I am," replied lawyer, "but how did you know?"

"Well," said the man, "you don't know where you are, or

where you're going, but you expect me to be able to help.

You're in the same position you were before we met, but now it's my fault."

Memoirs of a DA engineer