15
October 12, 2004 Thomas Sterling - Caltech & JPL 1 Roadmap and Change How Much and How Fast Thomas Sterling California Institute of Technology and NASA Jet Propulsion Laboratory October 12, 2004 Presentation to 2004 Workshop on Extreme Supercomputing Panel:

Roadmap and Change How Much and How Fast

  • Upload
    ornice

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

Presentation to 2004 Workshop on Extreme Supercomputing Panel :. Roadmap and Change How Much and How Fast. Thomas Sterling California Institute of Technology and NASA Jet Propulsion Laboratory October 12, 2004. 2 9 Years Ago Today. Linpack Zettaflops in 2032. 10 Zflops. 1 Zflops. - PowerPoint PPT Presentation

Citation preview

Page 1: Roadmap and Change How Much and How Fast

October 12, 2004 Thomas Sterling - Caltech & JPL 1

Roadmap and Change

How Much and How Fast

Thomas SterlingCalifornia Institute of Technology

and

NASA Jet Propulsion Laboratory

October 12, 2004

Presentation to 2004 Workshop on Extreme Supercomputing Panel:

Page 2: Roadmap and Change How Much and How Fast

October 12, 2004 Thomas Sterling - Caltech & JPL 2

29 Years Ago Today

Page 3: Roadmap and Change How Much and How Fast

October 12, 2004 Thomas Sterling - Caltech & JPL 3

Linpack Zettaflops in 2032

100 Mflops

1 Gflops

10 Gflops

100 Gflops

1 Tflops

10 Tflops

100 Tflops

1 Pflops

10 Pflops

100 Pflops

1 Eflops

10 Eflops

2000 2010 2040 20431995 2005 20301993

SUM N=1 N=500

Court

esy

of

Thom

as

Ste

rlin

g

100 Eflops

1 Zflops

10 Zflops

2015 2020 2025 2035

Page 4: Roadmap and Change How Much and How Fast

October 12, 2004 Thomas Sterling - Caltech & JPL 4

Page 5: Roadmap and Change How Much and How Fast

October 12, 2004 Thomas Sterling - Caltech & JPL 5

The Way We Were: 1974

IBM 370 market mainstream Approx. 1 Mflops

DEC PDP-11 geeks delight Seymour Cray started

working on Cray-1 Approx. 100 Mflops

2nd generation microprocessor

e.g. Intel 8008

Core memory 1103 1Kx1 DRAM chips Punch cards, paper tapes,

teletypes, selectrics

Page 6: Roadmap and Change How Much and How Fast

October 12, 2004 Thomas Sterling - Caltech & JPL 6

What Will Be Different

Moore’s Law will have flatlined Nano-scale atomic level devices

Assuming we solve lithography problem

Local clock rates ~100 GHz Fastest today is > 700 GHz

Local actions strongly preferential to global actions

Non-conventional technologies may be employed

Optical Quantum dots Rapid Single Flux Quantum

(RSFQ) gates

JJ1 JJ2

L1

Page 7: Roadmap and Change How Much and How Fast

October 12, 2004 Thomas Sterling - Caltech & JPL 7

What we will need

1 nano-watt per Megaflops Energy received from Tau

Ceti (per m2)

Approximately 1 square meter for 1 Zetaflops ALUs

10 billion execution sites

> 10 billion-way parallelism Including memory and

communications: 2000 m2

3-D packaging (4m)3

Global latency of ~ 10,000 cycles

Including average latency, => 1 trillion-way parallelism

Page 8: Roadmap and Change How Much and How Fast

October 12, 2004 Thomas Sterling - Caltech & JPL 8

Parcel Simulation Latency Hiding Experiment

Remote Memory Requests

Local Memory

Process Driven Node Parcel Driven Node

Local Memory

ALU

ALU

Input Parcels

Output Parcels

Remote Memory Requests

Remote Memory Requests

Remote Memory Requests

Flat

Network NodesNodes

Nodes

NodesControl Experiment

Test Experiment

Page 9: Roadmap and Change How Much and How Fast

October 12, 2004 Thomas Sterling - Caltech & JPL 9

Latency Hiding with Parcelswith respect to System Diameter in cycles

Sensitivity to Remote Latency and Remote Access Fraction16 Nodes

deg_parallelism in RED (pending parcels @ t=0 per node)

0.1

1

10

100

1000

Remote Memory Latency (cycles)

Tota

l tra

nsac

tiona

l wor

k do

ne/T

otal

pro

cess

wor

k do

ne

1/4%

1/2%

1%

2%

4%

16

4

1

64256

2

Page 10: Roadmap and Change How Much and How Fast

October 12, 2004 Thomas Sterling - Caltech & JPL 10

Latency Hiding with ParcelsIdle Time with respect to Degree of

ParallelismIdle Time/Node

(number of nodes in black)

0.E+00

1.E+05

2.E+05

3.E+05

4.E+05

5.E+05

6.E+05

7.E+05

8.E+05

Parallelism Level (parcels/node at time=0)

Idle

tim

e/n

od

e (c

ycle

s)

Process

Transaction

12 4 8

16 32 64 128 256

Page 11: Roadmap and Change How Much and How Fast

October 12, 2004 Thomas Sterling - Caltech & JPL 11

Architecture Innovation

Extreme memory bandwidth Active latency hiding Extreme parallelism Message-driven split-transaction computations (parcels) PIM

e.g. Kogge, Draper, Sterling, … Very high memory bandwidth Lower memory latency (on chip) Higher execution parallelism (banks and row-wide)

Streaming Dally, Keckler, … Very high functional parallelism Low latency (between functional units) Higher execution parallelism (high ALU density)

Page 12: Roadmap and Change How Much and How Fast

October 12, 2004 Thomas Sterling - Caltech & JPL 12

Continuum Computer Architecture

Merges state, logic, and communication in single building block

Parcel driven computation Fine grain split transaction computing Move data through vectors of instructions in store Move instruction stream through vector of data Gather-scatter an intrinsic Very efficient Futures for produces-multi-consumer computing

Combines strengths of PIM and Streaming All register architecture (fully associative) Functional units within a cycle of neighbors Extreme parallelism Intrinsic latency hiding

Page 13: Roadmap and Change How Much and How Fast

October 12, 2004 Thomas Sterling - Caltech & JPL 13

Inst. Reg.

Control

ALU

Assoc. Memory

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

Page 14: Roadmap and Change How Much and How Fast

October 12, 2004 Thomas Sterling - Caltech & JPL 14

Conclusions

Zettaflops at nano-scale technology is possible Size requirements tolerable

But packaging is a challenge; Latency challenge does not sink the idea

Major obstacles Power Latency Parallelism Reliability Programming

Architecture can address many of these Continuum Computing Architecture

Combines advantages of PIM and streaming Strong candidate for future Zetaflops computer

Page 15: Roadmap and Change How Much and How Fast

October 12, 2004 Thomas Sterling - Caltech & JPL 15