63
General Purpose Processors as Processor Arrays Peter Cappello UC, Santa Barbara

General Purpose Processors as Processor Arrays

  • Upload
    kathy

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

General Purpose Processors as Processor Arrays. Peter Cappello UC, Santa Barbara. VLSI Design Forces in 1986. “Nature, to be commanded, must be obeyed.” Sir Francis Bacon High performance  parallelism. VLSI Design Forces in 1986. High performance  parallelism. - PowerPoint PPT Presentation

Citation preview

Page 1: General Purpose Processors as Processor Arrays

General Purpose Processors as Processor Arrays

Peter CappelloUC, Santa Barbara

Page 2: General Purpose Processors as Processor Arrays

VLSI Design Forces in 1986

“Nature, to be commanded, must be obeyed.” – Sir Francis Bacon

• High performance parallelism

Page 3: General Purpose Processors as Processor Arrays

VLSI Design Forces in 1986

• High performance parallelism

Page 4: General Purpose Processors as Processor Arrays

VLSI Design Forces in 1986

• Power is scarce limit resistive delay

Page 5: General Purpose Processors as Processor Arrays

VLSI Design Forces in 1986

• Power is scarce limit resistive delay limit long communication

Page 6: General Purpose Processors as Processor Arrays

VLSI Design Forces in 1986

• Power is scarce limit resistive delay limit long communication• Area is scarce limit wire crossing

Page 7: General Purpose Processors as Processor Arrays

VLSI Design Forces in 1986

• Power is scarce limit resistive delay limit long communication• Area is scarce limit wire crossing

Page 8: General Purpose Processors as Processor Arrays

VLSI Design Forces in 1986

• Power is scarce limit resistive delay limit long communication• Area is scarce limit wire crossing

Page 9: General Purpose Processors as Processor Arrays

VLSI Design Forces in 1986

• $$ are scarce design is expensive reuse components

Page 10: General Purpose Processors as Processor Arrays

VLSI Design Forces in 1986

• $$ are scarce design is expensive reuse components

Page 11: General Purpose Processors as Processor Arrays

VLSI Design Forces in 1986

• $$ are scarce design is expensive reuse components

Page 12: General Purpose Processors as Processor Arrays

VLSI Design Forces in 1986

• $$ are scarce design is expensive reuse components

Page 13: General Purpose Processors as Processor Arrays

VLSI Design Forces in 1986

• $$ are scarce design is expensive reuse components

Page 14: General Purpose Processors as Processor Arrays

VLSI Design Forces in 1986

In 2D systolic arrays, clock skew is an issue wavefront arrays

Islands of synchrony inan ocean of asynchrony

Page 15: General Purpose Processors as Processor Arrays

Processor Array Properties

1. Have multiple processors

Page 16: General Purpose Processors as Processor Arrays

Processor Array Properties

1. Have multiple processors2. Neighbors abut (no long wires)

Page 17: General Purpose Processors as Processor Arrays

Processor Array Properties

1. Have multiple processors2. Neighbors abut3. Only neighbors communicate directly

Page 18: General Purpose Processors as Processor Arrays

Processor Array Properties

1. Have multiple processors2. Neighbors abut 3. Only neighbors communicate directly4. Have a constant # of processor types

Page 19: General Purpose Processors as Processor Arrays

Processor Array Properties

1. Have multiple processors2. Neighbors abut3. Only neighbors communicate directly4. Have a constant # of processor types5. Scale: larger problems larger arrays

Page 20: General Purpose Processors as Processor Arrays

No 3D PA Has Properties 1 - 5

Enclose 3D PA in minimal sphere of radius r.

r

Page 21: General Purpose Processors as Processor Arrays

No 3D PA Has Properties 1 - 5

Scale PA in all 3 dimensions.

r

Page 22: General Purpose Processors as Processor Arrays

No 3D PA Has Properties 1 - 5

1. Power consumption = Θ( r3 ).

r

Page 23: General Purpose Processors as Processor Arrays

No 3D PA Has Properties 1 - 5

1. Power consumption = Θ( r3 ).2. Heat dissipation via surface = Θ( r2 ).

r

Page 24: General Purpose Processors as Processor Arrays

VLSI Design Forces in 2006

“Nature, to be commanded, must be obeyed.”

– Sir Francis Bacon

• Power is scarce limit clock frequency parallelism• Power is scarce limit resistive delay limit long communication

Page 25: General Purpose Processors as Processor Arrays

Trends in GPP in 2006

• Chip multiprocessors (CMP)

• Vector IRAM

• Cell

• TRIPS

• RAW

Page 26: General Purpose Processors as Processor Arrays

Trends in GPP in 2006

Chip Multiprocessors (CMP)– Parallel processors– Crossbar

Page 27: General Purpose Processors as Processor Arrays

Trends in GPP in 2006

Vector IRAM – Vector Intelligent RAM• For mobile multimedia devices

Stream data processing• Combine GPP and DSP

– Parallel – linear array– Crossbar

Page 28: General Purpose Processors as Processor Arrays
Page 29: General Purpose Processors as Processor Arrays

Trends in GPP in 2006Cell processor

“The Department of Energy said Wednesday that it had awarded I.B.M. a contract to build a supercomputer capable of 1,000 trillion calculations a second, using an array of 16,000 Cell processor chips that I.B.M. designed for the coming PlayStation 3 video game machine.” Sept. 7, 2006. NY Times.

• Parallel processors – BIU – Bus interface unit– RMT – Replacement management table– SL1 – 1st-level cache– PPE – PowerPC Element– SPE – Synergistic Processor Element– Element interconnect bus

Page 30: General Purpose Processors as Processor Arrays
Page 31: General Purpose Processors as Processor Arrays

Trends in GPP in 2006

• TRIPSTera-op, Reliable, Intelligently adaptive

Processing System

The following slides are taken from a talk:"

The Design and Implementation of the TRIPS Prototype Chip," HotChips 17, Palo Alto, CA, August, 2005.

Page 32: General Purpose Processors as Processor Arrays

• E – execution tile

• R – register bank

• D – 8KB data cache

• I – instruction cache

• G – global control

Page 33: General Purpose Processors as Processor Arrays

• Instructions execute as a data flow graph– An instruction’s output

is another instruction’s input.

– Minimize use of register/cache for intermediate values

• Register reads/writes access the register banks

• Loads/stores access the data cache banks

Page 34: General Purpose Processors as Processor Arrays
Page 35: General Purpose Processors as Processor Arrays
Page 36: General Purpose Processors as Processor Arrays
Page 37: General Purpose Processors as Processor Arrays
Page 38: General Purpose Processors as Processor Arrays

Trends in GPP in 2006

RAW (MIT)The following slides are taken from a RAW talk:Evaluating The Raw Microprocessor:

Scalability and Versatility Presented at the International Symposium on Computer Architecture, June 21, 2004.

Page 39: General Purpose Processors as Processor Arrays

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

RF >>

+

Replace the crossbar with a point-to-point, pipelined, routed network.

Page 40: General Purpose Processors as Processor Arrays

Distribute the Register File

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

RF

RFRF RFRF

RFRF RFRF

RFRF RFRF

RFRF RFRF

Page 41: General Purpose Processors as Processor Arrays

Distribute the rest.

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

RFRF RFRF

RFRF RFRF

RFRF RFRF

RFRF RFRF

Control

WideFetch

(16 inst)

UnifiedLoad/Store

Queue

PC I$PC

D$I$

PC

D$I$

PC

D$I$

PC

D$

I$PC

D$I$

PC

D$I$

PC

D$I$

PC

D$

I$PC

D$I$

PC

D$I$

PC

D$I$

PC

D$

I$PC

D$I$

PC

D$I$

PC

D$I$

PC

D$

[ISCA99]

Page 42: General Purpose Processors as Processor Arrays

Tiles!

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

RFRF RFRF

RFRF RFRF

RFRF RFRF

RFRF RFRF

I$PC

D$I$

PC

D$I$

PC

D$I$

PC

D$

I$PC

D$I$

PC

D$I$

PC

D$I$

PC

D$

I$PC

D$I$

PC

D$I$

PC

D$I$

PC

D$

I$PC

D$I$

PC

D$I$

PC

D$I$

PC

D$

Page 43: General Purpose Processors as Processor Arrays

Conclusions•VLSI Scalable microprocessors are possible.

Constant factors are beginning to give way to asymptotics: - 16 ALU Raw – Oct 2002 - 64 ALU Raw – Now - 1,024 ALU Raw - 2010 - 32,768 ALU Raw – If Moore’s Law makes it to 2 nm•There is an opportunity to make processors more

“versatile” i.e., steal applications from custom chips.

•Tiled Processor Architectures are a promising approach and merit further research.

Page 44: General Purpose Processors as Processor Arrays

GPP Predictions: In 10 Years

• Encapsulate registers/cache/processor into an array (RAW)

• Partition off-chip memory: Encapsulate memory & processor.Safely increase parallel access (concurrent programming)

• For non-recursive applications GPP (mobile multimedia):– no bus; quasi-nearest neighbor networks.

• For recursive applications GPP (gaming, control)– replace bus w/ lean on-chip short-diameter communication network.

– 1 network-on-chip routes register/cache/instruction/control.

– Need >= 1K processors/chip to justify network-on-chip.

Page 45: General Purpose Processors as Processor Arrays

Predictions

• Increasing complexity of:– Applications– Technology

Increasing specialization of labor

Page 46: General Purpose Processors as Processor Arrays

Predictions

• Increasing complexity of:– Applications– Technology Increasing specialization of labor

• Rate of change of increase in complexity is increasing over time Increasing adaptability is important!

Page 47: General Purpose Processors as Processor Arrays

Yet another taxonomy!

RECONFIGURABILITY

ARCHITECTURALSPECIFICITY

ASIC PROTOTYPEASIC

GPP CCM

STATIC DYNAMIC

SPECIFIC

GENERAL

Page 48: General Purpose Processors as Processor Arrays

Yet another taxonomy!

ASIC PROTOTYPEASIC

GPP CCM

STATIC DYNAMIC

SPECIFIC

GENERAL

ARCHITECTURALSPECIFICITY

RECONFIGURABILITY

Page 49: General Purpose Processors as Processor Arrays

STATIC DYNAMIC

COMMUNICATIONLATENCY

TP

DP

ASIC PROTOTYPEASIC

GPP CCM

APPLICATIONSPECIFICITY

SPECIFIC

GENERAL

RECONFIGURABILITY

Page 50: General Purpose Processors as Processor Arrays

DP Communication Topology

FPGA FPGA

FPGA FPGA

FPGA FPGA

FPGA FPGA

FPGA FPGA

FPGA FPGA

FPGA FPGA

FPGA FPGA

EDGE ISA(2D VLIW)

With CoresFFT, RISC

High Throughput (iterative)Communication topology

Page 51: General Purpose Processors as Processor Arrays

TP Communication Topology

FPGA FPGA

FPGA FPGA

FPGA FPGA

FPGA FPGA

FPGA FPGA

FPGA FPGA

FPGA FPGA

FPGA FPGA

EDGE ISA(2D VLIW)

With CoresRAM, RISC

Low Latency (recursive)Communication topology

Page 52: General Purpose Processors as Processor Arrays

General PurposeLanguage

Domain SpecificLanguage

Computational Model

ComputeSubstrate

CommunicateSubstrate

ConfigurableHardware

StaticHardware

Fabrication Technology

DISCIPLINE PROCESS

CS, DE

CS, CE

CE, EE

EE

EE, ME

Circuit layout

Processor architecting

CompilingCS

CS, DE

Application programDE

CE, EE Processor layout

FPGA/Circuit design

Language design

Fabrication process

Compute model design

Page 53: General Purpose Processors as Processor Arrays

Conclusion

• Last 20 years witnessed dramatic advances

Page 54: General Purpose Processors as Processor Arrays

Conclusion

• Last 20 years witnessed dramatic advances• Next 20 years will witness even more

dramatic advances.

Page 55: General Purpose Processors as Processor Arrays

Spare slides follow

Page 56: General Purpose Processors as Processor Arrays

Recursive Computation via a Tree of Meshes Network?

Page 57: General Purpose Processors as Processor Arrays

Quasi-Scalable

Page 58: General Purpose Processors as Processor Arrays

Quasi-Scalable

Page 59: General Purpose Processors as Processor Arrays

Quasi-Scalable

RF D$ GLOBAL LOCAL

ADDRESS

Page 60: General Purpose Processors as Processor Arrays

Interleave Memory & Processor Tiles

• Slightly more chips

• Compiler localizes memory

accesses

• EDGE ISA deals with

variable access times

(TRIPS).

Page 61: General Purpose Processors as Processor Arrays
Page 62: General Purpose Processors as Processor Arrays

Cell architecture

Page 63: General Purpose Processors as Processor Arrays

Specialization of LaborHigh Level / Domain-Specific

Language

Computational ModelExposes Comm. Topology

ISA Network

FPGA

Fabrication

APPLICATIONPROGRAMMER

COMPILER

COMPUTERARCHITECT

COMPUTERENGINEER

ELECTRICAL &COMPUTERENGINEER