31
Confidential Dynamically Programmable Array Architecture Robert Heaton Obsidian Technology

Dynamically Programmable Array Architecture

  • Upload
    fahim

  • View
    65

  • Download
    0

Embed Size (px)

DESCRIPTION

Dynamically Programmable Array Architecture. Robert Heaton Obsidian Technology. Mesh of Trees. PU. PU. PU. PU. Busses are BI-directional 2 Cycles to exchange data Separate X and Y dimensions Diagonal routing not directly supported - PowerPoint PPT Presentation

Citation preview

Page 1: Dynamically Programmable Array Architecture

Confidential

Dynamically Programmable Array Architecture

Dynamically Programmable Array Architecture

Robert Heaton

Obsidian Technology

Page 2: Dynamically Programmable Array Architecture

Confidential

Mesh of TreesMesh of Trees Busses are BI-directional 2 Cycles to exchange data Separate X and Y dimensions Diagonal routing not directly

supported PU’s difficult to program to

take advantage of structure

PU PU

PU PU

PU PU

PU

PU PU

PU PU PU

PU PU PU PU

Page 3: Dynamically Programmable Array Architecture

Confidential

Two Dimensional MeshTwo Dimensional Mesh

PU

PU PU

PUPU

PU PU

PU

PU

PU PU

PU PU

PU PU

PU

Page 4: Dynamically Programmable Array Architecture

Confidential

4x4 Hierarchical Cluster4x4 Hierarchical Cluster

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

RU

Page 5: Dynamically Programmable Array Architecture

Confidential

Simple 4x4 Cluster WiringSimple 4x4 Cluster Wiring

Bus width = 140u for 16 bit busses

That is a lot of wires!

Budget 4x4 Cluster area is 1mm2

PU PU PU PU

N

Hin1

Hadr12L-2

Hout1

Switch

1.4

6*N W

ires

Joint

M2 Pitch

Page 6: Dynamically Programmable Array Architecture

Confidential

Routing HierarchyRouting Hierarchy 256 PUs 4 Levels of hierarchy

Hadr: up level till L0adr: local address L1adr: level 1 address L2adr: level 2 address L3adr: level 3 address

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

RU1

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

RU1

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

RU1

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

RU1

RU2

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

RU1

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

RU1

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

RU1

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

RU1

RU2

RU3

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

RU1

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

RU1

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

RU1

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

RU1

RU2

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

RU1

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

RU1

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

RU1

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

PU

PU

PU

PU

RU

RU1

RU2

Hadr L0adr L1adr L2adr L3adr

Page 7: Dynamically Programmable Array Architecture

Confidential

Weeks Investigation (9/12/97)Weeks Investigation (9/12/97)

Investigate routing structures Dynamic routing assignment/programming Compromise between area and flexibility Support for tree of trees

Not a complete story yet!

Page 8: Dynamically Programmable Array Architecture

Confidential

Routing UnitRouting Unit

Full Duplex connect busses Each PU node controls its source port via a 2 bit local or 6 bit hierarchical address

Broadcast support Any node may listen to any

other input to the cluster Hierarchical node addressing

must not clash

ProcessUnit(PU)

ProcessUnit(PU)

ProcessUnit(PU)

ProcessUnit(PU)

RoutingUnit(RU)

Page 9: Dynamically Programmable Array Architecture

Confidential

Routing Unit PU Port DetailRouting Unit PU Port Detail

Port numbering is clockwise & relative to each PU port

HBUS port is always at port 3

from port 0from port 1from port 2from port H

PU Input

PU Output

PU Input address

6

N

N

2

4

to other ports

&

s0

s1

Page 10: Dynamically Programmable Array Architecture

Confidential

PU OverviewPU Overview

Simple data path functionality Primitive control options Wide instructions control data path function

and operand routing Conditions may be inverted for “repeat until”

or “Branch If” control Very primitive address arithmetic 32 or less instructions in program

Page 11: Dynamically Programmable Array Architecture

Confidential

N Bit Functional UnitN Bit Functional Unit

Logic functions: OR, XOR, AND, 0, 1 Arithmetic: Add, subtract, Multiply Shifts: single bit left and right Conditional detection: 0, -1, <0, >0.

More optimization needed Routing issues need more work

ALU/MULT

DFF

Bit Shift

CarryLogic

Constbit

ALUCTL

mux0 mux1

mux2

A

F

CinCout

LSin RSin

SFTCTL

Constbit

Page 12: Dynamically Programmable Array Architecture

Confidential

N Bit Functional Unit (V2)N Bit Functional Unit (V2)

Logic functions: OR, XOR, AND, 0, 1 Arithmetic: Add, subtract Shifts: right and left shifts Conditional detection: 0, <0, >0, OF

Memory mapped RAM access to operands

ALU

DFF

B Shift

CarryLogic

ALUCTL

mux0 mux1

mux2

Out

CinCout

LSin RSin

SFTCTL

N b it RAM

Operands

N b it RAM

MultiplySequencer

Page 13: Dynamically Programmable Array Architecture

Confidential

Instruction FieldsInstruction Fields

?? + XN Bits per context

Field Comment BitsALU_CTL Control of Basic ALU Functions 5

SHIFT_CTL Control of the operand shift 2MUX_CTL Control operand muxes 3

BRANCH_ADR Next address if condition true 2COND_MSK Condition mask 5COND_FLD Condition field 5

EXT_COND_SRC Select source for external condition inputs 2HEIR_ADDR Hierarchical routing level address 2

L0_ADDR Level 0 source address 2L1_ADDR Level 1 source address 2L2_ADDR Level 2 source address 2L3_ADDR Level 3 source address 2

Page 14: Dynamically Programmable Array Architecture

Confidential

PU Instruction TypesPU Instruction TypesData Process 00 ALU_CTL, SFT_CTL, MUX_CTL, ROUTE_CTL

Move 01

Immediate OperandMultiply 100

Operand_ValueOP_SEL

Invert +ve OF-ve zero X1 X0 Condition Mask Ext’ Source Sel

15 Bits

R/W

OptionsOP_SEL

Condition Field:

Hadr L0adr L1adr L2adr L3adr

ROUTE_CTL Field:

Attention 101 Options FlagCondition Branch_Adr

Branch 110 Options LinkCondition Branch_Adr

32 Bits

Page 15: Dynamically Programmable Array Architecture

Confidential

Condition FieldCondition Field

X[1:0] are external condition bits & may be source from: Operand bits Global synchronization bus Nearest nabough conditions outputs

Condition Mask is anded with flag bits

Invert +ve OF-ve zero X1 X0 Condition Mask Ext’ Source Sel

15 Bits

Condition Field:

Page 16: Dynamically Programmable Array Architecture

Confidential

Static ProgramStatic Program

PU Never changes function Branch is set to always true Just two Instructions

Data Process

Branch

AlwaysAdr +1

Page 17: Dynamically Programmable Array Architecture

Confidential

More Typical ProgramMore Typical Program

Page 18: Dynamically Programmable Array Architecture

Confidential

Open IssuesOpen Issues

PU Data path width Complexity of shift operations RU Trunking Number of contexts per PU Flexible context RAM partitioning Improve PU synchronization

Page 19: Dynamically Programmable Array Architecture

Confidential

Shifter InstructionsShifter Instructions

Page 20: Dynamically Programmable Array Architecture

Confidential

Design ToolsDesign Tools

PU Assembler Architecture mapping Global resource allocation

Page 21: Dynamically Programmable Array Architecture

Confidential

Conditional N Bit PU CellConditional N Bit PU Cell

ALU/MULT

DFF

Bit Shift

CarryLogic

Constbit

ALUCTL

mux0 mux1

mux2

A B

F

CinCout

LSin RSin

SFTCTLRA

M

ColS

el

ConditionLogic

EXT[1:0]

AddressLogic

Branch

Cout

Cin

RSin LSin

Input

Out

Port address

Page 22: Dynamically Programmable Array Architecture

Confidential

Commercial ViabilityCommercial Viability

X5 performance improvement over conventional solutions (mix of cost & power)

Conceptually simple Clearly defined target applications Simple systems connections Scaleable Support hardware & software standards

Page 23: Dynamically Programmable Array Architecture

Confidential

Conditional N Bit DPA CellConditional N Bit DPA Cell

ALU

DFF

Bit Shift

CarryLogic

Constbit

ALUCTL

mux0 mux1

mux2

A B

F

CinCout

LSin RSin

SFTCTLRA

M

ColS

el

ConditionLogic

EXT[1:0]

AddressLogic

Branch

Routing Matrix

Routing Matrix

Rou

ting M

atrix

Rou

ting M

atrix

Cout

Cin

RSin LSin

4 Bit Cell:180 Gates112 Bits RAM

Page 24: Dynamically Programmable Array Architecture

Confidential

N Bit Wide DPAN Bit Wide DPA

N bit wide FUStatusReg

A B

CCondition Logic

N bit wide FUCondition Logic

A B

C

N bit wide FUCondition Logic

A B

FU DecodeM PlaneRAM

StatusReg

FU DecodeM PlaneRAM

StatusReg

FU DecodeM PlaneRAM

Program

Storage

Program

Storage

Program

Storage

Page 25: Dynamically Programmable Array Architecture

Confidential

N Bit Wide PU BlockN Bit Wide PU Block

N bit wide ALUStatusReg Condition Logic

A B

I DecodeAddrLogic

InstRAM

N Bit wide Shift

NOTES/QUESTIONS- Inst has no const, but has offsets,- Inst RAM can be small. 64 words? - note counter takes 3 instructions.- How much subroutine support? None?- Simplified 16 bit or full 32 bit instructions.- 2 or 4 local area busses?- Synchronization issue: Master states accessible, Cond mask use.- Option to break or combine N bit DP elements?- Resource pool on busses? E.g... MULT?- Approx.. size of 32 bit FU 800u x 500u? - If so a 16x8 processor array is possible. - I.e.. 128 processors at 100MHz = 12800MIPS- Turn off till global state instruction for power reduction- Handling of interrupts (if at all) - Handle global signal interrupts how?- Multiple bit wide segmentation through masks? E.g... 2 counter in one PU?

Local RAM

Arbit

Arbit

StateH

ierBus

BusW

BusX

PipeBus

PipeBus

Status Msk Source A Source B Shift OpOP Code

Instruction Format

Page 26: Dynamically Programmable Array Architecture

Confidential

Potential ConfigurationPotential Configuration

128 32 Bit “Pico” Process Units 12800MIPS @ 100MHz 80mm2 in 0.35u CMOS Concept of hierarchical hardware

scope Very fast streaming operations Simple PU programming model Applications:

Video processing LAN Routing DSP Fast Prototyping

16 x 8 PU ARRAY

MUX/DMA/FIFO

RAMBUS Interface

Controller 256GlobalRam

Page 27: Dynamically Programmable Array Architecture

Confidential

PU Program EnvironmentPU Program Environment

Operands: BusW, BusX, Accumulator, HierBus, PipeBus, Local Ram. Use PU Typically runs a small program

– May be as little as two instructions

– 64 words of code maximum

Instruction types:Arithmetic, logicalData movingInterrupt

Function InstructionsArithmetic 1

Counter 1-2Mux 1

Multiply Accumulate 3FIFO Stage 3

Multiport Register 1Shift Register 2

Page 28: Dynamically Programmable Array Architecture

Confidential

Architecture Figures of MeritArchitecture Figures of Merit

Average density vs application specific cells

Speed of applications vs hardwired logic Percentage reuse

Page 29: Dynamically Programmable Array Architecture

Confidential

Next StepsNext Steps

VHDL Modeling of Architecture Primitive assembler tools for PUs Selection coding and simulation of

applications Architecture tuning Layout and verification of complete DPA

Page 30: Dynamically Programmable Array Architecture

Confidential

Design ToolsDesign Tools

Tanner:Schematic entry, logic simulation, custom layout,

layout verification.Circuit Simulation.PC & Sun platforms.MOSIS Libraries.

Mentor Graphics:VHDL compilation and simulation.

Page 31: Dynamically Programmable Array Architecture

Confidential

Basic FU RoutingBasic FU Routing

FU FU

FU FU

FU

FU

FU

FU

FU FUFU FU