23
Design Methodology for Design Methodology for Customizable Programmable Customizable Programmable Processors Processors Berkeley – Finland Day, Oct. 18, 2002 Berkeley – Finland Day, Oct. 18, 2002 Prof. Jarmo Takala Institute of Digital and Computer Systems Tampere University of Technology Tampere, Finland Tel: +358 – 33115 3879; Email: [email protected]

Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002

  • Upload
    jafari

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002. Prof. Jarmo Takala Institute of Digital and Computer Systems Tampere University of Technology Tampere, Finland Tel: +358 – 33115 3879; Email: [email protected]. Outline. Motivation - PowerPoint PPT Presentation

Citation preview

Page 1: Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002

Design Methodology for Customizable Design Methodology for Customizable Programmable ProcessorsProgrammable Processors

Berkeley – Finland Day, Oct. 18, 2002Berkeley – Finland Day, Oct. 18, 2002

Prof. Jarmo TakalaInstitute of Digital and Computer Systems

Tampere University of TechnologyTampere, Finland

Tel: +358 – 33115 3879; Email: [email protected]

Page 2: Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002

J.Takala/TUT Berkeley – Finland Day, Oct.18, 2002

OutlineOutline

MotivationTransport Triggered Architecture (TTA)Design Methodology for TTAsResearch at TUTConclusions

Page 3: Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002

J.Takala/TUT Berkeley – Finland Day, Oct.18, 2002

MotivationMotivation

Programmable processors often used in products using digital signal processing (DSP)Flexibility

Ease of verification

Traditionally DSP processor architectures have been developed based on average performance in several benchmark tasks (~100)User applications often contain only subset of total

benchmarks

Efficiency can be improved by customizing architecture according to given tasks

Page 4: Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002

J.Takala/TUT Berkeley – Finland Day, Oct.18, 2002

MotivationMotivationDSP applications are often hard realtime

constrainedexecution should be deterministicdynamic runtime behaviours should be avoided

Static scheduling lends itself to DSP

Current design complexities call for increase in designer productivity

High level languages should be used

DSP algorithms contain inherent parallelism

Instruction level parallelism (ILP) should be maximized

Page 5: Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002

J.Takala/TUT Berkeley – Finland Day, Oct.18, 2002

What is needed?What is needed?

Application driven design process with easy design space exploration

Replace hardware complexity by software complexityCompiler driven process

Use templated architectureFlexible

heterogeneous function units

Modularscalability

Orthogonalcompiler friendly

Page 6: Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002

J.Takala/TUT Berkeley – Finland Day, Oct.18, 2002

Choices for Architecture TemplateChoices for Architecture Template

FrontendFrontend

Application

sequential(superscalar)

dependence

(dataflow)

independence

(EPIC)

independence

(VLIW)

Compilation time(Software)

Determine DependenciesDetermine Dependencies

Determine IndependenciesDetermine Independencies

Bind Function UnitsBind Function Units

Determine DependenciesDetermine Dependencies

Determine IndependenciesDetermine Independencies

Bind Function UnitsBind Function Units

Bind Datapaths & ExecuteBind Datapaths & Execute

Run time(Hardware)

ILP Architectures

Page 7: Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002

J.Takala/TUT Berkeley – Finland Day, Oct.18, 2002

VLIW Gained Popularity in DSPVLIW Gained Popularity in DSP

Re

gis

ter

File

Inst

ruct

ion

Fet

ch

Inst

ruct

ion

Dec

ode

Dat

a M

emor

y

Inst

ruct

ion

Mem

ory

Byp

assi

ng

Net

wo

rkCPU

FU-1

FU-2

FU-3

FU-4

FU-5

Page 8: Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002

J.Takala/TUT Berkeley – Finland Day, Oct.18, 2002

Transport Triggered ArchitectureTransport Triggered Architecture

VLIW drawbacksBypass complexityRegister file complexityRegister file design restricts FU flexibilityOperation encoding format restricts FU flexibility

Reverse programming paradigm [H. Corporaal, 94]

data transport operation

Instruction set contains only a single instruction: move

Page 9: Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002

J.Takala/TUT Berkeley – Finland Day, Oct.18, 2002

From VLIW to TTAFrom VLIW to TTA

Re

gis

ter

File

Byp

assi

ng

Net

wo

rkVLIW

Inst

ruct

ion

Fet

ch

Inst

ruct

ion

Dec

ode

Inst

ruct

ion

Mem

ory

FU-1

FU-2

FU-3

FU-4

FU-5

Dat

a M

emor

y

Inst

ruct

ion

Fet

ch

Inst

ruct

ion

Dec

ode

Byp

assi

ng

Net

wo

rk

FU-1

FU-2

FU-3

FU-4

FU-5

RegisterFileTTA

Page 10: Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002

J.Takala/TUT Berkeley – Finland Day, Oct.18, 2002

TTA DatapathTTA Datapath

IntegerALU

IntegerALU

FloatALU

Boolean RF

Float RF

Integer RF

Socket

Instruction Memory

Data Memory

Load/StoreUnit

Load/StoreUnit

Immediate Unit

Instruction Unit

Page 11: Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002

J.Takala/TUT Berkeley – Finland Day, Oct.18, 2002

Function UnitsFunction Units

Operands written to operand registers (O)

Operation performed when last operand written to trigger register (T)

Pipeline synchronized with control bits (C)

Standard interface FU_ready Result_ready Global_lock

T

optional

Optional shadow register

O

logic

logic

R

logic

C

C

C

C

Page 12: Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002

J.Takala/TUT Berkeley – Finland Day, Oct.18, 2002

ILP ArchitecturesILP Architectures

FrontendFrontend

Application

sequential(superscalar)

dependence

(dataflow)

independence

(EPIC)

independence

(VLIW)

Compilation time

independence

(TTA)

Determine DependenciesDetermine Dependencies

Determine IndependenciesDetermine Independencies

Bind Function UnitsBind Function Units

Bind DatapathsBind Datapaths

ExecuteExecute

Determine DependenciesDetermine Dependencies

Determine IndependenciesDetermine Independencies

Bind Function UnitsBind Function Units

Bind DatapathsBind Datapaths

Run time

Page 13: Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002

J.Takala/TUT Berkeley – Finland Day, Oct.18, 2002

TTA Characteristics: HWTTA Characteristics: HW

ModularCan be constructed with standard building blocks

Very flexible and scalableFU functionality can be arbitrarySupports user defined Special Function Units (SFU)

Lower complexityReduction on # register portsReduced bypass complexityReduction in bypass connectivityReduced register pressureTrivial decoding (implies long instructions)

Page 14: Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002

J.Takala/TUT Berkeley – Finland Day, Oct.18, 2002

TTA Characteristics: SWTTA Characteristics: SW

Traditional operation-triggered instruction:

Transport-triggered instruction:

Reminds dataflow and time-stationary coding

mul r1,r2,r3;

r1mul.o; r2mul.t; mul.rr3;

r1mul.o, r2mul.t; mul.rr3;

or

Page 15: Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002

J.Takala/TUT Berkeley – Finland Day, Oct.18, 2002

TTA Design ToolsTTA Design Tools

Design tools based on TTA architecture template have been developed at Delft University of Technology (DUT), Delft, the NetherlandsMOVE project lead by Prof. Henk CorporaalFully parametric C/C++ Compiler

buses, connections, function units, register files, etc.

Design space explorerProcessor generator

Page 16: Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002

J.Takala/TUT Berkeley – Finland Day, Oct.18, 2002

Sequential Simulator

Sequential Simulator

Code Generation TrajectoryCode Generation Trajectory

I/O

Parallel Code

GCC or SUIF

Profiling Data

Parallel SimulatorParallel

Simulator

Compiler BackendCompiler Backend

Sequential Code

Application (C/C++)Ar

chite

ctur

e De

scrip

tion Compiler

FrontendCompiler Frontend

I/O

(MOVE Project at DUT)

Page 17: Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002

J.Takala/TUT Berkeley – Finland Day, Oct.18, 2002

TTA Specific OptimizationsTTA Specific OptimizationsTTA allows extra scheduling optimizationsE.g., software bypassing

Bypassing can eliminate the need of RF access

However, more difficult to schedule !

Example: r1 → add.o, r2 → add.t;add.r → r3;r3 → sub.o, r4 → sub.tsub.r → r5;

Translates to: r1 → add.o, r2 → add.t;add.r → sub.o, r4 → sub.t;sub.r → r5;

Page 18: Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002

J.Takala/TUT Berkeley – Finland Day, Oct.18, 2002

ResourceOptimization

ConnectivityOptimization

Design Space ExplorationDesign Space Exploration

Application(C/C++)

Application(C/C++)

Map&ScheduleMap&Schedule

FrontendFrontend

FU modelsCost Functions

FU modelsCost Functions

SimulatorSimulator

Resources(Mach)

Resources(Mach)

Map&ScheduleMap&Schedule

Design Point

SimulatorSimulator

Design Points

Select ResourcesSelect Resources

Reduce ConnectionsReduce Connections

(MOVE Project at DUT)

Page 19: Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002

J.Takala/TUT Berkeley – Finland Day, Oct.18, 2002

Exploration: Resourse OptimizationExploration: Resourse Optimization

Pareto curve represents the lowest bound of found architecture configurations

Selected architecture for further optimization

(MOVE Project at DUT)

IRUIRU

ALU ALU

IU

LSU

IU

LSU

IU

LSU

Page 20: Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002

J.Takala/TUT Berkeley – Finland Day, Oct.18, 2002

ExpExplloration: oration: CConnectivity onnectivity OptimizationOptimization(MOVE Project at DUT)

Reduced connections decrease bus delay

Critical connections have been removed

IRUIRU

ALU ALU

IU

LSU

IU

LSU

IU

LSU

Page 21: Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002

J.Takala/TUT Berkeley – Finland Day, Oct.18, 2002

Topics to be InvestigatedTopics to be Investigated Poor code density

good target for code compression techniques apriori information of application, thus instruction propabilities known

Estimations Power estimation Fast estimations with sufficient accuracy

Flexibity, reuse Applications may change, thus additional resources need to assigned

although not needed by the original application Tool-assisted special function unit generation

Analysis support Model creation support Characterization support

Parameterized processor generator Interconnections, control, etc. maybe realized in several ways depending on

the target Low-power optimizations

Clustered TTAs Interprocessor communication schemes

These topics considered in FlexDSP Project at TUT

Page 22: Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002

J.Takala/TUT Berkeley – Finland Day, Oct.18, 2002

Code Compression

Code Compression

New Design EnvironmentNew Design Environment

Functionality(C/C++)

Functionality(C/C++)

OperationAnalysis

OperationAnalysis

Parametric CompilerParametric Compiler Parametric Processor Generator

Parametric Processor Generator

ParallelObject Code

HDLCode

FrontendFrontend

Design SpaceExploration

Design SpaceExploration

FU models(C, HDL)

Cost Functions (area, power,

speed)

FU models(C, HDL)

Cost Functions (area, power,

speed)ResourceConstraints

ResourceConstraints

TTA Processor

SFU GenerationSFU Generation

Target of FlexDSP Project at TUT

Page 23: Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002

J.Takala/TUT Berkeley – Finland Day, Oct.18, 2002

ConclusionsConclusions Design methodologies allowing processor

customization will improve efficiency in certain application areas, e.g., multimedia, telecom

TTA is a promising candidate for architectural template for customized processors In particular, support for custom function units allows

powerful tailoring Results of MOVE project at DUT have already proven

the concept Parameterized compiler allows tool-assisted design space

exploration Still more research needed on

Hardware implementations Enhanced compiler strategies