30
EE382N: Embedded Sys Dsgn/Modeling Lecture 10 © 2015 A. Gerstlauer 1 EE382N: Embedded System Design and Modeling Andreas Gerstlauer Electrical and Computer Engineering University of Texas at Austin [email protected] Lecture 10 – System-Level Synthesis EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 2 Lecture 10: Outline Synthesis System-level synthesis process Design Automation Automatic model refinement & model generation System-on-Chip Environment (SCE) Tool flow Setup and tutorial

EE382N: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382n_f15/notes/lecture10.pdf · EE382N: Embedded System Design and Modeling ... •uCOS-II •DSP • Custom,

  • Upload
    voliem

  • View
    240

  • Download
    2

Embed Size (px)

Citation preview

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 1

EE382N:Embedded System Design and Modeling

Andreas GerstlauerElectrical and Computer Engineering

University of Texas at [email protected]

Lecture 10 – System-Level Synthesis

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 2

Lecture 10: Outline

• Synthesis

• System-level synthesis process

• Design Automation

• Automatic model refinement & model generation

• System-on-Chip Environment (SCE)

• Tool flow

• Setup and tutorial

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 2

Synthesis

• Gajski’s Y-Chart

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 3

Behavior StructureSystem Synthesis

Processor

Logic

• Gajski’s Y-Chart

Synthesis

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 4

Behavior

Structure

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 3

• Gajski’s Y-Chart• Platform-based design (the other Y Chart)

Synthesis

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 5

Behavior

Structure

ArchitectureConstraints

Behavior /Function

• X-Chart

Synthesis

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 6

Behavior

Structure

ArchitectureConstraints

Behavior /Function

Quality

Specification

Implementation

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 4

• X-Chart• System-Level Synthesis

Synthesis

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 7

Synthesis

Specification Model

RefinementDecisionMaking

Hardware/Software Synthesis

ArchitectureBehavior

Platforms, IP Databases

Implementation Model

QualityStructure Performance Estimates

Transaction-Level Model

(TLM)

Model of Computation

(MoC)

Source: A. Gerstlauer, C. Haubelt, A. Pimentel, et al., “Electronic System-Level Synthesis Methodologies,“ TCAD, 2009.

Hardware vs. Software

• Double-roof model

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 8

system

component

logic

task

instructionarchitecture

RTL

gate

Hardware

Arch

ISA

Software

Implementation

Specification

Source: A. Gerstlauer, C. Haubelt, A. Pimentel, et al., “Electronic System-Level Synthesis Methodologies,“ TCAD, 2009.

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 5

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 9

System Design Evolution

Platform HW Dev. SW Dev. App. Dev. PrototypeBoard+ BSP

Board

HW Dev.

Platform Platform Modeling

VirtualPlatform Board

+ BSP

SW Dev.

App. Dev.

VP

Prototype

Pre

sen

tP

ast

Fu

ture Sys.

Gen.TLM

ASIC/FPGATools

Board+ BSP+ App

SW Gen.

HW Gen.Prototype

Platform

C/MoC

ApplicationDeveloper

Source: S. Abdi, Concordia Univ.

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 10

Design Automation

• Synthesis = Decision making + model refinement

Successive, stepwise model refinement Layers of implementation detail

Refinement

Model n

DB

Model n+1

Specification model

Implementation model

Optim. algorithm

GUI

Design decisions

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 6

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 11

Specification model

Computation model

Communication model

Implementation model

Validation GUI

Test

Simulate

Compile

Test

Simulate

Test

Simulate

Simulate

Test

Design Automation (1): Modeling

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 12

Specification model

Computation model

Design decisions

Communication model

Comp. refinement

Comm. refinement

PE Allocation

Beh/Var Partitioning

Scheduling

Refinement GUI

Net Allocation

Channel partitioning

Spec. optimization

Prot. insertion

Design decisions

Design decisionsProc. refinement

Implementation model

Capture

RTL/SWHW alloc/sched/bind

SW Compilation

IF generation

Browsing

Alg. selection

Comp. / IP

Protocols

Validation GUI

Test

Verify

Simulate

Compile

Test

Simulate

Test

Verify

Simulate

Simulate

Verify

Test

Design Automation (2): Refinement

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 7

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 13

Specification model

Computation modelEvaluation

Profiling

Profiling data

Design decisions

Communication model

Comp. refinement

Comm. refinement

PE Allocation

Beh/Var Partitioning

Scheduling

Refinement GUI

Net Allocation

Channel partitioning

Evaluation results

Spec. optimization

Prot. insertion

Design decisions

Evaluation

Evaluation results

Design decisionsProc. refinement

Implementation model

Capture

RTL/SWHW alloc/sched/bind

SW Compilation

IF generation

Browsing

Alg. selection

Comp. / IP

Protocols

Validation GUI

Test

Verify

Simulate

Compile

Test

Simulate

Profile

Test

Verify

Simulate

Simulate

Verify

Test

Design Automation (3): Exploration

Estimate

Estimate

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 14

Specification model

Computation modelEvaluation

Profiling

Profiling data

Design decisions

Communication model

Comp. synthesis

Comp. refinement

Comm. synthesis

Comm. refinement

PE Allocation

Beh/Var Partitioning

Scheduling

Refinement GUI

Net Allocation

Channel partitioning

Evaluation results

Spec. optimization

Prot. insertion

Design decisions

Evaluation

HW/SW synthesis

Evaluation results

Design decisionsProc. refinement

Implementation model

Capture

RTL/SWHW alloc/sched/bind

SW Compilation

IF generation

Browsing

Alg. selection

Comp. / IP

Protocols

Validation GUI

Test

Verify

Simulate

Compile

Test

Simulate

Profile

Test

Verify

Simulate

Simulate

Verify

Test

Design Automation (4): Synthesis

Estimate

Estimate

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 8

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 15

System-On-Chip Environment (SCE)

ArchnArchnTLMn

Impln

Spec

ImplnImpln

ISS

CP

U

Mem

IPHW

Bri

dg

e

Arb

iter CPU Bus DSP Bus

B5

DS

P

HALRTOS

B1

ISS

HALRTOS

B2,B3

B4

Synthesize target HW/SW

Compile onto MPSoC platform

Mem

IPHW

Bri

dg

e

CPU Bus DSP Bus

B3v1v2

B5B4

DSP

C4C2C1

OS + Drv

CPU

OS + Drv

Coren Coren

Coren Coren

Core1 Coren

B2B1

C3

Specification

System Design

SWDB

Systemmodels

CPUn.bin

Implementation Model

CE/BusModels

TLMnTLMnTLMi

Hardware Synthesis

Software Synthesis

RTLDB

RTLnRTLnRTLnISSnISSnISSn CPUn.binCPUn.bin

HWn.vHWn.vHWn.v

Design Decisions

Architecture Exploration

Scheduling Exploration

Network Exploration

Communication Synthesis

PE/OSModels

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 16

Abstract Programming Model

• Hierarchical process graph• Sequential processes

– ANSI C code

• Parallel-serial composition– Dependencies, Fork-join

• Abstract inter-process communication• Communication channels

– Message-passing, queues, etc.

• Shared variables

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 9

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 17

System Transaction-Level Model (TLM)

Generate MPSoC TLM simulation model

Fast and accurate for exploration and verification

• Compile onto multi-core/multi-processor platform

• Implement computation on processors/cores and busses

• Generate code for communication over bus network

Mem

IPHW

Bri

dg

e

CPU Bus DSP Bus

B3v1v2

B5B4

DSP

C4C2C1

OS + Drv

CPU

OS + Drv

Coren Coren

Coren Coren

Core1 Coren

B2B1

C3

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 18

System Implementation

• Synthesize hardware and software for each processor• High-level/behavioral RTL and interface synthesis

– Allocation, scheduling, binding

• Software synthesis and RTOS targeting– Code generation, firmware synthesis, cross-compile & link

CP

U

IPHW

Bri

dg

e

Arb

iter

DS

P

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 10

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 19

Cellphone Example: Hardware Platform

• 2 Subsystems• ARM7TDMI• Motorola DSP 56600k

• 4 Busses• AMBA AHB• DSP bus• IP & memory busses

• 2 Accelerator HW/IP blocks• DCT IP• Custom

codebook HW• 10 I/O HW blocks

DCT(IP)

TX

CPU(ARM7)

M1(SRAM)

M1Ctrl

I/O4(HW)

CoPro(HW)

DSP(DSP56k)

MBUS

BUS1 (AMBA AHB) BUS2 (DSP)

S

SS

S

M

S

M2/S

M1 M

Arb

ite

r1

Arb

IP BridgeM

S

DCTBus

S

I/O3(HW)

S

I/O2(HW)

S

I/O1(HW)

S

S

DMA(2753A)

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 20

Cellphone Example: Software Platform

• Operating systems• ARM7

• uCOS-II• DSP

• Custom, interrupt-driven multi-tasking kernel

DCT

TX

ARM7

M1Ctrl

I/O4

HW

DSP56k

MBUS

BUS1 (AMBA AHB) BUS2 (DSP)

Arb

ite

r1

IP Bridge

DCTBusI/O3I/O2I/O1

DMA

M1

uCOS-II Custom RTOS

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 11

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 21

Cellphone Example: Application

• Computation• ARM7

• MP3 Decoding• Jpeg Encoding

• DSP• GSM Transcoding

DCT

TX

ARM7

M1Ctrl

I/O4

HW

DSP56k

MBUS

BUS1 (AMBA AHB) BUS2 (DSP)

Arb

ite

r1

IP Bridge

DCTBusI/O3I/O2I/O1

DMA

M1

Enc DecJpeg

Codebk

SI BO BI SO

DCT

MP3

uCOS-II Custom RTOS

Application

Test code

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 22

Cellphone Example: Application

DCT

TX

ARM7

M1Ctrl

I/O4

HW

DSP56k

MBUS

BUS1 (AMBA AHB) BUS2 (DSP)

Arb

ite

r1

IP Bridge

DCTBusI/O3I/O2I/O1

DMA

M1

Enc DecJpeg

Codebk

SI BO BI SO

DCT

v1

C2

C1

C3

C4

C5

C6

C7

C8

C0

MP3

• Computation• ARM7

• MP3 Decoding• Jpeg Encoding

• DSP• GSM Transcoding

• Communication• Shared memory

– Camera frames• Message-passing

– MP3 and speech streaming

uCOS-II Custom RTOS

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 12

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 23

ComputationDesign

Design Methodology

Specification model

Algor.IP

Proto.IP

Computation model

Communication refinement

Communication model

Comp.IP

Estimation

ValidationAnalysis

Compilation Simulation model

Estimation

ValidationAnalysis

Compilation Simulation model

Estimation

ValidationAnalysis

Compilation Simulation model

Implementation model

Softwarecompilation

Interfacesynthesis

Hardwaresynthesis

Estimation

ValidationAnalysis

Compilation Simulation model

RTOSIP

RTLIP

Computation refinement

Capture

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 24

Computation Design (1)

• Architecture exploration• Allocation of Processing Elements (PE)

– Type and number of processors

– Type and number of custom hardware blocks

– Type and number of system memories

• Mapping to PEs– Map each behavior to a PE

– Map each complex channel to a PE

– Map each variable to a PE

Architecture model

Concurrent PEs

Abstract channels andmemory interfaces

IP1M1

B2. . . . . .. . . . . .

c2

c1

Mem

v1v3

PE2

B3

PE1

B1

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 13

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 25

Cellpone Example: Architecture Model

RcvData

Co-process

SpchOut

DCT

stripe[]

Decoder

JPEG Coder

SerOut

SpchIn

SerIn

DSP

Mem

DMA

HW

DCT_IP

BI

BO

SO

SICtrl

ARM7

DC

TA

dap

ter

Vocoder

MP3

Partitioning, synchronization, message-passing, RPC

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 26

Computation Design (2)

• Scheduling exploration

• Static scheduling of behaviors into sequential tasks– Group (flatten) behaviors into tasks

– Determine fixed execution order of behaviors in each task

• Dynamic scheduling of concurrent tasks by RTOS– Choose scheduling policy, i.e. round-robin or priority-based

– For each set of tasks, determine task priorities

Scheduled model

Abstract OS model insoftware PEs

IP1M1

B2. . . . . .. . . . . .

c2

c1

Mem

v1v3

PE2_OS

B3

OS Model

PE1

B1

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 14

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 27

Cellphone Example: Scheduled Model

RcvData

Co-process

SpchOut

DCT

stripe[]

Decoder

JPEG Coder

OSModel

SerOut

SpchIn

SerIn

DSPARM_OS

Mem

DMA

HW

DCT_IP

BI

BO

SO

SICtrl

ARM7

DSP_OS

DC

TA

dap

ter

Vocoder

MP3

OSModel

Scheduling, task refinement, OS model insertion

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 28

CommunicationDesign

Design Methodology

Specification model

Algor.IP

Proto.IP

Computation model

Communication refinement

Communication model

Comp.IP

Estimation

ValidationAnalysis

Compilation Simulation model

Estimation

ValidationAnalysis

Compilation Simulation model

Estimation

ValidationAnalysis

Compilation Simulation model

Implementation model

Softwarecompilation

Interfacesynthesis

Hardwaresynthesis

Estimation

ValidationAnalysis

Compilation Simulation model

RTOSIP

RTLIP

Computation refinement

Capture

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 15

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 29

Communication Design (1)

• Network exploration

• Allocation of system network– Type (protocols) and number of system busses

– Type and number of CEs (bridges and transducers, if applicable)

– System connectivity

• Routing of channels over busses– Map each communication channel to a system bus

(or an ordered list of busses, if applicable)

Network model

PEs + CEsMiddleware stacks

Point-to-point linksUntyped packet transfers

Untyped memory interfaces

M1Ctrl

TX

IP_TXMAC

intprotocol

IP1M1_LK

B2

PE2_OS

B3

OS Model

PE1

B1

L1 L2

Mem

char[512]

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 30

Cellphone Example: Network Model

ARM_OS

ARM7

DMA

DMA_HW

DCT

DCT_IP

M

S

DSP_OS

DSP

OSModel

HW

SI

SI_HW

BI

BI_HW

SO

SO_HW

BO

BO_HW

T

MSlinkBri

l

DC

TA

da

pte

r

link

DM

A

linkBri

linkBI

linkHW

linkSI

linkBO

linkSO

Mem

OSModel

Data conversion, channel merging

CE insertion, packeting, routing

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 16

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 31

Communication Design (2)

Maste

rPro

to Sla

veP

roto

mac m

ac

Mem

Pro

toco

l

IPP

roto

col

• Communication synthesis

• Assignment of bus parameters– Address mapping for each channel and memory interface

– Synchronization mechanism for each channel(dedicated or shared interrupts, polling)

– Transfer mode for each channel/interface (regular, burst, DMA)

Transaction-level model (TLM)

PEs + CEs + BussesProtocol stacks

Abstract bus channelsBus transactions + interrupts

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 32

Maste

rPro

to Sla

veP

roto

mac m

ac

Mem

Pro

toco

l

IPP

roto

col

Ma

sterP

roto S

lave

Pro

to

ma

c ma

c

Communication Design (2)

• Communication synthesis

• Assignment of bus parameters– Address mapping for each channel and memory interface

– Synchronization mechanism for each channel(dedicated or shared interrupts, polling)

– Transfer mode for each channel/interface (regular, burst, DMA)

Transaction-level model (TLM)

PEs + CEs + BussesProtocol stacks

Abstract bus channelsBus transactions + interrupts

Pin-accurate model (PAM)

Physical bus structureBit-accurate pins and wires

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 17

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 33

Cellphone Example: TLM

Synchronization, addressing, media acces

Arbitration, data slicing, interrupt handling

me

mm

ac

ARM_OS

ARM_HAL

AD

DR ARM7

DMAm

ac

me

m

DMA_HW

AD

DR

Mem

mem

Mem_HW

AD

DR

DCT

DCT_IP

cfMaster

cfSlave

ah

bP

roto

co

l

Bri

dg

e

DSP_OS

DS

P_

HA

L

DSP

AD

DR

OSModel

ma

c

intA intB intC intD

HW

ma

c

AD

DR

HW_HW

SI

ma

c

AD

DR

SI_HW

BI

mac

AD

DR

BI_HW

SO

ma

c

AD

DR

SO_HW

BO

ma

c

AD

DR

BO_HW

ma

c

AD

DR

T

dsp

Master

ds

pS

lave

dspProtocol

ma

c

AD

DR

pollPOLL_ADDR

pollPOLL_ADDR

pollPOLL_ADDR

poll POLL_ADDR

OSModel

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 34

Cellphone Example: Pin-Accurate Model

Implementation synthesis in backend tools

Interface and high-level synthesis on hardware side

Firmware, RTOS and C synthesis on the software side

ARM_BF

ISR

l

PIC

ARM_OS

ARM_HALARM_HW

AD

DR ARM7

DMA

DMA_BF

AD

DR

AD

DR

Mem

Mem_BF

AD

DR

DCT

DCT_IP

Arbiter

T_BF

DS

P_

BF

PIC

DSP_OS

DS

P_H

AL DS

P_H

W

DSP

l

AD

DR

OSModel

ISR

HW

AD

DR

HW_BF

SI

SI_BF

BI

ADDR,POLL_ADDR

BI_BF

SO

SO_BF

BO

BO_BF

Bridge

AD

DR

AD

DR

ADDR,POLL_ADDR

ADDR,POLL_ADDR

ADDR,POLL_ADDR

OSModel

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 18

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 35

Cellphone Example: Single-Processor Results

• MP3 decoder on ARM

• 55 MP3 frames

• JPEG encoder on ARM

• 30 116x96 pictures

• GSM vocoder on DSP

• 163 speech frames encoded/decoded

TLM speed

• 2000 Mcycles/s peak

• 300-600 Mcycl/s sustained

TLM accuracy

• <3% average frame timing error 0

5

10

15

20

25

Spec. N-TLM P-TLM BFM ISS/RTL

Ave

rag

e E

rro

r [%

]

MP3

JPEG

GSM

0.01

0.1

1

10

100

1000

10000

100000

Spec. N-TLM P-TLM BFM ISS/RTL

Sim

ula

tio

n T

ime

[s]

MP3

JPEG

GSM

Spec. Net. TLM PAM Impl.

Spec. Net. TLM PAM Impl.

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 36

Cellphone Example: Multi-Processor Results

• Experimental setup

• 1.5 second MP3

• 640x480 picture

• 1.5 speech GSM

3s / 300M ARM cycles / 180M DSP cycles

0

5

10

15

20

25

30

35

Appl. Task FW TLM BFM ISS

Ave

rag

e E

rro

r [%

]

1

10

100

1000

10000

100000

Sim

ula

tio

n T

ime [

s]Avg. Error

Sim. Time

0

5

10

15

20

25

30

35

Appl. Task FW TLM BFM ISS

Ave

rag

e E

rro

r [%

]

1

10

100

1000

10000

100000

Sim

ula

tio

n T

ime [

s]Avg. Error

Sim. Time

• TLM simulation speed

• 300 Mcycles/s

• TLM accuracy

• <3% error

Prototyping and exploration with 100% fidelity

Spec Arch Net TLM PAM ISSSpec. Sched. Net. TLM PAM Impl.

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 19

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 37

SCE Exploration Results

• Suite of industrial size examples

ExampleSystem Architecture(masters → slaves)

Model size (LOC) Generation timeSpec Arch Net PAM

JPEG A1 CF→HW 1806 2732 2780 4642 1.01 s

Vocoder

A1 DSP→HW

7385

9594 9775 10679 4.77s

A2 DSP→HW1,HW2 9632 9913 10989 5.21s

A3 DSP→HW1,HW2,HW3 9659 9949 11041 5.76s

Mp3float

A1 CF→HW1

6900

28190 28204 29807 5.86s

A2CF→HW1,HW2,HW3HW1↔HW3HW2↔HW3

28275 28633 31172 6.18s

A3CF →HW1, HW2, HW3, HW4HW1↔HW3↔HW5HW2↔HW4↔HW5

28736 30202 32795 17.69s

Mp3fix

A1 ARM→2 I/O

13363

17131 17270 21593 3.85s

A2 ARM→2 I/O, LDCT, RDCT 18300 18564 23228 7.01s

A3ARM→2 I/O, LDCT, RDCTLDCT↔I/ORDCT↔I/O

18748 19079 24471 7.40s

Baseband A1DSP→HW, 4 I/O, TCF, DMA→Mem, BR, T, DMABR→DCT_IP

11481 17020 17685 21711 8.99s

Cellphone A1ARM→4 I/O, 2 DCT, TLDCT, RDCT→I/ODSP→HW, 4 I/O, T

16441 21936 22570 30072 9.49s

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 38

Lecture 10: Outline

Synthesis

System-level synthesis process

Design Automation

Automatic model refinement & model generation

• System-on-Chip Environment (SCE)

Tool flow

• Setup and tutorial

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 20

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 39

System-on-Chip Environment (SCE)

• Server and accounts

• ECE LRC Linux servers– Labs (ENS 507) or remote access (ssh)

• SpecC software (© by CECS, UCI)– /usr/local/packages/sce-20100908

– module load sce

• SCE tool set

• GUI– sce, sced, scchart

• Scripting– sce_allocate / sce_map / sce_schedule / …

• Documentation ($SPECC/doc)– SCE Manual (online via Help->Manual)

– SCE Tutorial (PDF or html)– SCE Specification Reference Manual (SpecRM.pdf)

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 40

SCE Tool Flow

• SCE Components:• Graphical frontend

(sce, scchart)

• Editor (sced)

• Compiler and simulator (scc)

• Profiling and analysis (scprof)

• Architecture refinement (scar)

• RTOS refinement (scos)

• Network refinement (scnr)

• Communication refinement (sccr)

• RTL refinement (scrtl)

• Software refinement (sc2c)

• Scripting interface (scsh)

• Tools and utilities ...

Architecture model

Specification model

Architecture Exploration

PE Allocation

Beh/Var/Ch Partitioning

Scheduled model

Scheduling Exploration

Static Scheduling

OS Task Scheduling

Network model

Network Exploration

Bus Network Allocation

Channel Mapping

Communication model

Communication Synth.

Bus Addressing

Bus Synchronization

PEDatabase

CEDatabase

BusDatabase

OSDatabase

GU

I / S

crip

ting

TLM

RTL Synthesis

Datapath Allocation

Scheduling/Binding

SW Synthesis

Code Generation

Compile and Link

RTLDB

SWDB

Implementation model

VerilogVerilog

VerilogBinary

BinaryBinary

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 21

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 41

SCE Main Window

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 42

SCE Source Editor

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 22

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 43

SCE Hierarchy Displays

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 44

SCE Compiler and Simulator

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 23

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 45

SCE Profiling and Analysis

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 46

Design Space Exploration

Explore and trim Gradually prune design space

Exploration Space

Design Time

Profilingstage

Estimationstage

Evaluationstage

...

...

One-timeretargeting Implt

dependentsimulation/analysis

Implt independentsimulation

Profiling

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 24

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 47

Evaluation Flow

Specmodel

Instrumentation

Simulation

Speccharacteristics

FeedbackValidationRefinement

Pro

filin

g

Refinedmodel

Simulation/analysis

SystemQuality Metrics

Eva

luat

ion

Backannotation

Refinement Retargeting

ComponentEstimates

Designdecision

Est

imat

ion

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 48

Profiling

• Input specification MoC

• Hierarchy

• Computation & communication

• Multi-dimensional analysis

• Multi-entities– Behavior, channel, port, variable

• Multi-metrics– Operation, traffic, storage

– Static, dynamic

• Multi-levels– Application, transaction, bus-

functional

vcB1 B2

B

Profiling

Profiled App.

Simulation

Instr. Appl

Static Analysis

Counters

Instrumentation

Application

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 25

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 49

Profiling

• Instrumentation-based profiling

• Bb: The execution counts of basic block b

– Enumerate execution paths

• Cb,i,d: No. of computed characteristics for item type iand data type d in the block b

• Data type i: float, int, ..

• Item type d: metric-dependent

Specification metrics

Ri,d = bCb,i,d Bb

R = idRi,d

int b,c;

if( a = 0){

b++;

}

else{

b++;

c++;

}

R++,int= i [ Bi * Ci,++,int ]

= 1 * 1 + 3 * 2

= 7

B1 = 1

B3 = 3

C1,++,int = 1

C3,++,int = 2

Source: L. Cai, A. Gerstlauer, D. Gajski, “Retargetable Profiling for Rapid, Early System-Level Design Space Exploration,“ DAC, 2004.

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 50

Retargeting

• Target machine model• Wi,d : weights of

components which the entity mapped to

– Manual– Simulation– Complex cost function/

algorithm

Implementation estimates E = id(Ri,d * Wi,d) Time complexity: O(n)

vcB1 B2

B

PE1 PE2

Mem

R(B1)++,int= 7

W(PE1)++,int= 1

E(B1,PE1)++,int= 7 x 1 = 7

Source: L. Cai, A. Gerstlauer, D. Gajski, “Retargetable Profiling for Rapid, Early System-Level Design Space Exploration,“ DAC, 2004.

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 26

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 51

Computational complexity of top-level Vocoder behaviors:

LP_Analysis Open_Loop Closed_Loop Codebook Update 377.0 MOp 337.1MOp 478.7 MOp 646.4 MOp 43.6 MOp

Codebook operation mix:(x, int) (+, int) (-, int) (/,int) (others,int) 46.2% 33.5% 9.1% 7.1% 4.1%

HW acceleration

Floating –point not required

Dedicated hardware multipliers

Vocoder Profiling

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 52

Mapping of 8 top-level encoder behaviors onto ColdFire + DSP + HW 85:04h for 6561 alternatives (1.7s simulation + 3s refinement each) 100% fidelity

SW (20.0, 30.73 ms)

HW (144.1, 12.24 ms)

10 ms

15 ms

20 ms

25 ms

30 ms

35 ms

10 30 50 70 90 110 130 150 170

Tran

sco

din

g d

elay

Cost

Vocoder Design Space Exploration

Timing constraint

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 27

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 53

GSM Vocoder Tutorial

• Enhanced Full Rate (EFR) standard (GSM 06.10, ETSI)• Lossy voice encoding/decoding for mobile telephony

– Speech synthesis model

– Input: speech samples @ 104 kbit/s

– Frames of 4 x 40 = 160 samples (4 x 5ms = 20ms of speech)

– Output: encoded bit stream @ 12.2 kbit/s (244 bits / frame)

• Timing constraint– 20ms per frame (total of 3.26s for sample speech file)

SpecC specification model

• 73 leaf, 29 hierarchical behaviors (9 par, 10 seq, 10 fsm)

• 9139 lines of SpecC code (~13000 lines of original C)

Short-termsynthesis filter

+Delay/ Adaptive codebook

10th-order LP filter

Speech

Fixed codebook

Long-termpitch filter

Residual pulses

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 54

Vocoder Specification Model

Filter memory

update

Closed-loop

pitch search

Algebraic (fixed)

pitch search

Linear prediction

(LP) analysis

Open loop

codebook search����

����

����

����

��

���

���

�����

�����

��

����

��

��������

��������

��

decode_12k2

Post_Filter

Bits2prm_12k2

Decode

LP parameters

4 subframes

bits

speech[160]

A(z)

synth[40]

synth[40]

prm[57]

prm[13]

decoder

��

��

��

��

��

����

����

���

���

����

����

����

�� ��

����

����

����

��

��

��

��

��������

��������

��������

��������

��������

��������

mem

ory

2x per frame

A(z)

2 subframes

prm2bits_12k2

pre_process

sample

prm[57]

bits

speech[160]

coder

vocoderspeech_in bits_in

speech_outbits_out

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 28

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 55

Vocoder Computation Model

res

Motorola DSP56600

Custom hardware

data

����������������

speech_in

Codebook

HW

prm_in���

���

���

��� Bits2PrmSpeech_In

cdbk_res

cdbk_data

���

���

��������

���

���

��������

��������

speech_out

��������

Speech_Out���

���

��������

���

���

��������

Prm2Bits���

���

���

���

prm_out

Post_Filter

Decode_12k2

D_lsp

Pre_process

LP_analysis

Open_loop

Closed_loop

Start_codebook

Wait_codebook

Update

res

data

speech

prm_in

prm

speechprm

prm_in

synth_out

speech_in

bits_out

DSP

bits_in

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 56

Vocoder Communication Model

nWR

Data[23:0]

intC

Addr[15:0]

MCS

nRD

Bus InterfaceBus Driver

Bus InterfaceBus Interface Bus Interface Bus Interface

HWCodebook

Speech_In Bits2Prm Prm2Bits Speech_Out

DSP

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 29

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 57

Vocoder Implementation Model

Cycle-accurate co-simulation

• DSP instruction-set simulator (ISS)• 70,500 lines of assembly code (running @ 60MHz)

• RTL SpecC for hardware• 45,000 lines of VHDL RTL code (running @ 100MHz)

nWR

Data[23:0]

intC

Addr[15:0]

MCS

nRD

CodebookFSMD

MotorolaDSP56600

ISSOBJ

DSP HW

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 58

Vocoder Results

• Productivity gain: 12 months vs. 1 hour = 2000x

Comm Impl

AutomatedUser / Refine

ManualModified

lines

Spec Arch

Arch Comm

3,275

914

30 min / < 2 min5~6 month

5 min / < 0.5 min

3~4 month

6,146

15 min / < 1 min

1~2 month

Refinement Effort

Total 50 min / < 4 min9~12 month10,355

1

10

100

1000

10000

100000

Spec Arch Comm Impl

No

rmal

ized

Tim

e

Simulation Speed

02000400060008000

1000012000140001600018000

Spec Arch Comm Impl

Nu

mb

er o

f L

ines

Code Size

EE382N: Embedded Sys Dsgn/Modeling Lecture 10

© 2015 A. Gerstlauer 30

EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 59

Lecture 10: Summary

• System-level synthesis

• X-chart

• Design Automation

• Synthesis = decision making + refinement

• Automated refinement flow

• Flow of models at increasing level of detail

• System-on-Chip Environment (SCE)

• Automatic model generation