Upload
voliem
View
240
Download
2
Embed Size (px)
Citation preview
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 1
EE382N:Embedded System Design and Modeling
Andreas GerstlauerElectrical and Computer Engineering
University of Texas at [email protected]
Lecture 10 – System-Level Synthesis
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 2
Lecture 10: Outline
• Synthesis
• System-level synthesis process
• Design Automation
• Automatic model refinement & model generation
• System-on-Chip Environment (SCE)
• Tool flow
• Setup and tutorial
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 2
Synthesis
• Gajski’s Y-Chart
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 3
Behavior StructureSystem Synthesis
Processor
Logic
• Gajski’s Y-Chart
Synthesis
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 4
Behavior
Structure
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 3
• Gajski’s Y-Chart• Platform-based design (the other Y Chart)
Synthesis
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 5
Behavior
Structure
ArchitectureConstraints
Behavior /Function
• X-Chart
Synthesis
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 6
Behavior
Structure
ArchitectureConstraints
Behavior /Function
Quality
Specification
Implementation
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 4
• X-Chart• System-Level Synthesis
Synthesis
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 7
Synthesis
Specification Model
RefinementDecisionMaking
Hardware/Software Synthesis
ArchitectureBehavior
Platforms, IP Databases
Implementation Model
QualityStructure Performance Estimates
Transaction-Level Model
(TLM)
Model of Computation
(MoC)
Source: A. Gerstlauer, C. Haubelt, A. Pimentel, et al., “Electronic System-Level Synthesis Methodologies,“ TCAD, 2009.
Hardware vs. Software
• Double-roof model
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 8
system
component
logic
task
instructionarchitecture
RTL
gate
Hardware
Arch
ISA
Software
Implementation
Specification
Source: A. Gerstlauer, C. Haubelt, A. Pimentel, et al., “Electronic System-Level Synthesis Methodologies,“ TCAD, 2009.
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 5
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 9
System Design Evolution
Platform HW Dev. SW Dev. App. Dev. PrototypeBoard+ BSP
Board
HW Dev.
Platform Platform Modeling
VirtualPlatform Board
+ BSP
SW Dev.
App. Dev.
VP
Prototype
Pre
sen
tP
ast
Fu
ture Sys.
Gen.TLM
ASIC/FPGATools
Board+ BSP+ App
SW Gen.
HW Gen.Prototype
Platform
C/MoC
ApplicationDeveloper
Source: S. Abdi, Concordia Univ.
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 10
Design Automation
• Synthesis = Decision making + model refinement
Successive, stepwise model refinement Layers of implementation detail
Refinement
Model n
DB
Model n+1
Specification model
Implementation model
Optim. algorithm
GUI
Design decisions
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 6
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 11
Specification model
Computation model
Communication model
Implementation model
Validation GUI
Test
Simulate
Compile
Test
Simulate
Test
Simulate
Simulate
Test
Design Automation (1): Modeling
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 12
Specification model
Computation model
Design decisions
Communication model
Comp. refinement
Comm. refinement
PE Allocation
Beh/Var Partitioning
Scheduling
Refinement GUI
Net Allocation
Channel partitioning
Spec. optimization
Prot. insertion
Design decisions
Design decisionsProc. refinement
Implementation model
Capture
RTL/SWHW alloc/sched/bind
SW Compilation
IF generation
Browsing
Alg. selection
Comp. / IP
Protocols
Validation GUI
Test
Verify
Simulate
Compile
Test
Simulate
Test
Verify
Simulate
Simulate
Verify
Test
Design Automation (2): Refinement
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 7
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 13
Specification model
Computation modelEvaluation
Profiling
Profiling data
Design decisions
Communication model
Comp. refinement
Comm. refinement
PE Allocation
Beh/Var Partitioning
Scheduling
Refinement GUI
Net Allocation
Channel partitioning
Evaluation results
Spec. optimization
Prot. insertion
Design decisions
Evaluation
Evaluation results
Design decisionsProc. refinement
Implementation model
Capture
RTL/SWHW alloc/sched/bind
SW Compilation
IF generation
Browsing
Alg. selection
Comp. / IP
Protocols
Validation GUI
Test
Verify
Simulate
Compile
Test
Simulate
Profile
Test
Verify
Simulate
Simulate
Verify
Test
Design Automation (3): Exploration
Estimate
Estimate
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 14
Specification model
Computation modelEvaluation
Profiling
Profiling data
Design decisions
Communication model
Comp. synthesis
Comp. refinement
Comm. synthesis
Comm. refinement
PE Allocation
Beh/Var Partitioning
Scheduling
Refinement GUI
Net Allocation
Channel partitioning
Evaluation results
Spec. optimization
Prot. insertion
Design decisions
Evaluation
HW/SW synthesis
Evaluation results
Design decisionsProc. refinement
Implementation model
Capture
RTL/SWHW alloc/sched/bind
SW Compilation
IF generation
Browsing
Alg. selection
Comp. / IP
Protocols
Validation GUI
Test
Verify
Simulate
Compile
Test
Simulate
Profile
Test
Verify
Simulate
Simulate
Verify
Test
Design Automation (4): Synthesis
Estimate
Estimate
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 8
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 15
System-On-Chip Environment (SCE)
ArchnArchnTLMn
Impln
Spec
ImplnImpln
ISS
CP
U
Mem
IPHW
Bri
dg
e
Arb
iter CPU Bus DSP Bus
B5
DS
P
HALRTOS
B1
ISS
HALRTOS
B2,B3
B4
Synthesize target HW/SW
Compile onto MPSoC platform
Mem
IPHW
Bri
dg
e
CPU Bus DSP Bus
B3v1v2
B5B4
DSP
C4C2C1
OS + Drv
CPU
OS + Drv
Coren Coren
Coren Coren
Core1 Coren
B2B1
C3
Specification
System Design
SWDB
Systemmodels
CPUn.bin
Implementation Model
CE/BusModels
TLMnTLMnTLMi
Hardware Synthesis
Software Synthesis
RTLDB
RTLnRTLnRTLnISSnISSnISSn CPUn.binCPUn.bin
HWn.vHWn.vHWn.v
Design Decisions
Architecture Exploration
Scheduling Exploration
Network Exploration
Communication Synthesis
PE/OSModels
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 16
Abstract Programming Model
• Hierarchical process graph• Sequential processes
– ANSI C code
• Parallel-serial composition– Dependencies, Fork-join
• Abstract inter-process communication• Communication channels
– Message-passing, queues, etc.
• Shared variables
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 9
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 17
System Transaction-Level Model (TLM)
Generate MPSoC TLM simulation model
Fast and accurate for exploration and verification
• Compile onto multi-core/multi-processor platform
• Implement computation on processors/cores and busses
• Generate code for communication over bus network
Mem
IPHW
Bri
dg
e
CPU Bus DSP Bus
B3v1v2
B5B4
DSP
C4C2C1
OS + Drv
CPU
OS + Drv
Coren Coren
Coren Coren
Core1 Coren
B2B1
C3
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 18
System Implementation
• Synthesize hardware and software for each processor• High-level/behavioral RTL and interface synthesis
– Allocation, scheduling, binding
• Software synthesis and RTOS targeting– Code generation, firmware synthesis, cross-compile & link
CP
U
IPHW
Bri
dg
e
Arb
iter
DS
P
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 10
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 19
Cellphone Example: Hardware Platform
• 2 Subsystems• ARM7TDMI• Motorola DSP 56600k
• 4 Busses• AMBA AHB• DSP bus• IP & memory busses
• 2 Accelerator HW/IP blocks• DCT IP• Custom
codebook HW• 10 I/O HW blocks
DCT(IP)
TX
CPU(ARM7)
M1(SRAM)
M1Ctrl
I/O4(HW)
CoPro(HW)
DSP(DSP56k)
MBUS
BUS1 (AMBA AHB) BUS2 (DSP)
S
SS
S
M
S
M2/S
M1 M
Arb
ite
r1
Arb
IP BridgeM
S
DCTBus
S
I/O3(HW)
S
I/O2(HW)
S
I/O1(HW)
S
S
DMA(2753A)
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 20
Cellphone Example: Software Platform
• Operating systems• ARM7
• uCOS-II• DSP
• Custom, interrupt-driven multi-tasking kernel
DCT
TX
ARM7
M1Ctrl
I/O4
HW
DSP56k
MBUS
BUS1 (AMBA AHB) BUS2 (DSP)
Arb
ite
r1
IP Bridge
DCTBusI/O3I/O2I/O1
DMA
M1
uCOS-II Custom RTOS
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 11
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 21
Cellphone Example: Application
• Computation• ARM7
• MP3 Decoding• Jpeg Encoding
• DSP• GSM Transcoding
DCT
TX
ARM7
M1Ctrl
I/O4
HW
DSP56k
MBUS
BUS1 (AMBA AHB) BUS2 (DSP)
Arb
ite
r1
IP Bridge
DCTBusI/O3I/O2I/O1
DMA
M1
Enc DecJpeg
Codebk
SI BO BI SO
DCT
MP3
uCOS-II Custom RTOS
Application
Test code
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 22
Cellphone Example: Application
DCT
TX
ARM7
M1Ctrl
I/O4
HW
DSP56k
MBUS
BUS1 (AMBA AHB) BUS2 (DSP)
Arb
ite
r1
IP Bridge
DCTBusI/O3I/O2I/O1
DMA
M1
Enc DecJpeg
Codebk
SI BO BI SO
DCT
v1
C2
C1
C3
C4
C5
C6
C7
C8
C0
MP3
• Computation• ARM7
• MP3 Decoding• Jpeg Encoding
• DSP• GSM Transcoding
• Communication• Shared memory
– Camera frames• Message-passing
– MP3 and speech streaming
uCOS-II Custom RTOS
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 12
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 23
ComputationDesign
Design Methodology
Specification model
Algor.IP
Proto.IP
Computation model
Communication refinement
Communication model
Comp.IP
Estimation
ValidationAnalysis
Compilation Simulation model
Estimation
ValidationAnalysis
Compilation Simulation model
Estimation
ValidationAnalysis
Compilation Simulation model
Implementation model
Softwarecompilation
Interfacesynthesis
Hardwaresynthesis
Estimation
ValidationAnalysis
Compilation Simulation model
RTOSIP
RTLIP
Computation refinement
Capture
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 24
Computation Design (1)
• Architecture exploration• Allocation of Processing Elements (PE)
– Type and number of processors
– Type and number of custom hardware blocks
– Type and number of system memories
• Mapping to PEs– Map each behavior to a PE
– Map each complex channel to a PE
– Map each variable to a PE
Architecture model
Concurrent PEs
Abstract channels andmemory interfaces
IP1M1
B2. . . . . .. . . . . .
c2
c1
Mem
v1v3
PE2
B3
PE1
B1
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 13
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 25
Cellpone Example: Architecture Model
RcvData
Co-process
SpchOut
DCT
stripe[]
Decoder
JPEG Coder
SerOut
SpchIn
SerIn
DSP
Mem
DMA
HW
DCT_IP
BI
BO
SO
SICtrl
ARM7
DC
TA
dap
ter
Vocoder
MP3
Partitioning, synchronization, message-passing, RPC
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 26
Computation Design (2)
• Scheduling exploration
• Static scheduling of behaviors into sequential tasks– Group (flatten) behaviors into tasks
– Determine fixed execution order of behaviors in each task
• Dynamic scheduling of concurrent tasks by RTOS– Choose scheduling policy, i.e. round-robin or priority-based
– For each set of tasks, determine task priorities
Scheduled model
Abstract OS model insoftware PEs
IP1M1
B2. . . . . .. . . . . .
c2
c1
Mem
v1v3
PE2_OS
B3
OS Model
PE1
B1
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 14
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 27
Cellphone Example: Scheduled Model
RcvData
Co-process
SpchOut
DCT
stripe[]
Decoder
JPEG Coder
OSModel
SerOut
SpchIn
SerIn
DSPARM_OS
Mem
DMA
HW
DCT_IP
BI
BO
SO
SICtrl
ARM7
DSP_OS
DC
TA
dap
ter
Vocoder
MP3
OSModel
Scheduling, task refinement, OS model insertion
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 28
CommunicationDesign
Design Methodology
Specification model
Algor.IP
Proto.IP
Computation model
Communication refinement
Communication model
Comp.IP
Estimation
ValidationAnalysis
Compilation Simulation model
Estimation
ValidationAnalysis
Compilation Simulation model
Estimation
ValidationAnalysis
Compilation Simulation model
Implementation model
Softwarecompilation
Interfacesynthesis
Hardwaresynthesis
Estimation
ValidationAnalysis
Compilation Simulation model
RTOSIP
RTLIP
Computation refinement
Capture
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 15
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 29
Communication Design (1)
• Network exploration
• Allocation of system network– Type (protocols) and number of system busses
– Type and number of CEs (bridges and transducers, if applicable)
– System connectivity
• Routing of channels over busses– Map each communication channel to a system bus
(or an ordered list of busses, if applicable)
Network model
PEs + CEsMiddleware stacks
Point-to-point linksUntyped packet transfers
Untyped memory interfaces
M1Ctrl
TX
IP_TXMAC
intprotocol
IP1M1_LK
B2
PE2_OS
B3
OS Model
PE1
B1
L1 L2
Mem
char[512]
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 30
Cellphone Example: Network Model
ARM_OS
ARM7
DMA
DMA_HW
DCT
DCT_IP
M
S
DSP_OS
DSP
OSModel
HW
SI
SI_HW
BI
BI_HW
SO
SO_HW
BO
BO_HW
T
MSlinkBri
l
DC
TA
da
pte
r
link
DM
A
linkBri
linkBI
linkHW
linkSI
linkBO
linkSO
Mem
OSModel
Data conversion, channel merging
CE insertion, packeting, routing
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 16
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 31
Communication Design (2)
Maste
rPro
to Sla
veP
roto
mac m
ac
Mem
Pro
toco
l
IPP
roto
col
• Communication synthesis
• Assignment of bus parameters– Address mapping for each channel and memory interface
– Synchronization mechanism for each channel(dedicated or shared interrupts, polling)
– Transfer mode for each channel/interface (regular, burst, DMA)
Transaction-level model (TLM)
PEs + CEs + BussesProtocol stacks
Abstract bus channelsBus transactions + interrupts
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 32
Maste
rPro
to Sla
veP
roto
mac m
ac
Mem
Pro
toco
l
IPP
roto
col
Ma
sterP
roto S
lave
Pro
to
ma
c ma
c
Communication Design (2)
• Communication synthesis
• Assignment of bus parameters– Address mapping for each channel and memory interface
– Synchronization mechanism for each channel(dedicated or shared interrupts, polling)
– Transfer mode for each channel/interface (regular, burst, DMA)
Transaction-level model (TLM)
PEs + CEs + BussesProtocol stacks
Abstract bus channelsBus transactions + interrupts
Pin-accurate model (PAM)
Physical bus structureBit-accurate pins and wires
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 17
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 33
Cellphone Example: TLM
Synchronization, addressing, media acces
Arbitration, data slicing, interrupt handling
me
mm
ac
ARM_OS
ARM_HAL
AD
DR ARM7
DMAm
ac
me
m
DMA_HW
AD
DR
Mem
mem
Mem_HW
AD
DR
DCT
DCT_IP
cfMaster
cfSlave
ah
bP
roto
co
l
Bri
dg
e
DSP_OS
DS
P_
HA
L
DSP
AD
DR
OSModel
ma
c
intA intB intC intD
HW
ma
c
AD
DR
HW_HW
SI
ma
c
AD
DR
SI_HW
BI
mac
AD
DR
BI_HW
SO
ma
c
AD
DR
SO_HW
BO
ma
c
AD
DR
BO_HW
ma
c
AD
DR
T
dsp
Master
ds
pS
lave
dspProtocol
ma
c
AD
DR
pollPOLL_ADDR
pollPOLL_ADDR
pollPOLL_ADDR
poll POLL_ADDR
OSModel
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 34
Cellphone Example: Pin-Accurate Model
Implementation synthesis in backend tools
Interface and high-level synthesis on hardware side
Firmware, RTOS and C synthesis on the software side
ARM_BF
ISR
l
PIC
ARM_OS
ARM_HALARM_HW
AD
DR ARM7
DMA
DMA_BF
AD
DR
AD
DR
Mem
Mem_BF
AD
DR
DCT
DCT_IP
Arbiter
T_BF
DS
P_
BF
PIC
DSP_OS
DS
P_H
AL DS
P_H
W
DSP
l
AD
DR
OSModel
ISR
HW
AD
DR
HW_BF
SI
SI_BF
BI
ADDR,POLL_ADDR
BI_BF
SO
SO_BF
BO
BO_BF
Bridge
AD
DR
AD
DR
ADDR,POLL_ADDR
ADDR,POLL_ADDR
ADDR,POLL_ADDR
OSModel
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 18
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 35
Cellphone Example: Single-Processor Results
• MP3 decoder on ARM
• 55 MP3 frames
• JPEG encoder on ARM
• 30 116x96 pictures
• GSM vocoder on DSP
• 163 speech frames encoded/decoded
TLM speed
• 2000 Mcycles/s peak
• 300-600 Mcycl/s sustained
TLM accuracy
• <3% average frame timing error 0
5
10
15
20
25
Spec. N-TLM P-TLM BFM ISS/RTL
Ave
rag
e E
rro
r [%
]
MP3
JPEG
GSM
0.01
0.1
1
10
100
1000
10000
100000
Spec. N-TLM P-TLM BFM ISS/RTL
Sim
ula
tio
n T
ime
[s]
MP3
JPEG
GSM
Spec. Net. TLM PAM Impl.
Spec. Net. TLM PAM Impl.
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 36
Cellphone Example: Multi-Processor Results
• Experimental setup
• 1.5 second MP3
• 640x480 picture
• 1.5 speech GSM
3s / 300M ARM cycles / 180M DSP cycles
0
5
10
15
20
25
30
35
Appl. Task FW TLM BFM ISS
Ave
rag
e E
rro
r [%
]
1
10
100
1000
10000
100000
Sim
ula
tio
n T
ime [
s]Avg. Error
Sim. Time
0
5
10
15
20
25
30
35
Appl. Task FW TLM BFM ISS
Ave
rag
e E
rro
r [%
]
1
10
100
1000
10000
100000
Sim
ula
tio
n T
ime [
s]Avg. Error
Sim. Time
• TLM simulation speed
• 300 Mcycles/s
• TLM accuracy
• <3% error
Prototyping and exploration with 100% fidelity
Spec Arch Net TLM PAM ISSSpec. Sched. Net. TLM PAM Impl.
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 19
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 37
SCE Exploration Results
• Suite of industrial size examples
ExampleSystem Architecture(masters → slaves)
Model size (LOC) Generation timeSpec Arch Net PAM
JPEG A1 CF→HW 1806 2732 2780 4642 1.01 s
Vocoder
A1 DSP→HW
7385
9594 9775 10679 4.77s
A2 DSP→HW1,HW2 9632 9913 10989 5.21s
A3 DSP→HW1,HW2,HW3 9659 9949 11041 5.76s
Mp3float
A1 CF→HW1
6900
28190 28204 29807 5.86s
A2CF→HW1,HW2,HW3HW1↔HW3HW2↔HW3
28275 28633 31172 6.18s
A3CF →HW1, HW2, HW3, HW4HW1↔HW3↔HW5HW2↔HW4↔HW5
28736 30202 32795 17.69s
Mp3fix
A1 ARM→2 I/O
13363
17131 17270 21593 3.85s
A2 ARM→2 I/O, LDCT, RDCT 18300 18564 23228 7.01s
A3ARM→2 I/O, LDCT, RDCTLDCT↔I/ORDCT↔I/O
18748 19079 24471 7.40s
Baseband A1DSP→HW, 4 I/O, TCF, DMA→Mem, BR, T, DMABR→DCT_IP
11481 17020 17685 21711 8.99s
Cellphone A1ARM→4 I/O, 2 DCT, TLDCT, RDCT→I/ODSP→HW, 4 I/O, T
16441 21936 22570 30072 9.49s
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 38
Lecture 10: Outline
Synthesis
System-level synthesis process
Design Automation
Automatic model refinement & model generation
• System-on-Chip Environment (SCE)
Tool flow
• Setup and tutorial
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 20
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 39
System-on-Chip Environment (SCE)
• Server and accounts
• ECE LRC Linux servers– Labs (ENS 507) or remote access (ssh)
• SpecC software (© by CECS, UCI)– /usr/local/packages/sce-20100908
– module load sce
• SCE tool set
• GUI– sce, sced, scchart
• Scripting– sce_allocate / sce_map / sce_schedule / …
• Documentation ($SPECC/doc)– SCE Manual (online via Help->Manual)
– SCE Tutorial (PDF or html)– SCE Specification Reference Manual (SpecRM.pdf)
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 40
SCE Tool Flow
• SCE Components:• Graphical frontend
(sce, scchart)
• Editor (sced)
• Compiler and simulator (scc)
• Profiling and analysis (scprof)
• Architecture refinement (scar)
• RTOS refinement (scos)
• Network refinement (scnr)
• Communication refinement (sccr)
• RTL refinement (scrtl)
• Software refinement (sc2c)
• Scripting interface (scsh)
• Tools and utilities ...
Architecture model
Specification model
Architecture Exploration
PE Allocation
Beh/Var/Ch Partitioning
Scheduled model
Scheduling Exploration
Static Scheduling
OS Task Scheduling
Network model
Network Exploration
Bus Network Allocation
Channel Mapping
Communication model
Communication Synth.
Bus Addressing
Bus Synchronization
PEDatabase
CEDatabase
BusDatabase
OSDatabase
GU
I / S
crip
ting
TLM
RTL Synthesis
Datapath Allocation
Scheduling/Binding
SW Synthesis
Code Generation
Compile and Link
RTLDB
SWDB
Implementation model
VerilogVerilog
VerilogBinary
BinaryBinary
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 21
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 41
SCE Main Window
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 42
SCE Source Editor
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 22
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 43
SCE Hierarchy Displays
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 44
SCE Compiler and Simulator
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 23
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 45
SCE Profiling and Analysis
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 46
Design Space Exploration
Explore and trim Gradually prune design space
Exploration Space
Design Time
Profilingstage
Estimationstage
Evaluationstage
...
...
One-timeretargeting Implt
dependentsimulation/analysis
Implt independentsimulation
Profiling
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 24
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 47
Evaluation Flow
Specmodel
Instrumentation
Simulation
Speccharacteristics
FeedbackValidationRefinement
Pro
filin
g
Refinedmodel
Simulation/analysis
SystemQuality Metrics
Eva
luat
ion
Backannotation
Refinement Retargeting
ComponentEstimates
Designdecision
Est
imat
ion
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 48
Profiling
• Input specification MoC
• Hierarchy
• Computation & communication
• Multi-dimensional analysis
• Multi-entities– Behavior, channel, port, variable
• Multi-metrics– Operation, traffic, storage
– Static, dynamic
• Multi-levels– Application, transaction, bus-
functional
vcB1 B2
B
Profiling
Profiled App.
Simulation
Instr. Appl
Static Analysis
Counters
Instrumentation
Application
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 25
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 49
Profiling
• Instrumentation-based profiling
• Bb: The execution counts of basic block b
– Enumerate execution paths
• Cb,i,d: No. of computed characteristics for item type iand data type d in the block b
• Data type i: float, int, ..
• Item type d: metric-dependent
Specification metrics
Ri,d = bCb,i,d Bb
R = idRi,d
int b,c;
if( a = 0){
b++;
}
else{
b++;
c++;
}
R++,int= i [ Bi * Ci,++,int ]
= 1 * 1 + 3 * 2
= 7
B1 = 1
B3 = 3
C1,++,int = 1
C3,++,int = 2
Source: L. Cai, A. Gerstlauer, D. Gajski, “Retargetable Profiling for Rapid, Early System-Level Design Space Exploration,“ DAC, 2004.
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 50
Retargeting
• Target machine model• Wi,d : weights of
components which the entity mapped to
– Manual– Simulation– Complex cost function/
algorithm
Implementation estimates E = id(Ri,d * Wi,d) Time complexity: O(n)
vcB1 B2
B
PE1 PE2
Mem
R(B1)++,int= 7
W(PE1)++,int= 1
E(B1,PE1)++,int= 7 x 1 = 7
Source: L. Cai, A. Gerstlauer, D. Gajski, “Retargetable Profiling for Rapid, Early System-Level Design Space Exploration,“ DAC, 2004.
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 26
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 51
Computational complexity of top-level Vocoder behaviors:
LP_Analysis Open_Loop Closed_Loop Codebook Update 377.0 MOp 337.1MOp 478.7 MOp 646.4 MOp 43.6 MOp
Codebook operation mix:(x, int) (+, int) (-, int) (/,int) (others,int) 46.2% 33.5% 9.1% 7.1% 4.1%
HW acceleration
Floating –point not required
Dedicated hardware multipliers
Vocoder Profiling
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 52
Mapping of 8 top-level encoder behaviors onto ColdFire + DSP + HW 85:04h for 6561 alternatives (1.7s simulation + 3s refinement each) 100% fidelity
SW (20.0, 30.73 ms)
HW (144.1, 12.24 ms)
10 ms
15 ms
20 ms
25 ms
30 ms
35 ms
10 30 50 70 90 110 130 150 170
Tran
sco
din
g d
elay
Cost
Vocoder Design Space Exploration
Timing constraint
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 27
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 53
GSM Vocoder Tutorial
• Enhanced Full Rate (EFR) standard (GSM 06.10, ETSI)• Lossy voice encoding/decoding for mobile telephony
– Speech synthesis model
– Input: speech samples @ 104 kbit/s
– Frames of 4 x 40 = 160 samples (4 x 5ms = 20ms of speech)
– Output: encoded bit stream @ 12.2 kbit/s (244 bits / frame)
• Timing constraint– 20ms per frame (total of 3.26s for sample speech file)
SpecC specification model
• 73 leaf, 29 hierarchical behaviors (9 par, 10 seq, 10 fsm)
• 9139 lines of SpecC code (~13000 lines of original C)
Short-termsynthesis filter
+Delay/ Adaptive codebook
10th-order LP filter
Speech
Fixed codebook
Long-termpitch filter
Residual pulses
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 54
Vocoder Specification Model
Filter memory
update
Closed-loop
pitch search
Algebraic (fixed)
pitch search
Linear prediction
(LP) analysis
Open loop
codebook search����
����
����
����
��
���
���
�����
�����
��
����
��
��������
��������
��
decode_12k2
Post_Filter
Bits2prm_12k2
Decode
LP parameters
4 subframes
bits
speech[160]
A(z)
synth[40]
synth[40]
prm[57]
prm[13]
decoder
��
��
��
��
��
����
����
���
���
����
����
����
�� ��
����
����
����
��
��
��
��
��������
��������
��������
��������
��������
��������
mem
ory
2x per frame
A(z)
2 subframes
prm2bits_12k2
pre_process
sample
prm[57]
bits
speech[160]
coder
vocoderspeech_in bits_in
speech_outbits_out
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 28
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 55
Vocoder Computation Model
res
Motorola DSP56600
Custom hardware
data
����������������
speech_in
Codebook
HW
prm_in���
���
���
��� Bits2PrmSpeech_In
cdbk_res
cdbk_data
���
���
��������
���
���
��������
��������
speech_out
��������
Speech_Out���
���
��������
���
���
��������
Prm2Bits���
���
���
���
prm_out
Post_Filter
Decode_12k2
D_lsp
Pre_process
LP_analysis
Open_loop
Closed_loop
Start_codebook
Wait_codebook
Update
res
data
speech
prm_in
prm
speechprm
prm_in
synth_out
speech_in
bits_out
DSP
bits_in
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 56
Vocoder Communication Model
nWR
Data[23:0]
intC
Addr[15:0]
MCS
nRD
Bus InterfaceBus Driver
Bus InterfaceBus Interface Bus Interface Bus Interface
HWCodebook
Speech_In Bits2Prm Prm2Bits Speech_Out
DSP
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 29
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 57
Vocoder Implementation Model
Cycle-accurate co-simulation
• DSP instruction-set simulator (ISS)• 70,500 lines of assembly code (running @ 60MHz)
• RTL SpecC for hardware• 45,000 lines of VHDL RTL code (running @ 100MHz)
nWR
Data[23:0]
intC
Addr[15:0]
MCS
nRD
CodebookFSMD
MotorolaDSP56600
ISSOBJ
DSP HW
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 58
Vocoder Results
• Productivity gain: 12 months vs. 1 hour = 2000x
Comm Impl
AutomatedUser / Refine
ManualModified
lines
Spec Arch
Arch Comm
3,275
914
30 min / < 2 min5~6 month
5 min / < 0.5 min
3~4 month
6,146
15 min / < 1 min
1~2 month
Refinement Effort
Total 50 min / < 4 min9~12 month10,355
1
10
100
1000
10000
100000
Spec Arch Comm Impl
No
rmal
ized
Tim
e
Simulation Speed
02000400060008000
1000012000140001600018000
Spec Arch Comm Impl
Nu
mb
er o
f L
ines
Code Size
EE382N: Embedded Sys Dsgn/Modeling Lecture 10
© 2015 A. Gerstlauer 30
EE382N: Embedded Sys Dsgn/Modeling, Lecture 10 © 2015 A. Gerstlauer 59
Lecture 10: Summary
• System-level synthesis
• X-chart
• Design Automation
• Synthesis = decision making + refinement
• Automated refinement flow
• Flow of models at increasing level of detail
• System-on-Chip Environment (SCE)
• Automatic model generation