29
EE382V: Embedded Sys Dsgn and Modeling Lecture 10 © 2014 A. Gerstlauer 1 EE382V: Embedded System Design and Modeling Andreas Gerstlauer Electrical and Computer Engineering University of Texas at Austin [email protected] Lecture 10 – Computation Modeling & Refinement EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 2 Lecture 10: Outline Processor layers • Application • Task/OS • Firmware • Hardware Processor synthesis Software synthesis Hardware synthesis

EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 1

EE382V:Embedded System Design and Modeling

Andreas GerstlauerElectrical and Computer Engineering

University of Texas at [email protected]

Lecture 10 – Computation Modeling & Refinement

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 2

Lecture 10: Outline

• Processor layers

• Application

• Task/OS

• Firmware

• Hardware

• Processor synthesis

• Software synthesis

• Hardware synthesis

Page 2: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 2

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 3

System-On-Chip Environment (SCE)

Specification

System Design(Specify-Explore-Refine)

SWDB

Systemmodels

CPUn.bin

Implementation Model

PE/CE/BusModels

TLMnTLMnTLMi

Hardware Synthesis

Software Synthesis

RTLDB

RTLnRTLnRTLnISSnISSnISSn CPUn.bin

CPUn.binHWn.vHWn.vHWn.v

Design Decisions

ArchnArchnTLMn

Impln

Spec

ImplnImpln

Mem

IPHW

Bri

dg

e

CPU Bus DSP Bus

B3v1v2

B5B4

DSP

C4C2C1

OS + Drv

CPU

OS + Drv

Coren Coren

Coren Coren

Core1 Coren

B2B1

C3

B1 B2

OS

DrvHAL ISR

CPU

B1 B2

OS

DrvHAL ISR

CPU

Computation modeling and

refinement

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 4

General Processor Micro-Architecture

• Basic computation component is a processor (PE)

• Programmable, general-purpose software processor (CPU)

• Programmable special-purpose processor (e.g. DSPs)

• Application-specific instruction set processor (ASIP)

• Custom hardware processor

Functionality and timing (and power and …)

PE

Controller Datapath

Bus interface CLK

Control signals

Status lines∆t

Page 3: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 3

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 5

Computation Modeling (1)

• Structural RTL models

Sub-cycle accurate

HW

Controller

State

Next state logic

Output logic

Datapath

Registerfile

Memory

Bus interface CLK

FU1

CPU

Controller Datapath

Registerfile

Memory(data &progr.)

Load/store unit CLK

ALU

IR

PC

Decode

Fetch

Software processor Hardware processor

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 6

Computation Modeling (2)

• Behavioral RTL models (FSMD)• Instruction-set simulation (ISS) models

• Purely functional or micro-architectural

Cycle or timing accurate

HW

HW_CLK

CPU

CPU_CLK

HAL

ISS

RTOS

App.

Instruction set simulation (ISS) FSMD

Bin

ary

Page 4: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 4

© 2014 A. Gerstlauer 7

Computation Modeling (3)

• Host-compiled models

• Source-level application model

• Back-annotate timing and other metrics

• Abstract OS and processor models

• Transaction-level model (TLM) backplane

• C-based discrete-eventsimulation kernel [SpecC,SystemC]

Fast and accurate full-system simulation

Source: A. Gerstlauer. “Host-Compiled Simulation of Multi-Core Platforms," RSP10.

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 8

Host-Compiled Computation Layers

• Application

• Process execution (C code)

• Execution timing

• OS & processor

• Operating system– Real-time multi-tasking (RTOS model)

– Bus drivers (C code)

• Hardware abstraction layer (HAL)– Interrupt handlers

– Media accesses

• Processor hardware– Bus interfaces (I/O state machines)

– Interrupt suspension and timing

P1 P2

OS

CP

U

Drv

Interrupts

Bus

ISRHAL

Process B1(){

…waitfor(15000);…waitfor(25000);…

};

Page 5: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 5

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 9

• High-level, abstract programming model• Hierarchical process graph

– ANSI C leaf processes– Parallel-serial composition

• Abstract, typed inter-process communication

– Channels– Shared variables

Timed simulation of application functionality (SLDL)• Back-annotate timing

– Estimation or measurement(trace, ISS)

– Function or basic block levelgranularity

• Execute natively onsimulation host

– Discrete event simulator– Fast, native compiled simulation

Application Layer

Logical time

5 100

CPU

B2 C1

B1

B3C2

… … …

... void f() {

waitfor(5);...

}...

© 2014 A. Gerstlauer 10

Retargetable Back-Annotation

• Back-annotation flow • Intermediate

representation (IR)– Frontend optimizations [gcc]– IR to C conversion

• Target binary matching– Cross-compiler backend [gcc]– Control-flow graph matching

• Timing and power estimation

– Micro-architecture description language (uADL) or RTL

– Cycle-accurate timing– Reference power model

[McPAT]

• Back-annotation into IR– Basic block level

C Source Code

Frontend Optimisations

(gcc)

Intermediate Rep. (IR)

Backend

Binary

a=b=c=0;if(a<=0) { a=1; c=2; }……printf(…);

bb_2: a = 1; b = 0; c = 2; goto bb_7;bb_3:…..bb_7: printf(…);

Compile-able Intermediate Code

IR to C

Timing and

Energy Back

Annotator

bb_2: a = 1; b = 0; c = 2; incrDelay(15); incrEnergy(2); bb = BB_2; goto bb_7;bb_3: ….. incrDelay(delay[bb][BB_3]); incrEnergy(energy[bb][BB_3]); bb = BB_3;

…..

Host-Compiled (HC) Model

IR

Binary

GraphMatching

Mapping Table

Basic BlockTiming and Energy Cz.

AugmentedMapping Table

Back Annotator

uADL ISS

McPAT

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10

Source: S. Chakravarty, Z. Zhao, A. Gerstlauer. “Automated, Retargetable Back-Annotation for Host-Compiled Performance and Power Modeling," CODES+ISSS’13.

Page 6: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 6

11© 2014 A. GerstlauerEE382V: Embedded Sys Dsgn and Modeling, Lecture 10

Binary-to-Source/IR Mapping

• Compiler optimizations• Frontend

– Control flow optimizations

• Backend– Instruction scheduling/percolation

Mismatches– Capture frontend by annotating

at IR, not source– Establish binary-IR mapping

for back-annotation

Graph matching heuristic• Synchronized, recursive depth-first traversal

– Compatibility: loop and branch nesting levels– Cost: sum of unmatched nodes in subgraphs rooted at node– Return least-cost mapping between all successors (incl. skips)

• Resolve ambiguities using debug information

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 12

Timing/Energy Characterization

• Basic block characterization• Execution depends on state

– Pipeline stalls in case of hazards– Pipeline overlaps in multi-issue

• Pairwise characterization– Over all immediate predecessors – Across function hierarchy

• Timing & energy– First-to-last instruction fetch time– Resource utilization statistics

• Back-annotation into IR

• Path-dependent metrics– Capture static branch prediction

bb_2:a = 1; b = 0; c = 2;goto bb_7;

wait(15); energy(2);bb_3:…..If(prev_bb==3)

wait(25); energy(5);else if(prev_bb==1)

wait(30); energy(6);…..bb_7: printf(…);

Annotated IR

BB1 BB2

BB3

Exec flow 1

Exec flow 2

SS =A SS = BSS – Sys State

(registers, mem,

pipeline)

Page 7: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 7

Source-Level Simulation: Speed

05001000

15002000250030003500

400045005000

SHA

(Small)

SHA

(Large)

ADPCM

(Small)

ADPCM

(Large)

CRC32

(Small)

CRC32

(Large)

Sieve

MIPS

Host‐Compiled IR Source

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 13

• Automatic timing and energy back-annotation• Telecom & security

applications [MiBench]– SHA, ADPCM, CRC32 &

custom Eratosthenes’ Sieve– Small and large data sets,

10 to 700 million instr.

• One-time back-annotation– 3min. to 3s BA runtime

Back-annotated source vs. traditional ISS 2000 MIPS vs. 0.8 MIPS Close to native source-

level speeds

0s

100s

200s

300s

400s

500s

600s

700s

800s

SHA

(Small)

SHA

(Large)

ADPCM

(Small)

ADPCM

(Large)

CRC32

(Small)

CRC32

(Large)

Sieve

Runtim

HC+BA runtime ISS+McPAT runtime

Source-Level Simulation: Accuracy

• Source-level power and performance simulation

• Single- (z4-like) and dual-issue (z6-like) e200 PowerPC– No cache, static branch prediction

• Compare against cycle-accurate reference [ISS+McPAT]

>99% average timing and energy accuracy @ 2000 MIPS

Integrate back-annotation of other metrics

Performance, energy, reliability, power, thermal (PERPT)

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 14

Timing Accuracy

0.000001

0.00001

0.0001

0.001

0.01

0.1

1

10

SHA

(Small)

SHA

(Large)

ADPCM

(Small)

ADPCM

(Large)

CRC32

(Small)

CRC32

(Large)

Sieve

Error [%

Z4 z6

Energy Accuracy

0.000001

0.00001

0.0001

0.001

0.01

0.1

1

10

SHA

(Small)

SHA

(Large)

ADPCM

(Small)

ADPCM

(Large)

CRC32

(Small)

CRC32

(Large)

Sieve

Error [%

z4 z6

Page 8: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 8

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 15

OS Modeling

• High-level RTOS abstraction

• Specification is fast but inaccurate– Native execution, truly concurrent model

• Traditional ISS-based validation infeasible– Accurate but slow (esp. in multi-processor context), requires full binary

Model of operating system (task interleaving in time) High accuracy but small overhead at early stages

Focus on key effects, abstract unnecessary implementation details

Model all concepts: Multi-tasking, scheduling, preemption, interrupts, IPC

Specification System-Level Implementation

Source: A. Gerstlauer, H. Yu, D. Gajski. "RTOS Modeling for System-Level Design," DATE03.

Application

SLDL

Channels

RTOS Model

T1 T2

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 16

Operating System Layer

• Scheduling

• Group processes into tasks– Static scheduling

• Schedule tasks– Dynamic scheduling, multitasking

– Preemption, interrupt handling

– Task communication (IPC)

Scheduling refinement

• Flatten hierarchy

• Reorder behaviors

OS refinement

• Insert OS model

• Task refinement

• IPC refinement

Application

SLDL

OS Layer

P1 P2

Page 9: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 9

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 17

Abstract RTOS Model

• Emulate the sequential execution of concurrent tasks• Task scheduler

– Maintain task queues, determine task(s) to run & perform context switch

• Timing model– Simulate back-annotated task delays, call scheduler to allow for preemptions

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 18

RTOS Model Implementation• RTOS model

• OS, task, event management– Descriptors & queues

• Context switching– Block all but active task on SLDL level

• Scheduling– Select and dispatch task based on

algorithm

• Preemption– Allow rescheduling at simulation time

increases

• Event handling– Remove task temporarily from OS

while waiting for SLDL event

RTOS model library• RTOS models for different

scheduling strategies– Round robin, priority based

• Parametrizable– Task parameters (priorities)

channel OS implements OSAPI {Task current = 0;os_queue rdyq;

void dispatch(void) {current = schedule();notify(current.event);

}void yield() {task = current;dispatch();wait(task.event);

}

void time_wait(time t) {waitfor(t);yield();

}

Task pre_wait(void) {Task t = rdyq.get(current);dispatch(); return t;

}void post_wait(Task t) {rdyq.put(t);wait(t.event);

}};

1

5

10

15

20

25

schedule();

Page 10: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 10

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 19

RTOS Model Interface

interface OSAPI {

void init();void start(int sched_alg); void interrupt_return();

Task task_create(char *name, int type,sim_time period);

void task_terminate(); void task_sleep(); void task_activate(Task t); void task_endcycle();void task_kill(Task t); Task par_start();void par_end(Task t);

Task pre_wait();void post_wait(Task t);

void time_wait(sim_time nsec); };

1

5

10

15

20

Task management

OS management

Event handling

Delay modeling

• Canonical, target-independent API

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 20

Task Refinementprocess task_B2(OSAPI os) {

void main(void) {

... /* model execution delay */waitfor(BLOCK1_DELAY);...send();/* model execution delay */waitfor(BLOCK2_DELAY);

...

}

void send() {

wait(ack);

}};

1

5

10

15

20

25

os.task_terminate(h);

• Convert processes into tasks

• Task initialization– Register task with OS model

• Task activation– Wait for task start trigger from OS

• Replace delay model– Trigger rescheduling in OS

Preemption points

• Communication and synchronization

– Wrap around SLDL event handling

os.time_wait(BLOCK1_DELAY);

os.time_wait(BLOCK2_DELAY);

Task h;void task_B2(void) {h = os.task_create(“B2”,

APERIODIC, 0); }

os.task_activate(h);

t = os.pre_wait();

os.post_wait(t);

Page 11: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 11

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 21

Simulated Dynamic Behavior

C1

c1.recv()

c1.send()

Bu

s

bus.recv()

P2 P3

S1

Logical time

t0

t1

t2

t3

t5

t8

t6

t4

t7

Unscheduled

t0

t1

t2

t3

t4

t5

t6

t7

t8

Inaccuracy due to timing granularity

waitfor() waitfor()

waitfor()

waitfor()waitfor()

waitfor()

ISR

P1

waitfor()

Scheduled

C1

c1.recv()

c1.send()B

us

bus.recv()

Task P2 Task P3

S1

time_wait()

time_wait()

time_wait()

ISR

time_wait()

time_wait()

time_wait()

time_wait()

P1

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 22

OS Modeling Results

• Configurable, generic and flexible OS model

• Configurable scheduling strategies and parameters– Round-robin or priority-based scheduling

Scheduling exploration– Artificial periodic task sets, uniformly distributed periods & utilizations

– Back-annotation at 1s, 10s, 100s, or 1000s granularity

– Dual-core MIPS Malta reference platform w/ Linux 2.6 SMP kernel [OVP]

GranularityAvg. speed

per coreAvg. err.

1 s 140 MIPS 0.4 %

10 s 1500 MIPS 0.4 %

100 s 9000 MIPS 1.0 %

1000 s 29000 MIPS 8.0%

Page 12: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 12

© 2014 A. Gerstlauer 23

Speed and Accuracy Tradeoffs

• Errors in discrete preemption models

Automatic Timing Granularity Adjustment (ATGA)• Observe system state to predict preemption points• Dynamically and optimally control timing model • Transparently integrated into OS model Eliminate preemption errors

Time

Thigh

rlrh

Idle

Preemption Error

fh fl

TlowRun

Preemption Error

• Potentially large preemption errors– Not bounded by

simulation granularity

Source: P. Razaghi, A. Gerstlauer. "Predictive OS Modeling for Host-Compiled Simulation of Periodic Real-Time Task Sets," Emb. Sys. Letters ‘12.P. Razaghi, A. Gerstlauer. “Automatic Timing Granularity Adjustment for Host-Compiled Software Simulation,” ASPDAC’12

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10

ATGA Model Execution Example

© 2014 A. Gerstlauer 24

•Ready

•Idle

t0 •rTH,1 t6t5t4t3t2

•Ready

•Wait

•rTH,3•rTH,2

•Sleep

• Predictive •OS Mode:

•Wait

• Fall-back

•Ready

t7

•TL

•TM

•TH

•TIntr

•fTH,1

•Idle

•fTH,2

•Ready

•Idle

• Predictive

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10

Page 13: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 13

ATGA Results

• ATGA OS model

• Artificial periodictask sets

Vs. conventional modelat varying granularity

• Reference platform [OVP]

• MIPS-Malta

• Linux 2.6

Optimally navigate speed vs. accuracy tradeoff As fast as coarse grain (100s) As accurate as fine grain (1s) simulation

© 2014 A. Gerstlauer 25EE382V: Embedded Sys Dsgn and Modeling, Lecture 10

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 26

Operating System Layer

OS model

• On top of standard SLDL

• Wrap around SLDL primitives, replace event handling

– Block all but active task

– Select and dispatch tasks

• Target-independent, canonical API

– Task management

– Channel communication

– Timing and all events

Application

SLDL

OS Model

Task P2 Task P3

Page 14: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 14

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 27

Hardware Abstraction Layer (HAL)

• External communication

• Software Drivers– Presentation, session, network

communication layers

– Synchronization (interrupts)

• Hardware/software boundary– Low-level HW access

– Bus drivers and interrupt handlers

– Canonical HW/SW interface

• External interface– Bus transactions (TLM)

– Interrupt trigger

sample.send(v1);

void send(…) { intr.receive();bus.masterWrite(0xA000,

&tmp, len);

}

App

.D

river

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 28

Hardware Layer (1)

• Processor TLM

• HW interrupt handling– Interrupt logic

» Suspend user code

– Interrupt scheduling» Priority, nesting

• Peripherals– Interrupt controller

– Timers

• TLM bus model– Bus transactions

HAL: Hardware:

Page 15: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 15

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 29

Hardware Layer (2)

• Cache modeling• Pure behavioral

modeling– Tag state– Hits/misses– Replacement policy

• Integrated into back-annotation

– Called with accessedaddress trace

– Update cache state– Return delay

penalties

Implemented asSpecC channel

– < 200 lines of code

HWHALOSApp

TaskP2

C1

P1

TaskP3C2

OS Model

HWInt

IntA IntB IntC

UsrInt2UsrInt1

IntD

Bus TLM

INTAINTBINTCINTD

Cac

heM

odelAddresses

/ Delays

Source: A. Pedram, D. Craven, T. Amimeur, A. Gerstlauer. “Modeling Cache Effects at the Transaction Level," IESS 2009.

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 30

void f(void) {BB1: ...

os.wait(BB1_DELAY);if (c) goto BB2;

BB2: a[i][j] += sum;

...

os.wait(BB2_DELAY);BB3: ...

os.wait(BB3_DELAY);drv.write(res);

}

void main(void) {os.task_create(&f, “Task 1”, PRIO0);

}

Cache-Aware Back-Annotation

TLM

FrontendOptimizations

IntermediateIntermediatecode

Retargetable Backend

CW

PC

Binarycode

void f(void) {BB1: ...

os.wait(BB1_DELAY);if (c) goto BB3;

BB2: a[i][j] += sum;alist[__idx] =

A_BASE + 4*(i*A_WID+j);...miss = cache.upd(__alist, __idx);os.wait(BB2_DELAY + miss);

BB3: ...os.wait(BB3_DELAY);drv.write(res);

}

void main(void) {os.task_create(&f, “Task 1”, PRIO0);

}

OS

AP

ID

river

AP

IC

ache

mod

el

Micro-architecturedescription

Block-Level Characterization

• Hybrid timing model• Static + dynamic

– Runtime cache model

Addresslayout

Memoryaccesses

• Host-compiled functional model

Page 16: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 16

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 31

Hardware Layer (3)

• Bus-functional model (BFM)

• Pin-accurate processormodel

– Timing-accurate bus and interrupt protocols

• Bus model– Pin- and cycle-accurate

– Driving and sampling ofbus wires

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 32

Processor Models

OS

OS HA

L

HW

-TLM

HW

-BF

M

OS HA

L

HW

-TLM

HW

-BF

M

BF

M -

ISS

• Processor layers

• Application– Native, host-compiled C

– Back-annotation

• OS– OS model

– Middleware, drivers

• HAL– Firmware

• Processorhardware

– Bus interfaces

– Interrupts

– Cache

Source: G. Schirner, A. Gerstlauer, R. Doemer. “Fast and Accurate Processor Models for Efficient MPSoC Design," TODAES, 2009.

Page 17: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 17

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 33

Processor Model Example

• Voice encoding and decoding• Motorola DSP 56600

– Encoding & decoding tasks– custom OS

• 4 custom I/O blocks• 1 custom HW co-processor

– Codebook search

• Processor models• Perfect timing

– Back-annotated from ISS

• Priority-based OS model– EDF: Decoder > Encoder

• HW interrupt scheduling– 4 non-preempted priority levels

• Reference• Motorola proprietary ISS

Custom HWDSP 5660k

Encoder

Decoder

INTDINTCINTB

Codebook search

Cust. HWCust. HWCust. HW Cust. HW

Enc. Input

Enc. Output

Dec. Input

Dec. Output

DSP Port A

INTA

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 34

Processor Model Results

• Execute on Sun Fire V240(1.5 GHz)

• 163 speech frames

• Speed vs. accuracy

OS model (Appl Task)

Interrupts (FW TLM)

1800x speed w/ 3% error (vs. cycle-accurate ISS)

Page 18: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 18

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 35

Multi-Core Models

• Multi-core OS model• SMP scheduler model

– Global or partitioned queue

• Configurable parameters– Number of cores– FIFO, round-robin, priority-based

scheduling policies– Priorities, affinity, time slice

(for round-robin)

• Multi-core processor model• Multi-core interrupt handling

chain models– Interrupt handlers & tasks– Configurable generic interrupt controller (GIC) model

• TLM bus interfaces

Source: P. Razaghi, A. Gerstlauer. "Host-Compiled Multi-Core System Simulation for Early Real-Time Performance Evaluation," ACM TECS ‘14.

OS

Multi‐Core Scheduler

Dispatch

Global ReadyQueue

SLDL Simulation Kernel    

Intr.Handler

Application

HAL

TLM

I/ODrv

I/O IF

T1

CH

Intr.Handler

Intr. IF

T2

Intr.Task

Intr.Task

T3

Multi-Core OS Model

• Global or partitioned SMP scheduling

• Replicated or shared Ready, Idle, Sleep & Wait queues

• Processor suspension and interrupt handling

• Interrupt handlers as highest-priority OS-internal tasks

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 36

ISR

Interrupt task(bottom half)

Page 19: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 19

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 37

MA

C

TLM

Ada

pterD

rvD

rv

Multi-Core Processor Model

• Emulate the processor hardware/software interface• OS & hardware abstraction layers

– I/O drivers, interrupt suspension

• Hardware layer– TLM bus interface, interrupt routing logic & interrupt controller models

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 38

Dual-Core Processor Model Example

• Errors in preemption model due to discrete timing Integrate multi-core ATGA approach

Core 1

Core 0

time

Page 20: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 20

Multi-Core ATGA Model

• Enhanced fallbackmode check

• Only fall back when ext. event triggersinterrupt task with higher priority than current task

– Potential task switch

– Allow for delayedinterrupts otherwise

• Model inter-core interrupt notifications

• Adjust predicted times or switch to fallback

Accurate interrupt response times while maintaining speed

But: high-priority interrupt-driven tasks degrade performance

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 39

ATGA(Intr.H) ATGA(Intr.M)

ATGA(Intr.L)

ATGA(No.Intr)

10-2 10-1 100 10+110-2

10-1

100

10+1

10+2

Ave

rag

e E

rro

r [%

]

Simulation Time [Sec.]

Conventional (Intr.H)

Conventional (Intr.M)

Conventional (Intr.L)

Conventional (no Intr.)

10 ms

100 µs

1 µs

Multi-Core Cache Model

• Application model• Per core memory

access list– Address, mode, time stamp

• Cache interface• Hardware layer of

processor model

• Generic cache model• Emulate cache state

– Only tags, no values– Return hit & miss info

• Parameterizable– Cache size, line size, associativity,

replacement & write-back policy

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 40

Source: P. Razaghi, A. Gerstlauer. “Multi-Core Cache Modeling for Host-Compiled Performance Simulation," ESLSyn ‘13.

Page 21: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 21

Multi-Core Cache Simulation• Directly committing accesses in simulation order

Globally out-of-order in discrete timing model

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 41

Multi-Core Cache Simulation• Delayed reordering of aggregated requests

Multi-Core Out-of-Order Cache (MOOC) model

100% accurate results @ coarse-grain speedsEE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 42

•Safe-to-commit

•Safe-to-commit

Page 22: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 22

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 43

MPCSoC Platform Simulation

• Cellphone baseband MPCSoC

• Design space exploration: mapping & scheduling

Full-system simulation in close to real time

• 1400 MIPS at > 99% timing accuracy

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 44

MPCSoC Exploration Results

•Dual-Core•Core-attached Interrupt

•Single-Core •Dual-Core•Task-attached Interrupt

0.1%

1.0%

10.0%

100.0%

1000.0%

0ms

8ms

16ms

24ms

Avg

. F

ram

e E

rro

r

MP

3A

vg.

Fra

me

Del

ay

HCSim.TLM HCSim.TLM.no_IntrHCSim.TLM.no_Intr.error HCSim.TLM.error

0.1%

1.0%

10.0%

100.0%

0ms

10ms

20ms

30ms

Avg

. F

ram

e E

rro

r

JPE

GA

vg.

Fra

me

Del

ay

HCSim.TLM HCSim.TLM.no_IntrHCSim.TLM.error HCSim.TLM.no_Intr.error

Page 23: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 23

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 45

Lecture 9: Outline

Processor layers

Application

Task/OS

Firmware

Hardware

• Processor synthesis

• Software synthesis

• Hardware synthesis

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 46

Software Synthesis

Automatically generate target binaries from TLM Generate code for application (tasks and IPC) Synthesize firmware (drivers, interrupt handlers) OS wrappers and HAL implementations from DB Compile and link against target RTOS and libraries

ISS

MA

C

Dri

ver

Dri

ver

HALRTOS

App.

Source: G. Schirner, A. Gerstlauer, R. Doemer. “Automatic Generation of Hardware dependent Software for MPSoCs from Abstract System Specifications,” ASPDAC08

Page 24: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 24

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 47

Processor Implementation Models

• Software C model

• Generated application C code– Flat standard ANSI C code

• Firmware and hardware models– RTOS model, HAL model

– Low-level &hardware interrupt handling

– External bus communication protocol/TLM

• Software ISS model

• Reintegrared processor ISS– Bus-functional ISS wrapper

• Running generated binary– Application, RTOS, drivers, HAL

Bus Functional ModelHardware ShellCore ISS

ISS

nIRQnFIQ

ISS API (lib)

Bus Protocol

CPU_1.bin

HALInt.RTOSRAL

DriversSW Application

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 48

Lecture 9: Outline

Processor layers

Application

Task/OS

Firmware

Hardware

• Processor synthesis

Software synthesis

• Hardware synthesis

Page 25: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 25

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 49

Hardware Synthesis

• C-to-RTL high-level synthesis (HLS)

• Allocation, scheduling, binding

s3

s4

s5

t=y*i

d+=t

i++

s6 h=2*d

s1

s2

y=3*x

i=0

HW_FSMD

Behavioral RTL

HW_RTLController

Datapath

RegisterFile (RF)

Bus interface

FU

s3

s4

s5

s6

s1

s2

CLKCLK

b1b2

b3

Structural RTL

ctrl=10…10

Sch

edul

ing

Bin

ding

, net

lisin

g

……y = 3*x;i = 0;do {d += y * i;i++;

} while (i < 10);h = d + d;……

HW

BFM

Source: D. Shin, A. Gerstlauer, R. Doemer, D. Gajski. “An Interactive Design Environment for C-based High-level Synthesis of RTL Processors," TVLSI, 2008.

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 50

SCE Interactive RTL Synthesis

RTL Allocation

RTL Scheduling & Binding

Page 26: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 26

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 51

Modeling of Hardware in SoC Design

• RTL Modeling

• State modeling: Accellera RTL Semantics Standard– Style 1: unmapped

» a = b * c;

– Style 2: storage mapped» R1 = R1 * RF2[4];

– Style 3: function mapped» R1 = ALU1(MULT, R1, RF2[4]);

– Style 4: connection mapped» Bus1 = R1;

» Bus2 = RF2[4];

» Bus3 = ALU1(MULT, Bus1, Bus2);

– Style 5: exposed control» ALU_CTRL = 011001b;

» RF2_CTRL = 010b;

» …

http://www.eda.org/alc-cwg/cwg-open.pdf

Source: R. Doemer

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 52

SpecC RTL Modeling

behavior FSMD_Example(signal in bool CLK, // system clocksignal in bool RST, // system resetsignal in bit[31:0] Inport, // input portssignal in bit[1] Start,signal out bit[31:0] Outport, // output portssignal out bit[1] Done)

{ void main(void){fsmd(CLK) // clock + sensitivity

{bit[32] a, b, c, d, e; // local variables

{ Outport = 0; // defaultDone = 0b; // assignments}

if (RST) { goto S0; // reset actions}

S0 : { if (Start) goto S1;else goto S0;}

S1 : { a = b + c; // state actionsd = Inport * e; // (register transfers)Outport = a;goto S2;}

... }}

};

RTLModelingExample

Source: R. Doemer

Page 27: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 27

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 53

SpecC RTL Modeling

behavior FSMD_Example(signal in bool CLK, // system clocksignal in bool RST, // system resetsignal in bit[31:0] Inport, // input portssignal in bit[1] Start,signal out bit[31:0] Outport, // output portssignal out bit[1] Done)

{ void main(void){fsmd(CLK) // clock + sensitivity

{bit[32] a, b, c, d, e; // local variables

{ Outport = 0; // defaultDone = 0b; // assignments}

if (RST) { goto S0; // reset actions}

S0 : { if (Start) goto S1;else goto S0;}

S1 : { a = b + c; // state actionsd = Inport * e; // (register transfers)Outport = a;goto S2;}

... }}

};

S1 : { a = b + c; // Accellera style 1 d = Inport * e; // (unmapped)Outport = a;goto S2;}

bit[32] a, b, c, d, e; // unmapped variables

MappedRTLExample

Source: R. Doemer

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 54

SpecC RTL Modeling

behavior FSMD_Example(signal in bool CLK, // system clocksignal in bool RST, // system resetsignal in bit[31:0] Inport, // input portssignal in bit[1] Start,signal out bit[31:0] Outport, // output portssignal out bit[1] Done)

{ void main(void){fsmd(CLK) // clock + sensitivity

{bit[32] a, b, c, d, e; // local variables

{ Outport = 0; // defaultDone = 0b; // assignments}

if (RST) { goto S0; // reset actions}

S0 : { if (Start) goto S1;else goto S0;}

S1 : { a = b + c; // state actionsd = Inport * e; // (register transfers)Outport = a;goto S2;}

... }}

};

S1 : { RF[0]=RF[1]+RF[2]; // Accellera style 2 RF[3]=Inport*RF[4];// (storage mapped)Outport = RF[0];goto S2;}

buffered[CLK] bit[32] RF[4]; // register file

MappedRTLExample

Source: R. Doemer

Page 28: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 28

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 55

SpecC RTL Modeling

behavior FSMD_Example(signal in bool CLK, // system clocksignal in bool RST, // system resetsignal in bit[31:0] Inport, // input portssignal in bit[1] Start,signal out bit[31:0] Outport, // output portssignal out bit[1] Done)

{ void main(void){fsmd(CLK) // clock + sensitivity

{bit[32] a, b, c, d, e; // local variables

{ Outport = 0; // defaultDone = 0b; // assignments}

if (RST) { goto S0; // reset actions}

S0 : { if (Start) goto S1;else goto S0;}

S1 : { a = b + c; // state actionsd = Inport * e; // (register transfers)Outport = a;goto S2;}

... }}

};

S1 : { RF[0] = // Accellera style 3 ADD0(RF[1],RF[2]);// (function mapped)RF[3] =MUL0(Inport,RF[4]);Outport = RF[0];goto S2;}

buffered[CLK] bit[32] RF[4]; // register file

MappedRTLExample

Source: R. Doemer

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 56

SpecC RTL Modeling

behavior FSMD_Example(signal in bool CLK, // system clocksignal in bool RST, // system resetsignal in bit[31:0] Inport, // input portssignal in bit[1] Start,signal out bit[31:0] Outport, // output portssignal out bit[1] Done)

{ void main(void){fsmd(CLK) // clock + sensitivity

{bit[32] a, b, c, d, e; // local variables

{ Outport = 0; // defaultDone = 0b; // assignments}

if (RST) { goto S0; // reset actions}

S0 : { if (Start) goto S1;else goto S0;}

S1 : { a = b + c; // state actionsd = Inport * e; // (register transfers)Outport = a;goto S2;}

... }}

};

S1 : { BUS0 = RF[1]; // Accellera style 4 BUS1 = RF[2]; // (connection mapped)BUS3 = ADD0(BUS0,BUS1);RF[0]= BUS3;...goto S2;}

buffered[CLK] bit[32] RF[4]; // register file bit[32] BUS0, BUS1, BUS2; // busses

MappedRTLExample

Source: R. Doemer

Page 29: EE382V: Embedded System Design and Modelingusers.ece.utexas.edu/~gerstl/ee382v_s14/notes/lecture10.pdf• Structural RTL models Sub-cycle accurate HW Controller State Next state logic

EE382V: Embedded Sys Dsgn and Modeling

Lecture 10

© 2014 A. Gerstlauer 29

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 57

SpecC RTL Modeling

behavior FSMD_Example(signal in bool CLK, // system clocksignal in bool RST, // system resetsignal in bit[31:0] Inport, // input portssignal in bit[1] Start,signal out bit[31:0] Outport, // output portssignal out bit[1] Done)

{ void main(void){fsmd(CLK) // clock + sensitivity

{bit[32] a, b, c, d, e; // local variables

{ Outport = 0; // defaultDone = 0b; // assignments}

if (RST) { goto S0; // reset actions}

S0 : { if (Start) goto S1;else goto S0;}

S1 : { a = b + c; // state actionsd = Inport * e; // (register transfers)Outport = a;goto S2;}

... }}

};

S1 : { RF_CTRL = 011000b; // Accellera style 5 ADD0_CTRL = 01b; // (exposed control)MUL0_CTRL = 11b;...

goto S2;}

signal bit[5:0] RF_CTRL; // control wires signal bit[1:0] ADD0_CTRL, MUL0_CTRL;

MappedRTLExample

Source: R. Doemer

EE382V: Embedded Sys Dsgn and Modeling, Lecture 10 © 2014 A. Gerstlauer 58

Lecture 10: Summary

• Host-compiled computation modeling

• Model of software running in execution environment– Timed application, OS, bus drivers, interrupt handlers

– Processor hardware model, suspension, bus interfaces

Virtual platform prototype Embedded software development and validation

Viable complement to ISS-based validation

• Backend processor synthesis

• Software synthesis– Code generation, RTOS targeting, cross-compilation & linking

– Fully automatic final target binary generation

• Hardware synthesis– High-level/behavioral synthesis: allocation, scheduling, binding

– Interactive C-to-RTL synthesis flow