Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Mapping Flow for Car-Entertainment Applications on Embedded Multiprocessor Systems
Marco Bekooij and Maarten WiggersName? and address?Name? and address?
2
Outline
Multi-stream car-entertainment system
Objectives & Approach
Bounding interference with budget schedulers– Unsafe worst-case execution time estimates of tasks
Expressivity analysis model– Data dependent execution rates of tasks
Mapping flow– Limitations
FPGA demonstrator
Summary
3
Car entertainment application domain
4
Multi-stream car-entertainment system
5
Application model
use-case
input data stream
task
use-case
SRT video job
taskinput data
stream output streamto display
FRT audio job
task
task output streamto speakers
task
Jobs are composed of tasks
Simultaneously running jobs together form use-cases
Jobs often have real-time requirements– Firm (FRT) if deadline misses are highly undesirable (steep quality
degradation)– Soft (SRT) if occasional deadline misses are tolerable
6
Car entertainment use-case
use-case
use-case
ADC
FRT digital radio job A
PDC CFE
f1VIT CBE SRC APP DAC
f2
FRT source decoding job B
SRC APP DAC
f3MP3BR
Observations:– Reactive system because stream from transmitter cannot be slowed down– Firm real-time jobs because deadline misses are highly undesirable but
not catastrophic– Both streams are equally important
7
Our objectives:
► Enable independent development of jobs
► Maintain robustness despite increased resource sharing
► Reduce design and verification effort
8
Strategy: Divide and conquer
use-case
ADC
real-time digital radio job A
PDC CFE
f1VIT CBE SRC APP DAC
f2
real-time source decoding job B
SRC APP DAC
f3MP3BR
Bound interference other jobs (guaranteed by measures in hardware)– Each job can be designed and characterized independently of other jobs– Erroneous behavior of a job has bounded effect on behavior other jobs
Objective 1&2
9
Strategy: Abstraction with guarantees (conservative arrival time data)
v0 v1 v2Fast (>100Mcc/s) simulation ofcommunicatingprocesses with time (CP-T)
Instead of:
DSP
mem Arb
NI
I/OExternalSDRAM
CActrl
µP
Network
Low-level simulation(e.g. PV-T)
NI NI NI
$
Objective 3
10
Strategy: Synthesize settings(prevent iterative design space exploration)
Dataflow synthesis
(cyclic) task graph of a jobWCRTs of the tasks
multiprocessorinstance
throughput and latency constraint
scheduler settings andcommunication buffer capacities
Objective 3
11
Multiprocessor system under development
Æthereal network-on-chip (NoC)
• Instructions and data can be stored in external SDRAM • lower cost per bit than on-chip SRAM (Digital Radio Mondial channel decoder requires 940 kByte storage)
• Caches hide SDRAM memory access latency• large difference between WCET and average execution time
Proc
Arb
NI
ExternalSDRAM
NI
Proc
NI
$
memctrl
LL LT
[M. Bekooij et. al.: Bits & Chips 2007]
12
Execution times of tasks
Why use optimistic-estimated WCETs– Distance ∆ can be large due to caches and external SDRAM– Tight conservative-estimated WCET requires knowledge input data
Our approach– Guarantee no deadline misses for set of input stimuli– Fall-back mechanism to cope with deadline misses – Worst-case temporal behavior of other jobs should not be affected
dist
ribut
ion
of ti
mes
execution time
WCET
optimisticestimate
conservativeestimate
all execution timeobserved execution times
∆
13
Response time
Execution Time (ET)
ET
start finish
Response Time (RT)
RT
enabledstarted
finished
14
Scheduler/Arbiter characteristicse.g. memory port arbiter, task scheduler
Budget Scheduler (e.g. TDM)
WCRT independent of
WCET of tasks other jobs(unsafe WCET!)
Data arrival/traffic characterization
Budgetscheduler
All schedulers
15
Variable consumption rate
BR MP3 SRC DAC
44.1 kHz
480 441576
3 n=[1..100] 1
Digital to analog converter (DAC) determines throughput constraint
MP3 decoder task consumes variable amount of data
Block-reader (BR) task must “know” consumption speed MP3 task– Implies cyclic dependency that affects the temporal behavior!– Block reader task does not execute periodically
[M. Wiggers et.al.,DATE 2008]
16
Data dependent execution rate
BR DQ20481VLD
m n 1 IDCT1 MCn1
1[n] 1[n]
1 DAC1
Task graph of an H263 video decoder• Number of execution of the DQ and IDCT task is input data dependent
[M. Wiggers et.al.,RTAS 2008]
17
Expressivity of the model
Deadlockfree
applications
Set of all applications
Models for data dependent behaviorshould be provable deadlock free
18
Tool flow Heracles/Minos (online/offline)
cluster/static order scheduling
Processor/buffer assignment
buffer capacities
task graph+WCETs(CSDF)
multiprocessorinstance
throughput/latencyconstraint
data routing in NoC
scheduler settings
offline
onlineMinos
Heracles
Unsolved issues: phase coupling, maximization online mapping options?
System configuration
19
FPGA demonstrator
Æthereal (NoC)
MEM
DTLt DTLi DTLs
VLIW-2 DTLi
MEM
DTLt DTLs
DTLt DTLi DTLs
VLIW-4 DTLi
MEM
DTLt DTLs
DTLt DTLi DTLs
DTLt
DTLi
Audio I/ODTLs
DTLs
Frame bufferDTLt
DTLi
DTLtHost PC
• Only one network in system• for control, data, debug
• 2x2 mesh router network• guaranteed throughput
• SiHive VLIW core• Ansi-C compiler
• PC used as control processor• High level re-configuration API
• setup & tear-down connections • starting & stopping tasks
VLIW-1 DTLi
ME
M
DTLt DTLs
Differentiators:• Predictable = upper-bound arrival data can be computed at design time
• budget scheduler for every shared resource
20
Summary
Car entertainment application characteristics– Multi job– Tasks with data dependent execution rates– Large memory footprint of digital radio applications
Architecture characteristics– Multiprocessor– Scratch-pad memories as well as caches together with an external SDRAM
Approach– Bound interference other tasks by making use of budget schedulers
Tool flow– Split in an offline and an online part
Challenges– Phase coupling and maximization of online mapping options – Expressive enough analysis model to determine throughput– …. and many more.
21
Questions?
22
ReferencesB. Akesson, K. Goossens, and M. Ringhofer. Predator: A Predictable SDRAM Memory Controller. In Proc. Int’l Conference on Hardware-Software Codesign and System Synthesis (CODES+ISSS), 2007.
M. Bekooij, A. Moonen, and J. van Meerbergen. Predictable and Composable Multiprocessor System Design: A Constructive Approach, In Proc. Bits&Chips Symposium on Embedded Systems and Software, October 2007, Eindhoven, The Netherlands.
M. Bekooij, M.Wiggers, J. van Meerbergen, Efficient Buffer Capacity and Scheduler Setting Computation for Soft Real-Time Stream Processing Applications, In Proc. Int’l Workshop on Software and Compilers for Embedded Systems (SCOPES), April 2007.A. Kumar, A. Hansson, J. Huisken and H. Corporaal. An FPGA Design Flow for Reconfigurable Network-Based Multi-Processor Systems on Chip. In Proc. Design, Automation and Test in Europe Conference and Exhibition (DATE), April 2007.
A. Moonen et.al. A Multi-Core Architecture for In-Car Digital Entertainment, GSPX publication 24-27 October 2005: In Proc. Int’l Conference on Global Signal Processing (GSPx), October 2005.
O. Moreira and M. Bekooij. Self-Timed Scheduling Analysis for Real-Time Applications, In EURASIP Journal on Advances in Signal Processing, 2007
O. Moreira and M. Bekooij. Scheduling Multiple Independent Hard Real-Time Jobs on a Heterogeneous Multiprocessor, In Proc. Int’l Conference on Embedded Software (EMSOFT), September 2007
23
ReferencesS. Sriram and S. S. Bhattacharyya. Embedded Multiprocessors: Scheduling and Synchronization. Marcel Dekker Inc., 2000D. Stiliadis and A. Varma. Latency-Rate Servers: A General Model for Analysis of Traffic Scheduling Algorithms. In IEEE/ACM Transactions on Networking, 6(5):611–624, October 1998.
S. Stuijk, T. Basten, M.C.W. Geilen and H. Corporaal. Multiprocessor Resource Allocation for Throughput-Constrained Synchronous Dataflow Graphs, In Proc. Design Automation Conference (DAC), June 2007.
M. Wiggers, M. Bekooij, and G. Smit. Modelling Run-Time Arbitration by Latency-Rate Servers in Data Flow Graphs. In Proc. Int’l Workshop on Software and Compilers for Embedded Systems (SCOPES), April 2007.
M. Wiggers, M. Bekooij, P. Jansen, and G. Smit. Efficient Computation of Buffer Capacities for Cyclo-Static Real-Time Systems with Back-Pressure. In Proc. IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), April 2007.
M. Wiggers, M. Bekooij, and G. Smit. Computation of buffer capacities for thoughput constrained and data dependent inter-task communication. In Proc. In Proc. Design, Automation and Test in Europe Conference and Exhibition (DATE), April 2008
M. Wiggers, M. Bekooij, and G. Smit. Buffer Capacity Computation for Throughput Constrainted Streaming Applications with Data-Dependent Inter-Task Communication. In Proc. IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), April 2008.