Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
NC STATE UNIVERSITY
Center for Embedded Systems Research (CESR)Electrical & Computer Engineering
North Carolina State University
Ali El-Haj-Mahmoud and Eric Rotenberg
Safely Exploiting Multithreaded Processors to Tolerate Memory Latency
in Real-Time Systems
CASES 2004 2El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Embedded Processor Trends
• More demanding applications and user expectations• Higher frequency
– ARM-11 (0.13µ): 500 MHz– ARM-11 (0.10µ): 1 GHz
• Processor-memory speed gap
CASES 2004 3El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Memory Wall
1000
10
10000
100000
100
Perf
orm
ance
20051980
processor
memory
Years
How to capitalize on higher frequency?
CASES 2004 4El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Multithreading
• Switch-on-event coarse-grain multithreading– Multiple register contexts for fast switching– Switch to alternate task when current task accesses
memory– Overlap memory accesses with computation
• But what about multithreading in hard-real-time?– Require analyzability– Cannot rely on dynamic schemes
CASES 2004 5El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Exploiting Multithreading in Hard-Real-Time Systems
• Safety– Guarantee all tasks meet deadlines (worst-case)– Statically bound overlap under all scenarios
• Tractability– Confirm/disconfirm schedulability mathematically
using closed-form tests– Consider each task individually
CASES 2004 6El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Classic Real-Time Scheduling
25.082===
A
AA period
WCETU
• Utilization-based schedulability test• No need to construct the schedule a priori
Example: Earliest Deadline First (EDF)
time
Task A
Task B
EDF
B1 B2 B3 B4
A1 A2
periodA
periodB
release A1 deadline A1release A2
release B1 deadline B1release B2
B1 A1 B2 B3 B4A2
EDF schedulability test
1≤= ∑i i
i
periodWCETU
75.043===
B
BB period
WCETU
CASES 2004 7El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Performance vs. Tractability
• Performance: Memory overlap • Tractability: Closed-form schedulability test
• Only basic parameters of tasks are known – WCET = C + M (from conventional WCET analysis)– Period = deadline
A
B
MM M
CASES 2004 8El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
EDF (Classic Real-Time)Low Performance but Tractable
1>+=B
B
A
A
periodWCET
periodWCETU
• Memory overlap: No
• Closed-form test: Yes
A
B
MM M
M M M
DEADLINE MISSED
CASES 2004 9El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
A
B
MM M
M M M
EDF (Classic Real-Time)Low Performance but Tractable
1>+=B
B
A
A
periodWCET
periodWCETU
• Memory overlap: No
• Closed-form test: Yes
DEADLINE MISSED
CASES 2004 10El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
A
B
MM M
M M M
EDF (Classic Real-Time)Low Performance but Tractable
1>+=B
B
A
A
periodWCET
periodWCETU
• Memory overlap: No
• Closed-form test: Yes
DEADLINE MISSED
CASES 2004 11El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
A
B
MM M
M M M
EDF (Classic Real-Time)Low Performance but Tractable
1>+=B
B
A
A
periodWCET
periodWCETU
• Memory overlap: No
• Closed-form test: Yes
DEADLINE MISSED
CASES 2004 12El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
A
B
M M
M M
M
M
Dynamic Switch on Memory AccessPossibly High Performance but Intractable
• Memory overlap: Possible, unfair for low priority• Closed-form test: No
– Must examine memory positioning!
CASES 2004 13El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
A
B
M M
M M
M
M
Dynamic Switch on Memory AccessPossibly High Performance but Intractable
• Memory overlap: Possible, unfair for low priority• Closed-form test: No
– Must examine memory positioning!
CASES 2004 14El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
A
B
M M
M M
M
M
Dynamic Switch on Memory AccessPossibly High Performance but Intractable
• Memory overlap: Possible, unfair for low priority• Closed-form test: No
– Must examine memory positioning!
CASES 2004 15El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Dynamic Switch on Memory AccessPossibly High Performance but Intractable
• Memory overlap: Possible, unfair for low priority• Closed-form test: No
– Must examine memory positioning!
A
B
M M
M M
M
M
CASES 2004 16El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
A
B
M M
M M
M
M
Dynamic Switch on Memory AccessPossibly High Performance but Intractable
• Memory overlap: Possible, unfair for low priority• Closed-form test: No
– Must examine memory positioning!
DEADLINE MISSED
CASES 2004 17El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
A
B
M M
M M
M
M
Dynamic Switch on Memory AccessPossibly High Performance but Intractable
• Memory overlap: Possible, unfair for low priority• Closed-form test: No
– Must examine memory positioning!
DEADLINE MISSED
CASES 2004 18El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
A
B
M
MM
M
M
M
Deterministic SwitchingHigh Performance and Tractable
• Memory overlap: Yes (fair and bounded)• Closed-form test: Yes
CASES 2004 19El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
A
B
M
MM
M
M
M
Deterministic SwitchingHigh Performance and Tractable
• Memory overlap: Yes (fair and bounded)• Closed-form test: Yes
CASES 2004 20El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Deterministic SwitchingHigh Performance and Tractable
• Memory overlap: Yes (fair and bounded)• Closed-form test: Yes
A
B
M
MM
M
M
M
CASES 2004 21El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Deterministic SwitchingHigh Performance and Tractable
• Memory overlap: Yes (fair and bounded)• Closed-form test: Yes
A
B
M
MM
M
M
M
CASES 2004 22El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
A
B
M
MM
M
M
M
Deterministic SwitchingHigh Performance and Tractable
• Memory overlap: Yes (fair and bounded)• Closed-form test: Yes
CASES 2004 23El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Tractability through Deterministic Switching
• Fully decouple independent tasks by forcing periodic switches– Every task gets a chance to initiate/overlap
memory accesses– No scheduling dependences– No specificity regarding memory positioning
CASES 2004 24El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Weighted-Round-Robin (WRR)
Pipeline
Virt. Proc. 1Virt. Proc. 2Virt. Proc. 3Virt. Proc. 4
round = memory latency
T1 T2
T3
T4time
T1T2
T3 T1 T2 T4T3 T1 T2 T3T4
T3
T1T2
T4T3
T1T2
forced pre-emptiondilate WCET
T4T4 T4memory transfer operation
Round i Round i+1 Round i+2
duty cycle
CASES 2004 25El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Analytical Framework for WRR
d: duty cycle 0 < d ≤ 1 P: period = deadlineWCET: Worst-Case Execution Time
WCET = C + MC: aggregate computation timeM: aggregate memory time
WCET’: dilated Worst-Case Execution TimeWCET’ = (C/d) + M
CASES 2004 26El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Schedulability Test1. Dilated task meets deadline on its virtual processor
CASES 2004 27El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Schedulability Test
PWCET ≤'
1. Dilated task meets deadline on its virtual processor
PMdC
≤+
CASES 2004 28El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Schedulability Test
PWCET ≤'
1. Dilated task meets deadline on its virtual processor
PMdC
≤+
MPCd−
≥
CASES 2004 29El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Schedulability Test
( )( )PM
PCMP
Cd−
=−
=1
PWCET ≤'
1. Dilated task meets deadline on its virtual processor
PMdC
≤+
MPCd−
≥
CASES 2004 30El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Schedulability Test
2. Sum of all duty cycles less than or equal to 1
( )( )PM
PCMP
Cd−
=−
=1
PWCET ≤'
1. Dilated task meets deadline on its virtual processor
PMdC
≤+
MPCd−
≥
CASES 2004 31El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Schedulability Test
2. Sum of all duty cycles less than or equal to 1
( )( )PM
PCMP
Cd−
=−
=1
PWCET ≤'
1. Dilated task meets deadline on its virtual processor
PMdC
≤+
MPCd−
≥
∑=
≤n
iid
11
CASES 2004 32El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Schedulability Test
( )( ) 1
11≤
−∑=
n
i ii
ii
PMPC
2. Sum of all duty cycles less than or equal to 1
( )( )PM
PCMP
Cd−
=−
=1
PWCET ≤'
1. Dilated task meets deadline on its virtual processor
PMdC
≤+
MPCd−
≥
∑=
≤n
iid
11
CASES 2004 33El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Generalized Analytical Framework
12
22
1
11
≤⎟⎟⎠
⎞⎜⎜⎝
⎛+
++⎟⎟⎠
⎞⎜⎜⎝
⎛+
+⎟⎟⎠
⎞⎜⎜⎝
⎛+
t
tvp
t
vpvp
P
MdC
P
MdC
P
MdC
L
( )
( )∑
∑
=
=
−= t
jjj
t
jjj
vp
PM
PCd
1
1
1
Multiple tasks per VP:
CASES 2004 34El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Modeling Bus and Memory System
• Addressed in detail in paper• Analysis accounts for:
– Worst-case task serialization on memory bus– DRAM bank conflicts– Multiple VPs sharing single DRAM bank
CASES 2004 35El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
0
0.2
0.4
0.6
0.8
1
1.2
1GHz 2GHz 1GHz 2GHz 1GHz 2GHz
TASK-SETS8 tasks (2 tasks/VP for WRR)
wor
st-c
ase
utili
zatio
nEDF WRR EDF (no memory)
LOW MED HIGH
Results
CASES 2004 36El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
0
0.2
0.4
0.6
0.8
1
1.2
1GHz 2GHz 1GHz 2GHz 1GHz 2GHz
TASK-SETS8 tasks (2 tasks/VP for WRR)
wor
st-c
ase
utili
zatio
nEDF WRR EDF (no memory)
LOW MED HIGH
Results
memcomponent
CASES 2004 37El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
0
0.2
0.4
0.6
0.8
1
1.2
1GHz 2GHz 1GHz 2GHz 1GHz 2GHz
TASK-SETS8 tasks (2 tasks/VP for WRR)
wor
st-c
ase
utili
zatio
nEDF WRR EDF (no memory)
LOW MED HIGH
Results
memcomponent
CASES 2004 38El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
0
0.2
0.4
0.6
0.8
1
1.2
1GHz 2GHz 1GHz 2GHz 1GHz 2GHz
TASK-SETS8 tasks (2 tasks/VP for WRR)
wor
st-c
ase
utili
zatio
nEDF WRR EDF (no memory)
LOW MED HIGH
Results
50%
28%
memcomponent
CASES 2004 39El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Novel
YesYesOur real-time multithreading framework
NoYesClassic multithreading
YesNoClassic real-time
Formalism(safety & tractability)
Memory overlap
CASES 2004 40El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Useful
• Fully capitalize on high-frequency embedded microprocessors
• Exceed schedulability limit of conventional real-time theory for uniprocessors by analytically bounding WCET overlap
CASES 2004 41El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Deployable• Software-only solution
– Use Ubicom IP3023 8-thread embedded microprocessor
– Analytical framework + scheduling policy
16
Local Register Files (8 Banks)
GPR: 16 x 32 Bits Addr: 8 x 32 Bits
Accum: 1 x 48 Bits Source-3: 1 x 32 Bits Interrupt Mask Reg. Control/Status Reg.
Global RegisterThread Control Interrupt Status Debug Control
32-Bit CPU Core
Program Memory256K SRAM
Data Memory 64K SRAM
SDRAM Memory
Controller
Parallel I/O
Serializer Deserializer
(serdes)
Serializer Deserializer
(serdes)
SPI Debug
CPU PLL
PeripheralPLL
Watchdog Timer
Reset & Brown-Out
Random # Generator
Data Memory (0-8MB)
External Flash
Memory (0-4MB)
8
10Base-T USB GPSI
UARTs
MII GPIO
CASES 2004 42El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Summary
Safely expose multithreading to hard-real-time schedulability analysisBound computation / memory overlap
Offline closed-form schedulability testSafeTractable
Scale “Memory Wall” in embedded systemsExpose full benefits of high-frequency embedded processors
CASES 2004 43El-Haj-Mahmoud © 2004
NC STATE UNIVERSITY
Questions?