Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters
Procedure-level Vulnerability (PLV)
Sources of Variability
* PLV to expose fast dynamic voltage variation and its effects to the compiler for use in runtime migration * At compile time, we quantify the effect of full operating conditions on the dynamic voltage variation for every procedure.
Abbas Rahimi†, Luca Benini‡ and Rajesh Gupta† †UC San Diego and ‡Università di Bologna
v v
10% VCC, 160˚∆C Temperature, 40% VTH
Open-source
Leon3
Design Compiler
IC Compiler
PrimeTime PX
ModelSim
Vsim
VHDL Timing
constraints
Verilog
net-list
Verilog
net-list
Parasitics
Switching
activity
ProcX
Power @(Vi,Tj)
Dynamic
Voltage
droop/rise
@(Vi,Tj)
Object
code
PLV
characterized
metadata
For ProcX@Caller :
Read current (V,T) sensors of Corei
Read characterized metadata for ProcX
If PLVX > PLV_threshold
Invoke Procedure Hopping (ProcX@Callee)
VA-Proc generation: ProcX/ProcX@Caller/
ProcX@Callee
Generating
metadata
Operating
condition
(V,T) monitor
Design time Compile time
Run time
Leo
n-3
:
Co
rei
(0.81V,
-40˚C)
(0.90V,
25˚C)
(0.99V,
125˚C)
TSMC
45nm LIBs
Prim
eR
ail
SD
F
(0.81V,
125˚C)
Source
code
(V,T)
Ex
ec
uta
ble
s
BCC Compiler
VA-Procedures’
source code
PLV characterization flow: Design time/Compile time/ Runtime
NSF Expedi t ion in Comput ing, Var iab i l i ty -Aware Sof tware for Eff ic ient Comput ing wi th Nanoscale Devices h t tp : / /var iab i l i ty.org
Across-wafer Frequency
VCC DroopTemperature
Clock
actual circuit delayguardband
Other
uncertainty
Variation-tolerant Shared-L1 Cluster
Variation-aware VDD-hopping to mitigate process variation
... I$Bi-1I$B0
Log. Interc.
Core15
VA
-VD
D-h
op
pin
g
... TCDMBj-1TCDMB0
Log. Interc.
Low VDD
Typical VDD
High VDD
DF
S...
f+1
80
°f+
18
0°
f
CPM
Level ShiftersLevel Shifters
Level ShiftersLevel Shifters
SHM
PSS
Core0
VA
-VD
D-h
op
pin
g
CPM
PSS
Procedure hopping to mitigate dynamic voltage variation
... I$Bi-1I$B0
Log. Interc.
Core15
VA
-VD
D-h
op
pin
g
... TCDMBj-1TCDMB0
Log. Interc.
Low VDD
Typical VDD
High VDD
DF
S...
f+1
80
°f+
18
0°
f
CPM
Level ShiftersLevel Shifters
Level ShiftersLevel Shifters
SHM
PSS
Core0
VA
-VD
D-h
op
pin
g
CPM
PSS
Each core increases voltage if its delay is high
Each procedure hops from one
core to another if it causes voltage
variation
VDD-hopping
Three cores (f4, f8, f9) cannot meet target frequency of
830MHz.
All cores of the same cluster meet target frequency
of 830MHz.
VA-VDD-hopping tunes cores' voltage based on their delay
reported by CPMs
Intra-cluster Procedure Hopping
(Vol., Temp.) 0.99V, 125°C 0.90V, 25°C 0.81V, 125°C 0.81V, -40°C
Power density 0.66 μW/μm2 0.21 μW/μm2 0.18 μW/μm2 0.16 μW/μm2
Max IR-drop 44 mV < 35 mV < 31 mV < 31 mV4444 44 44
Inter-corner voltage droop of FIR procedure: FIR does not face any voltage emergency (< 4%) at the corners with voltages of 0.81V−0.9V due to their
lower power densities.
VDD = 0.81V
f0
862
f1
909
f2
870
f3
847
f4
826
f5
855
f6
877
f7
893
f8
820
f9
826
f10
909
f11
847
f12
901
f13
917
f14
847
f15
901
VDD = 0.99V
f0
1408
f1
1389
f2
1408
f3
1370
f4
1370
f5
1408
f6
1408
f7
1408
f8
1370
f9
1370
f10
1389
f11
1370
f12
1408
f13
1408
f14
1389
f15
1389
VA-VDD-Hopping=( 0.81V 0.99V, )
f0
862
f1
909
f2
870
f3
847
f4
1370
f5
855
f6
877
f7
893
f8
1370
f9
1370
f10
909
f11
847
f12
901
f13
917
f14
847
f15
901
0
1
2
3
4
5
6
7
8
Max v
olt
ag
e d
roo
p (
%)
(0.81V, -40°C) (0.81V, 125°C) (0.90V, 25°C) (0.99V, 125°C)
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Peak p
ow
er
(W)
(0.81V, -40°C) (0.81V, 125°C) (0.90V, 25°C) (0.99V, 125°C)
3.5× inter-corner peak power variation, and 1.28× intra-corner peak power variation
* At (0.99V,125°C), all procedures except tblook will
face voltage droop/rise > 4% of VDD
* At (0.90V, 25°C) only four procedures (IFFT, IDCT,
matrix, ttsprk) face the voltage emergencies.
* All procedures running at cores with 0.81V have the
maximum voltage droop/rise < 4% of VDD
* A low-cost runtime procedure hopping facilitates migration of procedures within the processor cluster, utilizing compile time characterization (captured as metadata) of PLV. * This is accomplished through the advantage of shared-L1 I$ and TCDM that eliminates the penalty of filling a private storage.
…ProcX@Callee:if (calculate_PLV ≤ PLV_threshold)
set_statusX_PHIT = runningload_contex¶m_from_SSPX
set_all_param&pointerscall ProcX
store_contex_to_SSPX
set_statusX_PHIT = donesend_broadcast_ack
else resume_normal_execution
…
Broadcast_req_ISR:ProcX@Callee = search_in_PHIT
call ProcX@Callee
…call ProcX //conventional compile Call ProcX@Caller //VA-compile
…ProcX@Caller:
If (calculate_PLV ≤ PLV_threshold)call ProcX
else
create_shared_stack_layoutset_PHIT_for_ProcX
send_broadcast_reqset_timerwait_on_ack_or_timer
…Broadcast_ack_ISR:
if (statusX_PHIT == done)load_context&return_from_SSPX
Shared
Local
Heap
Shared
Stack
ProcXProcX
@Callee
PHIT
Op
era
ting
Co
n. M
on
it.In
terru
pt C
ont.
Op
era
tin
g C
on
. Mo
nit.
Inte
rrup
t C
ont.
TCDM
Sh
are
d
L1 -I$
Callee Corek Caller Corei
ProcX
@Caller……
…
Stacks