View
217
Download
0
Category
Tags:
Preview:
Citation preview
http://variability.org http://mesl.ucsd.edu
Procedure Hopping: a Low Overhead Solution to Mitigate Variability in
Shared-L1 Processor ClustersAbbas Rahimi‡, Luca Benini†, and Rajesh Gupta‡
‡CSE, UC San Diego†DEIS, Università di Bologna
International Symposium on Low-Power Electronics and Design
http:// micrel.deis.unibo.it
Procedure Hopping to Mitigate Variability
2
Main Point
3
Across-wafer Frequency
VCC DroopTemperature
Clock
actual circuit delayguardband
Other uncertainty
Sources of Device Variation
10% VCC, ~160˚C Temperature, 40% VTH Variations are more challenging in a many-core platform!
• Sources of Variations
• Variation-tolerant Shared-L1 Processor Cluster
1. Process Variation → Variation-aware VDD-hopping
2. Dynamic Voltage Variation → Procedure hopping
• Methodology for PLV
– Design time characterization
– Compile time PLV metadata generation
– Runtime preventive compensation
• Experimental Results
4
Outline
Each cluster consists of:• 16 LEON-3 cores• An intra-cluster shared-L1I$ • An on-chip multi-banked tightly
coupled data memory (TCDM)• Two single-cycle logarithmic
interconnections for both instruction and data sides
• A hardware synchronization handler module (SHM) to coordinate and synchronize cores for accessing shared data on TCDM.
• VDD-hopping per core.
5
Shared-L1 TCDM cluster template
4x8 cluster: 4 PEs and an 8-bank TCDM
Shared-L1 Processor Clusters *
* D. Melpignano, L. Benini, et al., “Platform 2012, a many-core computing accelerator for embedded SoCs: performance evaluation of visual analytics applications”, DAC’12
Three cores (f4, f8, f9) cannot meet the target frequency of 830MHz.
6
VDD = 0.81V VDD = 0.99V VA-VDD-Hopping=( 0.81V 0.99V, )
f0
862
f1
909
f2
870
f3
847
f4
826
f5
855
f6
877
f7
893
f8
820
f9
826
f10
909
f11
847
f12
901
f13
917
f14
847
f15
901
f0
862
f1
909
f2
870
f3
847
f4
1370
f5
855
f6
877
f7
893
f8
1370
f9
1370
f10
909
f11
847
f12
901
f13
917
f14
847
f15
901
f0
1408 f1
1389 f2
1408 f3
1370 f4
1370 f5
1408 f6
1408 f7
1408 f8
1370 f9
1370 f10
1389 f11
1370 f12
1408 f13
1408 f14
1389 f15
1389
VDD–hopping to Compensate Process Variation
All cores of the same cluster meet the target frequency of 830MHz.
VA-VDD-hopping can accordingly tune the cores' voltage based on their delay reported by CPMs.
VDD–hopping to Compensate Process Variation
7
... I$Bi-1I$B0
Log. Interc.
Core15
VA
-VD
D-h
oppi
ng
... TCDMBj-1TCDMB0
Log. Interc.
Low VDD
Typical VDD
High VDD
DF
S...
f+18
0°f+
180°
f
CPM
Level ShiftersLevel Shifters
Level ShiftersLevel Shifters
SHM
PSS
Core0
VA
-VD
D-h
oppi
ng
CPM
PSS
• Every core have its own voltage domain• All cores work with the same frequency • VDD-hopping tunes the voltage of each core based on CMP.
Each core increases voltage if its delay is high.
The process variation is compensated
but, cluster will have various Voltage/Temperature-islands!
f0
862
f1
909
f2
870
f3
847
f4
1370
f5
855
f6
877
f7
893
f8
1370
f9
1370
f10
909
f11
847
f12
901
f13
917
f14
847
f15
901
• The IR-drop of execution of FIR on cores with various operating corners.
• FIR does not face any voltage emergency (IR-drop < 4%) at the corners with voltages of 0.81V-0.9V due to their lower power densities.
8
(Vol., Temp.) 0.99V, 125C 0.90V, 25C 0.81V, 125C 0.81V, -40C
Power density 0.66 μW/μm2 0.21 μW/μm2 0.18 μW/μm2 0.16 μW/μm2
Max IR-drop 44 mV < 35 mV < 31 mV < 31 mV4444 44 44
Fast Dynamic IR-drop within Cluster
Procedure hopping to Compensate Voltage Variation
9
... I$Bi-1I$B0
Log. Interc.
Core15
VA
-VD
D-h
oppi
ng
... TCDMBj-1TCDMB0
Log. Interc.
Low VDD
Typical VDD
High VDD
DF
S...
f+18
0°f+
180°
f
CPM
Level ShiftersLevel Shifters
Level ShiftersLevel Shifters
SHM
PSS
Core0
VA
-VD
D-h
oppi
ng
CPM
PSS
Procedure hopping facilitates fast and proactive migration of procedures within a cluster to prevent voltage variation thanks to shared I$ and TCDM resources.
Each procedure hops from one core to another if it causes voltage variation.
• Sources of Variations
• Variation-tolerant Shared-L1 Processor Cluster
1. Process Variation → Variation-aware VDD-hopping
2. Dynamic Voltage Variation → Procedure hopping
• Methodology for PLV
– Design time characterization
– Compile time PLV metadata generation
– Runtime preventive compensation
• Experimental Results
10
Outline
Procedure-level Vulnerability (PLV) • The notion of PLV to fast dynamic voltage variation is
defined.• The design time stage analyzes the dynamic voltage
droops/rises for every ProcX under full operating conditions generating PLVx metadata.
11
int ProcX (…) { …
}
(Vi,Tj)
Corei
Observe IR-drops
(V,T) PLVX
V1,T1 0.75
V2,T2 0.35
V3,T3 0.01
… …
Characterization of PLV to IR-drop: Compile time + Runtime
12
Open-source Leon3
Design Compiler
IC Compiler
PrimeTime PX
ModelSimVsim
VHDL Timing constraints
Verilognet-list
Verilognet-list
Parasitics
Switchingactivity
ProcX
Power @(Vi,Tj)
DynamicVoltage droop/rise @(Vi,Tj)
Object code
PLVcharacterized metadata
For ProcX@Caller :Read current (V,T) sensors of Corei
Read characterized metadata for ProcX
If PLVX > PLV_thresholdInvoke Procedure Hopping (ProcX@Callee)
VA-Proc generation: ProcX/ProcX@Caller/
ProcX@Callee
Generating metadata
Operating condition
(V,T) monitor
Design time Compile time
RuntimeL
eon
-3:
Co
rei
(0.81V,-40˚C)
(0.90V,25˚C)
(0.99V,125˚C)
TSMC 45nm LIBs
Prim
eRail
SD
F
(0.81V,125˚C)
Source code
(V,T)
Exec
uta
ble
s
BCC Compiler
VA-Procedures’ source code
• At compile time, PLVx metadata of ProcX is attached to the procedure.• During runtime, the discretized (V,T) point to the corresponding characterized
PLV metadata to assess the vulnerability of ProcX at the current (V,T).• If PLVx ≥ PLV_threshold, the ProcX will be hopped from caller core to a favor
callee core.
• Sources of Variations
• Variation-tolerant Shared-L1 Processor Cluster
1. Process Variation → Variation-aware VDD-hopping
2. Dynamic Voltage Variation → Procedure hopping
• Methodology of PLV
– Design time characterization
– Compile time PLV metadata generation
– Runtime preventive compensation
• Experimental Results
13
Outline
Max Voltage Variation Across Corners and Procedures
14
(Vol., Temp.) a2tim FIR IFFT bitmnp cacheb IDCT matrix pntrch PWM sspeed tblook ttsprk 0.99V, 125°C 5.39 4.46 6.34 5.03 4.62 6.26 5.89 5.36 5.23 5.05 3.84 5.410.90V, 25°C 3.65 2.98 4.63 3.47 3.11 4.41 4.09 3.63 3.65 3.48 2.44 4.99
0.81V, 125°C 3.45 2.8 3.7 3.43 2.92 3.77 3.63 3.39 3.27 3.33 2.29 3.630.81V, -40°C 3.34 2.72 3.66 3.34 2.84 3.7 3.53 3.26 3.24 3.24 2.22 3.53
Max voltage droop (%)
0
1
2
3
4
5
6
7
8
Max v
olt
ag
e r
ise (
%)
(0.81V, -40°C) (0.81V, 125°C) (0.90V, 25°C) (0.99V, 125°C)• Most of procedures running at cores with 0.99V have voltage emergencies.
• At 0.9V, only four procedures (IFFT, IDCT, matrix, ttsprk) face the voltage emergencies.
• No voltage emergency at 0.81V.• Procedure hopping avoids the voltage emergency for all
procedures by hopping them form a high-voltage core to a low-voltage core.
Cost of Procedure Hopping
• The total roundtrip overhead of the hopping a procedure from the caller core and returning the results from the callee core is less than 800 cycles.
• This overhead is less than 1% of the total cycles needed to execute any of the characterized procedures in EEMBC benchmark.
• During the procedure hopping no voltage emergency can occur even at (0.99V,125˚C), neither in the caller nor the callee core.
15
Caller
hoppingCaller
not hoppingCalleeservice
Calleeno service
Latency 218 cycles 88 cycles 575 cycles 342 cyclesVoltage droop 1.3% 0.6% 2.9% 1.8%
Conclusion • The notion of procedure-level vulnerability to fast
dynamic voltage variation is defined.• Based on PLV metadata, a fully-software low-cost
procedure hopping technique is proposed which guarantees the voltage emergency-free migration of all procedures, fast and proactively enough within a shared-L1 processor cluster.
• Full post-P&R results in 45nm TSMC technology confirms that the procedure hopping avoids the voltage emergency across a variability-affected cluster, while imposing only an amortized cost of less than 1% latency for any of the characterized embedded procedures.
16
17
Thank you!
http://variability.org
Acknowledgment• NSF Variability Expedition• ERC Multitherman Project
HW/SW Collaborative Architecture to Support Intra-cluster Procedure Hopping
18
• The code is easily accessible via the shared-L1 I$.• The data and parameters are passed through the shared stack in TCDM. • A procedure hopping information table (PHIT) keeps the status for a migrated
procedure.
…ProcX@Callee:if (calculate_PLV ≤ PLV_threshold)
set_statusX_PHIT = runningload_contex¶m_from_SSPX
set_all_param&pointerscall ProcX
store_contex_to_SSPX
set_statusX_PHIT = donesend_broadcast_ack
else resume_normal_execution
…Broadcast_req_ISR:ProcX@Callee = search_in_PHITcall ProcX@Callee
…call ProcX //conventional compile Call ProcX@Caller //VA-compile…ProcX@Caller:If (calculate_PLV ≤ PLV_threshold)
call ProcX
else create_shared_stack_layoutset_PHIT_for_ProcX
send_broadcast_reqset_timerwait_on_ack_or_timer
…Broadcast_ack_ISR:if (statusX_PHIT == done)
load_context&return_from_SSPX
Shared Local
Heap
Shared Stack
ProcXProcX
@Callee
PHIT
Op
era
ting
Co
n. M
on
it.Interrup
t Co
nt.O
pe
ratin
g C
on
. Mo
nit.
Inte
rrup
t C
ont
.
TCDM
Sh
aredL
1 -I$
Callee Corek Caller Corei
ProcX
@Caller……
…
Stacks
Intra-procedure Peak Power Variation
• Maximum of 1.28× intra-corner peak power variation occurs between IFFT and tblook procedures at (0.81V,125C).
• Maximum inter-corner peak power variation is 3.5× for FIR.• Maximum of 4.1× peak power variation across corners and
procedures, a2time at (0.81V,-40C), and IFFT at (0.99V,125C).
19
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Pea
k p
ow
er (
W)
(0.81V, -40°C) (0.81V, 125°C) (0.90V, 25°C) (0.99V, 125°C)
Recommended