View
215
Download
0
Tags:
Embed Size (px)
Citation preview
1
Leveraging the Core-Level Complementary Effects of PVT Variations to Reduce Timing
Emergencies in Multi-Core Processors
Guihai Yan1, Xiaoyao Liang2, Yinhe Han1, and Xiaowei Li1
1. Key Laboratory of Computer System and Architecture,Institute of Computing Technology, Chinese Academy of
Sciences (ICT, CAS)
2. NVIDIA CorporationJun. 23, 2010
2Key Laboratory of Computer System and Architecture, ICT CAS
Outline
Introduction to PVT variations
Analyzing “complementary effect”
• Timing domain
• Frequency domain
Implementation challenges & solutions
Experimental results
3Key Laboratory of Computer System and Architecture, ICT CAS
Introduction to variations Variation sources
• Process variation– Random dopant fluctuation– Sub-wave length lithography
• Voltage variation– Parasitic power delivery networks– Application variability– Inductive noise, IR-drop
• Temperature variation– Imbalanced activity– Hotspot
We focus on the primary manifestation• Performance variation
4Key Laboratory of Computer System and Architecture, ICT CAS
Process variation Sub-wavelength Lithography
• “What you get is not what you want”
• Systematic Random dopant fluctuations
• Vth variation• Random
1980 1990 2000 2010 2020
100nm
1m
10nm
1980 1990 2000 2010 20201980 1990 2000 2010 2020
100nm
1m
10nm
193nm193nm248nm248nm
365nm365nmLithographyLithographyWavelengthWavelength
65nm65nm
90nm90nm
130nm130nm
GenerationGeneration
GapGap
45nm45nm
32nm32nm
180nm180nm
13nm 13nm EUVEUV
Sub-wavelength lithography
[Borkar, DAC’09]
[Aitken, ATS’07]
Max Freq. differentiate by 20% !
[Teodorescu, ISCA’08]
P variation is time-independent, “DC component”
5Key Laboratory of Computer System and Architecture, ICT CAS
Temperature variation Application-
specific Slow-varying
• Milliseconds
• Typical thermal constant: 2ms
[Donald, ISCA’06]
Measured PentiumM processor temperatures
EL Synthesizer
EL Synthesizer
EL Synthesizer
EL Synthesizer
TM Agent
Core1 Core2
Core3 Core4
T variation is slow-varying, “Low-frequency components”
6Key Laboratory of Computer System and Architecture, ICT CAS
Voltage variation Fast-changing
• Inductive noise– a.k.a. L(di/dt) problem
• IR-drop
Hierarchical PDN
Why it is harder to keep a constant voltage level ?
V variation is fast-changing, “High-frequency components”
ExamplePower budget: 100WWorking voltage: 1V
Current: 100ATo keep voltage fluctuation
between ±5%, RPDN < 0.5 mOhm
Rcc
Rcc
Rcc
Rcc
Bump1
Bump4 Bump3
Bump2
Rb
Lb Lb
Lb Lb
Rb
RbRb
Core1 Core2
Core3Core4
GND
Cdecap
GND
GNDGND
Vc1Vc2
Vc3Vc4
Vcc grid
Vss grid
Vcc bump
Vss bump
Decap
GND
0.2 mOhm 0.2 mOhm 0.4 mOhm
90 pH 12 pH990 uF
2 mOhm
423 pH 45 pH
0.4 mOhm
264 uF
20 pH
0.15 mOhm
1222 uF
On-Chip Power Grid
Motherboard Socket and PackageVRM
(a) Power Delivery Path for Intel Xeon 5500 series Processors
(b) On-Chip Core-level Power Grid Model (c) Inter-Core Power Grid Model
Cavity Caps
7Key Laboratory of Computer System and Architecture, ICT CAS
Resultant impact of PVT variations
Fast cores
Slow coresViolent apps.High temp.
Low temp.Mild apps.
Timing (Delay) Variation
8Key Laboratory of Computer System and Architecture, ICT CAS
Prior solutions Strive to compensate P, V, and T variation
individually• Mitigate P variation
– ReCycle[ISCA’06], Body Bias[Micro’07], ReVIVal[ISCA’08] et al.
• Stabilize V variation– Pipeline damping[ISCA’03], DeCoR[HPCA’08] et al.
• Balance T variation– Hotspot [ISCA’03], DVFS + Activity
Migration[ISCA’03, HPCA’01, TODAES’07] et al.
Other timing-oriented solutions• Razor[JSSC’06], EVAL[Micro’08],
Tribeca[Micro’09] et al.
9Key Laboratory of Computer System and Architecture, ICT CAS
Our perspective Focus on the essential Timing issue
Delay variation
Process variation
Voltage variation
Temp. variation
Not Necessarily aggregated, but can cancel off each others in some cases. Hence, “Complementary”
Design Goal: Minimize Delay variation
Process
Voltage Temp.
Delay
10Key Laboratory of Computer System and Architecture, ICT CAS
Some terms Timing emergency (TE) Emergency level (EL)
• “Density” of TE• Define: EL = # of TE per
100 millions cycles
Violent vs. Mild • Voltage
– Large fluctuation = Violent
– Small fluctuation = Mild
• Temperature– “Hot” = Violent– “Cool” = Mild
• Process – Slow corner = Violent– Fast corner = Mild
Time
Del
ay
Timing Emergency
Threshold
Mild
Violent
Voltage Traces
11Key Laboratory of Computer System and Architecture, ICT CAS
How PVT Variations Complement each other ?
Observation in time domain
What if exchange the threads on Core1 and Core2?
T. Mild, V. MildCore1:
Large margin, low EL
T. Violent, V. ViolentCore2:
Little margin, High EL
Time
Del
ay
Threshold
Time
Del
ay
T Violent, V Violent
T Mild, V Mild T Mild, V Violent
T Violent, V Mild
Emergency
Excessive headroom
Mild + Violent
12Key Laboratory of Computer System and Architecture, ICT CAS
Frequency domain analysis
Y(f) = FFT(D(t)) Sample interval: 5ns Span of analysis: 1ms
DC component: “P”Low freq. component: “T”High freq. component : “V”
13Key Laboratory of Computer System and Architecture, ICT CAS
The strength of each component of PVT variations
Migrate threads = “ Graft” V component
Del
ay
DT
H
Time
Core2
Del
ay
DT
H
Time
Core1
TM
TM
T V
FrequencyS
pect
rum
de
viat
ion
T V
Frequency
Spe
ctru
m
devi
atio
n
T
V
Frequency
Spe
ctru
m
devi
atio
n
T V
FrequencyS
pect
rum
de
viat
ion
P P
P P
P T
PT
14Key Laboratory of Computer System and Architecture, ICT CAS
Frequency domain analysis (cont.)
Relative frequency spectrum deviations on 2GHz quad-core processor. • P: 0-100Hz, T: 100Hz-1MHz, V: 1MHz-250MHz.
Potential • Core3 and Core4 are mild
Strategy • exchange threads on Core1 and Core4, Core2 and Core 3
15Key Laboratory of Computer System and Architecture, ICT CAS
How to exploit such “complementary effect”?
Straightforward approach
T component P component V component
Product test Voltage sensor Temp. sensor Aging sensor
Xyz sensor
•Pros. •Conceptually simple
•Cons. •Slow: V. and T. sensor are slow•Incomprehensive: e.g. what if aging ? Our approach: Delay sensor-based scheme
Delay sensor
V component (P+T) component
•Pros. •Fast•Comprehensive (Timing)
•Cons.•Need a little trick
16Key Laboratory of Computer System and Architecture, ICT CAS
Implementation (cont.)
What we have known
• Delay variation – Delay sensors
What we need to know • The strength of PT and
V component
How to bridge the gap?
Three challenges• Infer PVT component from delay Values
• On-the-fly thread migration decision-making
• On-the-fly variation prediction
17Key Laboratory of Computer System and Architecture, ICT CAS
Top view of architecture
EL Synthesizer
EL Synthesizer
EL Synthesizer
EL Synthesizer
TM Agent
To Inter-Cluster TM Agent
Core1 Core2
Core3 Core4
Delay sensor
I/O Interface
I/O Interface
$
Interconnect Network
$
$
$
Inter-Cluster
TM Agent
Cluster1
Cluster2 Cluster4
Timing Emergency Aware + Thread Migration
TEA-TM
18Key Laboratory of Computer System and Architecture, ICT CAS
Infer PVT component from Delay Values
Use mean delay to infer PT component ( < 1MHz )
This simplification greatly facilitates cost-efficient implementation of TEA-TM. Then, how about “V component”?
Mean delay
PT component
19Key Laboratory of Computer System and Architecture, ICT CAS
On-the-fly TEA-TM Decision Making
Urgent First Policy (UFP) • Do NOT directly rely on accurate V-component
Basic idea:
• Migrate the threads running on the highest EL core to the core with the smallest PT component. —— Always right, but may not be optimum!
EL = PT “+” V
Core1 Core2
Emergency Level
PT Component
TM
Refer to our paper for the more sophisticated “DUFP” heuristic
20Key Laboratory of Computer System and Architecture, ICT CAS
On-the-fly Variation Prediction Objective: reducing the emergency level in the future
• Emergency Level
• PT component
•Linear prediction mechanismEL prediction result
21Key Laboratory of Computer System and Architecture, ICT CAS
Experiments Methodology
• Trace-based evaluation
Modeled processor• Quad-core• Superscalar• 2GHz
PDN• Similar to Intel Xeon
5500 quad-core microprocessor
• 130W (peak 150W) Workload
Wattch
HotSpot
Power Traces
Floorplan Info.
Voltage Traces
Current Traces
Thermal Traces
Applications
PDN Model Info.
Hspice
22Key Laboratory of Computer System and Architecture, ICT CAS
Metrics Relative throughput loss
Relative Fairness
Where,
Where,
23Key Laboratory of Computer System and Architecture, ICT CAS
Impact of TM interval on average EL reduction
No migration overhead accounted 1ms at 2GHz, migration overhead is negligible 0.3 ms at 2GHz, migration overhead < 15%
Perf. Overhead & EL Reduction
Overall Throughput
Minimal TM Interval
Large Migration Penalty
Large Emergency
Rate
•When take migration penalty into account
24Key Laboratory of Computer System and Architecture, ICT CAS
Reduction in Relative Throughput Loss
TM Interval: 0.2ms, Accuracy: 90% Developing more sophisticated heuristics
25Key Laboratory of Computer System and Architecture, ICT CAS
Fairness Improvement
80% fairness improvement
26Key Laboratory of Computer System and Architecture, ICT CAS
Conclusion Analyzing the complementary effect
• from both time and frequency domain
Presenting a delay sensor-based scheme (TEA-TM) to exploit the comp. effect• Simple, cost-efficient
The experimental results show• Improved throughput
• Improved fairness