Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Designing Processors for Timing Speculation from the Ground Up
Brian Greskamp, Lu Wan, Ulya Karpuzcu, Jeffrey Cook,Josep Torrellas, Deming Chen, and Craig Zilles
Universityof Illinois
http://iacoma.cs.uiuc.edu/
BlueShifτ
Brian Greskamp
Clock Frequency Crash
• Clock frequency is critical to per-thread performance• ... but frequency is now scaling slowly• Need to squeeze more speed out of the process
• Design for better than worst case delay• Use Timing Speculation (TS) architectures to
tolerate occasional timing errors❖ How to design processors to take advantage of TS?
2
Brian Greskamp
Err
or
Rate
(P
E)
Freq
Pe
rfo
rma
nc
e
Freq
Improving Performance with Timing Speculation (TS)
3
TS Perf gain
TSTS
• Timing Speculation (TS)• Increase f at constant V• Provide error detection and correction mechanism
Rat
ed
Rat
edsome paths overshoot
Brian Greskamp
TS Perf gain
Err
or
Rate
(P
E)
Freq
Pe
rfo
rma
nc
e
Freq
Err
or
Rate
(P
E)
Freq
Pe
rfo
rma
nc
e
Freq
4
BlueShift Perf gain
TS o
nly
TS o
nly
TS w
ithBl
ueSh
ift
TS w
ithBl
ueSh
ift
• Existing design methods produce steep PE vs f curves• BlueShift
• Speed up frequently overshooting paths• Increase f at same error rate (PE)
Rat
ed
Rat
ed
Contribution: BlueShift Design Methodology
some P cost
Brian Greskamp
Taxonomy of TS Archs• Checking granularity
• Stage-level: Detect and correct errors at each pipe stage
• At-retirement: Check each instruction just before it retires
• Checker persistence
• Always-on: TS always active (with P and area overhead)
• On-demand: TS can be disabled to save power at lower f
• Functional correctness
• Relaxed: Can prune functionality from some components
• Correct: All components must be functionally complete
5
Brian Greskamp
Example Architectures
6
Razor [Ernst03]Checker persistence: always-onChecking granularity: stage-levelFunctional correctness: correct
Paceline [Greskamp07]Checker persistence: on-demandChecking granularity: at-retirementFunctional correctness: correct
LeaderCore
CheckerCore
State Comparison
Spec. clk Rated clk
Core 0 Core 1
Rated clk Rated clk
CMP
XOR
NO
T
Error
Shadow
latch
Main
latch
Logic
Cone
Clk
hints
Brian Greskamp 7
BlueShift• Goal under BlueShift: Speed up frequently-used critical paths
• Many circuit paths are rarely used [OptTandem07]
• Let infrequently-used critical paths remain slow
• TS architecture will correct these faults
• BlueShift: Profile-driven design process
• Identify frequently-used critical paths
• Modify design to speed them up (area, power)
[OptTandem07]: Francisco Mesa-Martínez et al., MICRO2007
Brian Greskamp
Target PE Targ
et f
BlueShift Steps
1. Profile gate-level design2. Identify frequently overshooting paths3. Speed up frequently overshooting paths
8
Rep
eat u
ntil
fast
eno
ugh
Err
or
Rate
(P
E)
Freq
Design goal: PE below a specified threshold at target f
Brian Greskamp
• Targeted Acceleration: Accelerate frequently overshooting paths without increasing worst case delay
• Good for on-demand architectures (Paceline)• Delay Trading: Sacrifice worst case delay to speed up frequently
overshooting paths• Good for always-on architectures (Razor)
9
Two Possible ApproachesE
rro
r R
ate
(P
E)
Freq
Targeted Acceleration
Err
or
Ra
te (
PE
)
Freq
Delay Trading
Brian Greskamp
BlueShift Design Flow
10
Initial Netlist
Physical Impl
Gate Sim
Speed up overshooting paths
Design Changes
Physical Design
Path Profile
Compute PE
PE > target PE < target
Select Training Benchmarks
Bench 0 Bench 1 Bench n
Final Design
Brian Greskamp
1: Profile the Design
• Perform gate-level simulation of design
• Record net transition times for each cycle
• Trace dynamic overshooting paths
11
f
b
ca
d
e
Z
X
Y
1. Profile gate-level design
2. Identify frequently overshooting paths
3. Speed up frequently overshooting paths
Brian Greskamp
Frequency of overshooting d(p): Fraction of cycles on which path p exceeds target clock period
• Error rate PE is upper-bounded by
• Generated sorted list of d(p) for each path averaged across all benchmarks
• Select top k fraction of paths
12
2: Identify Frequently Overshooting Paths
1. Profile gate-level design
2. Identify frequently overshooting paths
3. Speed up frequently overshooting paths
�
p
d(p)
Brian Greskamp
• Apply body bias to gates on frequently overshooting paths
• Gates with body bias are faster but consume more leakage power
13
Original Design
Standard
Gate
FBB
Gate
Bias
OSB Design
Example: On-demand Selective Biasing (OSB)
Err
or
Ra
te (
PE
)
Freq
Err
or
Ra
te (
PE
)
Freq
Initial Netlist
Physical Impl
Gate Sim
Speed up High d(p) Paths
Design Changes
Physical Design
Path Profile
Compute PE
PE > target PE < target
Final Design
Select Training Benchmarks
Bench 0 Bench 1 Bench n
List
of F
BB G
ates
Brian Greskamp
Normal Mode
OSB Operation
14
Bias Off
Bias
Standard
Gate
FBB
Gate
Err
or
Rate
(P
E)
Freq
Rat
ed f
Rated clk Rated clk
Core 0 Core 1
CMP
Brian Greskamp
OSB Operation
15
Bias
Bias On
Standard
Gate
FBB
Gate
TS Mode
Err
or
Rate
(P
E)
Freq
TS f
Target PE
Err
or
Rate
(P
E)
Freq
Rat
ed f
Spec. clk Rated clk
LeaderCore
CheckerCore
State Comparison
CMP
hints
Brian Greskamp 16
• Generate timing constraints for commercial CAD tools
• Relax timing constraints for paths with low freq of overshooting d(p)
• Tighten timing constraints on paths with high freq of overshooting d(p)
• Intuitively: Generate slack on low d(p) paths and move it to high d(p) paths
Path
Con
stra
ints
Initial Netlist
Physical Impl
Gate Sim
Speed up High d(p) Paths
Design Changes
Physical Design
Path Profile
Compute PE
PE > target PE < target
Final Design
Select Training Benchmarks
Bench 0 Bench 1 Bench n
Example:Path Constraint Tuning (PCT)
Err
or
Ra
te (
PE
)
Freq
Brian Greskamp
How PCT Speeds Up Dynamic Overshooting Paths
17
Assume: Need to speed up path A->Z because it has high d(p) All other paths are infrequently-used and can be slow
1
1
1
A
Z
4
Original
1
A
1 1
4
Refactor
1
A
Z
2
1
2
Resize
1
A
Z
2
1
2
Re-place
1
A
Z
2
1
2
Allocate Low Vt
Brian Greskamp
BlueShift Evaluation
18
Module Stage # cells Descriptionsparc_exu EXE 21896 Integer FUs, control, bypasslsu_stb_ctl MEM 694 Store buffer controllsu_qctl1 MEM 2304 Load / Store queue controllsu_dctl MEM 3434 L1 D-cache control
sparc_ifu_dec F/D 612 Instruction decodersparc_ifu_fdp F/D 6894 Fetch datapath, PC maintenancesparc_ifu_fcl F/D 1921 L1 I-cache and PC control
• Sampled modules from OpenSPARC T1 (Sun Niagara)• Implemented in commercial ASIC process• Ran BlueShift Profiling on SPECint2006• Evaluated with SPECint2000
Brian Greskamp
On-demand Selective Biasing: Error Rate
19
Err
or
Rate
(P
E)
Freq
1.0 1.1 1.2 1.3 1.4 1.5
1e−07
1e−05
1e−03
1e−01
1.0 1.1 1.2 1.3 1.4 1.5
1e−07
1e−05
1e−03
1e−01
Erro
rs p
er C
ycle
(PE)
Frequency (norm. to rated)
Target PE
f = 1
.27
f = 1
.12
14% Frequency increase after BlueShift
Brian Greskamp
Path Constraint Tuning:Error Rate
20
Err
or
Rate
(P
E)
Freq
1.0 1.1 1.2 1.3 1.4 1.5
1e−07
1e−05
1e−03
1e−01
1.0 1.1 1.2 1.3 1.4 1.5
1e−07
1e−05
1e−03
1e−01
Erro
rs p
er C
ycle
(PE)
Frequency (norm. to rated)
Target PE
f = 1
.43
f = 1
.28
Up to 12% frequency increase after BlueShift
Brian Greskamp
Microarchitecture Evaluation
21
• Two configurations• Paceline + On-demand Selective Biasing (OSB)• Razor + Path Constraint Tuning (PCT)
• Model out-of-order microarchitecture• 4-wide, 152-entry ROB• Simulate with SESC
• Use circuit model to estimate total PE and power consumption of BlueShifted modules in pipeline
• Use Vdd scaling to speed up SRAM modules
Brian Greskamp
Speedup
22
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Non-TS Paceline Razor
SPEC
int 2
000
Perfo
rman
ce8%
6%
Blue
Shift
PC
T
Blue
Shift
OSB
Trad
ition
al D
esig
n
Trad
ition
al D
esig
nBlueShift improves performance over traditional design
Brian Greskamp
00.20.40.60.8
11.21.41.61.8
Non-TS Paceline Razor
Power
23
SPEC
int 2
000
Per-C
ore
Pow
er
12%
23%
Blue
Shift
PC
T
Blue
Shift
OSB
Trad
ition
al
Trad
ition
al
BlueShift has significant power cost over traditional designs
Brian Greskamp
Summary
• Presented profile-driven BlueShift flow• Proposed two specific optimizations aimed at
different TS architectures• Demonstrated frequency improvements
• 14% (8% speedup) from OSB• 12% (6% speedup) from PCT
24
BlueShift: A novel approach to designing processors for TS
Designing Processors for Timing Speculation from the Ground Up
Brian Greskamp, Lu Wan, Ulya Karpuzcu, Jeffrey Cook,Josep Torrellas, Deming Chen, and Craig Zilles
Universityof Illinois
http://iacoma.cs.uiuc.edu/
BlueShifτ
Brian Greskamp
Err
or
Rate
(P
E)
Freq
# o
f D
yn
Pa
ths
DelayP
erf
orm
an
ce
Freq
Improving Performance with Timing Speculation
26
Err
or
Rate
(P
E)
Freq
# o
f D
yn
Pa
ths
DelayP
erf
orm
an
ce
Freq
Target PER
ated
TS
Err
or
Rate
(P
E)
Freq
# o
f D
yn
Pa
ths
DelayP
erf
orm
an
ce
Freq
TS&B
S
Perf gain
• Premise: Increase f at constant V• Some paths will overshoot the clock cycle, causing errors• Timing Speculation: Detect and correct timing errors• BlueShift: Decrease the error rate
Brian Greskamp
OSB Performance Detail
27
0
2
4
6
8
10
12
bzi
p2
cra
fty
gap
gcc
gzi
p
mcf
pars
er
twolf
vort
ex
vpr
mean
0
2
4
6
8
10
12
0
2
4
6
8
10
12
Checker Leader NonBS Leader BS Extra
Sp
ee
du
p (
% o
ver
Unpaired)
Po
we
r (W
)
(a) (b)
2xUnpaired
PacelineBase
Paceline+OSB
bzi
p2
cra
fty
ga
p
gcc
gzi
p
mcf
pa
rse
r
two
lf
vort
ex
vpr
hm
ea
n
0
5
10
15
20
25
Paceline Base Paceline+OSB
!"#$%& '( )&%*+%,-./& 0-1 -.2 3+4&% /+.5$,36"+. 071 +* 2"**&%&.6 )-/&8".&97-5&2 3%+/&55+% /+.!#$%-6"+.5: !" -.2
#$%!" %&*&% 6+ ;8$&<="*6-78& -.2 .+.9;8$&<="*6-78& ,+2$8&5> %&53&/6"?&8@:
bzi
p2
cra
fty
ga
p
gcc
gzi
p
mcf
pa
rse
r
two
lf
vort
ex
vpr
me
an
0
2
4
6
8
10
0
2
4
6
8
10 NonBS BS
bzi
p2
cra
fty
ga
p
gcc
gzi
p
mcf
pa
rse
r
two
lf
vort
ex
vpr
hm
ea
n
0
10
20
30
40
Razor Base Razor+PCT
Sp
ee
du
p (
% o
ver
Unpaired)
Po
we
r (W
)
(a) (b)
RazorBase
Razor+PCT
!"#$%& A( )&%*+%,-./& 0-1 -.2 3+4&% /+.5$,36"+. 071 +* 2"**&%&.6 B-C+%97-5&2 3%+/&55+% /+.!#$%-6"+.5:
!"#$%& '071 5=+45 6=& 3+4&% /+.5$,&2 7@ 6=& 3%+/&55+% -.2
DE /-/=&5 ". &'()*+%) !',)> &'()*+%)-."!> -.2 64+ ".56-./&5 +*
/%0'+1)2: F=& 3+4&% "5 7%+G&. 2+4. ".6+ 3+4&% /+.5$,&2 7@
6=& /=&/G&% /+%& 04="/= "5 .&?&% ;8$&<="*6&21> .+.9;8$&<="*6-78&
,+2$8&5 ". 6=& 8&-2&%> ;8$&<="*6-78& ,+2$8&5 ". 6=& 8&-2&%> -.2
&H6%- )-/&8".& 56%$/6$%&5 0/=&/G3+".6".#> IJ> -.2 ;J1: K. -?&%9
-#&> 6=& 3+4&% /+.5$,&2 7@ &'()*+%)-."! "5 ELM ="#=&% 6=-.
6=-6 +* &'()*+%) !',): N+.5&O$&.68@> ;8$&<="*6 4"6= K<; 2+&5
.+6 -22 ,$/= 6+ 6=& 3+4&% /+.5$,36"+. 4="8& 2&8"?&%".# - 5"#9
."!/-.6 3&%*+%,-./& #-".: P+6& 6=-6 6=& /=&/G&% 3+4&% "5 #&.9
&%-88@ 8+4&% 4=&. 6=& /+%&5 %$. ". 3-"%&2 ,+2&: F="5 "5 7&/-$5&
6=& /=&/G&% /+%& 5-?&5 &.&%#@ 7@ 5G"33".# 6=& &H&/$6"+. +* ,-.@
4%+.#93-6= ".56%$/6"+.5:
!"# $%&'()*+, *-(.'(/%01- %02 *'3-(
Q& .+4 /+,3-%& 64+ B-C+%97-5&2 -%/="6&/6$%&5: 3'4$1 !',)
$5&5 6=& 3'4$1 !',) ,+2$8& ",38&,&.6-6"+.> 4="8& 3'4$1-&56
$5&5 6=& 3'4$1-&56 +.& +76-".&2 7@ -338@".# ;8$&<="*6 4"6=
)NF: R5 7&*+%&> 3'4$1-&56 %$.5 -6 6=& *%&O$&./@ #"?&. 7@ 6=&
PE /$%?&5 +* 6=& ;8$&<="*6-78& /+,3+.&.65S 6=&.> 4& -338@ 6%-9
2"6"+.-8 ?+86-#& 5/-8".# 6+ 6=& .+.9;8$&<="*6-78& /+,3+.&.65 5+
6=-6 6=&@ /-. /-6/= $3:
!"#$%& A0-1 5=+45 6=& 53&&2$3 +* 6=& 3'4$1 !',) -.2 3'7
4$1-&56 -%/="6&/6$%&5 +?&% 6=& /%0'+1)2 +.& $5&2 -5 - 7-5&8".&
". !"#$%& '0-1: <"./& 6=&5& B-C+%97-5&2 -%/="6&/6$%&5 6-%#&6 ="#=
3&%*+%,-./&> 6=&@ 2&8"?&% ="#=&% 53&&2$35: Q& 5&& 6=-6> +. -?&%9
-#&> 3'4$1-&56T5 3&%*+%,-./& "5 UM ="#=&% 6=-. 6=-6 +* 3'4$1
!',): F="5 "5 6=& ",3-/6 +* ;8$&<="*6 4"6= )NF ". 6="5 2&5"#. V
4="/= "5 .+6 ."#"78& /+.5"2&%".# 6=-6 3'4$1 !',) 4-5 -8%&-2@
2&5"#.&2 *+% ="#= 3&%*+%,-./&: Q& -85+ 5&& 6=-6 801 -.2> 6+ -
8&55&% &H6&.6> 9:$*; 2+ .+6 3&%*+%, -5 4&88 -5 6=& +6=&% -338"/-9
6"+.5 $.2&% 3'4$1-&56: F="5 "5 6=& %&5$86 +* 6=& $.*-?+%-78& PE
/$%?& *+% 6=&5& -338"/-6"+.5 ". !"#$%& W021:
!"#$%& A071 5=+45 6=& 3+4&% /+.5$,&2 7@ 6=& 64+ 3%+/&55+%
/+.!#$%-6"+.5: F=& 3+4&% "5 7%+G&. 2+4. ".6+ 6=& /+.6%"7$6"+.5
+* 6=& .+.9;8$&<="*6-78& -.2 6=& ;8$&<="*6-78& ,+2$8&5: K. -?9
&%-#&> 3'4$1-&56 /+.5$,&5 LXM ,+%& 3+4&% 6=-. 3'4$1 !',):
F="5 "5 7&/-$5& "6 %$.5 -6 - ="#=&% *%&O$&./@> $5&5 - ="#=&% 5$39
38@ ?+86-#& *+% 6=& .+.9;8$&<="*6-78& ,+2$8&5> -.2 .&&25 ,+%&
5=-2+4 8-6/=&5 -.2 =+8296",& 7$**&%5:
Y"?&. 3'4$1-&56T5 2&8"?&%&2 53&&2$3 -.2 3+4&% /+56> 4& 5&&
6=-6 ;8$&<="*6 4"6= )NF "5 .+6 /+,3&88".# *%+, -. E × D2 3&%9
53&/6"?&: Z.56&-2> 4& 5&& "6 -5 - 6&/=."O$& 6+ *$%6=&% 53&&29$3
- ="#=93&%*+%,-./& 2&5"#. 0-6 - 3+4&% /+561 4=&. /+.?&.6"+.-8
6&/=."O$&5 5$/= -5 ?+86-#& 5/-8".# +% 7+2@ 7"-5".# 2+ .+6 3%+?"2&
*$%6=&% 3&%*+%,-./&: <3&/"!/-88@> *+% *$<+( 0":&:> ;8$&<="*6-78&1
,+2$8&5> ;8$&<="*6 4"6= )NF 3%+?"2&5 -. $19=$<$%'* ,&-.5 +*
",3%+?".# 3&%*+%,-./& 4=&. *$%6=&% ?+86-#& 5/-8".# +% 7+2@ 7"9
-5".# 7&/+,&5 $.*&-5"78&: Z. 6="5 /-5& =+4&?&%> *+% 6=& 3"3&8".& -5
- 4=+8&> .+.9;8$&<="*6-78& 56-#&5 %&,-". - 7+668&.&/G 6=-6 ,$56
7& -22%&55&2 $5".# 5+,& +6=&% 6&/=."O$&:
!"4 +'/567%78'0%9 :;-(<-%2
R86=+$#= ,+56 ,+2$8&5 +* F-78& [ 4&%& *$88@ +36","C&2 4"6=
;8$&<="*6 ". +.& 2-@ +. +$% E\\9/+%& /8$56&%> 6=& +36","C-6"+. +*
,0'1( )>? 6++G -7+$6 +.& 4&&G: <$/= 8+.# 6$%.-%+$.2 6",&5 2$%9
Brian Greskamp
OSB Power Detail
28
0
2
4
6
8
10
12
bzi
p2
cra
fty
gap
gcc
gzi
p
mcf
pars
er
twolf
vort
ex
vpr
mean
0
2
4
6
8
10
12
0
2
4
6
8
10
12
Checker Leader NonBS Leader BS Extra
Sp
ee
du
p (
% o
ver
Unpaired)
Po
we
r (W
)
(a) (b)
2xUnpaired
PacelineBase
Paceline+OSB
bzi
p2
cra
fty
ga
p
gcc
gzi
p
mcf
pa
rser
two
lf
vort
ex
vpr
hm
ea
n
0
5
10
15
20
25
Paceline Base Paceline+OSB
!"#$%& '( )&%*+%,-./& 0-1 -.2 3+4&% /+.5$,36"+. 071 +* 2"**&%&.6 )-/&8".&97-5&2 3%+/&55+% /+.!#$%-6"+.5: !" -.2
#$%!" %&*&% 6+ ;8$&<="*6-78& -.2 .+.9;8$&<="*6-78& ,+2$8&5> %&53&/6"?&8@:
bzi
p2
cra
fty
gap
gcc
gzi
p
mcf
pars
er
twolf
vort
ex
vpr
mean
0
2
4
6
8
10
0
2
4
6
8
10 NonBS BS
bzi
p2
cra
fty
gap
gcc
gzi
p
mcf
pars
er
twolf
vort
ex
vpr
hm
ean
0
10
20
30
40
Razor Base Razor+PCT
Speedup (
% o
ver
Unpaired)
Pow
er
(W)
(a) (b)
RazorBase
Razor+PCT
!"#$%& A( )&%*+%,-./& 0-1 -.2 3+4&% /+.5$,36"+. 071 +* 2"**&%&.6 B-C+%97-5&2 3%+/&55+% /+.!#$%-6"+.5:
!"#$%& '071 5=+45 6=& 3+4&% /+.5$,&2 7@ 6=& 3%+/&55+% -.2
DE /-/=&5 ". &'()*+%) !',)> &'()*+%)-."!> -.2 64+ ".56-./&5 +*
/%0'+1)2: F=& 3+4&% "5 7%+G&. 2+4. ".6+ 3+4&% /+.5$,&2 7@
6=& /=&/G&% /+%& 04="/= "5 .&?&% ;8$&<="*6&21> .+.9;8$&<="*6-78&
,+2$8&5 ". 6=& 8&-2&%> ;8$&<="*6-78& ,+2$8&5 ". 6=& 8&-2&%> -.2
&H6%- )-/&8".& 56%$/6$%&5 0/=&/G3+".6".#> IJ> -.2 ;J1: K. -?&%9
-#&> 6=& 3+4&% /+.5$,&2 7@ &'()*+%)-."! "5 ELM ="#=&% 6=-.
6=-6 +* &'()*+%) !',): N+.5&O$&.68@> ;8$&<="*6 4"6= K<; 2+&5
.+6 -22 ,$/= 6+ 6=& 3+4&% /+.5$,36"+. 4="8& 2&8"?&%".# - 5"#9
."!/-.6 3&%*+%,-./& #-".: P+6& 6=-6 6=& /=&/G&% 3+4&% "5 #&.9
&%-88@ 8+4&% 4=&. 6=& /+%&5 %$. ". 3-"%&2 ,+2&: F="5 "5 7&/-$5&
6=& /=&/G&% /+%& 5-?&5 &.&%#@ 7@ 5G"33".# 6=& &H&/$6"+. +* ,-.@
4%+.#93-6= ".56%$/6"+.5:
!"# $%&'()*+, *-(.'(/%01- %02 *'3-(
Q& .+4 /+,3-%& 64+ B-C+%97-5&2 -%/="6&/6$%&5: 3'4$1 !',)
$5&5 6=& 3'4$1 !',) ,+2$8& ",38&,&.6-6"+.> 4="8& 3'4$1-&56
$5&5 6=& 3'4$1-&56 +.& +76-".&2 7@ -338@".# ;8$&<="*6 4"6=
)NF: R5 7&*+%&> 3'4$1-&56 %$.5 -6 6=& *%&O$&./@ #"?&. 7@ 6=&
PE /$%?&5 +* 6=& ;8$&<="*6-78& /+,3+.&.65S 6=&.> 4& -338@ 6%-9
2"6"+.-8 ?+86-#& 5/-8".# 6+ 6=& .+.9;8$&<="*6-78& /+,3+.&.65 5+
6=-6 6=&@ /-. /-6/= $3:
!"#$%& A0-1 5=+45 6=& 53&&2$3 +* 6=& 3'4$1 !',) -.2 3'7
4$1-&56 -%/="6&/6$%&5 +?&% 6=& /%0'+1)2 +.& $5&2 -5 - 7-5&8".&
". !"#$%& '0-1: <"./& 6=&5& B-C+%97-5&2 -%/="6&/6$%&5 6-%#&6 ="#=
3&%*+%,-./&> 6=&@ 2&8"?&% ="#=&% 53&&2$35: Q& 5&& 6=-6> +. -?&%9
-#&> 3'4$1-&56T5 3&%*+%,-./& "5 UM ="#=&% 6=-. 6=-6 +* 3'4$1
!',): F="5 "5 6=& ",3-/6 +* ;8$&<="*6 4"6= )NF ". 6="5 2&5"#. V
4="/= "5 .+6 ."#"78& /+.5"2&%".# 6=-6 3'4$1 !',) 4-5 -8%&-2@
2&5"#.&2 *+% ="#= 3&%*+%,-./&: Q& -85+ 5&& 6=-6 801 -.2> 6+ -
8&55&% &H6&.6> 9:$*; 2+ .+6 3&%*+%, -5 4&88 -5 6=& +6=&% -338"/-9
6"+.5 $.2&% 3'4$1-&56: F="5 "5 6=& %&5$86 +* 6=& $.*-?+%-78& PE
/$%?& *+% 6=&5& -338"/-6"+.5 ". !"#$%& W021:
!"#$%& A071 5=+45 6=& 3+4&% /+.5$,&2 7@ 6=& 64+ 3%+/&55+%
/+.!#$%-6"+.5: F=& 3+4&% "5 7%+G&. 2+4. ".6+ 6=& /+.6%"7$6"+.5
+* 6=& .+.9;8$&<="*6-78& -.2 6=& ;8$&<="*6-78& ,+2$8&5: K. -?9
&%-#&> 3'4$1-&56 /+.5$,&5 LXM ,+%& 3+4&% 6=-. 3'4$1 !',):
F="5 "5 7&/-$5& "6 %$.5 -6 - ="#=&% *%&O$&./@> $5&5 - ="#=&% 5$39
38@ ?+86-#& *+% 6=& .+.9;8$&<="*6-78& ,+2$8&5> -.2 .&&25 ,+%&
5=-2+4 8-6/=&5 -.2 =+8296",& 7$**&%5:
Y"?&. 3'4$1-&56T5 2&8"?&%&2 53&&2$3 -.2 3+4&% /+56> 4& 5&&
6=-6 ;8$&<="*6 4"6= )NF "5 .+6 /+,3&88".# *%+, -. E × D2 3&%9
53&/6"?&: Z.56&-2> 4& 5&& "6 -5 - 6&/=."O$& 6+ *$%6=&% 53&&29$3
- ="#=93&%*+%,-./& 2&5"#. 0-6 - 3+4&% /+561 4=&. /+.?&.6"+.-8
6&/=."O$&5 5$/= -5 ?+86-#& 5/-8".# +% 7+2@ 7"-5".# 2+ .+6 3%+?"2&
*$%6=&% 3&%*+%,-./&: <3&/"!/-88@> *+% *$<+( 0":&:> ;8$&<="*6-78&1
,+2$8&5> ;8$&<="*6 4"6= )NF 3%+?"2&5 -. $19=$<$%'* ,&-.5 +*
",3%+?".# 3&%*+%,-./& 4=&. *$%6=&% ?+86-#& 5/-8".# +% 7+2@ 7"9
-5".# 7&/+,&5 $.*&-5"78&: Z. 6="5 /-5& =+4&?&%> *+% 6=& 3"3&8".& -5
- 4=+8&> .+.9;8$&<="*6-78& 56-#&5 %&,-". - 7+668&.&/G 6=-6 ,$56
7& -22%&55&2 $5".# 5+,& +6=&% 6&/=."O$&:
!"4 +'/567%78'0%9 :;-(<-%2
R86=+$#= ,+56 ,+2$8&5 +* F-78& [ 4&%& *$88@ +36","C&2 4"6=
;8$&<="*6 ". +.& 2-@ +. +$% E\\9/+%& /8$56&%> 6=& +36","C-6"+. +*
,0'1( )>? 6++G -7+$6 +.& 4&&G: <$/= 8+.# 6$%.-%+$.2 6",&5 2$%9
Brian Greskamp
PCT Performance Detail
29
0
2
4
6
8
10
12
bzi
p2
cra
fty
gap
gcc
gzi
p
mcf
pa
rse
r
twolf
vort
ex
vpr
mea
n
0
2
4
6
8
10
12
0
2
4
6
8
10
12
Checker Leader NonBS Leader BS Extra
Sp
ee
du
p (
% o
ver
Unpaired)
Po
we
r (W
)
(a) (b)
2xUnpaired
PacelineBase
Paceline+OSB
bzi
p2
cra
fty
ga
p
gcc
gzi
p
mcf
pa
rse
r
two
lf
vort
ex
vpr
hm
ea
n
0
5
10
15
20
25
Paceline Base Paceline+OSB
!"#$%& '( )&%*+%,-./& 0-1 -.2 3+4&% /+.5$,36"+. 071 +* 2"**&%&.6 )-/&8".&97-5&2 3%+/&55+% /+.!#$%-6"+.5: !" -.2
#$%!" %&*&% 6+ ;8$&<="*6-78& -.2 .+.9;8$&<="*6-78& ,+2$8&5> %&53&/6"?&8@:
bzi
p2
cra
fty
gap
gcc
gzi
p
mcf
pars
er
twolf
vort
ex
vpr
mean
0
2
4
6
8
10
0
2
4
6
8
10 NonBS BS
bzi
p2
cra
fty
gap
gcc
gzi
p
mcf
pars
er
twolf
vort
ex
vpr
hm
ean
0
10
20
30
40
Razor Base Razor+PCT
Speedup (
% o
ver
Unpaired)
Po
we
r (W
)
(a) (b)
RazorBase
Razor+PCT
!"#$%& A( )&%*+%,-./& 0-1 -.2 3+4&% /+.5$,36"+. 071 +* 2"**&%&.6 B-C+%97-5&2 3%+/&55+% /+.!#$%-6"+.5:
!"#$%& '071 5=+45 6=& 3+4&% /+.5$,&2 7@ 6=& 3%+/&55+% -.2
DE /-/=&5 ". &'()*+%) !',)> &'()*+%)-."!> -.2 64+ ".56-./&5 +*
/%0'+1)2: F=& 3+4&% "5 7%+G&. 2+4. ".6+ 3+4&% /+.5$,&2 7@
6=& /=&/G&% /+%& 04="/= "5 .&?&% ;8$&<="*6&21> .+.9;8$&<="*6-78&
,+2$8&5 ". 6=& 8&-2&%> ;8$&<="*6-78& ,+2$8&5 ". 6=& 8&-2&%> -.2
&H6%- )-/&8".& 56%$/6$%&5 0/=&/G3+".6".#> IJ> -.2 ;J1: K. -?&%9
-#&> 6=& 3+4&% /+.5$,&2 7@ &'()*+%)-."! "5 ELM ="#=&% 6=-.
6=-6 +* &'()*+%) !',): N+.5&O$&.68@> ;8$&<="*6 4"6= K<; 2+&5
.+6 -22 ,$/= 6+ 6=& 3+4&% /+.5$,36"+. 4="8& 2&8"?&%".# - 5"#9
."!/-.6 3&%*+%,-./& #-".: P+6& 6=-6 6=& /=&/G&% 3+4&% "5 #&.9
&%-88@ 8+4&% 4=&. 6=& /+%&5 %$. ". 3-"%&2 ,+2&: F="5 "5 7&/-$5&
6=& /=&/G&% /+%& 5-?&5 &.&%#@ 7@ 5G"33".# 6=& &H&/$6"+. +* ,-.@
4%+.#93-6= ".56%$/6"+.5:
!"# $%&'()*+, *-(.'(/%01- %02 *'3-(
Q& .+4 /+,3-%& 64+ B-C+%97-5&2 -%/="6&/6$%&5: 3'4$1 !',)
$5&5 6=& 3'4$1 !',) ,+2$8& ",38&,&.6-6"+.> 4="8& 3'4$1-&56
$5&5 6=& 3'4$1-&56 +.& +76-".&2 7@ -338@".# ;8$&<="*6 4"6=
)NF: R5 7&*+%&> 3'4$1-&56 %$.5 -6 6=& *%&O$&./@ #"?&. 7@ 6=&
PE /$%?&5 +* 6=& ;8$&<="*6-78& /+,3+.&.65S 6=&.> 4& -338@ 6%-9
2"6"+.-8 ?+86-#& 5/-8".# 6+ 6=& .+.9;8$&<="*6-78& /+,3+.&.65 5+
6=-6 6=&@ /-. /-6/= $3:
!"#$%& A0-1 5=+45 6=& 53&&2$3 +* 6=& 3'4$1 !',) -.2 3'7
4$1-&56 -%/="6&/6$%&5 +?&% 6=& /%0'+1)2 +.& $5&2 -5 - 7-5&8".&
". !"#$%& '0-1: <"./& 6=&5& B-C+%97-5&2 -%/="6&/6$%&5 6-%#&6 ="#=
3&%*+%,-./&> 6=&@ 2&8"?&% ="#=&% 53&&2$35: Q& 5&& 6=-6> +. -?&%9
-#&> 3'4$1-&56T5 3&%*+%,-./& "5 UM ="#=&% 6=-. 6=-6 +* 3'4$1
!',): F="5 "5 6=& ",3-/6 +* ;8$&<="*6 4"6= )NF ". 6="5 2&5"#. V
4="/= "5 .+6 ."#"78& /+.5"2&%".# 6=-6 3'4$1 !',) 4-5 -8%&-2@
2&5"#.&2 *+% ="#= 3&%*+%,-./&: Q& -85+ 5&& 6=-6 801 -.2> 6+ -
8&55&% &H6&.6> 9:$*; 2+ .+6 3&%*+%, -5 4&88 -5 6=& +6=&% -338"/-9
6"+.5 $.2&% 3'4$1-&56: F="5 "5 6=& %&5$86 +* 6=& $.*-?+%-78& PE
/$%?& *+% 6=&5& -338"/-6"+.5 ". !"#$%& W021:
!"#$%& A071 5=+45 6=& 3+4&% /+.5$,&2 7@ 6=& 64+ 3%+/&55+%
/+.!#$%-6"+.5: F=& 3+4&% "5 7%+G&. 2+4. ".6+ 6=& /+.6%"7$6"+.5
+* 6=& .+.9;8$&<="*6-78& -.2 6=& ;8$&<="*6-78& ,+2$8&5: K. -?9
&%-#&> 3'4$1-&56 /+.5$,&5 LXM ,+%& 3+4&% 6=-. 3'4$1 !',):
F="5 "5 7&/-$5& "6 %$.5 -6 - ="#=&% *%&O$&./@> $5&5 - ="#=&% 5$39
38@ ?+86-#& *+% 6=& .+.9;8$&<="*6-78& ,+2$8&5> -.2 .&&25 ,+%&
5=-2+4 8-6/=&5 -.2 =+8296",& 7$**&%5:
Y"?&. 3'4$1-&56T5 2&8"?&%&2 53&&2$3 -.2 3+4&% /+56> 4& 5&&
6=-6 ;8$&<="*6 4"6= )NF "5 .+6 /+,3&88".# *%+, -. E × D2 3&%9
53&/6"?&: Z.56&-2> 4& 5&& "6 -5 - 6&/=."O$& 6+ *$%6=&% 53&&29$3
- ="#=93&%*+%,-./& 2&5"#. 0-6 - 3+4&% /+561 4=&. /+.?&.6"+.-8
6&/=."O$&5 5$/= -5 ?+86-#& 5/-8".# +% 7+2@ 7"-5".# 2+ .+6 3%+?"2&
*$%6=&% 3&%*+%,-./&: <3&/"!/-88@> *+% *$<+( 0":&:> ;8$&<="*6-78&1
,+2$8&5> ;8$&<="*6 4"6= )NF 3%+?"2&5 -. $19=$<$%'* ,&-.5 +*
",3%+?".# 3&%*+%,-./& 4=&. *$%6=&% ?+86-#& 5/-8".# +% 7+2@ 7"9
-5".# 7&/+,&5 $.*&-5"78&: Z. 6="5 /-5& =+4&?&%> *+% 6=& 3"3&8".& -5
- 4=+8&> .+.9;8$&<="*6-78& 56-#&5 %&,-". - 7+668&.&/G 6=-6 ,$56
7& -22%&55&2 $5".# 5+,& +6=&% 6&/=."O$&:
!"4 +'/567%78'0%9 :;-(<-%2
R86=+$#= ,+56 ,+2$8&5 +* F-78& [ 4&%& *$88@ +36","C&2 4"6=
;8$&<="*6 ". +.& 2-@ +. +$% E\\9/+%& /8$56&%> 6=& +36","C-6"+. +*
,0'1( )>? 6++G -7+$6 +.& 4&&G: <$/= 8+.# 6$%.-%+$.2 6",&5 2$%9
Brian Greskamp
PCT Power Detail
30
0
2
4
6
8
10
12
bzi
p2
cra
fty
gap
gcc
gzi
p
mcf
pars
er
twolf
vort
ex
vpr
mean
0
2
4
6
8
10
12
0
2
4
6
8
10
12
Checker Leader NonBS Leader BS Extra
Speedup (
% o
ver
Unpaired)
Pow
er
(W)
(a) (b)
2xUnpaired
PacelineBase
Paceline+OSB
bzi
p2
cra
fty
gap
gcc
gzi
p
mcf
pa
rser
two
lf
vort
ex
vpr
hm
ea
n0
5
10
15
20
25
Paceline Base Paceline+OSB
!"#$%& '( )&%*+%,-./& 0-1 -.2 3+4&% /+.5$,36"+. 071 +* 2"**&%&.6 )-/&8".&97-5&2 3%+/&55+% /+.!#$%-6"+.5: !" -.2
#$%!" %&*&% 6+ ;8$&<="*6-78& -.2 .+.9;8$&<="*6-78& ,+2$8&5> %&53&/6"?&8@:
bzi
p2
cra
fty
gap
gcc
gzi
p
mcf
pars
er
twolf
vort
ex
vpr
mean
0
2
4
6
8
10
0
2
4
6
8
10 NonBS BS
bzi
p2
cra
fty
gap
gcc
gzi
p
mcf
pars
er
twolf
vort
ex
vpr
hm
ean
0
10
20
30
40
Razor Base Razor+PCT
Speedup (
% o
ver
Unpaired)
Pow
er
(W)
(a) (b)
RazorBase
Razor+PCT
!"#$%& A( )&%*+%,-./& 0-1 -.2 3+4&% /+.5$,36"+. 071 +* 2"**&%&.6 B-C+%97-5&2 3%+/&55+% /+.!#$%-6"+.5:
!"#$%& '071 5=+45 6=& 3+4&% /+.5$,&2 7@ 6=& 3%+/&55+% -.2
DE /-/=&5 ". &'()*+%) !',)> &'()*+%)-."!> -.2 64+ ".56-./&5 +*
/%0'+1)2: F=& 3+4&% "5 7%+G&. 2+4. ".6+ 3+4&% /+.5$,&2 7@
6=& /=&/G&% /+%& 04="/= "5 .&?&% ;8$&<="*6&21> .+.9;8$&<="*6-78&
,+2$8&5 ". 6=& 8&-2&%> ;8$&<="*6-78& ,+2$8&5 ". 6=& 8&-2&%> -.2
&H6%- )-/&8".& 56%$/6$%&5 0/=&/G3+".6".#> IJ> -.2 ;J1: K. -?&%9
-#&> 6=& 3+4&% /+.5$,&2 7@ &'()*+%)-."! "5 ELM ="#=&% 6=-.
6=-6 +* &'()*+%) !',): N+.5&O$&.68@> ;8$&<="*6 4"6= K<; 2+&5
.+6 -22 ,$/= 6+ 6=& 3+4&% /+.5$,36"+. 4="8& 2&8"?&%".# - 5"#9
."!/-.6 3&%*+%,-./& #-".: P+6& 6=-6 6=& /=&/G&% 3+4&% "5 #&.9
&%-88@ 8+4&% 4=&. 6=& /+%&5 %$. ". 3-"%&2 ,+2&: F="5 "5 7&/-$5&
6=& /=&/G&% /+%& 5-?&5 &.&%#@ 7@ 5G"33".# 6=& &H&/$6"+. +* ,-.@
4%+.#93-6= ".56%$/6"+.5:
!"# $%&'()*+, *-(.'(/%01- %02 *'3-(
Q& .+4 /+,3-%& 64+ B-C+%97-5&2 -%/="6&/6$%&5: 3'4$1 !',)
$5&5 6=& 3'4$1 !',) ,+2$8& ",38&,&.6-6"+.> 4="8& 3'4$1-&56
$5&5 6=& 3'4$1-&56 +.& +76-".&2 7@ -338@".# ;8$&<="*6 4"6=
)NF: R5 7&*+%&> 3'4$1-&56 %$.5 -6 6=& *%&O$&./@ #"?&. 7@ 6=&
PE /$%?&5 +* 6=& ;8$&<="*6-78& /+,3+.&.65S 6=&.> 4& -338@ 6%-9
2"6"+.-8 ?+86-#& 5/-8".# 6+ 6=& .+.9;8$&<="*6-78& /+,3+.&.65 5+
6=-6 6=&@ /-. /-6/= $3:
!"#$%& A0-1 5=+45 6=& 53&&2$3 +* 6=& 3'4$1 !',) -.2 3'7
4$1-&56 -%/="6&/6$%&5 +?&% 6=& /%0'+1)2 +.& $5&2 -5 - 7-5&8".&
". !"#$%& '0-1: <"./& 6=&5& B-C+%97-5&2 -%/="6&/6$%&5 6-%#&6 ="#=
3&%*+%,-./&> 6=&@ 2&8"?&% ="#=&% 53&&2$35: Q& 5&& 6=-6> +. -?&%9
-#&> 3'4$1-&56T5 3&%*+%,-./& "5 UM ="#=&% 6=-. 6=-6 +* 3'4$1
!',): F="5 "5 6=& ",3-/6 +* ;8$&<="*6 4"6= )NF ". 6="5 2&5"#. V
4="/= "5 .+6 ."#"78& /+.5"2&%".# 6=-6 3'4$1 !',) 4-5 -8%&-2@
2&5"#.&2 *+% ="#= 3&%*+%,-./&: Q& -85+ 5&& 6=-6 801 -.2> 6+ -
8&55&% &H6&.6> 9:$*; 2+ .+6 3&%*+%, -5 4&88 -5 6=& +6=&% -338"/-9
6"+.5 $.2&% 3'4$1-&56: F="5 "5 6=& %&5$86 +* 6=& $.*-?+%-78& PE
/$%?& *+% 6=&5& -338"/-6"+.5 ". !"#$%& W021:
!"#$%& A071 5=+45 6=& 3+4&% /+.5$,&2 7@ 6=& 64+ 3%+/&55+%
/+.!#$%-6"+.5: F=& 3+4&% "5 7%+G&. 2+4. ".6+ 6=& /+.6%"7$6"+.5
+* 6=& .+.9;8$&<="*6-78& -.2 6=& ;8$&<="*6-78& ,+2$8&5: K. -?9
&%-#&> 3'4$1-&56 /+.5$,&5 LXM ,+%& 3+4&% 6=-. 3'4$1 !',):
F="5 "5 7&/-$5& "6 %$.5 -6 - ="#=&% *%&O$&./@> $5&5 - ="#=&% 5$39
38@ ?+86-#& *+% 6=& .+.9;8$&<="*6-78& ,+2$8&5> -.2 .&&25 ,+%&
5=-2+4 8-6/=&5 -.2 =+8296",& 7$**&%5:
Y"?&. 3'4$1-&56T5 2&8"?&%&2 53&&2$3 -.2 3+4&% /+56> 4& 5&&
6=-6 ;8$&<="*6 4"6= )NF "5 .+6 /+,3&88".# *%+, -. E × D2 3&%9
53&/6"?&: Z.56&-2> 4& 5&& "6 -5 - 6&/=."O$& 6+ *$%6=&% 53&&29$3
- ="#=93&%*+%,-./& 2&5"#. 0-6 - 3+4&% /+561 4=&. /+.?&.6"+.-8
6&/=."O$&5 5$/= -5 ?+86-#& 5/-8".# +% 7+2@ 7"-5".# 2+ .+6 3%+?"2&
*$%6=&% 3&%*+%,-./&: <3&/"!/-88@> *+% *$<+( 0":&:> ;8$&<="*6-78&1
,+2$8&5> ;8$&<="*6 4"6= )NF 3%+?"2&5 -. $19=$<$%'* ,&-.5 +*
",3%+?".# 3&%*+%,-./& 4=&. *$%6=&% ?+86-#& 5/-8".# +% 7+2@ 7"9
-5".# 7&/+,&5 $.*&-5"78&: Z. 6="5 /-5& =+4&?&%> *+% 6=& 3"3&8".& -5
- 4=+8&> .+.9;8$&<="*6-78& 56-#&5 %&,-". - 7+668&.&/G 6=-6 ,$56
7& -22%&55&2 $5".# 5+,& +6=&% 6&/=."O$&:
!"4 +'/567%78'0%9 :;-(<-%2
R86=+$#= ,+56 ,+2$8&5 +* F-78& [ 4&%& *$88@ +36","C&2 4"6=
;8$&<="*6 ". +.& 2-@ +. +$% E\\9/+%& /8$56&%> 6=& +36","C-6"+. +*
,0'1( )>? 6++G -7+$6 +.& 4&&G: <$/= 8+.# 6$%.-%+$.2 6",&5 2$%9
Brian Greskamp
Taxonomy of TS Archs
31
Razor[Ernst03]
Paceline[Greskamp07]
CLS[Liu00]
OptimisticTandem
[Martinez07]
Checking Granularity: Per-stage or at-retirement
Brian Greskamp
Taxonomy of TS Archs
• Checker persistence: Is TS always-on, or only on demand?
• If on-demand, must be careful not to decrease rated f
• Checking granularity: Check per-stage or at retirement?
• If per-stage, can tolerate higher PE
• Is functional correctness relaxed for speculative units?
• Functional correctness requires new techniques
32
Brian Greskamp
Taxonomy of TS Archs
• Checker persistence: Is TS always-on, or only on demand?
• Is functional correctness relaxed for speculative units?
33
Example Architecture Persistence CorrectnessRazor [Ernst03] always-on correct
OptTandem [Martinez07] always-on relaxedPaceline [Greskamp07] on-demand correct