Upload
brendan-french
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
ReSlice: Selective Re-execution of
Long-retired Misspeculated Instructions
Using Forward SlicingSmruti R. Sarangi, Wei Liu, Josep Torrellas, Yuanyuan Zhou
University of Illinois at Urbana-Champaignhttp://iacoma.cs.uiuc.edu
Presented at CARG, March 8th, 2006By Jason Zebchuk
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing Wei Liu, University of Illinois 2
Data Value Speculation
Predict the value and proceed speculatively
When correct value is known, if mispediction, squash and re-execute
Initial proposals
• L1 data value
• Store/Load Data dependences
Aggressive novel proposals Retire speculative instructions
• Values of L2 misses
• Thread independence in Thread-Level Speculation (TLS)
• Speculative synchronization
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing Wei Liu, University of Illinois 3
Long-latency Speculation
Misprediction recovery is very wasteful Most discarded instructions are still useful
Checkpoint
Prediction
Resolution
210 Instructions6.6 Instructions
Speculatively retired instructions
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing Wei Liu, University of Illinois 4
Contributions (I)
ReSlice: Architecture to buffer forward slice and re-execute it on misprediction
Checkpoint
Prediction
Resolution
Buffered Forward Slice
If succeedsIf fails
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing Wei Liu, University of Illinois 5
Contributions (II)
A sufficient condition for correct slice re-execution and merge
• Guarantee to correctly repair the program state
Application to Thread-Level Speculation (TLS) on SpecInt
• Prediction: task live-ins
• Removed 61% task squashes
• Speedup: up to 33% over TLS (avg 12%)
• E×D2 reduction: avg 20%
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing Wei Liu, University of Illinois 6
Outline
Motivation and Contributions
Idea of ReSlice
Architecture Design
Experimental Results
Conclusions
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing Wei Liu, University of Illinois 7
Idea of ReSlice
Initial execution of the task
• Predict value of “risky” load and continue
• e.g L2 missing loads, exposed loads in TLS …
Buffer in HW the forward slice of the load
If a misprediction happens Re-execute the slice with the correct value
If succeed: merge the register and memory state and continue
• If fail: roll back and re-execute (conventional recovery)
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing Wei Liu, University of Illinois 8
Why is this Challenging?
…#1: LD R1 mem_A …#2: ST R2 (R1) …#3: LD R5 mem[0x20] …
“Risky” Load
R1 0x10
ST R2 mem[0x10]
R1 0x20
ST R2 mem[0x20]
LD R5 mem[0x20]
Problem: Instruction #3 is not buffered!
Buffered Slice Correct ExecutionInitial Execution
Initial execution is speculative Buffered slices may not be correct
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing Wei Liu, University of Illinois 9
Solution: Run-time Address Check
…#1: LD R1 mem_A …#2: ST R2 (R1) …#3: LD R5 mem[0x20] …
“Inhibiting” Store
R1 0x10
ST R2 mem[0x10]
R1 0x20
ST R2 mem[0x20]
Buffered Slice Slice Re-ExecutionInitial Execution
Different store addresses;
And mem[0x20] was loaded in initial execution
“Risky” Load
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing Wei Liu, University of Illinois 10
Problems
Buffered Slice
Missing Insts Extra Insts Data Corruption
Slice Re-execution
Initial Speculative Execution
(Example just shown)
“Dangling” Load “Inhibiting” Load
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing Wei Liu, University of Illinois 11
A Sufficient Condition
Guarantee correct
• Slice re-execution
• State merge with program
Two sub-conditions
• Branch takes same direction in both initial execution and slice re-execution
• No presence of any previous three problems
Easy for HW to check at run-time
Details please see our paper
• (Briefly: compare addresses accessed in old and new run, also look for references to same address in first run)
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing Wei Liu, University of Illinois 12
How does ReSlice Work?
TimelineCheckpoint
Prediction
Resolution
Slice Collection
Compare the correct and predicted value
Correct prediction
Slice Re-Execution
Misprediction
IncorrectState Merging
Correct
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing Wei Liu, University of Illinois 13
Architecture Design
CPU
Live Ins
Buffered Instr.
Re-Execution Unit (REU)
Slice DescriptorSlice
DescriptorSlice
Descriptors
Slice Info
Slice Info
Slice Buffer
CacheRegister State Memory
State
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing Wei Liu, University of Illinois 14
Other Architectural Requirements
Tag physical registers with “SliceTag”
Tag Cache for storing addresses accessed by slice instructions
Undo Log records first store to each memory location in each Slice
High-coverage Predictor to predict which instructions to use as seeds.
• Uses Dependence and Value Predictor used by TLS
Uses TLS information to identify Speculatively Read/Written locations in cache
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing Wei Liu, University of Illinois 15
Step 1: Slice Collection
• Fill up the Slice Buffer when a prediction is made• Track both register and memory dependences
• Save live-in operands and slice instructions
• Slices are buffered when instructions are retired
CPU
Live Ins
Buffered Instr.
Re-Execution Unit (REU)
Slice Descriptor
Slice Info
Slice Info
Slice Buffer
CacheRegister State Memory
State
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing Wei Liu, University of Illinois 16
An Example
Slice Desc.
Live Ins Buffered Insts
ST R3, mem[R4]R4 = 0x10
ADD R3, R1, R2
R4 = 0x10
ST R3, mem[R4]
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing Wei Liu, University of Illinois 17
Step 2: Slice Re-Execution
• REU takes over after a misprediction• In-order execution
• Sufficient condition is checked during slice re-execution
• If success, merge the register and memory state; otherwise, squash the task and restart
CPU
Live Ins
Buffered Instr.
Re-Execution Unit (REU)
Slice Descriptor
Slice Info
Slice Info
Slice Buffer
CacheRegister State Memory
State
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing Wei Liu, University of Illinois 18
Step 3: State Merging
• Copy live registers back to CPU register file
• Merge memory state (details in paper)
CPU
Live Ins
Buffered Instr.
Re-Execution Unit (REU)
Slice Descriptor
Slice Info
Slice Info
Slice Buffer
CacheRegister State Memory
State
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing Wei Liu, University of Illinois 19
Multiple Overlapping Slices
One slice may change live-ins of the other slice
Detect overlapping slices during slice collection
Combine the instructions of the overlapping slices and re-execute them in program order
misprediction misprediction
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing Wei Liu, University of Illinois 20
Outline
Motivation and Contributions
Idea of ReSlice
Architecture Design
Experimental Results
Conclusions
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing Wei Liu, University of Illinois 21
Methodology
Simulated CMP architecture
• 5 GHz
• Private 16k L1 per core; 1 MB Shared L2
SpecInt 2000 applications
TLS+ReSlice
Four 3-issue cores with TLS and ReSlice
Baseline: TLS
Four 3-issue cores with TLS
Serial
One 3-issue core
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing Wei Liu, University of Illinois 22
Slice Characterization (Averages)
10.4 instructions
4.5 Reg live-ins
1.0 Mem live-ins
2.2 Reg live-outs
1.9 Mem live-outs
1.6 slices per task
15% tasks with overlapping slices
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing Wei Liu, University of Illinois 23
Number of Task Squashes
0
0.2
0.4
0.6
0.8
1
bzip2 crafty gap gzip mcf parser twolf vortex vpr A.Mean
# o
f sq
ua
she
s (%
)
TLS TLS+ReSlice
61% squashes are avoided because of ReSlice
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing Wei Liu, University of Illinois 24
Performance
0
0.2
0.4
0.6
0.8
1
1.2
1.4
bzip2 crafty gap gzip mcf parser twolf vortex vpr G.Mean
Sp
ee
du
p
Serial TLS TLS+ReSlice
Speedup TLS+ReSlice
• Over TLS: up to 33% (avg 12%)
• Over Serial: avg 45%
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing Wei Liu, University of Illinois 25
Energy × Delay2
0
0.2
0.4
0.6
0.8
1
1.2
bzip2 crafty gap gzip mcf parser twolf vortex vpr G.Mean
Nor
mal
ized
E*D
*D
TLS TLS+ReSlice
E×D2 reduction: 20% over TLS
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing Wei Liu, University of Illinois 26
Conclusions
Generic architecture for forward slice re-execution
Sufficient condition for correct re-execution and merge
Improve TLS on SpecInt
• Speedups: up to 33% over TLS (avg 12%)
• E×D2 reduction: avg 20% over TLS
Recovering wasted work is a promising approach
• Boost performance
• Energy efficiency