View
67
Download
1
Category
Preview:
DESCRIPTION
Trace Caches. Michele Co CS 451. Motivation. High performance superscalar processors High instruction throughput Exploit ILP Wider dispatch and issue paths Execution units designed for high parallelism Many functional units Large issue buffers Many physical registers - PowerPoint PPT Presentation
Citation preview
Page 1
Trace Caches
Michele Co
CS 451
Page 2
Motivation
High performance superscalar processorsHigh instruction throughputExploit ILP
–Wider dispatch and issue pathsExecution units designed for high parallelism
–Many functional units
–Large issue buffers
–Many physical registers Fetch bandwidth becomes performance bottleneck
Page 3
Fetch Performance Limiters
Cache hit rate Branch prediction accuracy Branch throughput
Need to predict more than one branch per cycle
Non-contiguous instruction alignment Fetch unit latency
Page 4
Problems with Traditional Instruction Cache
Contain instructions in compiled orderWorks well for sequential code with little branching, or code
with large basic blocks
Page 5
Suggested Solutions
Multiple branch target address prediction Branch address cache
(1993, Yeh, Marr, Patt)
–Provides quick access to multiple target addresses
–Disadvantages• Complex alignment
network, additional latency
Page 6
Suggested Solutions (cont’d)
Collapsing buffer Multiple accesses to btb
(1995, Conte, Mills, Menezes, Patel)–Allows fetching non-
adjacent cache lines–Disadvantages
• Bank conflicts• Poor scalability for
interblock branches• Significant logic added
before and after instruction cache
Fill unit Caches RISC-like
instructions derived from CISC instruction stream
(1988, Melvin, Shebanow, Patt)
Page 7
Problems with Prior Approaches
Need to generate pointers for all noncontiguous instruction blocks BEFORE fetching can beginExtra stages, additional latencyComplex alignment network necessary
Multiple simultaneous access to instruction cacheMultiporting is expensive
SequencingAdditional stages, additional latency
Page 8
Potential Solution – Trace Cache Rotenberg, Bennett, Smith (1996) Advantages
Caches dynamic instruction sequences
–Fetches past multiple branchesNo additional fetch unit latency
DisadvantagesRedundant instruction storage
–Between trace cache and instruction cache
–Within trace cache
Page 9
Trace Cache Details
TraceSequence of instructions potentially containing branches and
their targetsTerminate on branches with indeterminate number of targets
–Returns, indirect jumps, traps Trace identifier
Start address + branch outcomes Trace cache line
Valid bitTagBranch flagsBranch maskTrace fall-through addressTrace target address
Page 10
Page 11
Next Trace Prediction (NTP)
History register Correlating table
Complex history indexing
Secondary Table Indexed by most recently
committed trace ID
Index generating function
Page 12
NTP Index Generation
Page 13
Return History Stack
Page 14
Trace Cache vs. Existing Techniques
Page 15
Trace Cache Optimizations
PerformancePartial matching [Friendly, Patel, Patt (1997)] Inactive issue [Friendly, Patel, Patt (1997)]Trace preconstruction [Jacobson, Smith (2000)]
PowerSequential access trace cache [Hu, et al., (2002)]Dynamic direction prediction based trace cache [Hu, et al.,
(2003)]Micro-operation cache [Solomon, et al., 2003]
Page 16
Trace Processors
Trace Processor Architecture Processing elements (PE)
–Trace-sized instruction buffer–Multiple dedicated functional units–Local register file–Copy of global register file
Use hierarchy to distribute execution resources
Addresses superscalar processor issues Complexity
–Simplified multiple branch prediction (next trace prediction)–Elimination of local dependence checking (local register file)–Decentralized instruction issue and result bypass logic
Architectural limitations–Reduced bandwidth pressure on global register file (local register
files)
Page 17
Trace Processor
Page 18
Trace Cache Variations
Block-based trace cache (BBTC)Black, Rychlik, Shen (1999)Less storage capacity needed
Page 19
Trace Table: BBTC Trace Prediction
Page 20
Block Cache
Page 21
Rename Table
Page 22
BBTC Optimization
Completion time multiple branch prediction (Rakvic, et al., 2000) Improvement over trace table predictions
Page 23
Tree-based Multiple Branch Prediction
Page 24
Tree-PHT
Page 25
Tree-PHT Update
Page 26
Trace Cache Variations (cont’d)
Software trace cacheRamirez, Larriba-Pey, Navarro, Torrellas (1999)Profile-directed code reordering to maximize sequentiality
–Convert taken branches to not-taken
–Move unused basic blocks out of execution path
–Inline frequent basic blocks
–Map most popular traces to reserved area of i-cache
Recommended