Trace Caches

Michele Co

CS 451

Motivation

High performance superscalar processorsHigh instruction throughputExploit ILP

–Wider dispatch and issue pathsExecution units designed for high parallelism

–Many functional units

–Large issue buffers

–Many physical registers Fetch bandwidth becomes performance bottleneck

Fetch Performance Limiters

Cache hit rate Branch prediction accuracy Branch throughput

Need to predict more than one branch per cycle

Non-contiguous instruction alignment Fetch unit latency

Problems with Traditional Instruction Cache

Contain instructions in compiled orderWorks well for sequential code with little branching, or code

with large basic blocks

Suggested Solutions (cont’d)

Collapsing buffer Multiple accesses to btb

(1995, Conte, Mills, Menezes, Patel)–Allows fetching non-

adjacent cache lines–Disadvantages

• Bank conflicts• Poor scalability for

interblock branches• Significant logic added

before and after instruction cache

Fill unit Caches RISC-like

instructions derived from CISC instruction stream

(1988, Melvin, Shebanow, Patt)

Problems with Prior Approaches

Need to generate pointers for all noncontiguous instruction blocks BEFORE fetching can beginExtra stages, additional latencyComplex alignment network necessary

Multiple simultaneous access to instruction cacheMultiporting is expensive

SequencingAdditional stages, additional latency

Potential Solution – Trace Cache Rotenberg, Bennett, Smith (1996) Advantages

Caches dynamic instruction sequences

–Fetches past multiple branchesNo additional fetch unit latency

DisadvantagesRedundant instruction storage

–Between trace cache and instruction cache

–Within trace cache

Trace Cache Details

TraceSequence of instructions potentially containing branches and

their targetsTerminate on branches with indeterminate number of targets

–Returns, indirect jumps, traps Trace identifier

Start address + branch outcomes Trace cache line

Valid bitTagBranch flagsBranch maskTrace fall-through addressTrace target address

Next Trace Prediction (NTP)

History register Correlating table

Complex history indexing

Secondary Table Indexed by most recently

committed trace ID

Index generating function

NTP Index Generation

Return History Stack

Trace Cache vs. Existing Techniques

Trace Cache Optimizations

PerformancePartial matching [Friendly, Patel, Patt (1997)] Inactive issue [Friendly, Patel, Patt (1997)]Trace preconstruction [Jacobson, Smith (2000)]

PowerSequential access trace cache [Hu, et al., (2002)]Dynamic direction prediction based trace cache [Hu, et al.,

(2003)]Micro-operation cache [Solomon, et al., 2003]

Trace Processors

Trace Processor Architecture Processing elements (PE)

–Trace-sized instruction buffer–Multiple dedicated functional units–Local register file–Copy of global register file

Use hierarchy to distribute execution resources

Addresses superscalar processor issues Complexity

–Simplified multiple branch prediction (next trace prediction)–Elimination of local dependence checking (local register file)–Decentralized instruction issue and result bypass logic

Architectural limitations–Reduced bandwidth pressure on global register file (local register

files)

Trace Processor

Trace Cache Variations

Block-based trace cache (BBTC)Black, Rychlik, Shen (1999)Less storage capacity needed

Trace Table: BBTC Trace Prediction

Block Cache

Rename Table

BBTC Optimization

Completion time multiple branch prediction (Rakvic, et al., 2000) Improvement over trace table predictions

Tree-based Multiple Branch Prediction

Tree-PHT

Tree-PHT Update

Trace Cache Variations (cont’d)

Software trace cacheRamirez, Larriba-Pey, Navarro, Torrellas (1999)Profile-directed code reordering to maximize sequentiality

–Convert taken branches to not-taken

–Move unused basic blocks out of execution path

–Inline frequent basic blocks

–Map most popular traces to reserved area of i-cache

Trace Caches

Documents

1 Mots caches * 2 - boutdegomme.fr · MAGIQUE . Mots caches * Mots caches * s s 3 4 . Mots caches 6 Mots caches s s 5 FAIM PREFERE GATEAU CHIEN CAUCHEMAR MANGER BABA CHOU VOMIR

Caches Master

Mots caches * 1 Mots caches * 2ekladata.com/.../13-Alim-BDG-mots-cache-s-DIF-COR.pdfMots caches * Mots caches * s s 1 2 EPOUVANTAIL CORBEAU ENORME CHANGER PRINCESSE LEGUME LAPINE POTAGER

Session Caches Overview

Caches (Writing)

Cadeaux Caches

Caches II - courses.cs.washington.edu

Laser caches

Virtual Caches

Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches

Virtual Private Caches

CIS 501 (Martin): Caches 1 CIS 501 (Martin): Caches 2milom/cis501-Fall10/lectures/... · 2010. 10. 19. · CIS 501 (Martin): Caches 1 CIS 501 Computer Architecture Unit 5: Caches

Things Caches Do

Caches microP

EECC722 - Shaaban #1 Lec # 9 Fall2001 10-10-2001 Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch

Trace Caches J. Nelson Amaral. Difficulties to Instruction Fetching Where to fetch the next instruction from? – Use branch prediction Sometimes there

Reverse proxy caches

Configuring NetFlow Aggregation Caches · Configuring NetFlow Aggregation Caches ThismodulecontainsinformationaboutandinstructionsforconfiguringNetFlowaggregationcaches.The

6.1 Caches

Caches (continued)