Upload
horace-hamilton
View
219
Download
3
Embed Size (px)
Citation preview
Korea Univ
B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors
컴퓨터 ·전파통신공학과2015020802
최병준
1
Computer Engineering and Systems Group (CESG),Department of Electrical and Computer Engineering,Texas A&M University
-> Reena Panda , Paul V. Gratz
Department of Computer Science,The University of Texas at San Antonio
-> Daniel A. Jim´enez
Korea Univ
1. Introduction
• Modern computer architecture is beset bytwo opposing, conflicting trends while technology scaling and deep pipelining
have led to high processor frequencies,the memory access speed has not scaled ac-cordingly
Meanwhile, power and energy considerations have revived interest in in-order processors
2
Korea Univ
1. Introduction
• In this paper we presenta novel data cache prefetch scheme,leveraging both execution path specula-tionas well as effective address speculation,to efficiently improve performance ofin-order processors.
• Much prior work focuses on reducing the impact of the memory-wall on processor performance
3
Korea Univ
1. Introduction
• In this paper we proposea light-weight prefetcher ‘B-Fetch’,a combined control-flow and effective ad-dress speculating prefetching scheme.
• B-Fetch leveragesthe high prediction accuracies ofcurrent-generation branch predictors,combined with novel effective address speculation.
4
Korea Univ
2. Background
• To be effective at masking such high laten-cies,a prefetcher must anticipate misses and is-sue prefetches significantly ahead of actual execution
• This requires accurate prediction of ~
1. the likely memory instructions to be executed 2. the likely effective addresses of these instruc-tions
5
Korea Univ
2. Background
• Program execution path is determined bythe direction taken by the component con-trol instructions
• The memory access behavior can therefore be linked to the prior control flow behavior
6
Korea Univ
2. Background
7
• The program execution path is determined by direction taken by the relevant control instruc-tion.
• Memory access behavior can therefore be linked to prior control flow be-havior
Korea Univ
3. Proposed Design
• Pipeline Overview
1. Branch Lookahead2. Register-Table Lookup3. Prefetch Issue
9
Korea Univ
3. Proposed Design
• System Components Path Confidence Estimator Branch Trace Cache Branch-Register Table Prefetch Filtering
10
Korea Univ
3. Proposed Design
• System Components Path Confidence Estimator Branch Trace Cache
Branch-Register Table Prefetch Filtering
11
Korea Univ
3. Proposed Design
• System Components Path Confidence Estimator Branch Trace Cache Branch-Register Table
Prefetch Filtering
12
Korea Univ
3. Proposed Design
• System Components
Path Confidence Estimator Branch Trace Cache Branch-Register Table Prefetch Filtering
13
Korea Univ
3. Proposed Design
• Hardware Cost
The table shows B-Fetchrequires ∼33% of thetable state requiredby SMS
14
Korea Univ
4. Evaluation
• Methodology We evaluate our prefetcher in a simulation en-
vironment based on the M5 Simulator.The simulator is used to modela 1-wide, 5-stage in-order pipeline
The test workload consists of the 18 SPEC CPU2006 benchmarks that our simulation in-frastructure supports, compiled for the ALPHA ISA
15
Korea Univ
4. Evaluation
• Prefetcher Performance Figure 5 contains the
IPC for the simpleStride, SMS and Bfetch prefetchersnormalized against baseline
B-Fetch provides performance benefits across a range of applications, both integer and floating point
16
Korea Univ
4. Evaluation
• Prefetcher Performance The results show the B-Fetch prefetcher provides
a mean speedup of 39% (62%) acrossall (prefetch sensitive) benchmarks.
As compared to a stride prefetcher,B-Fetch improves the performance by over 25% (39.4%).
Compared against SMS, B-Fetch improves the perfor-mance by 2.2% (3.4%), at the cost of ∼ 1/3 the overhead in storage
17
Korea Univ
5. Conclusions
• B-Fetch not only predicts effective addresses which display regular-access patterns,but also can take advantage of the dynamic values of the registers at runtime to predict irregular and isolated data accesses.
• The focus of this paper has been improvingin-order processor performance,the B-Fetch scheme should perform compa-rably on superscalar processors,we plan to explore this in future work.
18