Upload
jade-mason
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
Increasing Cache Efficiency by Eliminating Noise
Prateek Pujara & Aneesh Aggarwal {prateek, aneesh}@binghamton.edu
http://caps.cs.binghamton.eduState University of New York, Binghamton
INTRODUCTION Caches are very important to cover the
processor-memory performance gap. Thus Caches should be utilized very efficiently.
Fetch only the useful data
Cache Utilization : Percentage of the useful words out of the total words fetched into the cache.
Utilization vs Block-Size
Larger Cache Blocks• Increase bandwidth requirement
• Reduce utilization
Smaller Cache Blocks• Reduces bandwidth requirement
• Increase utilization
Percent Cache Utilization
16KB, 4-way Set Associative Cache, 32 byte block size
0.000
10.000
20.000
30.000
40.000
50.000
60.000
70.000
80.000
90.000
100.000
Bzip2Gcc Mcf
ParserVortex
Vpr
AmmpAppluApsi
Art
EquakeMgridSwim
Wupwise
Int AverageFP Average
Average
Percentage
Percentage Utilization
Benefits of Utilization improvement
Lower energy consumptionBy avoiding wastage of energy on
useless words. Improve performance
By better utilizing the available cache space.
Reduce memory trafficBy not fetching useless words.
Our Goal
Improve Utilization Predict the to-be-referenced words Avoid cache pollution by fetching only the
predicted words
Our Contributions Illustrate high predictability of cache noise Propose efficient cache noise predictor Show potential benefits of cache noise prediction
based fetching in terms of• Cache utilization• Cache power consumption• Bandwidth requirement
Illustrate benefits of cache noise prediction for prefetching
Investigate cache noise prediction as an alternative to sub-blocking
Cache Noise Prediction
Programs repeat the pattern of memory references.
Predict cache noise based on the history of words accessed in the cache blocks.
Cache Noise Predictors1) Phase Context Predictor (PCP) Records the words usage history of the most recently evicted
cache block.
2) Memory Context Predictor (MCP) Assuming that data accessed from contiguous memory locations
will be accessed in same fashion.
3) Code Context Predictor (CCP) Assuming that instructions in a particular portion of the code will
access data in same fashion.
Cache Noise Predictors
For code context predictors • Use higher order bits of PC as context
Store the context along with the cache block.
Add 2 bit vectors for each cache block• One for identifying the valid words present
• One for storing the access pattern
Code Context Predictor (CCP)
Say PC of an instruction is 1001100100
X (100110)
1 1 0 0 1
Code Context:
Last Word Usage History
Valid-Bit
Code Context Predictor (CCP)
Say PC of an instruction is 1001100100
X (100110)
1 1 0 0
Z (xxxxxx)Y (101001)
1 0 0 1 x x x x1 1 0
Code Context:
Code Context Predictor (CCP)
Say PC of an instruction is 1001100100
X (100110)
1 1 0 0
Z (xxxxxx)Y (101001)
1 0 0 1 x x x x1 1 0
Code Context:
Miss due to PC 1001100100 Only 1st and 2nd words are broughtEvicted cache block was brought by PC 101110 and used only 1st word
Code Context Predictor (CCP)
Say PC of an instruction is 1001100100
X (100110)
1 1 0 0
Z (xxxxxx)Y (101001)
1 0 0 1 x x x x1 1 0
Code Context:
Miss due to PC 1001100100 Only 1st and 2nd words are broughtEvicted cache block was brought by PC 101110 and used only 1st word
Code Context Predictor (CCP)
Say PC of an instruction is 1001100100
X (100110)
1 1 0 0
Z (101110)Y (101001)
1 0 0 1 1 0 0 01 1 1
Code Context:
Code Context Predictor (CCP)
Say PC of an instruction is 1001100100
X (100110)
1 1 0 0
Z (101110)Y (101001)
1 0 0 1 1 0 0 01 1 1
Code Context:
Miss due to PC 1011101100Only 1st word broughtEvicted block was brought by PC 101001 and used 2nd and 4th word
Code Context Predictor (CCP)
Say PC of an instruction is 1001100100
X (100110)
1 1 0 0
Z (101110)Y (101001)
1 0 0 1 1 0 0 01 1 1
Code Context:
Miss due to PC 1011101100Only 1st word broughtEvicted block was brought by PC 101001 and used 2nd and 4th word
Code Context Predictor (CCP)
Say PC of an instruction is 1001100100
X (100110)
1 1 0 0
Z (101110)Y (101001)
0 1 0 1 1 0 0 01 1 1
Code Context:
Predictability of CCP
PCP - 56%
MCP - 67%
Predictability = Correct prediction/Total missesNo prediction almost 0%
0
10
20
30
40
50
60
70
80
90
100
Bzip2Gcc Mcf
ParserVortex
Vpr
AmmpAppluApsi
Art
EquakeMgridSwim
Wupwise
Int AverageFP Average
Average
Percentage
CCP(30bits) CCP(28bits) CCP(26bits)
Improving the Predictability
Miss Initiator Based History (MIBH)Words usage history based on the
offset of the word that initiated the miss.
ORing Previous Two Histories (OPTH)Bitwise ORing past two histories.
Predictability of CCP
The predictability of PCP and MCP was about 68% and 75% respectively using both MIBH and OPTH.
0
10
20
30
40
50
60
70
80
90
100
Bzip2Gcc Mcf
ParserVortex
Vpr
AmmpAppluApsi
Art
EquakeMgridSwim
Wupwise
Int AverageFP Average
Average
Percentage
CCP(30bits) – MIBH CCP(28bits) – MIBH
CCP Implementation
wordshistory
wordshistory
usageusagecontext
valid-bit valid-bitMIWOMIWO
MIWO -- Miss Initiator Word Offset
CCP Implementation
read/write portread/write port
wordshistory
wordshistory
usageusagecontext
valid-bit valid-bitMIWOMIWO
broadcast tag
MIWO -- Miss Initiator Word Offset
CCP Implementation
read/write portread/write port
wordshistory
wordshistory
usageusagecontext
valid-bit valid-bitMIWOMIWO
broadcast tag
MIWO -- Miss Initiator Word Offset
== =
CCP Implementation
read/write portread/write port
wordshistory
wordshistory
usageusagecontext
valid-bit valid-bitMIWOMIWO
broadcast tag
MIWO -- Miss Initiator Word Offset
== =
CCP Implementation
read/write portread/write port
wordshistory
wordshistory
usageusagecontext
valid-bit valid-bitMIWOMIWO
broadcast tag
MIWO -- Miss Initiator Word Offset
== =
CCP Implementation
read/write portread/write port
wordshistory
wordshistory
usageusagecontext
valid-bit valid-bitMIWOMIWO
broadcast tag
MIWO -- Miss Initiator Word Offset
== =
CCP Implementation
read/write portread/write port
wordshistory
wordshistory
usageusagecontext
valid-bit valid-bitMIWOMIWO
broadcast tag
MIWO -- Miss Initiator Word Offset
== =
CCP Implementation
read/write portread/write port
wordshistory
wordshistory
usageusagecontext
valid-bit valid-bitMIWOMIWO
broadcast tag
MIWO -- Miss Initiator Word Offset
== =
CCP Implementation
read/write portread/write port
wordshistory
wordshistory
usageusagecontext
valid-bit valid-bitMIWOMIWO
broadcast tag
MIWO -- Miss Initiator Word Offset
== =
Experimental Setup Applied noise prediction to L1 data cache L1 Dcache of 16KB 4-way associative 32byte
block size Unified L2 cache of 512KB 8-way associative
64 byte block size L1 Icache of 16KB direct mapped ROB - 256 instructions LSB - 64 entries Issue Queue - 96 Int/64 FP
Prediction Accuracies with 32/4, 16/8 & 16/4 CCP
0
10
20
30
40
50
60
70
80
90
100
Int Avg FP AvgAverage
Percentage
Correct Prediction Misprediction No Prediction32
/4
16/8
16/4
RESULTS
0
1
2
3
4
5
6
7
8
9
Bandwidth
BASE CCP
0
10
20
30
40
50
60
70
80
90
Utilization
BASE CCP
0
0.2
0.4
0.6
0.8
1
1.2
IPC
BASE CCP
9
9.2
9.4
9.6
9.8
Miss Rate
BASE CCP
Percentage Dynamic Energy Savings
-25
-15
-5
5
15
25
35
45
55
65
Bzip2Gcc Mcf
ParserVortexVpr
AmmpAppluApsi Art
EquakeMgridSwim
WupwiseInt AverageFP Average
Average
Percentage Savings
Energy Savings
Prefetching
Processors employ prefetching to improve the cache miss rate.
• Fetch the next cache block on a miss to exploit spatial locality.
The prefetched cache block is predicted to have the same pattern as that of the currently fetched block.
Prefetching Prefetched cache block updates the
predictor table when evicted. Prefetched cache block is stored without any
context information Whenever it is accessed for the first time, the
context and the offset information is stored Prefetched block does not update the
predictor table when evicted.
Prediction Accuracy with Prefetching
Energy consumptionreduced by about 22%
Utilization increasedby about 70%
Miss Rate increasedonly by about 2%
0
10
20
30
40
50
60
70
80
90
100
Int Avg FP AvgAverage
Correct Prediction Misprediction No Prediction
No
Pre
fetc
hing
No
Upd
ate
Upd
ate
Sub-blocking
Sub-blocking is used to• Reduce cache noise
• Reduce bandwidth requirement
Limitations of sub-blocking• Increased miss rate
Can we use cache noise prediction as an alternative to sub-blocking?
Cache Noise Prediction vs Sub-blocking
0
2
4
6
8
10
12
14
16
18
20
Miss Rate
Sub-block CCP
64
66
68
70
72
74
76
78
Utilization
Sub-block CCP
0
5
10
15
20
25
30
35
Energy Savings
Sub-block CCP
0
1
2
3
4
5
6
7
Bandwidth
Sub-block CCP
Conclusion Cache noise is highly predictable. Proposed cache noise predictors.
• CCP achieves 75% prediction rate with correct prediction of 97% using a small 16 entry table.
Prediction without impact on IPC and minimal impact (0.1%) on miss rate.
Very effective with prefetching. Compared to sub-blocking cache noise
prediction based fetching improves • Miss rate by 97% and Utilization by 10%