Upload
abraham-martin
View
212
Download
0
Embed Size (px)
Citation preview
1
A Few Subtle Insights About UCP
Moinuddin K. Qureshi
Work on UCP done while at:
2
First Things First
I thank Xing and Rajeev for:
1. Validating that UCP (based on misses) works2. Re-Validating that UCP (based on IPC) is slightly better
than one based on misses: ~1%-3%
As mentioned, this is not the 1st (or 2nd, or 3rd, or 4th …) paper to provide this insight
3
Critique 1: UCP(MPKI) = UCP
Consider two apps, A and B, with identical miss rate curves
Num Ways in 4-way Cache0 1 2 3 4
MPK
I
10
5
4
3
21UCP(MPKI) gives 2 ways to both: A&B
A & B both access cache 1 per 100 inst, Cache Hit: 1 Cycle, Memory: 100 cycles
A has 99 integer ops (1 cycle each): CPI_A = (99+1+ MissRatePerc)/100
B has 99 FP ops (10 cycles each): CPI_B = (990+1+ MissRatePerc)/100
UCP(MPKC) 4 ways to A: IPC_best, WS_best
UCP(MICRO’06) optimizes perf more than UCP(MPKI)
Num Ways in 4-way Cache0 1 2 3 4
IPC
1.0
0.5
A
B
4
Critique 2: Dynamic can beat Static Optima
5
Critique 3: Not all Misses are Created Equal
CPI
MPK
I
Problem with Linear CPI Model of Xing
6
UCP: The last 4.5 years …
Things I would have liked to see in literature: 1. Non-Integer Way Partition2. Utility Based Cache Insertion 3. Prefetch Aware Cache Partition
7
Extension 1: Probabilistic Way Partition
Common criticism of way partitioning: We can only allocateInteger number of ways
A simple way to avoid this is Probabilistic Way Partition.
Say you want to allocate 3.5 ways to application A
Then on a cache miss, consult a Rand number generator If Randval > 50% of Randmax, then A gets 4 ways, else 3 ways
On average, A will end up getting 3.5 ways in the cache
Can go finer, say we want to allocate 4.125 ways to B
8
Extension 2: Utility Based Cache Insertion
One can achieve the effect of partitioning by intelligent insertion
In a 16-way cache, a given application A can insert at 16 locations
If N applications share the cache the decision space is 16N
An efficient hardware scheme that obtains the best decision in this decision space will outperform both UCP and TADIP
9
Extension 3: Prefetch Aware Partitioning
How does one do partitioning under prefetching ?
For applications whose dataset is prefetchable, we mayNot want to give cache space (even if it has high utility)
In-fact sometimes it’s a win-win to give more cache to irregularApps, as it provides more bandwidth available for prefetching
What is the right way to extend UCP to prefetches ?
10
Summary
UCP: Partitioning based on misses works (simple)
Several work has shown UCP based on IPC works slightly better
There are several extensions of UCP still unexplored:-- Let me know if you are interested in exploring
questions/comments: [email protected]