Summary - Adaptive Insertion Policies for High Performance Caching. Qureshi, et al

Preview:

Citation preview

Adaptive Insertion Policies for High Performance CachingQureshi, et al.

EECE527 - Paper SummaryJose Pinilla

Cache Replacement Policies

● Victim Selection Policy○ LRU

● Insertion Policy○ MRU○ LRU

LRU (Baseline)LRU replacement (commonly used):

Belady’s OPT

Optimal page replacement algorithm (Changes Victim Selection Policy):

LRU replacement (commonly used):

LIP (LRU Insertion Policy)

LIP (LRU Insertion Policy)

LRU replacement (commonly used):

7 7

0

7

1

0

2

1

0

3

1

0

4

1

0

2

1

0

3

1

0

3

2

0

3

2

1

0

2

1

0

7

1

Belady’s OPT

LIP (LRU Insertion Policy)

LRU replacement (commonly used):

7 7

0

7

1

0

2

1

0

3

1

0

4

1

0

2

1

0

3

1

0

3

2

0

3

2

1

0

2

1

0

7

1

Cyclic Reference Model

for j = 1 to Ninstructions read (a1...aT)

for j = 1 to Ninstructions read (b1...bT)

Let there be an access pattern in which (a1 · · · aT)N is followed by (b1 · · · bT)N

Cache Size K (K < T)

N >> T N >> K/ϵ

Access Pattern: LRU Step 1

a1

a2

a3

aT

K

TN

Access Pattern: LRU Step 2

a1

a2

a3

aT

K

TN

Access Pattern: LRU Step X

a1

a2

a3

aT

K

TN

Access Pattern: LRU Step X>T*N

a1

a2

a3

aT

TN

b1

b2

b3

bT

KTN

T

Access Pattern: LIP Step 1

a1

a2

a3

aT

TN

b1

b2

b3

bT

N

K

T

Access Pattern: LIP Step 2

a1

a2

a3

aT

TN

b1

b2

b3

bT

N

K-1

T

Access Pattern: LIP Step X>T*N

a1

a2

a3

aT

TN

b1

b2

b3

bT

N

K-1

Bimodal InsertionControl the percentage of incoming lines placed as MRU

ϵ = Bimodal throttle parameterϵ=1 => LRUϵ=0 => LIP

T

Access Pattern: BIP

a1

a2

a3

aT

TN

b1

b2

b3

bT

N

K-1

T

Access Pattern: BIP

a1

a2

a3

aT

TN

b1

b2

b3

bT

N

T

Access Pattern: BIP

a1

a2

a3

aT

TN

b1

b2

b3

bT

N

T

Access Pattern: BIP

a1

a2

a3

aT

TN

b1

b2

b3

bT

N

Hit Rate

Cache Size K (K < T)

ϵ = Bimodal throttle parameterϵ=1 => LRUϵ=0 => LIP

N >> T N >> K/ϵ

Benchmarksmcf art

health

250M instructions obtained with SimPoint

Results 1

So they proved that it works…

Results 1

So they proved that it works…...but don’t over do it (ϵ)...

Results 1

So they proved that it works…...but don’t over do it (ϵ)...

...actually, let’s choose LRU on run-time sometimes.

DIP: Select MechanismDIP - Global / DSS DIP - Set Dueling

ATD: Auxiliary Tag Directory

MTD: Main Tag Directory

DIP: Select MechanismDIP - Global / DSS DIP - Set Dueling

Dedicated-SetSelection

Policy

Staticor

Dynamic(+2 bits)

DIP: Select MechanismDIP - Global / DSS DIP - Set Dueling

Dedicated-SetSize

SelectionPolicy

Run-time adaptation: PSEL values

PSEL>=512 then LIP PSEL<512 then LRU

Hardware advantages● LIP, BIP and DIP similar to current LRU approximations

● DIP does not require extra bits in the tag-store entry

● No major logic overhead means the cache access time is unaffected

Related Work

R: Random, N: Random from the less recent half, F: Frequently

● Bypass

● Early Eviction

● Dynamic Exclusion

Remarks

Retain some fraction of the working set

Dynamically adapt to workloads and patterns

Low overhead (Set dueling)

Questions?

Questions?

What would be the behaviour if DIP used ATDs dedicated to LRU and LIP?

● Compare Amean

Dynamic ϵ● Can ϵ be extracted from PSEL?

References“Cache Replacement with Dynamic Exclusion”. Scott McFarling

“Set-Dueling-Controlled Adaptive Insertion for High-Performance Caching”. Qureshi et al.

“Using SimPoint for Accurate and Efficient Simulation”. Perelman et al.

“Adaptive Caching for High-Performance Memory Systems”. PhD Dissertation. Qureshi et al.

McFarling: Conflict Between Loops

for i = 1 to 10for j = 1 to 10

instruction afor j = 1 to 10

instruction b

*(a10b10)10 = 0%

(amah9bmbh

9)10 = 10%

* ignoring loop

Source: “Cache Replacement with Dynamic Exclusion”. Scott McFarling

McFarling: Conflict Between Loops Levels

for i = 1 to 10for j = 1 to 10

instruction ainstruction b

Direct-mapped(amah

9bm)10 = 18%

Optimalamah

9bm(ah10bm)9 = 10%

Source: “Cache Replacement with Dynamic Exclusion”. Scott McFarling

McFarling: Conflict within Loops

for i = 1 to 10instruction ainstruction b

Direct-mapped(ambm)10 = 100%

Optimalambm(ahbm)9 = 55%

Source: “Cache Replacement with Dynamic Exclusion”. Scott McFarling

Recommended