35
Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance Universidade da Coruña Basilio B. Fraguela Ramón Doallo Depto. de Electrónica e Sistemas Emilio L. Zapata Depto. de Arquitectura de Computadores Universidad de Málaga

Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

  • Upload
    aneko

  • View
    51

  • Download
    1

Embed Size (px)

DESCRIPTION

Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance. Emilio L. Zapata Depto. de Arquitectura de Computadores. Basilio B. Fraguela Ramón Doallo Depto. de Electrónica e Sistemas. Universidade da Coruña. Universidad de Málaga. Introduction. - PowerPoint PPT Presentation

Citation preview

Page 1: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

Probabilistic Miss Equations: Evaluating

MemoryHierarchy Performance

Universidade da Coruña

Basilio B. Fraguela

Ramón Doallo

Depto. de Electrónica e Sistemas

Emilio L. Zapata

Depto. de Arquitectura

de Computadores

Universidad de Málaga

Page 2: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

2

Introduction• Increasing gap between processor and memory speeds: bottleneck for systems performance

• Approaches to study cache behavior:– Trace-driven simulations: slow, not flexible.

– Built-in hardware counters: not flexible, no portability.

– Modeling: quick, flexible, little precision. Many models require a trace to extract some input parameters.

• We present a systematic modeling strategy that allows a fast analysis that provides good levels of accuracy.

• Supports set associative caches with LRU replacement.

Page 3: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

3

Misses Nature• Kinds of misses

– Intrinsic/compulsory: first reference to a line• There will be one per each different line accessed

– Interference: a non-first reference to a line misses• Each attempt to reuse a line will result in a miss with a given miss probability

• This probability depends on the impact on the cache of the memory regions accessed since the last access to the line.

• The portion of code executed between the last access to the line and the new access is called the reuse distance

• A line may have several reuse distances• Each reuse distance has a miss probability, estimated from the memory regions accessed during it.

Page 4: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

4

Miss Estimation

• The misses generated by a reference in a loop may be estimated by a formula that contains– The number of different lines it accesses– The number of line reuses it gives place to per possible reuse distance

– The miss probability for each of the reuse distances

• Fourth factor (external): miss probability in the first access to each line by the reference

• The formula is different for each nesting level (loop) enclosing the reference

Page 5: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

5

Example

• Assume 10 elements of A per cache line

DO J=1,5 DO I=1, 100 A(I) = I + J END DOEND DO

MissesAI ( p) 10 * p 90 * 0

MissesAJ ( p) 1* MissesAI (p) 4 * MissesAI (P(A(1:100)))

• Inner loop: 10 different lines, 90 (sure) reuses

• Outer loop: 1 first-time iteration, 4 reuses

Page 6: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

6

Initial Modeling Scope DO I0=1, N0, L0

DO I1=1, N1, L1 ... DO IZ=1, NZ, LZ

A(fA1(IA1), fA2(IA2), ..., fAdA(IAdA)) ... B(fB1(IB1), fB2(IB2), ..., fBdB(IBdB)) ... END DO ... C(fC1(IC1), fC2(IC2), ..., fCdC(ICdC)) ... END DOEND DO

AAAAAA ,,1,0,)( dxKIIf xxxxx

Page 7: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

7

Probabilistic Miss Equation

• Simplest PME form

FRi( p) LRiFR (i1)( p) (N i LRi)FR( i1)(PR (It(i,1)))

LRi 1N i 1

max{Ls / 'Aj ,1}

Line sizeStride of the reference in the loop

Number of different linesaccessed by R during theexecution of the loop innesting level i

where

Page 8: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

8

PMEs: Not so fast!

• The previous PME is only valid for references that carry no reuse with others

• Our model accurately takes into account the potential reuse between references in translation to build the PMEs

• Reuse among references that are not in translation is not modeled currently

Page 9: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

9

Reuse Among Different Nests

• Same strategy for loops in levels up to j

• The number of misses in loops jk, k>0 is a constant– the initial miss

probability is estimated in a conservative way considering the whole execution of the preceding loop

• In loop j0 this probability is an extern parameter, except for reuse iterations

• This only works well if several conditions hold

DO Ij-1=1, Nj-1, Sj-1

DO Ij0=1, Nj0, Sj0

... A(fA01(IA01), ..., fA0dA(IA0dA)) ... END DO ... DO Ij1=1, Nj1, Sj1

... A(fA11(IA11), ..., fA1dA(IA1dA)) ... END DO ... ... ... ... ... ... ... DO Ijn=1, Njn, Sjn

... A(fAn1(IAn1), ..., fAndA(IAndA)) ... END DOEND DO

Page 10: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

10

“In a K-way set with LRU replacement policy, a given line is replaced when K or more different lines mapped to its same cache set have been referenced since its last access”

Miss Probabilities: Basics (I)

• Miss probability depends on the impact on the cache of the memory regions accessed since the last access to the line to reuse

Page 11: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

11

Miss Probability: Basics (II)

• Miss probability = probability K or more lines have been mapped to the sets of the lines to reuse during the reuse distance

• Notice that the ratio of cache sets that have X lines is also the probability a given cache set has X lines

• We need a way to represent the distribution of the number of lines assigned to each set during the reuse distance

Page 12: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

12

Area Vectors

• Associated to data structure V we have area vector

• The area vector for each data structure is calculated separately as a functions of its access pattern

KVVVV SSSS 10

lines. received have that sets

portion theis 0 and V, from lines moreor

received have that sets of portion theis where0

iK

Ki, S

KS

iV

V

Page 13: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

13

Area Vector Example

Page 14: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

14

Miss Probability Computation

Page 15: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

15

Interference Area Vectors Calculation

• The references are analyzed in each nesting level i to count the number of points accessed in each dimension d (Nrid) and the distance between each two of them (Lrid)

• The region accessed may be described as the tuple

• In general this region describes an area with the shape of either a sequential access or an access to groups of consecutive elements separated by a constant stride.

• These two accesses (and others) have been modeled for the calculation of their corresponding interference area vectors

)),(,),,(),,(( 2211 AA dRiRidRiRiRiRiRi LNLNLN R

Page 16: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

16

Region Examples

Page 17: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

17

Area vectors union• As independent probabilities:

KiSS

)S(S

SSSS

SSSS

jiKj

ij

K

K

V

K

ijUi

K

j

K-j

iVU

VVVV

UUUU

0,)SS(

)SS(

:as defined is SS addition, their from

resulting vector area the, and

vectorsarea Given two

)(

10

10

VU

0 00VU

VU

Page 18: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

18

Consideration of the Relative Positions

• For each pair of data structures A and B, their overlapping coefficient Sol(A,B) is calculated: portion of cache sets that may contain lines belonging to both structures.

• Before adding the interference area vector generated by one of them in order to calculate the miss probability in the accesses to the other one, it is scaled using this factor

• If both references are sequential and are in translation (their indices only differ in added constants), a simple algorithm with total precision is applied

Page 19: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

19

Memory Performance Analysis Tool: MEPAT

• The model was integrated in Polaris– FORTRAN codes with references with affine indexes can be analyzed

– Predicts the behavior of caches with an artitrary size, line size and associativity

– Complemented with the Delphi CPU model– Optimization module: optimal tile size selection

Page 20: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

20

MEPAT Structure

Page 21: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

21

Validation with SPECfp95 and Perfect

Benchmarks

Page 22: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

22

Prediction vs Measurementin SPECfp95

Page 23: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

23

Prediction vs Measurement

in Perf. Bench.

Page 24: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

24

Typical Miss Ratio Errors

Page 25: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

25

Modeling Times(O200, R10000 180MHz)

Aditional 0.2 to 2.5 seconds for syntactical analysis, etc.

Page 26: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

26

Prediction vs HW Counters for Blocked Matrix Product (I)

Page 27: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

27

Prediction vs HW Counters for Blocked Matrix Product (II)

Page 28: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

28

Optimal Tile Size Search in the SPECfp95 and Perf. Bench. Codes

• Uses memory model + Delphi CPU model• Environment: Origin 200 with R10000

– Processor parameters known for Delphi– Good compiler: MIPSpro 7.3.1.1m

• Objective: generate code faster than that of the production compiler replacing the tile sizes it has chosen by those proposed by the model

Page 29: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

29

Problems in the Experiment

• In these codes only the last row and/or column is reused, rather than the whole tile

• Base addresses of the data structures are not available

• The data sets of several of the codes fit in the second level cache (1 MB)

• The execution time of some of the loops modified is too small to be meaningful, so the whole application was measured

Page 30: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

30

MEPAT vs MIPSpro

Page 31: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

31

Related Work

• Other models:– Ghosh and col.: CME (Cache Miss Equations), linear Diophantine equations.

– Vera and Xue: statistical sample of CMEs– Chatterjee and col: Presburger formulae– Harper and col.: cache footprints

• Other prediction tools:– Delphi– SPLAT

Page 32: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

32

Conclusions

• General strategy for the modeling of the memory hierarchy behavior

• Good precision: average miss ratio prediction error about 0.1%

• Very fast: milliseconds for SPECfp95 codes

• Complemented with CPU model to predict real execution times

• Competitive with a good production compiler

Page 33: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

33

Current / Future Work

• Trying/optimizing in different platforms– successful Pentium IV experiments

• Extension to model codes with conditionals

• Further extension to model indirections– SIGMETRICS’98, Europar’98

Page 34: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

34

Application to tile size selection: Pentium 4 @ 2GHz

Page 35: Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

35

Application to tile size selection:

Itanium 2 @ 1.5GHz