26
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer Engineering North Carolina State University HPCA-2005

Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

Embed Size (px)

Citation preview

Page 1: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

Predicting Inter-Thread Cache Contention on a Chip

Multi-Processor Architecture

Dhruba Chandra Fei Guo Seongbeom Kim

Yan Solihin

Electrical and Computer EngineeringNorth Carolina State University

HPCA-2005

Page 2: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

2Chandra, Guo, Kim, Solihin - Contention Model

L2 $

Cache Sharing in CMP

L1 $

……

Processor Core 1 Processor Core 2

L1 $

Page 3: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

3Chandra, Guo, Kim, Solihin - Contention Model

Impact of Cache Space Contention

0%50%

100%150%200%250%300%350%400%

Alo

ne

mcf

+art

mcf

+sw

im

mcf

+mst

mcf

+gzi

p

L2

Cac

he M

isse

s

Application-specific (what) Coschedule-specific (when) Significant: Up to 4X cache misses, 65% IPC reduction

Need a model to understand cache sharing impact

0%

20%

40%

60%

80%

100%

Alo

ne

mcf

+art

mcf

+sw

im

mcf

+mst

mcf

+gzi

p

mcf

's N

orm

aliz

ed I

PC

Page 4: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

4Chandra, Guo, Kim, Solihin - Contention Model

Related Work Uniprocessor miss estimation:

Cascaval et al., LCPC 1999 Chatterjee et al., PLDI 2001

Fraguela et al., PACT 1999 Ghosh et al., TPLS 1999J. Lee at al., HPCA 2001 Vera and Xue, HPCA 2002Wassermann et al., SC 1997

Context switch impact on time-shared processor: Agarwal, ACM Trans. On Computer Systems, 1989Suh et al., ICS 2001

No model for cache sharing impact: Relatively new phenomenon: SMT, CMP Many possible access interleaving scenarios

Page 5: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

5Chandra, Guo, Kim, Solihin - Contention Model

Contributions Inter-Thread cache contention models

2 Heuristics models (refer to the paper) 1 Analytical model

Input: circular sequence profiling for each thread Output: Predicted num cache misses per thread in a co-schedule

Validation Against a detailed CMP simulator 3.9% average error for the analytical model

Insight Temporal reuse patterns impact of cache sharing

Page 6: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

6Chandra, Guo, Kim, Solihin - Contention Model

Outline Model Assumptions Definitions Inductive Probability Model Validation Case Study Conclusions

Page 7: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

7Chandra, Guo, Kim, Solihin - Contention Model

Outline Model Assumptions Definitions Inductive Probability Model Validation Case Study Conclusions

Page 8: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

8Chandra, Guo, Kim, Solihin - Contention Model

Assumptions One circular sequence profile per thread

Average profile yields high prediction accuracy Phase-specific profile may improve accuracy

LRU Replacement Algorithm Others are usu. LRU approximations

Threads do not share data Mostly true for serial apps Parallel apps: threads likely to be impacted uniformly

Page 9: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

9Chandra, Guo, Kim, Solihin - Contention Model

Outline Model Assumptions Definitions Inductive Probability (Prob) Model Validation Case Study Conclusions

Page 10: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

10Chandra, Guo, Kim, Solihin - Contention Model

Definitions seqX(dX,nX) = sequence of nX accesses to dX distinct

addresses by a thread X to the same cache set cseqX(dX,nX) (circular sequence) = a sequence in which

the first and the last accesses are to the same address

A B C D A E E Bcseq(4,5) cseq(1,2)

cseq(5,7)

seq(5,8)

Page 11: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

11Chandra, Guo, Kim, Solihin - Contention Model

Circular Sequence Properties Thread X runs alone in the system:

Given a circular sequence cseqX(dX,nX), the last access is a cache miss iff dX > Assoc

Thread X shares the cache with thread Y: During cseqX(dX,nX)’s lifetime if there is a sequence of

intervening accesses seqY(dY,nY), the last access of thread X is a miss iff dX+dY > Assoc

Page 12: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

12Chandra, Guo, Kim, Solihin - Contention Model

Example Assume a 4-way associative cache:

A B A

X’s circular sequence cseqX(2,3)

U V V W

Y’s intervening access sequence

lifetime

No cache sharing: A is a cache hitCache sharing: is A a cache hit or miss?

Page 13: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

13Chandra, Guo, Kim, Solihin - Contention Model

Example Assume a 4-way associative cache:

A U B V V W A

A B A

X’s circular sequence cseqX(2,3)

U V V W

Y’s intervening access sequence

A U B V V A W

Cache Hit Cache Miss

seqY(2,3) intervening in cseqX’s lifetime

seqY(3,4) intervening in cseqX’s lifetime

Page 14: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

14Chandra, Guo, Kim, Solihin - Contention Model

Outline Model Assumptions Definitions Inductive Probability Model Validation Case Study Conclusions

Page 15: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

15Chandra, Guo, Kim, Solihin - Contention Model

Inductive Probability Model For each cseqX(dX,nX) of thread X

Compute Pmiss(cseqX): the probability of the last access is a miss

Steps: Compute E(nY): expected number of intervening

accesses from thread Y during cseqX’s lifetime

For each possible dY, compute P(seq(dY, E(nY)): probability of occurrence of seq(dY, E(nY)),

If dY + dX > Assoc, add to Pmiss(cseqX)

Misses = old_misses + ∑ Pmiss(cseqX) x F(cseqX)

Page 16: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

16Chandra, Guo, Kim, Solihin - Contention Model

Computing P(seq(dY, E(nY))) Basic Idea:

P(seq(d,n)) = A * P(seq(d-1,n)) + B * P(seq(d-1,n-1)) Where A and B are transition probabilities

Detailed steps in paper

seq(d,n)

seq(d-1,n-1) seq(d,n-1)

+ 1 access to a distinct address

+ 1 access to a non-distinct address

Page 17: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

17Chandra, Guo, Kim, Solihin - Contention Model

Outline Model Assumptions Definitions Inductive Probability Model Validation Case Study Conclusions

Page 18: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

18Chandra, Guo, Kim, Solihin - Contention Model

Validation SESC simulator Detailed CMP + memory hierarchy

14 co-schedules of benchmarks (Spec2K and Olden) Co-schedule terminated when an app completes

CMP Cores

2 cores, each 4-issue dynamic. 3.2GHz

Base Memory

L1 I/D (private): each WB, 32KB, 4way, 64B line

L2 Unified (shared): WB, 512 KB, 8way, 64B line

L2 replacement: LRU

Page 19: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

19Chandra, Guo, Kim, Solihin - Contention Model

ValidationCo-schedule Actual Miss

IncreasePrediction Error

gzip

+ applu

243% -25%

11% 2%

gzip

+ apsi

180% -9%

0% 0%

mcf

+ art

296% 7%

0% 0%

mcf

+ gzip

18% 7%

102% 22%

mcf

+ swim

59% -7%

0% 0%

Error =

(PM-AM)/AM

Larger error happens when miss increase is very large Overall, the model is accurate

Page 20: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

20Chandra, Guo, Kim, Solihin - Contention Model

Other Observations Based on how vulnerable to cache sharing impact:

Highly vulnerable (mcf, gzip) Not vulnerable (art, apsi, swim) Somewhat / sometimes vulnerable (applu, equake, perlbmk,

mst)

Prediction error: Very small, except for highly vulnerable apps 3.9% (average), 25% (maximum) Also small for different cache associativities and sizes

Page 21: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

21Chandra, Guo, Kim, Solihin - Contention Model

Outline Model Assumptions Definitions Inductive Probability Model Validation Case Study Conclusions

Page 22: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

22Chandra, Guo, Kim, Solihin - Contention Model

Case Study Profile approx. by geometric progression

F(cseq(1,*)) F(cseq(2,*)) F(cseq(3,*)) … F(cseq(A,*)) …

Z Zr Zr2 … ZrA … Z = amplitude 0 < r < 1 = common ratio Larger r larger working set

Impact of interfering thread on the base thread? Fix the base thread Interfering thread: vary

Miss frequency = # misses / time Reuse frequency = # hits / time

Page 23: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

23Chandra, Guo, Kim, Solihin - Contention Model

Base Thread: r = 0.5 (Small WS)

Base thread: Not vulnerable to interfering thread’s miss frequency Vulnerable to interfering thread’s reuse frequency

1 1.5 2 2.5 3 3.5 4

Multiplying Factor

L2

Cac

he

Mis

ses

Miss Freq Reuse Freq

Page 24: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

24Chandra, Guo, Kim, Solihin - Contention Model

Base Thread: r = 0.9 (Large WS)

Base thread: Vulnerable to interfering thread’s miss and reuse frequency

1 1.5 2 2.5 3 3.5 4

Multiplying Factor

L2 C

ach

e M

isses

Miss Freq Reuse Freq

Page 25: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

25Chandra, Guo, Kim, Solihin - Contention Model

Outline Model Assumptions Definitions Inductive Probability Model Validation Case Study Conclusions

Page 26: Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer

26Chandra, Guo, Kim, Solihin - Contention Model

Conclusions New Inter-Thread cache contention models Simple to use:

Input: circular sequence profiling per thread Output: Number of misses per thread in co-schedules

Accurate 3.9% average error

Useful Temporal reuse patterns cache sharing impact

Future work: Predict and avoid problematic co-schedules Release the tool at http://www.cesr.ncsu.edu/solihin