27
ISPASS 2011 Characterizing Multi-threaded Applications based on Shared-Resource Contention Tanima Dey Wei Wang, Jack W. Davidson, Mary L. Soffa Department of Computer Science University of Virginia 1

Characterizing Multi-threaded Applications based on Shared-Resource Contention

  • Upload
    elaine

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

Characterizing Multi-threaded Applications based on Shared-Resource Contention. Tanima Dey Wei Wang, Jack W. Davidson, Mary L. Soffa Department of Computer Science University of Virginia. Motivation. The number of cores doubles every 18 months Expected: Performance number of cores - PowerPoint PPT Presentation

Citation preview

Page 1: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

ISPASS 2011

Characterizing Multi-threaded Applications based on

Shared-Resource Contention

Tanima Dey Wei Wang, Jack W. Davidson, Mary L. Soffa

Department of Computer Science

University of Virginia

1

Page 2: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

MotivationThe number of cores doubles every 18 monthsExpected: Performance number of coresOne of the bottlenecks is shared resource contention

For multi-threaded workloads, contention is unavoidable

To reduce contention, it is necessary to understand where and how the contention is created

2

Page 3: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Shared Resource Contention in Chip-Multiprocessors

Intel Quad Core Q9550

C0

C1

C2

C3

L2 L2

Memory

L1 L1L1 L1

Front -Side Bus

3

Application 1 Thread

Application 2 Thread

Page 4: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Scenario 1 Multi-threaded applicationsWith co-runner

C0

C1

C2

C3

L2 L2

Memory

L1 L1L1 L1

4

Application 1 Thread

Application 2 Thread

Page 5: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Without co-runner

C0

C1

C2

C3

L2 L2

Memory

L1 L1L1 L1

Application Thread

5

Scenario 2Multi-threaded applications

Page 6: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Shared-Resource Contention

Intra-application contentionContention among threads from the same application

(No co-runners)

Inter-application contentionContention among threads from the co-running

application

6

Page 7: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

ContributionsA general methodology to evaluate a multi-threaded

application’s performance Intra-application contention Inter-application contentionContention in the memory-hierarchy shared resources

Characterizing applications facilitates better understanding of the application’s resource sensitivity

Thorough performance analyses and characterization of multi-threaded PARSEC benchmarks 7

Page 8: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

OutlineMotivationContributionsMethodologyMeasuring intra-application contentionMeasuring inter-application contentionRelated WorkSummary

8

Page 9: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Methodology

9

Designed to measure both intra- and inter-application contention for a targeted shared resourceL1-cache, L2-cacheFront Side Bus (FSB)

Each application is run in two configurationsBaseline: threads do not share the targeted resourceContention: threads share the targeted resource

Multiple number of targeted resourceDetermine contention by comparing performance

(gathering hardware performance counters’ values)

Page 10: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

OutlineMotivationContributionsMethodologyMeasuring intra-application contention (See paper)Measuring inter-application contentionRelated WorkSummary

10

Page 11: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

L1-cache

Baseline Configuration

Contention Configuration

Measuring inter-application contention

C0

C1

C2

C3

L2 L2

Memory

L1 L1L1 L1

Application 1 Thread

Application 2 Thread

C0

C1

C2

C3

L2 L2

Memory

L1 L1L1 L1

11

Page 12: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Measuring inter-application contentionL2-cache

Baseline Configuration

Contention Configuration

C0

C1

C2

C3

L2 L2

Memory

L1 L1L1 L1

Application 1 Thread

Application 2 Thread

C0

C1

C2

C3

L2 L2

Memory

L1 L1L1 L1

12

Page 13: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Measuring inter-application contentionFSB

Baseline Configuration

Memory

C0

C2

C4

C6

L2 L2

L1 L1L1 L1

C1

C3

C5

C7

L2 L2

L1 L1L1 L1

Application 1 Thread

Application 2 Thread

13

Page 14: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Measuring intra-application contentionFSB

Contention Configuration

Memory

C0

C2

C4

C6

L2 L2

L1 L1L1 L1

C1

C3

C5

C7

L2 L2

L1 L1L1 L1

Application 1 Thread

Application 2 Thread

14

Page 15: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

PARSEC Benchmarks

15

Application Domain Benchmark(s)

Financial Analysis Blackscholes (BS)Swaptions (SW)

Computer Vision Bodytrack (BT)

Engineering Canneal (CN)

Enterprise Storage Dedup (DD)

Animation Facesim (FA)Fluidanimate (FL)

Similarity Search Ferret (FE)

Rendering Raytrace (RT)

Data Mining Streamcluster (SC)

Media Processing Vips (VP)X264 (X2)

Page 16: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Experimental platformPlatform 1: Yorkfield

Intel Quad core Q955032 KB L1-D and L1-I

cache6MB L2-cache2GB MemoryCommon FSB

C0

L2 cache

Memory

L1 cache

Memory Controller Hub (Northbridge)

FSB

MB

FSB interface

L2 cache

L2 HW-PF

FSB interface

L2 HW-PF

L1 HW-PF

C1

C2

C3

L1 cache

L1 HW-PF

L1 cache

L1 HW-PF

L1 cache

L1 HW-PF

1616

Page 17: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Tanima Dey

Experimental platform

Memory

Memory Controller Hub (Northbridge)FSB

MB

FSB

C0

L2 cache

L1 cache

FSB interface

L2 cache

L2 HW-PF

FSB interface

L2 HW-PF

L1 HW-PF

C2

C4

C6

L1 cache

L1 HW-PF

L1 cache

L1 HW-PF

L1 cache

L1 HW-PF

C1

L2 cache

L1 cache

FSB interface

L2 cache

L2 HW-PF

FSB interface

L2 HW-PF

L1 HW-PF

C3

C5

C7

L1 cache

L1 HW-PF

L1 cache

L1 HW-PF

L1 cache

L1 HW-PF

Platform 2: Harpertown

1717

Page 18: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

18

Performance AnalysisInter-application contention

For i-th co-runner

PercentPerformanceDifferencei =

( PerformanceBasei – PerformanceContendi ) * 100

PerformanceBasei

Absolute performance difference sum

APDS = Σ abs ( PercentPerformanceDifferencei )

Page 19: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Inter-application contentionL1-cache – for Streamcluster

19

Bla

ck

sc

ho

les

Bo

dy

tra

ck

Ca

nn

ea

l

De

du

p

Fa

ce

sim

Fe

rre

t

Flu

ida

nim

ate

Ra

ytr

ac

e

Sw

ap

tio

ns

Vip

s

X2

64

-8

-6

-4

-2

0

2

4

6

8

Inter-application L1-cache Contention

Co-running benchmarks

Pe

rfo

rma

nc

e D

iffe

ren

ce

(%

)

Page 20: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Inter-application L1-cache contention Streamcluster

20

Inter-application L1-cache Contention

-8

-6-4

-20

2

46

8

Bla

ck

sc

ho

les

Bo

dy

tra

ck

Ca

nn

ea

l

De

du

p

Fa

ce

sim

Fe

rre

t

Flu

ida

nim

ate

Ra

ytr

ac

e

Str

ea

mc

lus

ter

Sw

ap

tio

ns

Vip

s

X2

64

Co-running benchmarks

Pe

rfo

rma

nc

e D

iffe

ren

ce

(%

)

Page 21: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

21

Inter-application contention

21

L1-cache

Page 22: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Inter-application contention

22

L2-cache

Page 23: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Inter-application contentionFSB

23

Page 24: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Characterization

24

Benchmarks

L1-cache L2-cache FSB

Blackscholes

none none none

Bodytrack inter inter intra

Canneal intra inter intra

Dedup inter intra, inter intra, inter

Facesim inter inter intra

Ferret intra intra, inter intra

Fluidanimate

inter inter intra

Raytrace none none intra

Streamcluster

inter inter intra

Swaptions none none none

Vips intra inter inter

X264 inter intra, inter intra

Page 25: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

SummaryThe methodology generalizes contention analysis of

multi-threaded applicationsNew approach to characterize applicationsUseful for performance analysis of existing and future

architecture or benchmarks Helpful for creating new workloads of diverse

properties

Provides insights for designing improved contention-aware scheduling methods

25

Page 26: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Related WorkCache contention

Knauerhase et al. IEEE Micro 2008Zhuravleve et al. ASPLOS 2010Xie et al. CMP-MSI 2008Mars et al. HiPEAC 2011

Characterizing parallel workload Jin et al., NASA Technical Report 2009

PARSEC benchmark suiteBienia et al. PACT 2008Bhadauria et al. IISWC 2009

26

Page 27: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Thank you!

27