27
ISPASS 2011 Characterizing Multi-threaded Applications based on Shared-Resource Contention Tanima Dey Wei Wang, Jack W. Davidson, Mary L. Soffa Department of Computer Science University of Virginia 1

Characterizing Multi-threaded Applications based on Shared-Resource Contention

  • Upload
    werner

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Characterizing Multi-threaded Applications based on Shared-Resource Contention. Tanima Dey Wei Wang, Jack W. Davidson, Mary L. Soffa Department of Computer Science University of Virginia. Motivation. The number of cores doubles every 18 months Expected: Performance number of cores - PowerPoint PPT Presentation

Citation preview

Page 1: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

ISPASS 2011

Characterizing Multi-threaded Applications based on

Shared-Resource Contention

Tanima Dey Wei Wang, Jack W. Davidson, Mary L. Soffa

Department of Computer ScienceUniversity of Virginia

1

Page 2: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

MotivationThe number of cores doubles every 18 monthsExpected: Performance number of coresOne of the bottlenecks is shared resource contention

For multi-threaded workloads, contention is unavoidable

To reduce contention, it is necessary to understand where and how the contention is created

2

Page 3: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Shared Resource Contention in Chip-Multiprocessors

Intel Quad Core Q9550

C0

C1

C2

C3

L2 L2

Memory

L1 L1L1 L1

Front -Side Bus

3

Application 1 Thread

Application 2 Thread

Page 4: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Scenario 1 Multi-threaded applicationsWith co-runner

C0

C1

C2

C3

L2 L2

Memory

L1 L1L1 L1

4

Application 1 Thread

Application 2 Thread

Page 5: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Without co-runner

C0

C1

C2

C3

L2 L2

Memory

L1 L1L1 L1

Application Thread

5

Scenario 2Multi-threaded applications

Page 6: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Shared-Resource Contention

Intra-application contentionContention among threads from the same application

(No co-runners)

Inter-application contentionContention among threads from the co-running

application

6

Page 7: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

ContributionsA general methodology to evaluate a multi-threaded

application’s performance Intra-application contention Inter-application contentionContention in the memory-hierarchy shared resources

Characterizing applications facilitates better understanding of the application’s resource sensitivity

Thorough performance analyses and characterization of multi-threaded PARSEC benchmarks 7

Page 8: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

OutlineMotivationContributionsMethodologyMeasuring intra-application contentionMeasuring inter-application contentionRelated WorkSummary

8

Page 9: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Methodology

9

Designed to measure both intra- and inter-application contention for a targeted shared resourceL1-cache, L2-cacheFront Side Bus (FSB)

Each application is run in two configurationsBaseline: threads do not share the targeted resourceContention: threads share the targeted resource

Multiple number of targeted resourceDetermine contention by comparing performance

(gathering hardware performance counters’ values)

Page 10: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

OutlineMotivationContributionsMethodologyMeasuring intra-application contention (See paper)Measuring inter-application contentionRelated WorkSummary

10

Page 11: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

L1-cache

Baseline Configuration

Contention Configuration

Measuring inter-application contention

C0

C1

C2

C3

L2 L2

Memory

L1 L1L1 L1

Application 1 Thread

Application 2 Thread

C0

C1

C2

C3

L2 L2

Memory

L1 L1L1 L1

11

Page 12: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Measuring inter-application contentionL2-cache

Baseline Configuration

Contention Configuration

C0

C1

C2

C3

L2 L2

Memory

L1 L1L1 L1

Application 1 Thread

Application 2 Thread

C0

C1

C2

C3

L2 L2

Memory

L1 L1L1 L1

12

Page 13: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Measuring inter-application contentionFSB

Baseline Configuration

Memory

C0

C2

C4

C6

L2 L2

L1 L1L1 L1

C1

C3

C5

C7

L2 L2

L1 L1L1 L1

Application 1 Thread

Application 2 Thread

13

Page 14: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Measuring intra-application contentionFSB

Contention Configuration

Memory

C0

C2

C4

C6

L2 L2

L1 L1L1 L1

C1

C3

C5

C7

L2 L2

L1 L1L1 L1

Application 1 Thread

Application 2 Thread

14

Page 15: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

PARSEC Benchmarks

15

Application Domain Benchmark(s)Financial Analysis Blackscholes (BS)

Swaptions (SW)Computer Vision Bodytrack (BT)Engineering Canneal (CN)Enterprise Storage Dedup (DD)Animation Facesim (FA)

Fluidanimate (FL)Similarity Search Ferret (FE)Rendering Raytrace (RT)Data Mining Streamcluster (SC)Media Processing Vips (VP)

X264 (X2)

Page 16: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Experimental platformPlatform 1: Yorkfield

Intel Quad core Q955032 KB L1-D and L1-I

cache6MB L2-cache2GB MemoryCommon FSB

C0

L2 cache

Memory

L1 cache

Memory Controller Hub (Northbridge)

FSB

MB

FSB interface

L2 cache

L2 HW-PF

FSB interface

L2 HW-PF

L1 HW-PF

C1

C2

C3

L1 cache

L1 HW-PF

L1 cache

L1 HW-PF

L1 cache

L1 HW-PF

1616

Page 17: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Tanima Dey

Experimental platform

Memory

Memory Controller Hub (Northbridge)FSB

MB

FSB

C0

L2 cache

L1 cache

FSB interface

L2 cache

L2 HW-PF

FSB interface

L2 HW-PF

L1 HW-PF

C2

C4 C6

L1 cache

L1 HW-PF

L1 cache

L1 HW-PF

L1 cache

L1 HW-PF

C1

L2 cache

L1 cache

FSB interface

L2 cache

L2 HW-PF

FSB interface

L2 HW-PF

L1 HW-PF

C3

C5

C7

L1 cache

L1 HW-PF

L1 cache

L1 HW-PF

L1 cache

L1 HW-PF

Platform 2: Harpertown

1717

Page 18: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

18

Performance AnalysisInter-application contention

For i-th co-runnerPercentPerformanceDifferencei = ( PerformanceBasei – PerformanceContendi ) * 100

PerformanceBasei

Absolute performance difference sum

APDS = Σ abs ( PercentPerformanceDifferencei )

Page 19: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Inter-application contentionL1-cache – for Streamcluster

19

Bla

cksc

hole

s

Bod

ytra

ck

Can

neal

Ded

up

Face

sim

Ferr

et

Flui

dani

mat

e

Ray

trace

Swap

tions

Vips

X264

-8

-6

-4

-2

0

2

4

6

8Inter-application L1-cache Contention

Co-running benchmarks

Perfo

rman

ce D

iffer

ence

(%)

Page 20: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Inter-application L1-cache contention Streamcluster

20

Inter-application L1-cache Contention

-8-6-4-202468

Bla

cksc

hole

s

Bod

ytra

ck

Can

neal

Ded

up

Face

sim

Ferr

et

Flui

dani

mat

e

Ray

trace

Stre

amcl

uste

r

Swap

tions

Vips

X264

Co-running benchmarks

Perfo

rman

ce D

iffer

ence

(%)

Page 21: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

21

Inter-application contention

21

L1-cache

Page 22: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Inter-application contention

22

L2-cache

Page 23: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Inter-application contentionFSB

23

Page 24: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Characterization

24

Benchmarks

L1-cache L2-cache FSB

Blackscholes

none none none

Bodytrack inter inter intraCanneal intra inter intraDedup inter intra, inter intra, interFacesim inter inter intraFerret intra intra, inter intraFluidanimate

inter inter intra

Raytrace none none intraStreamcluster

inter inter intra

Swaptions none none noneVips intra inter interX264 inter intra, inter intra

Page 25: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

SummaryThe methodology generalizes contention analysis of

multi-threaded applicationsNew approach to characterize applicationsUseful for performance analysis of existing and future

architecture or benchmarks Helpful for creating new workloads of diverse

properties

Provides insights for designing improved contention-aware scheduling methods

25

Page 26: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Related WorkCache contention

Knauerhase et al. IEEE Micro 2008Zhuravleve et al. ASPLOS 2010Xie et al. CMP-MSI 2008Mars et al. HiPEAC 2011

Characterizing parallel workload Jin et al., NASA Technical Report 2009

PARSEC benchmark suiteBienia et al. PACT 2008Bhadauria et al. IISWC 2009

26

Page 27: Characterizing  Multi-threaded Applications based on Shared-Resource Contention

Thank you!

27