21
Demand-Driven Software Race Detection using Hardware Performance Counters Performance Counters Joseph L. Greathouse , Zhiqiang Ma , Matthew I. Frank Ramesh Peri , Todd Austin University of Michigan Intel Corporation CSCADS Aug 2, 2011

Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters

Demand-Driven Software Race

Detection using Hardware

Performance CountersPerformance Counters

Joseph L. Greathouse†, Zhiqiang Ma‡, Matthew I. Frank‡

Ramesh Peri‡, Todd Austin†

†University of Michigan ‡Intel Corporation

CSCADS

Aug 2, 2011

Page 2: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters

In spite of proposed hardware solutions

Concurrency Bugs Still Matter

Bulk Memory

Commits

2

Commits

TRANSACTIONAL

MEMORYAMD

ASF

?

Sun

Rock

?

Page 3: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters

Concurrency Bugs Matter NOW

if(ptr==NULL)

if(ptr==NULL)

TIME

Thread 2mylen=large

Thread 1mylen=small

len2=thread_local->mylen;

ptr=malloc(len2);Nov. 2010 OpenSSL Security Flaw

3

memcpy(ptr, data2, len2)

ptrLEAKED

TIME

len1=thread_local->mylen;

ptr=malloc(len1);

memcpy(ptr, data1, len1)

Nov. 2010 OpenSSL Security Flaw

Page 4: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters

This Talk in One Sentence

Speed up software race detection

with existing hardware support.with existing hardware support.

4

Page 5: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters

Software Data Race Detection

� Add checks around every memory access

� Find inter-thread sharing events

� Synchronization between write-shared

accesses?

� No? Data race.

5

Page 6: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters

Thread 2mylen=large

Thread 1mylen=small

if(ptr==NULL)

len1=thread_local->mylen;

ptr=malloc(len1);

memcpy(ptr, data1, len1)

Example of Data Race Detection

ptr write-shared?Interleaved

Synchronization?

TIME

if(ptr==NULL)

len2=thread_local->mylen;len2=thread_local->mylen;

ptr=malloc(len2);ptr=malloc(len2);

memcpy(ptr, data2, len2)memcpy(ptr, data2, len2)

6

TIME

Page 7: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters

SW Race Detection is Slow

150

200

250

300

Race Detector Slowdown (x) Phoenix PARSEC

7

0

50

100

his

tog

ram

km

ea

ns

line

ar_

reg

rC

ma

trix

_m

ulC

pca

str

ing_

ma

tch

wo

rd_

co

un

t

Ge

oM

ea

n

bla

cksch

ole

s

bo

dytr

ack

face

sim

ferr

et

fre

qm

ine

raytr

ace

sw

ap

tio

ns

flu

ida

nim

ate

vip

s

x2

64

ca

nn

ea

l

de

du

p

str

ea

mclu

sC

Ge

oM

ea

n

Race Detector Slowdown (x

Page 8: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters

Goal of this Work

Accelerate Software Data Race Detection

Technique #1: Making it Fast

Demand-Driven Data Race Detection

Technique #2: Keeping it Real

Find sharing events with existing HW

8

Page 9: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters

Inter-thread Sharing is What’s Important

“Data races ... are failures in programs that access and

update shared data in critical sections” – Netzer & Miller, 1992

if(ptr==NULL)

len1=thread_local->mylen;

ptr=malloc(len1);

memcpy(ptr, data1, len1)

Thread-local data

NO SHARING

Shared data

NO INTER-THREAD

TIME

9

if(ptr==NULL)

memcpy(ptr, data1, len1)

len2=thread_local->mylen;len2=thread_local->mylen;

ptr=malloc(len2);ptr=malloc(len2);

memcpy(ptr, data2, len2)memcpy(ptr, data2, len2)

NO INTER-THREAD

SHARING EVENTS

TIME

Page 10: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters

Very Little Inter-Thread Sharing

Phoenix PARSEC

40

50

60

70

80

90

100

Sharing Events

1.5

2

2.5

3

Sharing Events

10

0

10

20

30

40

% Write-Sharing Events

0

0.5

1

% Write-Sharing Events

Page 11: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters

Technique 1: Demand-Driven Analysis

Multi-threadedApplication

SoftwareRace Detector

11

Local

Access

Inter-thread

sharing

Inter-thread Sharing Monitor

Page 12: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters

Inter-thread Sharing Monitor

� Check each memory op. for write-sharing

� Signal software race detector on sharing

� Possible to do in software

+ Can be built now with instrumentation

– Slow. May take as long as race detection

12

Page 13: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters

� Follow read/write sets of threads

Fast user-level faults

Sharing Monitor

Thread 1

WRITE Y

Ideal Hardware Sharing Detector

Thread 2

READ Y

Thread 1

WRITE Y

T1

R: ∅

W: ∅

T2

R: ∅

W: ∅

Thread 2

READ Y

T1

R: ∅

W: {Y}

W->R

Sharing

� Fast user-level faults

13

Multi-threadedApplication

SoftwareRace Detector

Inter-thread Sharing Monitor

Page 14: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters

Limitations of Existing Hardware

� Fast faults

� Solution: Enable detector for long periods of time

Multi-threadedApplication

SoftwareRace Detector

NO

� Read/write sets

� Solution:

14

Inter-thread Sharing Monitor

NO

SHARING

Page 15: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters

Technique 2: Hardware Sharing Detector

� Hardware Performance Counters

� Interrupt on cache coherency events

� Intel’s HITM event: W→R Data Sharing

S

M

S

IHITM

� Limitations of this method:

� SMT sharing can’t be counted

� Cache eviction

� Others in paper

15

M IHITM

Page 16: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters

Demand-Driven Analysis on Real HW

Execute

Instruction

Disable HITM AnalysisNO

NO

16

SW Race

Detection

Enable

Analysis

AnalysisInterrupt?

Sharing

Recently?

Enabled?

NOYES

YES

YES

Page 17: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters

Experimental Evaluation

� Modified Intel Inspector XE Race Detector

� Linux on 4-core Core i7, no Hyper-Threading

� Performance Tests:

� Phoenix Suite� Phoenix Suite

� PARSEC

� Accuracy Tests:

� Phoenix Suite

� PARSEC

� Pre-release version of RADBench

17

Simulation

Page 18: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters

Performance Difference

150

200

250

300

Race Detector Slowdown (x) Phoenix PARSEC

18

0

50

100

his

tog

ram

km

ea

ns

line

ar_

reg

rC

ma

trix

_m

ulC

pca

str

ing_

ma

tch

wo

rd_

co

un

t

Ge

oM

ea

n

bla

cksch

ole

s

bo

dytr

ack

face

sim

ferr

et

fre

qm

ine

raytr

ace

sw

ap

tio

ns

flu

ida

nim

ate

vip

s

x2

64

ca

nn

ea

l

de

du

p

str

ea

mclu

sC

Ge

oM

ea

n

Race Detector Slowdown (x

Page 19: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters

Performance Increases

8

10

12

14

16

18

20

driven Analysis

Speedup (x)

Phoenix PARSEC

51x

19

0

2

4

6

8

his

tog

ram

km

ea

ns

line

ar_

reg

rC

ma

trix

_m

ulC

pca

str

ing_

ma

tch

wo

rd_

co

un

t

Ge

oM

ea

n

bla

cksch

ole

s

bo

dytr

ack

face

sim

ferr

et

fre

qm

ine

raytr

ace

sw

ap

tio

ns

flu

ida

nim

ate

vip

s

x2

64

ca

nn

ea

l

de

du

p

str

ea

mclu

sC

Ge

oM

ea

n

Demand-driven Analysis

Speedup (x

Page 20: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters

Demand-Driven Analysis Accuracy

8

10

12

14

16

18

20

driven Analysis

Speedup (x)

1/1 2/4 3/3 4/4 3/3 4/4 4/42/4 4/4 4/42/4

Accuracy vs.

Continuous Analysis:

20

0

2

4

6

8

his

tog

ram

km

ea

ns

line

ar_

reg

rC

ma

trix

_m

ulC

pca

str

ing_

ma

tch

wo

rd_

co

un

t

Ge

oM

ea

n

bla

cksch

ole

s

bo

dytr

ack

face

sim

ferr

et

fre

qm

ine

raytr

ace

sw

ap

tio

ns

flu

ida

nim

ate

vip

s

x2

64

ca

nn

ea

l

de

du

p

str

ea

mclu

sC

Ge

oM

ea

n

Demand-driven Analysis

Speedup (x)

Continuous Analysis:

97%

Page 21: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters

Future Directions

� Better Performance

� Fast user-level faults

� Application specific hardware

� More Accuracy

� Better performance counters� Better performance counters

� Inform SW on cache evictions/misses

� Smooth transition to ideal hardware

� Combine sampling & demand-driven analysis

21