25
A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording (ASLPOS’06) Min Xu Rastislav Bodik Mark D. Hill Shimin Chen LBA Reading Group Presentation

A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording (ASLPOS’06)

Embed Size (px)

DESCRIPTION

A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording (ASLPOS’06). Rastislav Bodik. Mark D. Hill. Min Xu. Shimin Chen LBA Reading Group Presentation. Why Do You Need a Recorder?. % gdb a.out gdb> run Program received SIGSEGV. In get() at hash.c:45 - PowerPoint PPT Presentation

Citation preview

Page 1: A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording  (ASLPOS’06)

A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording

(ASLPOS’06)

Min Xu Rastislav Bodik Mark D. Hill

Shimin Chen

LBA Reading Group Presentation

Page 2: A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording  (ASLPOS’06)

2

% gcc sim.c% a.outSegmentation fault%

% gdb a.outgdb> runProgram received SIGSEGV.In get() at hash.c:4545 a = bucket->d;

% gdb a.outgdb> runProgram exited normally.gdb>

% gcc para-sim.c% a.outSegmentation fault%

Why Do You Need a Recorder?

% gdb a.out loggdb> runProgram received SIGSEGV.In get() at para-hash.c:6767 a = bucket->d;

% gcc para-sim.c% a.outSegmentation faultRace recorded in “log”%

Page 3: A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording  (ASLPOS’06)

3Ideally …

% gdb a.out loggdb> runProgram received SIGSEGV.In get() at para-hash.c:6767 a = bucket->d;

% gcc para-sim.c% a.outSegmentation faultRace recorded in “log”%

Long recording:small logLow runtime

overheadLow cost

Applicability:Programs – data race

Systems – non-SC

Page 4: A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording  (ASLPOS’06)

4Flight Data Recorder (ISCA’03)

Full-system Record-Replay• Recording memory races:

• Assumes Sequential Consistency (SC)• Record order of instruction interleaving• Target cache-coherence multiprocessor server• Piggyback on coherence protocol: little extra H/W

• Recording system states: SafetyNet• Recording I/OsResults:

• Non-trivial recording interval: 1 second• Negligible runtime overhead: less than 2%• Can be “Always On”

Page 5: A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording  (ASLPOS’06)

5RTR

Better memory race log compression• 1 byte per Kilo instructions

Dealing with Total Store Ordering

In this talk, I will try to describe a full picture combining FDR and RTR.

Page 6: A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording  (ASLPOS’06)

6Outline

•Introduction•Recording System State•Recording Input/Output•Recording Memory Races•Dealing with TSO•Summary

Page 7: A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording  (ASLPOS’06)

7

Recording System State (based on SafetyNet)

•Purpose: re-construct the initial state (registers, TLB, main memory) at the beginning of the replay interval

•Policy: FDR’s 1second replay interval• Take a logical checkpoint every 1/3 second• Reserve memory space to store logs for 4

checkpoints•Logical checkpoint:

• Quiesce entire system to take a physical checkpoint• Registers and TLB states (4248 bytes/processor on

SPARC V9)• Log old value of a cache line upon first update

• Add an “already-updated” bit per cache line

Page 8: A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording  (ASLPOS’06)

8

FDR paper

Page 9: A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording  (ASLPOS’06)

9Outline

•Introduction•Recording System State•Recording Input/Output•Recording Memory Races•Dealing with TSO•Summary

Page 10: A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording  (ASLPOS’06)

10Recording I/O

I/O loads

Instruction count + interrupt number

DMA store values

Page 11: A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording  (ASLPOS’06)

11Outline

•Introduction•Recording System State•Recording Input/Output•Recording Memory Races•Dealing with TSO•Summary

Page 12: A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording  (ASLPOS’06)

12Log All Dependence

1

2

3

4

5

6

1

2

3

4

5

6

ld A

Thread I Thread J

Replay

st B

st C

sub

ld B

add

st C

ld B

st A

st C

ld D

st D

Log J: 23 14 35 46

Log I: 23

Log Size: 5*16=80 bytes(10 integers)

Dependence Log

16 bytes

But too many dependence

Page 13: A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording  (ASLPOS’06)

13

Netzer’s Transitive Reduction (TR)approximated by FDR

1

2

3

4

5

6

1

2

3

4

5

6

ld A

Thread I Thread J

Replay

st B

st C

sub

ld B

add

st C

ld B

st A

st C

ld D

st D

TR reduced Log J: 23

35 46

Log I: 23

Log Size: 64 bytes(8 integers)

TR Reduced Log

How to further reduce log size?

Page 14: A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording  (ASLPOS’06)

14RTR

Actively creating artificial dependencies• Stricter• Vectorized

Page 15: A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording  (ASLPOS’06)

15The Intuition of the RTR Algorithm

After Reduction

From I to J

From J to I

Vectors

Vectors“Regulate” Replay

Page 16: A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording  (ASLPOS’06)

16

Stricter Dependences to Aid Vectorization

1

2

3

4

1

2

3

4

ld A

Thread I Thread J

Replay

st B

st C

add

st C

ld B

st Ald D

5 5sub st C

6 6ld B st D

Log J: 23 45

Log I: 23

Log Size: 48 bytes(6 integers)

New Reduced Log

stricter

Reduced

Fewer dependencies to log

Page 17: A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording  (ASLPOS’06)

17Compress Vectorized Dependencies

1

2

3

4

5

6

1

2

3

4

5

6

ld A

Thread I Thread J

Replay

st B

st C

sub

ld B

add

st C

ld B

st A

st C

ld D

st D

Log J: x=3,5, ∆=1

Log I: x=3, ∆=1

Log Size: 40 bytes(5 integers)

Vectorized Log

VectorDeps.

TRRTR: fewer deps + fewer byte/dep

Page 18: A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording  (ASLPOS’06)

18

Page 19: A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording  (ASLPOS’06)

19H/W Considerations

(IC) Instruction count per core -- easy(VIC[p]) record previously seen senders’ largest time stamps for transitive reduction

(CTS[b]) time stamp per cache block:• i.e. record IC upon load/store commits• At commit time:

• Figure out memory address – how difficult?• Write CTS: decoupled timestamp memory

Page 20: A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording  (ASLPOS’06)

20H/W Considerations Cont’d

Piggyback on cache coherence messages• FDR: CTS[b]• RTR: CTS[b] & sender’s IC

Logic to perform algorithm at the receiver side• FDR: integer comparison, update VIC[sender],

generate log record• RTR: in addition, max/min, integer subtraction

Augment directory structure• Record last owner for evicted blocks

Cache must respond to inquiries about evicted blocks: reply with CTS[SET/LRU]

Page 21: A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording  (ASLPOS’06)

21Outline

•Introduction•Recording System State•Recording Input/Output•Recording Memory Races•Dealing with TSO•Summary

Page 22: A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording  (ASLPOS’06)

22Total Store Ordering

FIFO Write buffer• A store commits by placing its value into write

buffer• A store is ordered when it exits the write buffer

and updates the memory• Stores are ordered in commit order (FIFO)

Load can obtain values from write buffer or from memory system

Page 23: A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording  (ASLPOS’06)

23Problems with TSO

/* XXX */ is memory order

The two examples create cycles that will result in replay deadlocks

Page 24: A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording  (ASLPOS’06)

24Solution

Identify problematic load instructions• Monitor invalidation in [t1, t2]• t1: the load (or the previous store that feeds the

load) is ordered at memory• t2: all preceding instructions are ordered

Log load values and replay these load instructions by values

HW: similar to the misspeculation detection circuitry in SC systems (e.g. MIPS R10000)

Insufficient for supporting Processor Consistency and other more relaxed models

Page 25: A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording  (ASLPOS’06)

25Conclusion

RTR 1 byte/kilo-instruction•Based on Netzer’s transitive reduction•Create stricter dependencies•Vectorize dependencies to compress log•Avoid overly-strict hence no deadlock