45
1 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center QuickTime™ and a decompressor are needed to see this picture.

David F. Bacon T.J. Watson Research Center

  • Upload
    mele

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation. David F. Bacon T.J. Watson Research Center. Part 2: Trace (aka Mark). Clearing Monitor Table clearing JNI Weak Global clearing Debugger Reference clearing JVMTI Table clearing - PowerPoint PPT Presentation

Citation preview

Page 1: David F. Bacon T.J. Watson Research Center

1

Parallel and ConcurrentReal-time Garbage Collection

Part III:Tracing, Snapshot, and Defragmentation

David F. Bacon

T.J. Watson Research Center

QuickTime™ and a decompressor

are needed to see this picture.

Page 2: David F. Bacon T.J. Watson Research Center

2

Part 2: Trace (aka Mark)• Initiation

– Setup• turn double barrier on

• Root Scan– Active Finalizer scan– Class scan– Thread scan**

• switch to single barrier, color to black

– Debugger, JNI, Class Loader scan

• Trace – Trace*– Trace Terminate***

• Re-materialization 1– Weak/Soft/Phantom Reference List Transfer– Weak Reference clearing** (snapshot)

• Re-Trace 1– Trace Master – (Trace*)– (Trace Terminate***)

• Re-materialization 2– Finalizable Processing

• Clearing– Monitor Table clearing– JNI Weak Global clearing– Debugger Reference clearing– JVMTI Table clearing– Phantom Reference clearing

• Re-Trace 2– Trace Master – (Trace*)– (Trace Terminate***)– Class Unloading

• Completion– Finalizer Wakeup– Class Unloading Flush– Clearable Compaction**– Book-keeping

* Parallel** Callback*** Single actor symmetric

• Flip – Move Available Lists to Full List* (contention)

• turn write barrier off

– Flush Per-thread Allocation Pages**• switch allocation color to white• switch to temp full list

• Sweeping– Sweep*– Switch to regular Full List**– Move Temp Full List to regular Full List* (contention)

Page 3: David F. Bacon T.J. Watson Research Center

3

Let’s Assume a Stack Snapshot

Stack

rr

pp TTa

b

XXa

b UUa

b

ZZa

b

WWa

b

YYa

b

Page 4: David F. Bacon T.J. Watson Research Center

4

Yuasa Algorithm Review:2(a): Copy Over-written Pointers

Stack

pp

rr

TTa

b

XXa

b UUa

b

ZZa

b

WWa

b

YYa

b

Page 5: David F. Bacon T.J. Watson Research Center

5

Yuasa Algorithm Review: 2(b): Trace

Stack

pp TTa

b

XXa

b UUa

b

ZZa

b

WWa

b

YYa

b

WWa

b

ZZa

b

YYa

b

XXa

b

* Color is per-object mark bit

Page 6: David F. Bacon T.J. Watson Research Center

6

Yuasa Algorithm Review: 2(c): Allocate “Black”

Stack

pp TTa

b

XXa

b UUa

b

ZZa

b

WWa

b

YYa

b

ss VVa

b

WWa

b

ZZa

b

YYa

b

XXa

b

Page 7: David F. Bacon T.J. Watson Research Center

7

Non-monotonicity in Tracing

1 GB

Page 8: David F. Bacon T.J. Watson Research Center

8

Which Design Pattern is This?

pool of work

Shared Monotonic Work Pool

Per-Thread State Update

Page 9: David F. Bacon T.J. Watson Research Center

9

Trace is Non-Monotonic…and requires thread-local data

GCWorkerThreads

GCMasterThread

ApplicationThreads

pool ofnon-monotonic

work

Page 10: David F. Bacon T.J. Watson Research Center

10

Basic Solution

• Check if there are more work packets

– If some found, trace is not done yet

– If none found, “probably done”• Pause all threads• Re-scan for non-empty buffers• Resume all threads• If none, done• Otherwise, try again later

Page 11: David F. Bacon T.J. Watson Research Center

11

Ragged Barriers:How to Stop without Stopping

Epoch 1

A B C- Local Epoch- Global Min- Global Max

Epoch 4

Epoch 3

Epoch 2

Page 12: David F. Bacon T.J. Watson Research Center

12

“Trace” Phase

Page 13: David F. Bacon T.J. Watson Research Center

13

The Thread’s Full Monty

1616 6464 256256

Thread 1 inVM

color

1616 6464 256256

Thread 2

epoch 37

inVM

writebuf

43

pthread pthread_tpthread_t

writebuf

41

pthread pthread_tpthread_t

next

next

threadlist

dblb

dblb

epoch 43

BarrierOnBarrierOntrue

BarrierOnBarrierOntrue

color

PhaseInfo

phase workers

PhaseInfo

phase workers

trace 3

color

color

sweep

sweep

Epoch

agreed newest

Epoch

agreed newest4335

Page 14: David F. Bacon T.J. Watson Research Center

14

Work Packet Data Structures

wbuffillwbuffill

wbuftracewbuftracetotracetotrace freefree

wbuf.epoch < Epoch.agreed

WBufEpochWBufEpoch

39

WBufCountWBufCount

3

39

37

37

Page 15: David F. Bacon T.J. Watson Research Center

15

trace() { thread->epoch = Epoch.newest; bool canTerminate = true;

if (WBufCount > 0) getWriteBuffers(); canTerminate = false;

while (b = wbuf-trace.pop()) if (! moreTime()) return; int traceCount = traceBufferContents(b); canTerminate &= (traceCount == 0); while (b = totrace.pop()) if (! moreTime()) return; int TraceCount = traceBufferContents(b); canTerminate &= (traceCount == 0);

if (canTerminate) traceTerminate();}

Page 16: David F. Bacon T.J. Watson Research Center

16

Getting Write Buffer RootsgetWriteBuffers() { thread->epoch = fetchAndAdd(Epoch.newest, 1); WBufEpoch = thread->epoch; // mutators will dump wbufs

LOCK(wbuf-fill); LOCK(wbuf-trace);

for each (wbuf in wbuf-fill) if (wbuf.epoch < Epoch.agreed) remove wbuf from wbuf-fill; add wbuf to wbuf-trace;

UNLOCK(wbuf-trace); UNLOCK(wbuf-fill);}

Page 17: David F. Bacon T.J. Watson Research Center

17

Write Barrier

writeBarrier(Object object, Field field, Object new) { if (BarrierOn) Object old = object[field]; if (old != null && ! old.marked) outOfLineBarrier(old); if (thread->dblb) // double barrier outOfLineBarrier(new);}

Page 18: David F. Bacon T.J. Watson Research Center

18

Write Barrier Slow PathoutOfLineBarrier(Object obj) { if (obj == null || obj.marked) return;

obj.marked = true;

bool epochOK = thread->wbuf->epoch == WBufEpoch; bool haveRoom = thread->wbuf->data < thread->wbuf->end;

if (! (epochOK && enoughSpace)) thread->wbuf = flushWBufAndAllocNew(thread->wbuf); // Updates WBufEpoch, Epoch.newest

*thread->wbuf->data++ = obj;}

Page 19: David F. Bacon T.J. Watson Research Center

19

“Trace Terminate” Phase

Page 20: David F. Bacon T.J. Watson Research Center

20

Trace Termination

wbuffillwbuffill

wbuftracewbuftracetotracetotrace freefree

WBufEpochWBufEpoch

39

WBufCountWBufCount

0

Page 21: David F. Bacon T.J. Watson Research Center

21

Asynchronous Agreement

BarrierOnBarrierOntrue

BarrierOnBarrierOntrue

PhaseInfo

phase workers

PhaseInfo

phase workerstrace 1

Epoch

agreed newest

Epoch

agreed newest4237

PhaseInfo

phase workers

PhaseInfo

phase workersterminate 1

desiredEpoch = Epooch.newest;…WAIT FOR Epoch.agreed == desiredEpoch if (WBufCount == 0) DONE else RESUME TRACING

Page 22: David F. Bacon T.J. Watson Research Center

22

Ragged Barrierbool raggedBarrier(desiredEpoch, urgent) { if (Epoch.agreed >= desiredEpoch) return true;

LOCK(threadlist); int latest = MAXINT; for each (Thread thread in threadlist) latest = min(latest, thread.epoch);

Epoch.agreed = latest; UNLOCK(threadlist);

if (epoch.agreed >= desiredEpoch) return true; else doCallbacks(RAGGED_BARRIER, true, urgent); return false;} * Non-locking implementation?

Page 23: David F. Bacon T.J. Watson Research Center

23

Part 1: Scan Roots• Initiation

– Setup• turn double barrier on

• Root Scan– Active Finalizer scan– Class scan– Thread scan**

• switch to single barrier, color to black

– Debugger, JNI, Class Loader scan

• Trace – Trace*– Trace Terminate***

• Re-materialization 1– Weak/Soft/Phantom Reference List Transfer– Weak Reference clearing** (snapshot)

• Re-Trace 1– Trace Master – (Trace*)– (Trace Terminate***)

• Re-materialization 2– Finalizable Processing

• Clearing– Monitor Table clearing– JNI Weak Global clearing– Debugger Reference clearing– JVMTI Table clearing– Phantom Reference clearing

• Re-Trace 2– Trace Master – (Trace*)– (Trace Terminate***)– Class Unloading

• Completion– Finalizer Wakeup– Class Unloading Flush– Clearable Compaction**– Book-keeping

* Parallel** Callback*** Single actor symmetric

• Flip – Move Available Lists to Full List* (contention)

• turn write barrier off

– Flush Per-thread Allocation Pages**• switch allocation color to white• switch to temp full list

• Sweeping– Sweep*– Switch to regular Full List**– Move Temp Full List to regular Full List* (contention)

Page 24: David F. Bacon T.J. Watson Research Center

24

Fuzzy Snapshot

• Finally, we assume no magic

• Initiate Collection

Page 25: David F. Bacon T.J. Watson Research Center

25

“Initiate Collection” Phase

Page 26: David F. Bacon T.J. Watson Research Center

26

Initiate: Color Black, Double Barrier

1616 6464 256256

Thread 1 inVM

1616 6464 256256

Thread 2

epoch 37

inVM

writebuf

43

pthread pthread_tpthread_t

writebuf

41

pthread pthread_tpthread_t

next

next

threadlist

epoch 43

BarrierOnBarrierOntrue

BarrierOnBarrierOntrue

PhaseInfo

phase workers

PhaseInfo

phase workers

initiate 3

color

color

sweep

sweep

Epoch

agreed newest

Epoch

agreed newest4237

color

color

dblb dblb

dblb dblb

Page 27: David F. Bacon T.J. Watson Research Center

27

What is a Double Barrier?Store both Old and New Pointers

Stack

pp

rr

TTa

b

XXa

b UUa

b

ZZa

b

WWa

b

YYa

b

Page 28: David F. Bacon T.J. Watson Research Center

28

Why Double Barrier?

T1Stack

ppXXa

b

ZZa

b

WWa

b

T2Stack

nn

“Snapshot” = { V, W, X, Z }

VVa

b T3Stack

kk

mm

jj

T2: m.b = n (writes X.b = W)T3: j.b = k (writes X.b = V)T1: q = p.b (reads X.b: V, W, or Z??)

qq

Page 29: David F. Bacon T.J. Watson Research Center

29

Yuasa (Single) Barrier with 2 Writers

T1Stack

ppXXa

b

ZZa

b

WWa

b

T2Stack

nn

VVa

b T3Stack

kk

mm

jj

T2: m.b = n (X.b = W)

qq

T2 T3

T3: j.b = k (X.b = V)

Page 30: David F. Bacon T.J. Watson Research Center

30

Yuasa Barrier Lost Update

T1Stack

ppXXa

b

ZZa

b

WWa

b

T2Stack

nn

VVa

b T3Stack

kk

mm

jj

T2: m.b = n (X.b = W)

qq

T2 T3

T3: j.b = k (X.b = V)

Page 31: David F. Bacon T.J. Watson Research Center

31

Hosed!

T1Stack

ppXXa

b

ZZa

b

WWa

b

T2Stack

nn

VVa

b T3Stack

kk

mm

jj

qq

T2 T3T1

T2: m.b = n (X.b = W)T3: j.b = k (X.b = V)T1: q = p.b (q <- W)T2: n = null

T1: Scan Stack

T2: Scan Stack

Page 32: David F. Bacon T.J. Watson Research Center

32

“Thread Stack Scan” Phase

Page 33: David F. Bacon T.J. Watson Research Center

33

Scan Stacks (double barrier off)

1616 6464 256256

Thread 1 inVM

1616 6464 256256

Thread 2

epoch 37

inVM

writebuf

43

pthread pthread_tpthread_t

writebuf

41

pthread pthread_tpthread_t

next

next

threadlist

epoch 43

BarrierOnBarrierOntrue

BarrierOnBarrierOntrue

PhaseInfo

phase workers

PhaseInfo

phase workers

initiate 3

color

color

sweep

sweep

Epoch

agreed newest

Epoch

agreed newest4237

color

color

dblb dblb

dblb dblb

Scan Stack 1

Scan Stack 2

Page 34: David F. Bacon T.J. Watson Research Center

34

All Done!

QuickTime™ and a decompressor

are needed to see this picture.

Page 35: David F. Bacon T.J. Watson Research Center

35

Boosting: Ensuring Progress

GCWorkerThreads

GCMasterThread

ApplicationThreads

(may do GC work)

Boost to master priority

Revert(?)

Page 36: David F. Bacon T.J. Watson Research Center

36

Part 4: Defragmentation• Initiation

– Setup• turn double barrier on

• Root Scan– Active Finalizer scan– Class scan– Thread scan**

• switch to single barrier, color to black

– Debugger, JNI, Class Loader scan

• Trace – Trace*– Trace Terminate***

• Re-materialization 1– Weak/Soft/Phantom Reference List Transfer– Weak Reference clearing** (snapshot)

• Re-Trace 1– Trace Master – (Trace*)– (Trace Terminate***)

• Re-materialization 2– Finalizable Processing

• Clearing– Monitor Table clearing– JNI Weak Global clearing– Debugger Reference clearing– JVMTI Table clearing– Phantom Reference clearing

• Re-Trace 2– Trace Master – (Trace*)– (Trace Terminate***)– Class Unloading

• Completion– Finalizer Wakeup– Class Unloading Flush– Clearable Compaction**– Book-keeping

* Parallel** Callback*** Single actor symmetric

• Flip – Move Available Lists to Full List* (contention)

• turn write barrier off

– Flush Per-thread Allocation Pages**• switch allocation color to white• switch to temp full list

• Sweeping– Sweep*– Switch to regular Full List**– Move Temp Full List to regular Full List* (contention)

• Defragmentation

Page 37: David F. Bacon T.J. Watson Research Center

37

freefree?

free16

free16

free64

free64

free256free256

alloc’dalloc’d sweepsweep

2562561616 6464

Page 38: David F. Bacon T.J. Watson Research Center

38

Two-way Communication

GCWorkerThreads

GCMasterThread

ApplicationThreads

(may do GC work)

objects have moved

pointers have changed

Page 39: David F. Bacon T.J. Watson Research Center

39

Defragmentation

T T a

b

U U a

b

Z Z a

b

X X a

b

T’ T’ a

b

C C a

b

B B a

b

D D a

b

E E a

b

Page 40: David F. Bacon T.J. Watson Research Center

40

Staccato Algorithm

FREE

T T a

b

NORMAL

T T a

b

COPYING

T T a

b

MOVED

T T a

b

T’ T’ a

b

T’ T’ a

b

allocatenop

defragmentcas

access (abort)cas

commitcas

accessload

accessload

reapstore

createstore

Page 41: David F. Bacon T.J. Watson Research Center

41

Scheduling

Page 42: David F. Bacon T.J. Watson Research Center

42

Guaranteeing Real Time

• Guaranteeing usability without realtime:– Must know maximum live memory

• If fragmentation & metadata overhead bounded

• We also require:– Maximum allocation rate (MB/s)

• How does the user figure this out???– Very simple programming style– Empirical measurement– (Research) Static analysis

Page 43: David F. Bacon T.J. Watson Research Center

43

Conclusions

• Systems are made of concurrent components

• Basic building blocks:– Locks– Try-locks– Compare-and-Swap– Non-locking stacks, lists, …– Monotonic phases– Logical clocks and asynchronous agreement

• Encapsulate so others won’t suffer!

Page 44: David F. Bacon T.J. Watson Research Center

44

http://www.research.ibm.com/metronome

https://sourceforge.net/projects/tuningforkvp

Page 45: David F. Bacon T.J. Watson Research Center

45

Legend

GC Phases

APP

TRACE

SNAPSHOT

FLIP

SWEEP

APP

APP

APP

APP

APP

APP

APP

APP APP

APP

APP

APP

APP

APP

APP

APP

APP

APP

APP

APP

APP

APP

APP

TERMINATE

APP

APP

APP

APP

APP

APP

APP

APP

…………

…………