Upload
mele
View
33
Download
0
Embed Size (px)
DESCRIPTION
Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation. David F. Bacon T.J. Watson Research Center. Part 2: Trace (aka Mark). Clearing Monitor Table clearing JNI Weak Global clearing Debugger Reference clearing JVMTI Table clearing - PowerPoint PPT Presentation
Citation preview
1
Parallel and ConcurrentReal-time Garbage Collection
Part III:Tracing, Snapshot, and Defragmentation
David F. Bacon
T.J. Watson Research Center
QuickTime™ and a decompressor
are needed to see this picture.
2
Part 2: Trace (aka Mark)• Initiation
– Setup• turn double barrier on
• Root Scan– Active Finalizer scan– Class scan– Thread scan**
• switch to single barrier, color to black
– Debugger, JNI, Class Loader scan
• Trace – Trace*– Trace Terminate***
• Re-materialization 1– Weak/Soft/Phantom Reference List Transfer– Weak Reference clearing** (snapshot)
• Re-Trace 1– Trace Master – (Trace*)– (Trace Terminate***)
• Re-materialization 2– Finalizable Processing
• Clearing– Monitor Table clearing– JNI Weak Global clearing– Debugger Reference clearing– JVMTI Table clearing– Phantom Reference clearing
• Re-Trace 2– Trace Master – (Trace*)– (Trace Terminate***)– Class Unloading
• Completion– Finalizer Wakeup– Class Unloading Flush– Clearable Compaction**– Book-keeping
* Parallel** Callback*** Single actor symmetric
• Flip – Move Available Lists to Full List* (contention)
• turn write barrier off
– Flush Per-thread Allocation Pages**• switch allocation color to white• switch to temp full list
• Sweeping– Sweep*– Switch to regular Full List**– Move Temp Full List to regular Full List* (contention)
3
Let’s Assume a Stack Snapshot
Stack
rr
pp TTa
b
XXa
b UUa
b
ZZa
b
WWa
b
YYa
b
4
Yuasa Algorithm Review:2(a): Copy Over-written Pointers
Stack
pp
rr
TTa
b
XXa
b UUa
b
ZZa
b
WWa
b
YYa
b
5
Yuasa Algorithm Review: 2(b): Trace
Stack
pp TTa
b
XXa
b UUa
b
ZZa
b
WWa
b
YYa
b
WWa
b
ZZa
b
YYa
b
XXa
b
* Color is per-object mark bit
6
Yuasa Algorithm Review: 2(c): Allocate “Black”
Stack
pp TTa
b
XXa
b UUa
b
ZZa
b
WWa
b
YYa
b
ss VVa
b
WWa
b
ZZa
b
YYa
b
XXa
b
7
Non-monotonicity in Tracing
1 GB
8
Which Design Pattern is This?
pool of work
Shared Monotonic Work Pool
Per-Thread State Update
9
Trace is Non-Monotonic…and requires thread-local data
GCWorkerThreads
GCMasterThread
ApplicationThreads
pool ofnon-monotonic
work
10
Basic Solution
• Check if there are more work packets
– If some found, trace is not done yet
– If none found, “probably done”• Pause all threads• Re-scan for non-empty buffers• Resume all threads• If none, done• Otherwise, try again later
11
Ragged Barriers:How to Stop without Stopping
Epoch 1
A B C- Local Epoch- Global Min- Global Max
Epoch 4
Epoch 3
Epoch 2
12
“Trace” Phase
13
The Thread’s Full Monty
1616 6464 256256
Thread 1 inVM
color
1616 6464 256256
Thread 2
epoch 37
inVM
writebuf
43
pthread pthread_tpthread_t
writebuf
41
pthread pthread_tpthread_t
next
next
threadlist
dblb
dblb
epoch 43
BarrierOnBarrierOntrue
BarrierOnBarrierOntrue
color
PhaseInfo
phase workers
PhaseInfo
phase workers
trace 3
color
color
sweep
sweep
Epoch
agreed newest
Epoch
agreed newest4335
14
Work Packet Data Structures
wbuffillwbuffill
wbuftracewbuftracetotracetotrace freefree
wbuf.epoch < Epoch.agreed
WBufEpochWBufEpoch
39
WBufCountWBufCount
3
39
37
37
15
trace() { thread->epoch = Epoch.newest; bool canTerminate = true;
if (WBufCount > 0) getWriteBuffers(); canTerminate = false;
while (b = wbuf-trace.pop()) if (! moreTime()) return; int traceCount = traceBufferContents(b); canTerminate &= (traceCount == 0); while (b = totrace.pop()) if (! moreTime()) return; int TraceCount = traceBufferContents(b); canTerminate &= (traceCount == 0);
if (canTerminate) traceTerminate();}
16
Getting Write Buffer RootsgetWriteBuffers() { thread->epoch = fetchAndAdd(Epoch.newest, 1); WBufEpoch = thread->epoch; // mutators will dump wbufs
LOCK(wbuf-fill); LOCK(wbuf-trace);
for each (wbuf in wbuf-fill) if (wbuf.epoch < Epoch.agreed) remove wbuf from wbuf-fill; add wbuf to wbuf-trace;
UNLOCK(wbuf-trace); UNLOCK(wbuf-fill);}
17
Write Barrier
writeBarrier(Object object, Field field, Object new) { if (BarrierOn) Object old = object[field]; if (old != null && ! old.marked) outOfLineBarrier(old); if (thread->dblb) // double barrier outOfLineBarrier(new);}
18
Write Barrier Slow PathoutOfLineBarrier(Object obj) { if (obj == null || obj.marked) return;
obj.marked = true;
bool epochOK = thread->wbuf->epoch == WBufEpoch; bool haveRoom = thread->wbuf->data < thread->wbuf->end;
if (! (epochOK && enoughSpace)) thread->wbuf = flushWBufAndAllocNew(thread->wbuf); // Updates WBufEpoch, Epoch.newest
*thread->wbuf->data++ = obj;}
19
“Trace Terminate” Phase
20
Trace Termination
wbuffillwbuffill
wbuftracewbuftracetotracetotrace freefree
WBufEpochWBufEpoch
39
WBufCountWBufCount
0
21
Asynchronous Agreement
BarrierOnBarrierOntrue
BarrierOnBarrierOntrue
PhaseInfo
phase workers
PhaseInfo
phase workerstrace 1
Epoch
agreed newest
Epoch
agreed newest4237
PhaseInfo
phase workers
PhaseInfo
phase workersterminate 1
desiredEpoch = Epooch.newest;…WAIT FOR Epoch.agreed == desiredEpoch if (WBufCount == 0) DONE else RESUME TRACING
22
Ragged Barrierbool raggedBarrier(desiredEpoch, urgent) { if (Epoch.agreed >= desiredEpoch) return true;
LOCK(threadlist); int latest = MAXINT; for each (Thread thread in threadlist) latest = min(latest, thread.epoch);
Epoch.agreed = latest; UNLOCK(threadlist);
if (epoch.agreed >= desiredEpoch) return true; else doCallbacks(RAGGED_BARRIER, true, urgent); return false;} * Non-locking implementation?
23
Part 1: Scan Roots• Initiation
– Setup• turn double barrier on
• Root Scan– Active Finalizer scan– Class scan– Thread scan**
• switch to single barrier, color to black
– Debugger, JNI, Class Loader scan
• Trace – Trace*– Trace Terminate***
• Re-materialization 1– Weak/Soft/Phantom Reference List Transfer– Weak Reference clearing** (snapshot)
• Re-Trace 1– Trace Master – (Trace*)– (Trace Terminate***)
• Re-materialization 2– Finalizable Processing
• Clearing– Monitor Table clearing– JNI Weak Global clearing– Debugger Reference clearing– JVMTI Table clearing– Phantom Reference clearing
• Re-Trace 2– Trace Master – (Trace*)– (Trace Terminate***)– Class Unloading
• Completion– Finalizer Wakeup– Class Unloading Flush– Clearable Compaction**– Book-keeping
* Parallel** Callback*** Single actor symmetric
• Flip – Move Available Lists to Full List* (contention)
• turn write barrier off
– Flush Per-thread Allocation Pages**• switch allocation color to white• switch to temp full list
• Sweeping– Sweep*– Switch to regular Full List**– Move Temp Full List to regular Full List* (contention)
24
Fuzzy Snapshot
• Finally, we assume no magic
• Initiate Collection
25
“Initiate Collection” Phase
26
Initiate: Color Black, Double Barrier
1616 6464 256256
Thread 1 inVM
1616 6464 256256
Thread 2
epoch 37
inVM
writebuf
43
pthread pthread_tpthread_t
writebuf
41
pthread pthread_tpthread_t
next
next
threadlist
epoch 43
BarrierOnBarrierOntrue
BarrierOnBarrierOntrue
PhaseInfo
phase workers
PhaseInfo
phase workers
initiate 3
color
color
sweep
sweep
Epoch
agreed newest
Epoch
agreed newest4237
color
color
dblb dblb
dblb dblb
27
What is a Double Barrier?Store both Old and New Pointers
Stack
pp
rr
TTa
b
XXa
b UUa
b
ZZa
b
WWa
b
YYa
b
28
Why Double Barrier?
T1Stack
ppXXa
b
ZZa
b
WWa
b
T2Stack
nn
“Snapshot” = { V, W, X, Z }
VVa
b T3Stack
kk
mm
jj
T2: m.b = n (writes X.b = W)T3: j.b = k (writes X.b = V)T1: q = p.b (reads X.b: V, W, or Z??)
29
Yuasa (Single) Barrier with 2 Writers
T1Stack
ppXXa
b
ZZa
b
WWa
b
T2Stack
nn
VVa
b T3Stack
kk
mm
jj
T2: m.b = n (X.b = W)
T2 T3
T3: j.b = k (X.b = V)
30
Yuasa Barrier Lost Update
T1Stack
ppXXa
b
ZZa
b
WWa
b
T2Stack
nn
VVa
b T3Stack
kk
mm
jj
T2: m.b = n (X.b = W)
T2 T3
T3: j.b = k (X.b = V)
31
Hosed!
T1Stack
ppXXa
b
ZZa
b
WWa
b
T2Stack
nn
VVa
b T3Stack
kk
mm
jj
T2 T3T1
T2: m.b = n (X.b = W)T3: j.b = k (X.b = V)T1: q = p.b (q <- W)T2: n = null
T1: Scan Stack
T2: Scan Stack
32
“Thread Stack Scan” Phase
33
Scan Stacks (double barrier off)
1616 6464 256256
Thread 1 inVM
1616 6464 256256
Thread 2
epoch 37
inVM
writebuf
43
pthread pthread_tpthread_t
writebuf
41
pthread pthread_tpthread_t
next
next
threadlist
epoch 43
BarrierOnBarrierOntrue
BarrierOnBarrierOntrue
PhaseInfo
phase workers
PhaseInfo
phase workers
initiate 3
color
color
sweep
sweep
Epoch
agreed newest
Epoch
agreed newest4237
color
color
dblb dblb
dblb dblb
Scan Stack 1
Scan Stack 2
34
All Done!
QuickTime™ and a decompressor
are needed to see this picture.
35
Boosting: Ensuring Progress
GCWorkerThreads
GCMasterThread
ApplicationThreads
(may do GC work)
Boost to master priority
Revert(?)
36
Part 4: Defragmentation• Initiation
– Setup• turn double barrier on
• Root Scan– Active Finalizer scan– Class scan– Thread scan**
• switch to single barrier, color to black
– Debugger, JNI, Class Loader scan
• Trace – Trace*– Trace Terminate***
• Re-materialization 1– Weak/Soft/Phantom Reference List Transfer– Weak Reference clearing** (snapshot)
• Re-Trace 1– Trace Master – (Trace*)– (Trace Terminate***)
• Re-materialization 2– Finalizable Processing
• Clearing– Monitor Table clearing– JNI Weak Global clearing– Debugger Reference clearing– JVMTI Table clearing– Phantom Reference clearing
• Re-Trace 2– Trace Master – (Trace*)– (Trace Terminate***)– Class Unloading
• Completion– Finalizer Wakeup– Class Unloading Flush– Clearable Compaction**– Book-keeping
* Parallel** Callback*** Single actor symmetric
• Flip – Move Available Lists to Full List* (contention)
• turn write barrier off
– Flush Per-thread Allocation Pages**• switch allocation color to white• switch to temp full list
• Sweeping– Sweep*– Switch to regular Full List**– Move Temp Full List to regular Full List* (contention)
• Defragmentation
37
freefree?
free16
free16
free64
free64
free256free256
alloc’dalloc’d sweepsweep
2562561616 6464
38
Two-way Communication
GCWorkerThreads
GCMasterThread
ApplicationThreads
(may do GC work)
objects have moved
pointers have changed
39
Defragmentation
T T a
b
U U a
b
Z Z a
b
X X a
b
T’ T’ a
b
C C a
b
B B a
b
D D a
b
E E a
b
40
Staccato Algorithm
FREE
T T a
b
NORMAL
T T a
b
COPYING
T T a
b
MOVED
T T a
b
T’ T’ a
b
T’ T’ a
b
allocatenop
defragmentcas
access (abort)cas
commitcas
accessload
accessload
reapstore
createstore
41
Scheduling
42
Guaranteeing Real Time
• Guaranteeing usability without realtime:– Must know maximum live memory
• If fragmentation & metadata overhead bounded
• We also require:– Maximum allocation rate (MB/s)
• How does the user figure this out???– Very simple programming style– Empirical measurement– (Research) Static analysis
43
Conclusions
• Systems are made of concurrent components
• Basic building blocks:– Locks– Try-locks– Compare-and-Swap– Non-locking stacks, lists, …– Monotonic phases– Logical clocks and asynchronous agreement
• Encapsulate so others won’t suffer!
44
http://www.research.ibm.com/metronome
https://sourceforge.net/projects/tuningforkvp
45
Legend
GC Phases
APP
TRACE
SNAPSHOT
FLIP
SWEEP
APP
APP
APP
APP
APP
APP
APP
APP APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
TERMINATE
APP
APP
APP
APP
APP
APP
APP
APP
…………
…………