18
Efficient Concurrent Mark-Sweep Cycle Collection Daniel Frampton, Stephen Blackburn, Luke Quinane and John Zigman (Pending submission) Presented by Jose Joao CS395T - Mar 23, 2009

Efficient Concurrent Mark-Sweep Cycle Collection

  • Upload
    jacqui

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Efficient Concurrent Mark-Sweep Cycle Collection. Daniel Frampton, Stephen Blackburn, Luke Quinane and John Zigman (Pending submission). Presented by Jose Joao CS395T - Mar 23, 2009. Outline. Motivation Backup tracing Trial deletion Mark-Sweep Cycle Detection (MSCD) Results - PowerPoint PPT Presentation

Citation preview

Page 1: Efficient Concurrent Mark-Sweep Cycle Collection

Efficient Concurrent Mark-Sweep Cycle Collection

Daniel Frampton, Stephen Blackburn, Luke Quinane and John Zigman

(Pending submission)

Presented by Jose JoaoCS395T - Mar 23, 2009

Page 2: Efficient Concurrent Mark-Sweep Cycle Collection

Outline

• Motivation– Backup tracing– Trial deletion

• Mark-Sweep Cycle Detection (MSCD)• Results– What worked and what didn’t

• Discussion

Page 3: Efficient Concurrent Mark-Sweep Cycle Collection

Motivation• Reference counting can directly (i.e. locally)

identify garbage– Low pause times– Reasonable throughput (deferred , coalescing,

ulterior)– But it cannot reclaim circular garbage

• Existing general solutions are expensive:– Trace the whole heap (backup tracing)– Temporarily delete an object and see if the cycle

collapses (trial deletion)

Page 4: Efficient Concurrent Mark-Sweep Cycle Collection

Trial deletion• Is partial mark-sweep (no roots required): find objects that

are alive only because they are reachable from themselves

• Three phases:– Assume candidate object is dead and mark&decrement children

recursively.– Trace again from candidate object, marking &incrementing if

some RC is not zero, i.e. if the object is externally reachable– Sweep objects with a zero count

• Bacon and Rajan: process candidates en masse, avoid acyclic objects, concurrent algorithm

• Usually less efficient than concurrent tracing

Page 5: Efficient Concurrent Mark-Sweep Cycle Collection

Backup tracing

• Trace all live objects and sweep the entire heap

• Shortcomings:– Increases pause times– Concurrency for low pause times requires

synchronization, e.g. write barrier– Visits all objects, although some cannot be part of a

cycle

Page 6: Efficient Concurrent Mark-Sweep Cycle Collection

MSCD: base algorithm

1. Add roots to mark queue2. Mark until empty mark queue

1. Pop from queue and process (mark, scan and add children to queue)

2. Enqueue objects subject to races (fixup set)

3. Sweep

Page 7: Efficient Concurrent Mark-Sweep Cycle Collection

MSCD: concurrency• Builds on top of coalescing RC with a snapshot-at-the-beginning

write barrier:

Atomic state update to process each object only once

1) Record all pre-mutation pointers for deferred decrement RC

2) Record object as mutated

Page 8: Efficient Concurrent Mark-Sweep Cycle Collection

MSCD: concurrency

Black: marked and scannedGrey: marked, not yet scannedWhite: not yet visited

C is never visited and incorrectly collected

Again, C is never visited and incorrectly collected

Same here…

Necessary conditions for a race:• Create a pointer from a black to a white object C• Destroy the last path from a grey object to that white object C

Necessary conditions for a race:• Create a pointer from a black to a white object C• Destroy the last path from a grey object to that white object C

RC(C): 1 → 2 → 1RC(C): 1 → 2 → 1

RC(C): 1 → 2 → 1RC(C): 1 → 2 → 1

RC(E): 2 → 1RC(E): 2 → 1

Page 9: Efficient Concurrent Mark-Sweep Cycle Collection

MSCD: concurrencyKey insight: how to reduce the size of fixup set?

Use the set of objects with RC decremented to a non-zero value– These decrements are necessary condition for cyclic

garbage– These decrements are uncommon– Easy to identify while processing the decrement buffer

(after increments)– Robust to coalescing of reference counts– These are the purple objects or candidates for trial deletion

(Bacon&Rajan)– It’s enough to compute this set at tracing time

– Trade-offs?

Page 10: Efficient Concurrent Mark-Sweep Cycle Collection

MSCD: marking• Statically determine acyclic classes:

– No pointer fields, or– Can point only to acyclic classes

• Set green bit in header of acyclic objects at allocation time

• Ignore green objects for the fixup set (step 2.2 of base algorithm?)– why only step 2.2? How about step 2.1?– the sweep phase also has to consider green objects as marked

• How about green objects pointed to only by non-green objects in a cycle?

• Trade-offs?

Page 11: Efficient Concurrent Mark-Sweep Cycle Collection

MSCD: sweeping• Sweep only potentially cyclic objects and their children

• Start with all purple objects

• Trade-offs?– Much cheaper than scanning the heap– Require keeping the set of all purple objects identified

since last cycle detection, not only during tracing• Space overhead• Time overhead of filtering the purple set from

RC-collected objects• Overhead increases with time between cycle detections!

Page 12: Efficient Concurrent Mark-Sweep Cycle Collection

MSCD: implementation

• Interaction with the reference counter– Establish roots atomically – Add complete fixup set to mark queue– RC must not free objects pointed to by MSCD (mark queue

and fixup queue): free buffer

• Invocation heuristics– When RC is unable to free enough memory (?)– Heap fullness threshold– Size of the purple set– Can do trial deletion or backup tracing instead of MSCD

Page 13: Efficient Concurrent Mark-Sweep Cycle Collection

MSCD: possible timing

Mutator RC

Roots

Mutator RC Mutator RC

MSCD: marking

FixupNew(grey)

marking

Fixup

Finalmarking

Sweeping

Mutator

New(grey)

Fixup Fixup Fixup

Page 14: Efficient Concurrent Mark-Sweep Cycle Collection

Methodology and Results• Jikes RVM 2.3.4+CVS, MMTk• Dacapo beta050224, SPECjvm98 and pseudojbb

• Stop-the-world (i.e. limit) throughput: – Trial deletion is about 70% worse than Backup MS, while

MSCD is about 20% better than Backup MS.– MSCD visits only 12% fewer nodes:

• green objects on the fringe still have to be visited, • green objects are short lived (many allocated, fewer on the

heap at a given time)– MSCD has about 7% cheaper cost per visited node:

• green objects not scanned, • sweep optimization

Page 15: Efficient Concurrent Mark-Sweep Cycle Collection

More Results• Concurrent throughput:– Bug in base and MSCD running on SMT (why not CMP?)– Time-slicing (i.e. single-context uniprocessor): no benefit

from concurrency optimization → fixup is too small

• Overall performance (stop-the-world CD triggered by insufficient reclamation by RC):– MSCD with mark opt. is better than MSCD with both mark

and sweep opt. due to overhead of maintaining the purple set– Overhead of gray bit and green bit– Heuristics to trigger CD matters, especially on tight heaps– Generations (e.g. ulterior RC) could reduce cycle detection

load

Page 16: Efficient Concurrent Mark-Sweep Cycle Collection

Discussion• Main ideas: reduce the cost of backup MS by:– stopping mark at the green-object frontier,– start sweep from purple objects,– reusing the concurrency mechanism from coalescing RC

• Figure 6 shows about 50% of the total time is GC+CD (!)• Baseline is non-generational deferred/coalescing RC.

• Why not testing concurrency on CMP in addition to/instead of SMT?

• Synchronization is still required in the write barrier, although they claim the guard can be removed (?)

?

Page 17: Efficient Concurrent Mark-Sweep Cycle Collection

Open questions

• Invocation heuristics (trade-offs?)– When running out of heap– At some heap occupancy threshold– Some form of estimating that there is enough

cyclic garbage to trigger CD?– Hints from programmer/compiler?

• Can we do better with CMPs?

Page 18: Efficient Concurrent Mark-Sweep Cycle Collection

Qustions for the authors

• Old version of Jikes RVM. Why? Does it matter?

• For xalan and compress, green% + cycle% > 100%• Table 2 and Figure 5 don’t agree