40
CGCExplorer: A Semi-Automated Search Procedure for Provably Correct Concurrent Collectors Martin Vechev Eran Yahav David Bacon University of Cambridge IBM T.J. Watson Research Center Noam Rinetzky Tel Aviv University

CGCExplorer: A Semi-Automated Search Procedure for Provably Correct Concurrent Collectors Martin Vechev Eran Yahav David Bacon University of CambridgeIBM

Embed Size (px)

Citation preview

CGCExplorer: A Semi-Automated Search Procedurefor Provably Correct Concurrent Collectors

Martin Vechev Eran Yahav David Bacon

University of Cambridge IBM T.J. Watson Research Center

Noam Rinetzky

Tel Aviv University

Synthesizing Concurrent Algorithms

Designing practical and efficient concurrent algorithms is hard trading off simplicity for performance fine-grained coordination

Result: sub-optimal, buggy algorithms

Need a more structured approach to synthesize correct and optimal implementations out of coarse-grained specificationsSome tasks are best done by machine,while others are best done by human insight;and a properly designed system will find the right balance.– D. Knuth

Synthesizing Concurrent Collectors

Concurrent garbage collectors Widely used Must be correct, but also fast and scalable Many algorithms, not many formal proofs

A challenge problem for verification and synthesis• Concurrency• Heap with no a priori bound

Focus on a specific family of collection algorithms A generalization of Dijkstra’s algorithm Concurrent, Tracing, Non-moving• Single mutator, single collector (non-parallel)

Contributions

Unifying framework – collection algorithms as common skeleton with parametric functions

Trace StepMutator Step

Expose

Mutator Collector

Contributions

specified various sets of blocks

in 10 cycles

explored 1,600,000 collection algorithms

found 6 correct algorithms

hundreds of variations

Contributions

Overview

High-level design Find a sufficient local invariantFind a sufficient abstraction

Low-level search Verify local invariant

High-level design Find algorithm outlineFind building blocks

Low-level search explore algorithm space

Generation

Verification

Algorithm Space - Counting Algorithms

Track collector’s progress (wavefront)Count pointer installations from behind wavefront

Increment on install, decrement on delete Up to a predetermined counting threshold

expose objects with count > 0 when finished tracing

root

scanned field

object header

1

Collector wavefront

update source field to target objcheck wavefrontif source field behind wavefront - update new target object count - update old target object count

read field valueupdate wavefront (collector progress)mark target object

select objects with count > 0produce new roots

Counting Algorithms: High Level View

Trace StepMutator Step

Expose

Mutator Collector

{ M1: old = source.field M2: w = source.field.WF M3: w new.MC++ M4: w log = log U {new} M5: w old.MC-- M6: source.fld = new}

{ C1: dst = source.field C2: source.field.WF = true C3: mark dst}

{ E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o} return V}

Trace Step (source, field)Mutator Step (source, field, new)

Set Expose (log)

Coarse-Grained to Fine-Grained Synchronization

What now ?Can we remove atomics ?

Result is incorrect, may lose objects!

atomicatom

ic

atomic

{ M1: old = source.field M2: w = source.field.WF M3: w new.MC++ M4: w log = log U {new} M5: w old.MC-- M6: source.fld = new}

{ C1: dst = source.field C2: source.field.WF = true C3: mark dst}

{ E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o} return V}

Trace Step (source, field)Mutator Step (source, field, new)

Set Expose (log)

What now ?Can we remove atomics ?

Coarse-Grained to Fine-Grained Synchronization

{ C1: dst = source.field C2: source.field.WF = true C3: mark dst}

{ M1: old = source.field M2: w = source.field.WF M5: w old.MC-- M3: w new.MC++ M4: w log = log U {new} M6: source.fld = new}

{ E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o} return V}

Trace Step (source, field)Mutator Step (source, field, new)

Set Expose (log)

What now ?Can we remove atomics ?

“When in doubt, use brute force.” --Ken Thompson “When in doubt, use brute force.” --Ken Thompson

Coarse-Grained to Fine-Grained Synchronization

Tracing Step Building BlocksMutator Building Blocks

Expose Building Blocks

M1: old = source.fieldM2: w = source.field.WFM3: w new.MC++M4: w log = log U {new} M5: w old.MC--M6: source.fld = new

C1: dst = source.fieldC3: mark dstC2: source.field.WF = true

E1: o= remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o}

System Input – Building Blocks

Input Constraints

• Mutator blocks: [M3, M4]• Tracing blocks: [C1, C3]• Expose blocks: [ E1, E2, E3, E4 ]

• Dataflow e.g. M2 < M3

System Output – (Verified) Algorithms

Mutator Step (source, field, new)

{ M1: old = source.field

M6: source.fld = new M2: w = source.field.WF

M3: w new.MC++ M4: w log = log U {new}

M5: w old.MC—}

Set Expose(log)

{ E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o}}

Trace Step (source, field){ C1: dst = source.field C3: mark dst C2: source.field.WF = true }

Explored 306 variations in around 2 mins

Least atomic (verified) algorithm with given blocks

But What Now ?

How do we get further improvement?Need more insightsNeed new building blocks

Example: start and end of collector reading a field

CoordinationMeta-data

Atomicity Ordering

Continuing the Search…

We derived a non-atomic algorithm (at the granularity of blocks) Non atomic write-barrier, collector step and expose System explored over 1,600,000 algorithms (took ~34 hours)

All experiments took ~41 machine hours and ~3 human hours

CGC: Challenge for Automatic Verification

Unbounded heap and sequence of mutations

Checking a global invariant is hard State space too big even for partial checking 3 nodes can quickly consume several GB in the SPIN model checker

Solution • Manually boil down to a local invariant• Automatically prove local invariant Use abstraction - unbounded number of concrete nodes

conservatively represented by small, bounded number of abstract nodes

What Do We Prove?

Want to prove collector safety Retaining all live objects

Local invariant: for every object If an object is referenced from a scanned field at

time of expose, it is either marked, or its count > 0

Show for any arbitrary object, under any arbitrary sequence of mutations

hiddnhiddn

2

root

scanned field

Abstraction Intuition

Select tracked representative objectTrack reference count only for the selected object

object header

wavefront

hiddnhiddn

2

root

Abstraction Intuition

Only up to a fixed number of pointers matter – up to counting threshold• Track these precisely• Forget the rest

scanned field

object header

wavefront

Recap

High-level design Find a sufficient local invariantFind a sufficient abstraction

Low-level search Verify local invariant

High-level design Find algorithm outlineFind building blocks

Low-level search Explore algorithm space

Generation

Verification

Find proof outlineFind proof building blocks

What’s next?

Concurrent Collector Synthesis Get real algorithms Mapping to real machine instructions

• Yet another level of search

Synthesis of other concurrent algorithms In the pipeline – concurrent set algorithms

Local abstractions for concurrent programs

Invited Questions

1) Are your algorithms practical? 2) What are the limitations of this approach?

Would it work for my problem? 3) How do you prove that your algorithms

terminate?4) Can you show another algorithm?5) How do you reduce the number of calls to the

model-checker?6) You didn’t mention any related work7) Can you give more details on experimental

results?

ANSWERS FOLLOW

Where Do Building Blocks Come From?

Read/write of heap location, andCollector coordination meta-data

e.g., collector progress, state flags

start_1

start_2

count marked

end_1

end_2

fld_1

fld_2

header

fld_2 start_3 end_36 bits

5 bits

1 bit

0 bits

start_1

start_2

count marked

end_1

end_2

fld_1

fld_2

header

fld_2 start_3

start_2

count marked

end_1

end_2

fld_1

fld_2

header

fld_2 start_3 end_3

start_1

count marked

fld_1

fld_2

header

fld_2

count marked

fld_1

fld_2

header

fld_2 end_3

count marked

fld_1

fld_2

header

fld_2

Progress Coordination Metadata

Collector Building BlocksMutator Building Blocks

Expose Building Blocks

E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o}

Refined Input – Finer Building Blocks

M1: old = source.field M2s: ws = source.field.WFs M2e: we = source.field.WFe M3s: ws new.MC++ M4s: ws log = log U {new} M5e: we old.MC--M6: source.fld = new

C1: dst = source.fieldC3: mark dst C2s: source.field.WFs = true C2e: source.field.WFe = true

Input Constraints

• Mutator: [ M3s, M4s ]• Tracing: [C1, C3], C2s < [C1, C3] < C2e• Expose: [ E1, E2, E3, E4 ]

• Dataflow: e.g. M2s < M3s

Trace Step (source, field)Mutator Step (source, field, new)

Set expose (log)

{ M1: old = source.field M2e: we = source.field.WFe M6: source.fld = new M2s: ws = source.field.WFs M3s: ws new.MC++ M4s: ws log = log U {new} M5e: we old.MC– }

{ C2s: source.field.WFs = true

C1: dst = source.field C3: mark dst

C2e: source.field.WFe = true}

{ E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o}

}

System Output

• Constraints = Insights. e.g.:

M2e < M6 < M2s

C2s < C13 < C2e

and.

(Some) Related Work

Superoptimizer: a look at the smallest program, Massalin, ASPLOS’87 Finite state, limited length of instruction sequences

Programming by Sketching, Solar-Lezama et. al., PLDI’05 Finite state

Sketching with Stencils, Solar Leazma et. al., PLDI’07

Automatic discovery of mutual exclusion algorithms, Bar David and Taubenfeld, PODC’03 Finite state

Correctness-Preserving Derivation of Concurrent Garbage Collection Algorithms, PLDI’06

CheckFence: Sebastian Burckhardt, Rajeev Alur and Milo M. K. Martin, PLDI’07

Algorithm Exploration

lessatomic

moreatomic

differentorders

Algorithm Exploration

lessatomic

moreatomic

differentorders

lessatomic

moreatomic

differentorders

lessatomic

moreatomic

differentorders

Trace StepMutator Step

Expose

Limitations

Need algorithm designer insights Designer needs to understand results of each

phase

Abstraction is tailor-made Designing an abstraction for the next collector?

Pushing the limits of current model-checkers Multiple mutators? Unbounded number of

mutators? Better partial-order reduction may help

Are Your Algorithms Practical?

Are your algorithms correct?Honest answer: not yet

So far focused on correctness more than on performance

However, counting algorithms are of practical interest

The moral is that for the design of multiprocessor installations we cannot rely on the traditional approach of the optimistic engineer, who, when the design looks reasonable, puts it together to see if it works. -- Edsger W.Dijkstra

Experimental Results

Run Total Checked Correct Time (min)

1 306 45 1 2

2 2744 162 2 34

3 12 7 2 1

4 592 146 14 56

5 32 26 1 1

6 3024 550 80 212

7 Timed out

8 6144 127 10 39

9 1624320 1833 6 2072

10 364032 288 0 39

TOTAL 2001206 3184 116 2456

+ About 180 minutes of human working with the system

(3.8 Ghz Xeon processor and 8 Gb memory running version 4 of RedHat Linux.)

Why Does it Work?

Ingredients Relentless optimism Limited setting

Limited Setting single collector, single mutator counting threshold is known algorithm skeleton is fixed algorithm uses a barrier before moving to the

sweep phase … (see paper)

Concurrent Single mutator, single collector (not parallel)

Tracing Computes transitive reachability from roots

Non-Moving Collector does not relocate objects

Algorithm Space - Counting Algorithms

How Do You Prove Termination?

Manually

DEMONS START HERE IF NOT EARLIER

Synthesizing Concurrent Algorithms

Some tasks are best done by machine,while others are best done by human insight;and a properly designed system will find the right balance.– D. Knuth

it seems unavoidable that multiprocessor installations will be built… it seems equally unavoidable that many of them will be put together by aforementioned optimistic engineer. I shudder at the thought of all the new bugs: they will only delight the Devil. Am I too pessimistic? Nobody knows the trouble I have seen...

--Edsger W.Dijkstra