39
Paraglide Martin Vechev Eran Yahav

Paraglide Martin Vechev Eran Yahav Martin Vechev Eran Yahav

Embed Size (px)

Citation preview

ParaglideParaglideMartin Vechev Eran Yahav

Martin Vechev Eran Yahav

Synthesizing System-Level Software

RequirementsCorrectnessScalability

Response time

RequirementsCorrectnessScalability

Response time

ChallengesCrossing abstraction levels

Hardware complexityTime to market

ChallengesCrossing abstraction levels

Hardware complexityTime to market

Highly Concurrent Algorithms

Parallel pattern matching

Anomaly detection

Parallel pattern matching

Anomaly detectionVoxel trees

Polyhedrons…

Voxel treesPolyhedrons

…Scene graph traversalPhysics simulationCollision Detection

Scene graph traversalPhysics simulationCollision Detection

Cartesian tree (fast fits)Lock-free queue

Garbage collection…

Cartesian tree (fast fits)Lock-free queue

Garbage collection…

Goal

Generate efficient provably correct components of concurrent systems from higher-level specs· Verification/checking integrated into the design

process· Automatic exploration of implementation details

Synthesize critical components· System-level code· Explore tradeoffs Some tasks are best done by machine,while others are best done by human insight;and a properly designed system will find the right balance.– D. Knuth

Implementation

??Manual Construction• Hard to verify/test• Often buggy• Did the programmer choose well??• One time deal

Memory ModelThread Model

Concurrency PrimitivesCPU primitives

Optimistic concurrencyAdding metadata

Adding space…

ENVIRONMENT REQUIREMENTS BAG OF TRICKS

ThroughputMemory Consumption

Pause Time…

High(er) level description

SYSTEM SPEC

Current Approach: Manual Construction

Memory ModelThread Model

Concurrency PrimitivesCPU primitives

Optimistic concurrencyAdding metadata

Adding space…

Implementation

ENVIRONMENT REQUIREMENTS BAG OF TRICKS

??

ThroughputMemory Consumption

Pause Time…

ImplementationImplementation

Alternativeimpls

Our Vision

Machine Assistance• Auto checking/verification • Auto exploration

of implementation details

• Repeatable

Machine Assistance• Auto checking/verification • Auto exploration

of implementation details

• Repeatable

High(er) level description

SYSTEM SPEC

Example: Concurrent Set Algorithm

Systematically derived with machine assistanceCorrectness – automatically verified

Performance – only uses CAS

Systematically derived with machine assistanceCorrectness – automatically verified

Performance – only uses CAS

Why Should You Care?

Correctness· Checking/verification integrated into the design process

Performance· Systematic exploration beats human in crossing levels

of abstraction, leveraging non-intuitive memory models, etc.

· Systematic exploration produces many candidates with varying tradeoffs

Adaptability· Shorter development cycle for adapting system to a

new environment

Correctness· Checking/verification integrated into the design process

Performance· Systematic exploration beats human in crossing levels

of abstraction, leveraging non-intuitive memory models, etc.

· Systematic exploration produces many candidates with varying tradeoffs

Adaptability· Shorter development cycle for adapting system to a

new environment

Why Should You Care?

Why There is Hope?

Designer effort· Provide insights that are also required in

manual constructionCorrectness

· Checking helps eliminate large number of incorrect candidates

· Designer can focus on remaining candidatesPerformance

· …Adaptability

· …

Why There is Hope II ?

Transformational derivation· Concurrent garbage collection algorithms [PLDI’06]

Combinatorial exploration · Concurrent GC algorithms [PLDI’07]· Concurrent set algorithms [PLDI’08]

Automatic Verification · Comparison under Abstraction for Verifying Linearizability

[Amit, CAV’07]· Shape Analysis for Concurrent Programs [TAU]· …

Risk Summary

Designer Effort· Return on designer “investment”· Is the result competitive with manually crafted

system?· Is the tool working in the right level of

abstraction?

Verification· scalability

Outline

Technical details· Commonalities between concurrent algorithms· Adapting to a changing environment· Preliminary experience: our combinatorial

approachPlan

· Succeed EarlyMany open questions

· Common representation· “more efficient”· …

Ben-Ari Base ‘84

Dijkstra(C) ‘78

Doligez(C) ‘93

Azatchi ‘03

Domani ‘03

Yuasa ‘90

Pixley ‘88

Ben-Ari Base ‘84

Doligez ‘94

Ben-Ari Extended ‘84

Steele(C) ‘75

Boehm ‘91

Barabash ‘03

AL

GO

RIT

HM

SP

RO

OF

SExample: “The Origin of GCs”

Incorrect

Correct

(C) Corrected

FA

MIL

Y

Example: Concurrent Set Algorithms

Harris ‘01

Michael ‘02

Heller ‘05

Valois ‘95

Ruppert ‘04

Massalin ‘91

Greenwald ‘99

Adapting to a Changing Environment

Algorithm

Synch primitives

Memory model

Thread model

Memorymanager

Scheduler …

Families of algorithms sharing a common skeleton with parametric functions

Trace StepMutator Step

Expose

Mutator Collector

Machine Assisted Design Process

Machine Assisted Design Process

Overview

High-level design Find a sufficient local invariantFind a sufficient abstraction

Low-level search Verify local invariant

High-level design Find algorithm outlineFind building blocks

Low-level search explore algorithm space

Generation

Verification

{ M1: old = source.field M2: w = source.field.WF M3: w new.MC++ M4: w log = log U {new} M5: w old.MC-- M6: source.fld = new}

{ C1: dst = source.field C2: source.field.WF = true C3: mark dst}

{ E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o} return V}

Trace Step (source, field)Mutator Step (source, field, new)

Set Expose (log)

Coarse-Grained to Fine-Grained Synchronization

What now ?Can we remove atomics ?

Result is incorrect, may lose objects!

atomicatom

ic

atomic

{ M1: old = source.field M2: w = source.field.WF M3: w new.MC++ M4: w log = log U {new} M5: w old.MC-- M6: source.fld = new}

{ C1: dst = source.field C2: source.field.WF = true C3: mark dst}

{ E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o} return V}

Trace Step (source, field)Mutator Step (source, field, new)

Set Expose (log)

What now ?Can we remove atomics ?

Coarse-Grained to Fine-Grained Synchronization

{ C1: dst = source.field C2: source.field.WF = true C3: mark dst}

{ M1: old = source.field M2: w = source.field.WF M5: w old.MC-- M3: w new.MC++ M4: w log = log U {new} M6: source.fld = new}

{ E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o} return V}

Trace Step (source, field)Mutator Step (source, field, new)

Set Expose (log)

What now ?Can we remove atomics ?

“When in doubt, use brute force.” --Ken Thompson “When in doubt, use brute force.” --Ken Thompson

Coarse-Grained to Fine-Grained Synchronization

Tracing Step Building BlocksMutator Building Blocks

Expose Building Blocks

M1: old = source.fieldM2: w = source.field.WFM3: w new.MC++M4: w log = log U {new} M5: w old.MC--M6: source.fld = new

C1: dst = source.fieldC3: mark dstC2: source.field.WF = true

E1: o= remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o}

System Input – Building Blocks

Input Constraints

• Mutator blocks: [M3, M4]• Tracing blocks: [C1, C3]• Expose blocks: [ E1, E2, E3,

E4 ]

• Dataflow e.g. M2 < M3

System Output – (Verified) Algorithms

Mutator Step (source, field, new)

{ M1: old = source.field

M6: source.fld = new M2: w = source.field.WF

M3: w new.MC++ M4: w log = log U {new}

M5: w old.MC—}

Set Expose(log)

{ E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o}}

Trace Step (source, field){ C1: dst = source.field C3: mark dst C2: source.field.WF = true }

Explored 306 variations in around 2 mins

Least atomic (verified) algorithm with given blocks

But What Now ?

How do we get further improvement?Need more insightsNeed new building blocks

· Example: start and end of collector reading a field

CoordinationMeta-data

Atomicity Ordering

Continuing the Search…

We derived a non-atomic algorithm (at the granularity of blocks)· Non atomic write-barrier, collector step and expose · System explored over 1,600,000 algorithms (took ~34 hours)

All experiments took ~41 machine hours and ~3 human hours

Plan

Identify application domainCase studies

· Concurrent garbage collection algorithms· Concurrent set algorithms· Concurrent memory allocator (used in metronome)· …

Dynamic tool for testing systems (ParaDyn)Abstraction-guided synthesisAutomatic verification using local abstractionsRepresentationChoosing the right starting point

Highly Concurrent Plan

Identify application domain

Case studies· Concurrent garbage

collection algorithms· Concurrent set

algorithms· Concurrent memory

allocator (used in metronome)

· …

Dynamic tool for testing systems (ParaDyn)

• Representation • Choosing the right starting point• …

Abstraction-guided synthesis

Automatic verification using local abstractions

Succeed Early

Choose “the right” domain· Correctness is critical· High performance· Highly dynamic (concurrent changes)· Custom architecture (?)· Irregular structures (?)· Workloads unknown at compile time· Examples: VM components, drivers for

embedded devices…

Longer-term Questions

Representation

Appropriate for transformation?Makes concurrency apparent?

Choosing the Right Starting Point?

“Higher-level specification” ?A sequential program?start with something else?

Add(S,x): S’ = S { x } Remove(S,x): S’ = S { x }

Contains(S,x): x S

What is “More Efficient”?

Multiple dimensions· Scalability· Response time· …

Theoretical models exist· Disjoint-access parallelism· …

Not clear whether existing theoretical models capture reality

Abstraction-Guided Synthesis

Guarantee correctness · synthesize only programs that can be proved

with your abstraction

Summary

Machine assisted design and implementation of correct efficient highly-concurrent algorithms

Designer provides insights, system explores implementation details

Business impact· Change the way concurrent systems are built· (More) Reliable high-performance systems. Shorter time

to marketScientific impact

· Realistic semi-automated synthesis of concurrent systems

Why us?

Our team has expertise in concurrency and verification of concurrent systems

We have preliminary experience with synthesizing concurrent algorithms in the domain of concurrent garbage collectors

We have ongoing collaborations with world experts on verification of concurrent programs, and with researchers working on parallel computing

THE END

Parallelization

Higher-level Underlying structure does not change

during computationSystem can be broken into independent

parts

Synthesizing Concurrent Systems

Designing practical and efficient concurrent systems is hard · trading off simplicity for performance· fine-grained coordination

Result: sub-optimal, buggy algorithms

Need a more structured approach to synthesize correct and optimal implementations out of coarse-grained specificationsSome tasks are best done by machine,while others are best done by human insight;and a properly designed system will find the right balance.– D. Knuth