44
Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov [email protected] Sumit Gulwani [email protected] And the rest of the PROSE team! [email protected] July 18, 2016 SYNT-16, Toronto, Canada 1

Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov [email protected] Sumit

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

Program Synthesis in the Industrial World:

Inductive, Incremental, Interactive

Alex Polozov

[email protected]

Sumit Gulwani

[email protected]

And the rest of the PROSE team!

[email protected]

July 18, 2016 SYNT-16, Toronto, Canada 1

Page 2: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

PROgram Synthesis using Examples

July 18, 2016 SYNT-16, Toronto, Canada 2

Vu Le Daniel Perelman

Danny Simmons

Adam SmithMohammad

RazaAbhishek

Udupa

Allen Cypher Sumit Gulwani

Ranvijay Kumar

Alex Polozov

R&D team, MSR → industrial Microsoft We are hiring! Interns or full-time

Page 3: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

This talk

Lessons

Challenges

Solutions

July 18, 2016 SYNT-16, Toronto, Canada 3

Page 4: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

Outline

Programming by Examples (PBE) & PROSE:Quick Background

Mass-Market Deployment↪ Goals

↪ Challenges

↪ Solutions

Discussion

July 18, 2016 SYNT-16, Toronto, Canada 4

Page 5: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

PBE & PROSEA 3-slide Background

July 18, 2016 SYNT-16, Toronto, Canada 5

Page 6: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

Motivation

99% of spreadsheet users

do not know programming

Data scientists spend 80%

time wrangling raw data

July 18, 2016 SYNT-16, Toronto, Canada 6

Page 7: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

PROSE Timeline

July 18, 2016 SYNT-16, Toronto, Canada 7

2010-2012

[POPL 11]

FlashFill(text transformations)

2012-2014

[PLDI 14]

FlashExtract(text extraction)

2012-2015

[PLDI 15]

FlashRelate(table transformations)

2014-2015

[OOPSLA 15]

FlashMeta(PBE framework)

PROSESDK

2015-present

Page 8: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

PBE Architecture

July 18, 2016 SYNT-16, Toronto, Canada 8

Program Synthesizer Debugging

Example-basedintent spec 𝜑

Ranking function ℎ DSL ℒ

Rankedprogram set ෩𝑁

Intendedprogram 𝑃 ∈ ℒ

Refined intent

Translator

Test inputs Ԧ𝜎 Intended program inPython/C#/C++/…

Page 9: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

Mass-Market Deployment Goals & Challenges

July 18, 2016 SYNT-16, Toronto, Canada 9

Page 10: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

July 18, 2016 SYNT-16, Toronto, Canada 10

User Experience

Inductive(intent is easily specified)

Interactive(facilitates the debugging cycle)

Scalable(snappy UI = responds in < 1 s)

Agile(quick software development)

Ambiguity resolution Incremental synthesis

Predictive synthesis Engineering practices

Page 11: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit
Page 12: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

Engineering practices

• Production-quality library code

• Prototyping still exists, but it’s not the final form

• Unit tests & TDD

• Integration tests: real-life scenarios

• Close to 8K for all DSLs in total

• Most are mined from public sources (e.g. help forums)

• In preparation: benchmark suite release for the community

July 18, 2016 SYNT-16, Toronto, Canada 13

Page 13: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

Performance-minded engineering

• Parallelization of learning matters

• E.g.: multi-user log file processing in Azure Log Analytics

• Performance of program execution matters

• E.g.: “Big Data” on an end-user’s machine

• Smallest ≠ fastest!

• (1) Synthesize many correct programs, then (2) optimize for the fast ones

July 18, 2016 SYNT-16, Toronto, Canada 14

Robustness-based ranking

Performance-based ranking

Page 14: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

Development

• DSL design: ≈ 10 months → ≈ 2 weeks

• This is not a bottleneck!*

• Ranking: bulk of the effort

• Designing a score for an operator 𝐹 is 2-3x longer than designing 𝐹 (incl. synthesis!)

• E.g.: rock-paper-scissors among string processing operators

July 18, 2016 SYNT-16, Toronto, Canada 15

* Once you learn the skill…

Should I process the string“25-06-11”with regexes? Treat itas a numeric computation? A date?

Page 15: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit
Page 16: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

From:

all lines ending with “Number ∘ Dot”

“Space ∘ Number ∘ Dot”

starting with “Word ∘ Space ∘ CamelCase”

Extract:

the first “Number” before a “Dot”

the last “Number” before a “Dot”

the last “Number” before a “Dot ∘ LineBreak”

the last “Number”

text between the last “Space” and the last “Dot”

the first “Comma ∘ Space” and the last “Dot ∘ LineBreak”

…and up to 1020 more candidates

18July 18, 2016 SYNT-16, Toronto, Canada

Page 17: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

Anecdotes

• FlashFill was not accepted to Excel until it solved the most common scenarios from 1 example

• Some users still don’t know you can give 2!

July 18, 2016 SYNT-16, Toronto, Canada 19

Adam Smith Adam

Alice Williams Alic

Page 18: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

Ambiguity resolution

Option 1: machine-learned robustness-based ranking [CAV 15]

• Idioms/patterns from test data can influence search & ranking

• E.g.: bucketing

July 18, 2016 SYNT-16, Toronto, Canada 20

100 76-100

51 51-75

86

Page 19: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

x ⇒ Concat(Round(x, Down, 25), Const(“-”), Round(x, Up, 25))

Ambiguity resolution

Option 1: machine-learned robustness-based ranking [CAV 15]

• Idioms/patterns from test data can influence search & ranking

• E.g.: bucketing

July 18, 2016 SYNT-16, Toronto, Canada 21

100 76-100

51 51-75

86

Page 20: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

x ⇒ Concat(Round(x, Down, 25), Const(“-”), Round(x, Up, 25))

Ambiguity resolution

Option 1: machine-learned robustness-based ranking [CAV 15]

• Idioms/patterns from test data can influence search & ranking

• E.g.: bucketing

July 18, 2016 SYNT-16, Toronto, Canada 22

100 76-100

51 51-75

86

Page 21: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

x ⇒ Concat(Round(x, Down, 25), Const(“-”), Round(x, Up, 25))

Ambiguity resolution

Option 1: machine-learned robustness-based ranking [CAV 15]

• Idioms/patterns from test data can influence search & ranking

• E.g.: bucketing

Option 2: interactive clarification

July 18, 2016 SYNT-16, Toronto, Canada 23

100 76-100

51 51-75

86

Page 22: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit
Page 23: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

PBE Architecture

July 18, 2016 SYNT-16, Toronto, Canada 26

Program Synthesizer Debugging

Example-basedintent spec 𝜑

Ranking function ℎ DSL ℒ

Rankedprogram set ෩𝑁

Intendedprogram 𝑃 ∈ ℒ

Refined intent

Translator

Test inputs Ԧ𝜎 Intended program inPython/C#/C++/…

Page 24: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

PBE Architecture

July 18, 2016 SYNT-16, Toronto, Canada 27

Program Synthesizer User

Example-basedintent spec 𝜑

Ranking function ℎ DSL ℒ

Rankedprogram set ෩𝑁

Intendedprogram 𝑃 ∈ ℒ

Refined intent

Translator

Test inputs Ԧ𝜎 Intended program inPython/C#/C++/…Hypothesizer

Questions

Page 25: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

PBE Architecture

July 18, 2016 SYNT-16, Toronto, Canada 28

Program Synthesizer User

Example-basedintent spec 𝜑

Ranking function ℎ DSL ℒ

Rankedprogram set ෩𝑁

Intendedprogram 𝑃 ∈ ℒ

Refined intent

Translator

Test inputs Ԧ𝜎 Intended program inPython/C#/C++/…Hypothesizer

Questions

Page 26: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

Hypothesizer

Reduces the cognitive load on the user

Reduces the number of iterations by choosing the most effective disambiguating questions

Increases the user’s confidence in the system (“proactive = smart”)

July 18, 2016 SYNT-16, Toronto, Canada 29

Given a program set ෩𝑵, find program constraints (“hypotheses”) 𝝋that best disambiguate among programs in ෩𝑵,

and present them to the user as multiple-choice questions.

Page 27: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

Example

July 18, 2016 SYNT-16, Toronto, Canada 30

Missing page numbers, 1993 1993

64-67, 1995 64

… … …

Which output is correct here?a. 64b. 67c. 1995

෩𝑁

1995

64 67

Page 28: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

Example

July 18, 2016 SYNT-16, Toronto, Canada 31

Missing page numbers, 1993 1993

64-67, 1995 64

… … …

Which output is correct here?a. 64b. 67c. 1995

෩𝑁

1995

64 67

𝜑𝑖+1: 𝑃 𝜎2 = "1995"

Page 29: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

Example

July 18, 2016 SYNT-16, Toronto, Canada 32

Missing page numbers, 1993 1993

64-67, 1995 1995

… … …

Which output is correct here?a. 64b. 67c. 1995

𝜑𝑖+1: 𝑃 𝜎2 = "1995"

Page 30: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

Example – alternative

July 18, 2016 SYNT-16, Toronto, Canada 33

Missing page numbers, 1993 1993

64-67, 1995 64

… … …

෩𝑁

1995

64 67

64-67

Is this part of the input relevant?a. Yesb. Noc. Maybe

Page 31: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

Picking the right question

“Distinguishability” = effectiveness for disambiguation

1. An input is distinguishing if many top-ranked candidate programs disagree on the intended output on it.

• Any response will partition the program set well

2. A question is distinguishing if the alternative candidate programs corresponding to all potential responses have high ranks.

• Any response will lead to a good alternative program

Preliminary results: good questions yield just 4-6 iterations until convergence

July 18, 2016 SYNT-16, Toronto, Canada 34

Page 32: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit
Page 33: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

Big Data

July 18, 2016 SYNT-16, Toronto, Canada 37

Page 34: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

Big Data + Program Synthesis

July 18, 2016 SYNT-16, Toronto, Canada 38

Page 35: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

Problem definition

• ℒ is an industrial DSL (e.g., FlashFill)

• ෩𝑁𝑖 ≈ 1020

• Time limit: ≈ 1 sec

July 18, 2016 SYNT-16, Toronto, Canada 39

Given a program set ෩𝑵𝒊 ⊂ ℒ that satisfies the currently accumulated spec 𝝋𝒊, and a new constraint 𝝍𝒊+𝟏, learn a subset ෩𝑵𝒊+𝟏 ⊂ ෩𝑵𝒊 of

programs that satisfy the new spec 𝝋𝒊+𝟏 = 𝝋𝒊 ∧ 𝝍𝒊+𝟏

Page 36: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

Background: Version Space Algebra

July 18, 2016 SYNT-16, Toronto, Canada 40

int positionIn[string s] := AbsPos(s, k)| RegPos(s, std.Pair(r, r), k);

Page 37: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

Background: Version Space Algebra

July 18, 2016 SYNT-16, Toronto, Canada 41

int positionIn[string s] := AbsPos(s, k)| RegPos(s, std.Pair(r, r), k);

Sharing #1:cross-product

representation

Page 38: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

Background: Version Space Algebra

July 18, 2016 SYNT-16, Toronto, Canada 42

int positionIn[string s] := AbsPos(s, k)| RegPos(s, std.Pair(r, r), k);

Sharing #1:cross-product

representationSharing #2:

equal sets are shared as the same DAG node

Sharing #2:equal sets are shared

as the same DAG node

Page 39: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

Background: Version Space Algebra

July 18, 2016 SYNT-16, Toronto, Canada 43

int positionIn[string s] := AbsPos(s, k)| RegPos(s, std.Pair(r, r), k);

Sharing #1:cross-product

representationSharing #2:

equal sets are shared as the same DAG node

Sharing #2:equal sets are shared

as the same DAG node

For, e.g., FlashFill, volume ෩𝑁 ≈ 105 when ෩𝑁 ≈ 1020.

Runtime of VSA operations ∝ VSA volume.

Page 40: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

VSAs and CFGs are two

isomorphic representations for a language

July 18, 2016 SYNT-16, Toronto, Canada 44

Filter ෩𝑁,𝜓 ≡ Learn ℒ ෩𝑁 ,𝜓

Page 41: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

Incremental Inductive Synthesis

1. Implicitly represent ෩𝑁𝑖 (already a VSA!) as an isomorphic CFG ℒ ෩𝑁𝑖 .

2. Analyze the descriptive power of 𝜓𝑖+1:

• Definitive (e.g., examples, set membership, subsequence constraints):

Apply regular top-down deductive synthesis on ℒ ෩𝑁𝑖

• Locally refining (e.g., datatypes, input relevance):

Re-run backpropagation only on relevant parts of ℒ ෩𝑁𝑖

• Globally refining (e.g., negative examples):

Filter ෩𝑁𝑖 at the top level

July 18, 2016 SYNT-16, Toronto, Canada 45

Page 42: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

PBE Architecture

July 18, 2016 SYNT-16, Toronto, Canada 46

Program Synthesizer User

Example-basedintent spec 𝜑

Ranking function ℎ DSL ℒ

Rankedprogram set ෩𝑁

Intendedprogram 𝑃 ∈ ℒ

Refined intent

Translator

Test inputs Ԧ𝜎 Intended program inPython/C#/C++/…Hypothesizer

QuestionsRefined DSL ℒ′

Page 43: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

Preliminary results

• Big improvement when VSA fragmentation is limited

• Not the final results; work in progress has orders-of-magnitude improvements

July 18, 2016 SYNT-16, Toronto, Canada 47

FlashFill FlashExtract

Page 44: Program Synthesis in the Industrial World: Inductive ...Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov polozov@cs.washington.edu Sumit

Lessons & Conclusions

• Robust engineering is the key to PBE deployment

• Ranking ≫ learning (in industrial PBE)

• Interaction models should be first-class citizens in synthesis frameworks

• Great theoretical results: e.g., OGIS [Jha & Seshia 2015]

• Also need: HCI evaluations, comparison of query types, worst-case TD optimization

• Proactive ambiguity analysis of current candidate programs

• Incrementality: treat the previous program set as the new search space

• Requires full program set computation (possibly in the background)

July 18, 2016 SYNT-16, Toronto, Canada 48

https://microsoft.github.io/[email protected]

Thank you!