Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Learning Programs from Noisy Data
Veselin RaychevPavol BielikMartin VechevAndreas Krause
ETH Zurich
Why learn programs from examples?
2
Input/output examples
often easier to provide examples than specification
(e.g. in FlashFill)
Why learn programs from examples?
3
Input/output examples p such that
p( )=
...p( )=
learn a function
?often easier to provide
examples than specification(e.g. in FlashFill)
Why learn programs from examples?
4
Input/output examples p such that
p( )=
...p( )=
learn a function
?often easier to provide
examples than specification(e.g. in FlashFill)
the user may make a mistake in the
examples
Why learn programs from examples?
5
Input/output examples p such that
p( )=
...p( )=
learn a function
?
Actual goal: produce p that the user really wanted and tried to specify
often easier to provide examples than specification
(e.g. in FlashFill)
the user may make a mistake in the
examples
Why learn programs from examples?
6
Input/output examples p such that
p( )=
...p( )=
learn a function
?
Actual goal: produce p that the user really wanted and tried to specify
often easier to provide examples than specification
(e.g. in FlashFill)
the user may make a mistake in the
examples
Key problem of synthesis: overfits, not robust to noise
Learning Programs from Data: Defining Dimensions
7
Number of Examples
Handling errors in the dataset
Learned program complexity
Learning Programs from Data: Defining Dimensions
8
Number of Examples
Handling errors in the dataset
Learned program complexity
Program synthesis(PL)
no
tens
interestingprograms
Learning Programs from Data: Defining Dimensions
9
Number of Examples
Handling errors in the dataset
Learned program complexity
Program synthesis(PL)
no
tens
interestingprograms
Deep learning(ML)
yes
millions
simple, butunexplainable
functions
Learning Programs from Data: Defining Dimensions
10
Number of Examples
Handling errors in the dataset
Learned program complexity
Program synthesis(PL)
no
tens
interestingprograms
Deep learning(ML)
yes
millions
simple, butunexplainable
functions
This paperbridges a gap
yes
millions
interestingprograms
Learning Programs from Data: Defining Dimensions
11
Number of Examples
Handling errors in the dataset
Learned program complexity
Program synthesis(PL)
no
tens
interestingprograms
Deep learning(ML)
yes
millions
simple, butunexplainable
functions
This paperbridges a gap
yes
millions
interestingprograms
expands capabilities ofexisting synthesizers
Learning Programs from Data: Defining Dimensions
12
Number of Examples
Handling errors in the dataset
Learned program complexity
Program synthesis(PL)
no
tens
interestingprograms
Deep learning(ML)
yes
millions
simple, butunexplainable
functions
This paperbridges a gap
yes
millions
interestingprograms
new state-of-the-art precision for programming tasks
expands capabilities ofexisting synthesizers
Learning Programs from Data: Defining Dimensions
13
Number of Examples
Handling errors in the dataset
Learned program complexity
Program synthesis(PL)
no
tens
interestingprograms
Deep learning(ML)
yes
millions
simple, butunexplainable
functions
This paperbridges a gap
yes
millions
interestingprograms
new state-of-the-art precision for programming tasks
expands capabilities ofexisting synthesizers
Bridges gap between ML and PL
Advances both areas
In this paper
14
● A general framework that handles○ errors in training dataset○ learns statistical models on data○ handles synthesis with millions of examples
● Instantiated with two synthesizers○ generalize existing works
Handling noise
New probabilistic models
Handling large datasets
Contributions
15
Input/output examples
incorrect examples
1. synthesizep
2. use probabilistic
modelparametrized
with p
Representative dataset sampler
Program generator
Handling noise
New probabilistic models
Handling large datasets
Contributions
16
Handling noise
Input/output examples
incorrect examples
1. synthesizep
2. use probabilistic
modelparametrized
with p
Representative dataset sampler
Program generator
Synthesis with noise: usage model
17
Input/output examples
Synthesis with noise: usage model
18
Input/output examples
DomainSpecific
Language
Synthesis with noise: usage model
19
Input/output examples synthesizer
DomainSpecific
Language
Synthesis with noise: usage model
20
Input/output examples synthesizer
DomainSpecific
Language
p such that p( )=
...p( )=
Synthesis with noise: usage model
21
Input/output examples synthesizer
DomainSpecific
Language
p such that p( )=
...p( )=
incorrect example(e.g. a typo)
Synthesis with noise: usage model
22
Input/output examples synthesizer
DomainSpecific
Language
p such that p( )=
...p( )=
incorrect example(e.g. a typo)
p( )≠
Synthesis with noise: usage model
23
Input/output examples synthesizer
DomainSpecific
Language
p such that p( )=
...p( )=
incorrect example(e.g. a typo)
p( )≠new kind of feedback
from synthesizer
✔
❌
✔
✔
✔
Synthesis with noise: usage model
24
Input/output examples synthesizer
DomainSpecific
Language
p such that p( )=
...p( )=
incorrect example(e.g. a typo)
p( )≠new kind of feedback
from synthesizer● Tell user to
remove suspicious example, or
● Ask for more examples
✔
❌
✔
✔
✔
if returnif returnif returnif returnreturn
Issue: such a program makesno errorsand it may be the only solution to the minimization problem
Handling noise: problem statement
25
D: Input/output examples
incorrect examples
synthesizer pbest = arg min errors(D, p)p∈P
Too long program, hardcodes the input/outputs.Synthesis must penalize such answers
dataset of input/output
examples
space of possible programs in DSL
Our problem formulation:
pbest = arg min errors(D, p) + λr(p) p∈P
regularizerpenalizes long
programs
regularization constant
error rate
Noisy synthesis using SMT
26
pbest = arg min errors(D, p) + λr(p) p∈P
number of instructions
total solution
cost
Noisy synthesis using SMT
27
pbest = arg min errors(D, p) + λr(p) p∈P
number of instructions
total solution
cost
Ψ
Noisy synthesis using SMT
28
encoding
err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1
errors = err1 + err2 + err3p ∈ Pr (with r instructions)
pbest = arg min errors(D, p) + λr(p) p∈P
number of instructions
total solution
cost
formula given to SMT solver
costnumber of errors
0 1 2 3
r
1 0.6 1.6 2.6 3.6
2 1.2 2.2 3.2 4.2
3 1.8 2.8 3.8 4.8
Ψ
Noisy synthesis using SMT
29
encoding
err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1
errors = err1 + err2 + err3p ∈ Pr (with r instructions)
pbest = arg min errors(D, p) + λr(p) p∈P
number of instructions
total solution
cost
Ask a number of SMT queriesin increasing value of solution cost
e.g. for
λ = 0.6costs are
formula given to SMT solver
costnumber of errors
0 1 2 3
r
1 0.6 1.6 2.6 3.6
2 1.2 2.2 3.2 4.2
3 1.8 2.8 3.8 4.8
Ψ
Noisy synthesis using SMT
30
encoding
err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1
errors = err1 + err2 + err3p ∈ Pr (with r instructions)
pbest = arg min errors(D, p) + λr(p) p∈P
number of instructions
total solution
cost
Ask a number of SMT queriesin increasing value of solution cost
e.g. for
λ = 0.6costs are
UNSAT
formula given to SMT solver
costnumber of errors
0 1 2 3
r
1 0.6 1.6 2.6 3.6
2 1.2 2.2 3.2 4.2
3 1.8 2.8 3.8 4.8
Ψ
Noisy synthesis using SMT
31
encoding
err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1
errors = err1 + err2 + err3p ∈ Pr (with r instructions)
pbest = arg min errors(D, p) + λr(p) p∈P
number of instructions
total solution
cost
Ask a number of SMT queriesin increasing value of solution cost
e.g. for
λ = 0.6costs are
UNSAT
UNSAT
formula given to SMT solver
costnumber of errors
0 1 2 3
r
1 0.6 1.6 2.6 3.6
2 1.2 2.2 3.2 4.2
3 1.8 2.8 3.8 4.8
Ψ
Noisy synthesis using SMT
32
encoding
err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1
errors = err1 + err2 + err3p ∈ Pr (with r instructions)
pbest = arg min errors(D, p) + λr(p) p∈P
number of instructions
total solution
cost
Ask a number of SMT queriesin increasing value of solution cost
e.g. for
λ = 0.6costs are
UNSAT
UNSAT
UNSAT
formula given to SMT solver
costnumber of errors
0 1 2 3
r
1 0.6 1.6 2.6 3.6
2 1.2 2.2 3.2 4.2
3 1.8 2.8 3.8 4.8
Ψ
Noisy synthesis using SMT
33
encoding
err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1
errors = err1 + err2 + err3p ∈ Pr (with r instructions)
pbest = arg min errors(D, p) + λr(p) p∈P
number of instructions
total solution
cost
Ask a number of SMT queriesin increasing value of solution cost
e.g. for
λ = 0.6costs are
UNSAT
UNSAT
UNSAT
UNSAT
formula given to SMT solver
costnumber of errors
0 1 2 3
r
1 0.6 1.6 2.6 3.6
2 1.2 2.2 3.2 4.2
3 1.8 2.8 3.8 4.8
Ψ
Noisy synthesis using SMT
34
encoding
err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1
errors = err1 + err2 + err3p ∈ Pr (with r instructions)
pbest = arg min errors(D, p) + λr(p) p∈P
number of instructions
total solution
cost
Ask a number of SMT queriesin increasing value of solution cost
e.g. for
λ = 0.6costs are
UNSAT
UNSAT
UNSAT
UNSAT
SAT
formula given to SMT solver
costnumber of errors
0 1 2 3
r
1 0.6 1.6 2.6 3.6
2 1.2 2.2 3.2 4.2
3 1.8 2.8 3.8 4.8
Ψ
Noisy synthesis using SMT
35
encoding
err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1
errors = err1 + err2 + err3p ∈ Pr (with r instructions)
pbest = arg min errors(D, p) + λr(p) p∈P
number of instructions
total solution
cost
Ask a number of SMT queriesin increasing value of solution cost
e.g. for
λ = 0.6costs are
UNSAT
UNSAT
UNSAT
UNSAT
SAT
best program is with two instructions and makes one error
formula given to SMT solver
Noisy synthesizer: example
36
Take an actual synthesizer andshow that we can make it handle noise
Implementation: BitSyn
37
For BitStream programs, using Z3
similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11]
Example program:function check_if_power_of_2(int32 x) { var o = add(x, 1) return bitwise_and(x, o)}
synthesized, short loop-free programs
Implementation: BitSyn
38
For BitStream programs, using Z3
similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11]
Example program:function check_if_power_of_2(int32 x) { var o = add(x, 1) return bitwise_and(x, o)}
synthesized, short loop-free programs
Question: how well does our synthesizer discover noise?(in programs from prior work)
Implementation: BitSyn
39
For BitStream programs, using Z3
similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11]
Example program:function check_if_power_of_2(int32 x) { var o = add(x, 1) return bitwise_and(x, o)}
synthesized, short loop-free programs
Question: how well does our synthesizer discover noise?(in programs from prior work)
Implementation: BitSyn
40
For BitStream programs, using Z3
similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11]
Example program:function check_if_power_of_2(int32 x) { var o = add(x, 1) return bitwise_and(x, o)}
synthesized, short loop-free programs
Question: how well does our synthesizer discover noise?(in programs from prior work)
best area to be in. empirically pick λ here
So far… handling noise
● Problem statement and regularization● Synthesis procedure using SMT● Presented one synthesizer
Handling noise enables us to solve new classes of problems beyond normal synthesis
41
Handling large datasets
Handling noise
New probabilistic modelsContributions
42
Input/output examples
incorrect examples
1. synthesizep
2. use probabilistic
modelparametrized
with p
Representative dataset sampler
Program generator
Handling large datasetsHandling large datasets
Handling noise
New probabilistic modelsContributions
43
Input/output examples
incorrect examples
1. synthesizep
2. use probabilistic
modelparametrized
with p
Representative dataset sampler
Program generator
Fundamental problem
44
Large number of examples:
pbest = arg min cost(D, p)p∈P
Fundamental problem
45
Large number of examples:
pbest = arg min cost(D, p)
D
Millions ofinput/output
examples
p∈P
Fundamental problem
46
Large number of examples:
pbest = arg min cost(D, p)
D
Millions ofinput/output
examples
computing cost(D, p)
O( |D| )
p∈P
Fundamental problem
47
Large number of examples:
pbest = arg min cost(D, p)
D
Millions ofinput/output
examples
computing cost(D, p)
O( |D| )
Synthesis: practically intractable
p∈P
Fundamental problem
48
Large number of examples:
pbest = arg min cost(D, p)
D
Millions ofinput/output
examples
computing cost(D, p)
O( |D| )
Synthesis: practically intractable
Key idea: iterative synthesis onfraction of examples
p∈P
Our solution: two components
given dataset d, finds best program
49
pbest = arg min cost(d, p)p∈P
Program generator
Synthesizer for small number of examples
Our solution: two components
given dataset d, finds best program
50
pbest = arg min cost(d, p)p∈P
Program generator
Datasetsampler
Picks dataset d ⊆ D
Synthesizer for small number of examples
We introduce representative dataset sampler
Generalize a user providing input/output examples
In a loop
51
Representative dataset sampler
Program generator
In a loop
52
Representative dataset sampler
Program generator
Start with a small random sample d⊆D
Iteratively generate programs and samples.
In a loop
53
Representative dataset sampler
Program generator
Start with a small random sample d⊆D
Iteratively generate programs and samples.
Program generator p1
In a loop
54
Representative dataset sampler
Program generator
Start with a small random sample d⊆D
Iteratively generate programs and samples.
Program generator p1
Representative dataset samplerd
In a loop
55
Representative dataset sampler
Program generator
Start with a small random sample d⊆D
Iteratively generate programs and samples.
Program generator p1
Program generator p1 , p2
Representative dataset samplerd
In a loop
56
Representative dataset sampler
Program generator
Start with a small random sample d⊆D
Iteratively generate programs and samples.
Program generator p1
Program generator p1 , p2
Representative dataset sampler
Representative dataset samplerd
In a loop
57
Representative dataset sampler
Program generator
Start with a small random sample d⊆D
Iteratively generate programs and samples.
Program generator p1
Program generator p1 , p2
Representative dataset sampler
pbest
Representative dataset samplerd
In a loop
58
Representative dataset sampler
Program generator
Start with a small random sample d⊆D
Iteratively generate programs and samples.
Program generator p1
Program generator p1 , p2
Representative dataset sampler
pbest
Representative dataset samplerd
Algorithm generalizes synthesis by examples techniques
Representative dataset sampler
59
Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset
d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |
Representative dataset sampler
60
Costs on small dataset d
p1 p2
Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset
d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |
Representative dataset sampler
61
Costs on small dataset d
Costs on full dataset D
p1 p2
p1 p2
Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset
d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |
Representative dataset sampler
62
Costs on small dataset d
Costs on full dataset D
p1 p2
p1 p2
Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset
d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |
Representative dataset sampler
63
Costs on small dataset d
Costs on full dataset D
p1 p2
Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset
d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |
p1 p2
Representative dataset sampler
64
Costs on small dataset d
Costs on full dataset D
p1 p2
p1 p2
Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset
d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |
Theorem: this sampler shrinks the candidate program search spaceIn evaluation: significant speedup of synthesis
So far... handling large datasets
● Iterative combination of synthesis and sampling
● New way to perform approximateempirical risk minimization
● Guarantees (in the paper)
65
Representative dataset sampler
Program generator
Handling noise
New probabilistic models
Handling large datasets
Contributions
66
Input/output examples
incorrect examples
1. synthesizep
2. use probabilistic
modelparametrized
with p
Representative dataset sampler
Program generator
Handling noise
New probabilistic models
Handling large datasets
Contributions
67
Input/output examples
incorrect examples
New probabilistic models
1. synthesizep
2. use probabilistic
modelparametrized
with p
Representative dataset sampler
Program generator
Statistical programming tools
68
A new breed of tools:
Learn from large existing codebases (e.g. Big Code) to make predictions about programs
Statistical programming tools
69
1. Train machine learning model
A new breed of tools:
Learn from large existing codebases (e.g. Big Code) to make predictions about programs
Statistical programming tools
70
1. Train machine learning model
2. Make predictionswith model
A new breed of tools:
Learn from large existing codebases (e.g. Big Code) to make predictions about programs
Statistical programming tools
71
1. Train machine learning model
2. Make predictionswith model
A new breed of tools:
Learn from large existing codebases (e.g. Big Code) to make predictions about programs
hard-coded modellow precision
Existing machine learning models
Essentially remember mapping from context in training data to prediction (with probabilities)
72
Hindle et al.[ICSE’12]
Existing machine learning models
Essentially remember mapping from context in training data to prediction (with probabilities)
73
Hindle et al.[ICSE’12]
Existing machine learning models
Essentially remember mapping from context in training data to prediction (with probabilities)
74
Hindle et al.[ICSE’12]
Existing machine learning models
Essentially remember mapping from context in training data to prediction (with probabilities)
75
Hindle et al.[ICSE’12]
Existing machine learning models
Essentially remember mapping from context in training data to prediction (with probabilities)
76
Learn a mapping
Model will predict slice when it sees it after “+ name .”
This model comes from NLP
+ name . slice
Raychev et al.[PLDI’14]
Hindle et al.[ICSE’12]
Existing machine learning models
Essentially remember mapping from context in training data to prediction (with probabilities)
77
Learn a mapping
Model will predict slice when it sees it after “+ name .”
This model comes from NLP
+ name . slice
Raychev et al.[PLDI’14]
Hindle et al.[ICSE’12]
Existing machine learning models
Essentially remember mapping from context in training data to prediction (with probabilities)
78
Learn a mapping
Model will predict slice when it sees it after “+ name .”
This model comes from NLP
+ name . slice
Raychev et al.[PLDI’14]
Hindle et al.[ICSE’12]
Existing machine learning models
Essentially remember mapping from context in training data to prediction (with probabilities)
79
Learn a mapping
Model will predict slice when it sees it after “+ name .”
This model comes from NLP
+ name . slice
Raychev et al.[PLDI’14]
Hindle et al.[ICSE’12]
Existing machine learning models
Essentially remember mapping from context in training data to prediction (with probabilities)
80
Learn a mapping
Model will predict slice when it sees it after “+ name .”
This model comes from NLP
+ name . slice
Raychev et al.[PLDI’14]
Hindle et al.[ICSE’12]
Existing machine learning models
Essentially remember mapping from context in training data to prediction (with probabilities)
81
Learn a mapping
Model will predict slice when it sees it after “+ name .”
This model comes from NLP
+ name . slice
Learn a mapping
Model will predict slice when it sees it after “charAt”Relies on static analysis
charAt slice
Problem of existing systems
Precision. They rarely predict the next statement
82
Raychev et al.[PLDI’14]
Hindle et al.[ICSE’12] Learn a mapping
Model will predict slice when it sees it after “+ name .”
This model comes from NLP
+ name . slice
Learn a mapping
Model will predict slice when it sees it after “charAt”Relies on static analysis
charAt slice
Problem of existing systems
Precision. They rarely predict the next statement
83
Raychev et al.[PLDI’14]
Hindle et al.[ICSE’12] Learn a mapping
Model will predict slice when it sees it after “+ name .”
This model comes from NLP
+ name . slice
Learn a mapping
Model will predict slice when it sees it after “charAt”Relies on static analysis
charAt slice
Very low precision
Problem of existing systems
Precision. They rarely predict the next statement
84
Raychev et al.[PLDI’14]
Hindle et al.[ICSE’12] Learn a mapping
Model will predict slice when it sees it after “+ name .”
This model comes from NLP
+ name . slice
Learn a mapping
Model will predict slice when it sees it after “charAt”Relies on static analysis
charAt slice
Very low precision
Low precision for JavaScript
Problem of existing systems
Precision. They rarely predict the next statement
85
Raychev et al.[PLDI’14]
Hindle et al.[ICSE’12] Learn a mapping
Model will predict slice when it sees it after “+ name .”
This model comes from NLP
+ name . slice
Learn a mapping
Model will predict slice when it sees it after “charAt”Relies on static analysis
charAt slice
Very low precision
Low precision for JavaScript
Core problem:Existing machine learning
models are limited and notexpressive enough
Key idea: second-order learning
86
Learn a program that parametrizes a probabilistic model that makes predictions.
Key idea: second-order learning
87
1. Synthesizeprogram
describing a model
Learn a program that parametrizes a probabilistic model that makes predictions.
Key idea: second-order learning
88
1. Synthesizeprogram
describing a model
i.e. learn the mapping
2. Train model
Learn a program that parametrizes a probabilistic model that makes predictions.
Key idea: second-order learning
89
1. Synthesizeprogram
describing a model
i.e. learn the mapping
2. Train model 3. Make predictionswith this model
Learn a program that parametrizes a probabilistic model that makes predictions.
Key idea: second-order learning
90
1. Synthesizeprogram
describing a model
i.e. learn the mapping
2. Train model 3. Make predictionswith this model
Learn a program that parametrizes a probabilistic model that makes predictions.
prior models are described by simple hard-coded programs
Our approach:learn a better program
output
Training and evaluation
Training example:
91
slice
input
output
Training and evaluation
Training example:
92
slice
Compute contextwith program p
input
output
Training and evaluation
Training example:
93
slice
Learn a mapping
toUpperCase slice
Compute contextwith program p
input
output
Training and evaluation
Training example:
94
slice
Evaluation example:
Learn a mapping
toUpperCase slice
Compute contextwith program p
input
output
Training and evaluation
Training example:
95
slice
Evaluation example:
Learn a mapping
toUpperCase slice
Compute contextwith program p
Compute context with program p
input
output
Training and evaluation
Training example:
96
slice
Evaluation example:
Learn a mapping
toUpperCase slice
Compute contextwith program p
slice
Compute context with program p predict completion
input
output
Training and evaluation
Training example:
97
slice
Evaluation example:
Learn a mapping
toUpperCase slice
Compute contextwith program p
slice
Compute context with program p predict completion
✔
input
Observation
Synthesis of probabilistic model can be done with the same optimization problem as before!
98
Our problem formulation:
pbest = arg min errors(D, p) + λr(p) p∈P
regularizerpenalizes long
programs
regularization constant
evaluation data: input/output
examples
Observation
Synthesis of probabilistic model can be done with the same optimization problem as before!
99
Our problem formulation:
pbest = arg min errors(D, p) + λr(p) p∈P
regularizerpenalizes long
programs
regularization constant
evaluation data: input/output
examplescost(D, p)
So far...
100
Handling noise
Synthesizing a model
Representative dataset sampler
Techniques are generally applicable to program synthesis
Next, application for “Big Code” called DeepSyn
DeepSyn: Training
Trained on 100’000 JavaScript files from GitHub
101
Program synthesizer
Representative dataset sampler
Program generator
D
DeepSyn: Training
Trained on 100’000 JavaScript files from GitHub
102
Program synthesizer
Representative dataset sampler
Program generator
D
Train model on full dataand best program p
pD
DeepSyn: Evaluation
50’000 evaluation files (not used in training or synthesis)
API completion task
103
DeepSyn: Evaluation
50’000 evaluation files (not used in training or synthesis)
API completion task
104
Conditioning program p Accuracy
Last two tokens, Hindle et al.[ICSE’12] 22.2%
Last two APIs per object, Raychev et al.[PLDI’14] 30.4%
Program synthesis with noise 46.3%
Program synthesis with noise + dataset sampler 50.4%This
wor
k
DeepSyn: Evaluation
50’000 evaluation files (not used in training or synthesis)
API completion task
105
Conditioning program p Accuracy
Last two tokens, Hindle et al.[ICSE’12] 22.2%
Last two APIs per object, Raychev et al.[PLDI’14] 30.4%
Program synthesis with noise 46.3%
Program synthesis with noise + dataset sampler 50.4%
We can explain best program. It looks at API preceding completion position and at tokens prior to these APIs.
This
wor
k
Scalability
Handling noise
Second-order learning
Handling large datasets
Q&A
106
Input/output examples
incorrect examples
1. synthesizep
2. use probabilistic
modelparametrized
with p
Representative dataset sampler
Program generator
Synthesis of probabilistic
models
Extending synthesizers to handle noise
Scalability
Handling noise
Second-order learning
Handling large datasets
Q&A
107
Input/output examples
incorrect examples
1. synthesizep
2. use probabilistic
modelparametrized
with p
Representative dataset sampler
Program generator
Synthesis of probabilistic
models
Extending synthesizers to handle noise
Bridges gap between ML and PLAdvances both areas
What did we synthesize?
108
Left PrevActor WriteAction WriteValue PrevActor WriteAction PrevLeaf WriteValue PrevLeaf WriteValue