23

Unsupervised program synthesis

Embed Size (px)

Citation preview

Page 1: Unsupervised program synthesis
Page 2: Unsupervised program synthesis

Motivation – 1Drawing Visual Concepts

Dataset - Human Drawn Shape

Synthesize

Program..Goto(r1,0); draw(shape1);Goto(r3,10);draw(shape3);assert (contains 1 0);..

Execute

Goto(r1,0)

Routine

(given as primitives)

argument

Machine OutputProgram(I1)

Ii – Input representation

for the input.

1

2 3

Page 3: Unsupervised program synthesis

Motivation – 2Learning Morphological Rules

Style, styledhatch,hatchedArticulate,articulatedPay,paidLay,laidNeed,needed

Program

if [ property1.value1 == True ] (stem + d)elseif [ property3.value2 > 5 ] (stem +ed)Elseif [property4.value5 == “y”] (stem + id)

Synthesize Execute

Style, styledhatch,hatchedArticulate,articulatedPay,paidLay,laidNeed,neededRun,ran

Noise

<stem, word in past tense>Program(snatch) = snatched

Page 4: Unsupervised program synthesis

Can we quantify the length of the Program description?

Can we quantify the length of dataset encoded/represented in terms of the properties as required by the program ?

Page 5: Unsupervised program synthesis

• Can we cast the problem as an optimization problem, so that we can find an optimal solution i.e. program & data encoding with the minimum description length

Optimization Problem

• Task is about compressing the data, and yet represent the same in terms of interpretable entities i.e. Logical Dimensionality reduction

Logical Dimensionality

Reduction

Introduction

Page 6: Unsupervised program synthesis

Problem Framing

Description length priors over programs Pf (·), (eg, linguistic rules)

Priors over the inputs I, PI (·) to f,(eg, stems)

N observations, { xi }i = 1 to N , (eg, words)

Noise model: Px|z(· | ·) , where z I is defined as f(Ii)

Page 7: Unsupervised program synthesis

Plate Diagram

Page 8: Unsupervised program synthesis

Solution

• Manually provide a rough outline of the program to be induced.

• Also called as sketch

• probabilistic context-free grammar

• automatically translate sketches into Satisfiability Modulo Theories (SMT) problems.

• Intractable in general, but often solved efficiently in practice (Formal verification)

Page 9: Unsupervised program synthesis

Solution

Page 10: Unsupervised program synthesis

A context-free grammar (CFG) is a 4-tuple G = (N,Σ, R, S) where:• N – Non terminals set | Σ – Terminals set | R - is a finite set of rules of the form

X→Y1Y2. . . Yn , where X ∈ N, n ≥ 0 , and YI ∈ ( N ∪ Σ) for I = 1. . . N

CFG

Page 11: Unsupervised program synthesis

A context-free grammar (CFG) is a 4-tuple G = (N,Σ, R, S) where:• N – Non terminals set | Σ – Terminals set | R - is a finite set of rules of the form

X→Y1Y2. . . Yn , where X ∈ N, n ≥ 0 , and YI ∈ ( N ∪ Σ) for I = 1. . . N

A PCFG is a CFG with a probability on production rules i.e. G = (N,Σ, R, S, q)• q - Probabilities on the production

PCFG

Page 12: Unsupervised program synthesis

PCFG - Sketch

Define the program

primitives.

Constrain the program space with a

PCFG

Page 13: Unsupervised program synthesis

Sketch AND/OR Graph

OR Node corresponnds to choice

AND node corresponds to descendant

Each program is a path through the AND/OR Graph

Recursiveness helps to have paths of any length.

Currently authors bound the length (arbitrary constant)

Cij – is a Boolean value 1 or 0, depending on

which production is being derived

All the edges in a path will have value 1. All others will be 0

OR

AND

OR

Page 14: Unsupervised program synthesis

Constraints The SMT Solver can verify the correctness of the path over inputs

when the path is represented as a set of constraints.

Page 15: Unsupervised program synthesis

Denotations

Mathematical Objects, that describe the meaning of entities in a

language

Every node in the selected path has a denotation

In denotation, each non-terminal is an expression (or a routine), which takes an input I and the range for

the output is known

The path will give the sequence of the routines with the appropriate

values for the arguments, which is obtained from the input

[Expression] (Input) = Output

The output is dependent on the input

Page 16: Unsupervised program synthesis

The optimization algorithm iterates by finding numerous solutions. At each step along with constraints of the program a new constraint is

current length of the program

Page 17: Unsupervised program synthesis

Initialize N inputs (unknown)

Find denotations and constraints for all paths and feed to a SMT

solver

Iteratively add the minimum length as constraint to find satisfiable

solutions of lesser length

Optimization loop

Page 18: Unsupervised program synthesis

Tress rooted at each non-terminal

Descendants in the trees rooted at non

terminal

Encoding of the Input w.r.t. the program

Encoding the programs for SMT

Calculate length of the program

Denotation of the program

Form the constraints

Page 19: Unsupervised program synthesis

•shapes, coordinates, distances, angles, scales

Program inputs:

•Image parseProgram output:

•control a turtle, but:

•Restricted to alternatingly moving and drawing

•No arithmetic on real variables

•No rotation of shapes

Constraints on program

space:

Experiments: Visual Concepts

Page 20: Unsupervised program synthesis

• Comparing human performance on the SVRT with classification accuracy for machine learning approaches.

• Human accuracy is the fraction of humans that learned the concept: 0% is chance level.

• Machine accuracy is the fraction of correctly classified held out examples: 50% is chance level.

• Area of circles is proportional to the number of observations at that point.

• Dashed line is average accuracy. • Program synthesis: this work trained on 6

examples. ConvNet: A variant of LeNet5 trained on 2000 examples. Parse (Image) features: discriminative learners on features of parse (pixels) trained on 6 (10000) examples. Humans given an average of 6.27 examples and solve an average of 19.85 problems

Experiments: Results

Page 21: Unsupervised program synthesis

Experiments: Morphology Learning

• The underlying stemsProgram inputs:

• Tuple of all inflections for a stemProgram output:

• Has form: tuple of expressions, one for each tense.

• Attend only to stem ending

• Consider only suffixes

Constraints on program space:

Page 22: Unsupervised program synthesis

Experiments: Results

Page 23: Unsupervised program synthesis

Thanks