Program Synthesis using Deduction-Guided Reinforcement...

Program Synthesis using Deduction-Guided Reinforcement Learning

Yanju Chen, Chenglong Wang, Osbert Bastani, Isil Dillig and Yu Feng

Program Synthesis 2

Find a program ! that satisfies all the specifications ".specifications # synthesizer program $

table transformation string transformationdata visualization HPC

Program Synthesis 3

Deductive Reasoning

Statistical ReasoningDISCONNECTION

NEO(Feng et al. 2018)

BLAZE(Wang et al. 2018)

DEEPCODER(Balog et al. 2017)

AutoPandas(Bavishi et al. 2019)

MORPHEUS(Feng et al. 2017)

The feedback of deduction can not be seamlessly used by the statistical model.

Can we bridge the statistical and deductive approaches in program synthesis in a seamless way?

There's a fundamental disconnection between program synthesis using pure statistical and deductive methods, so...

CONCORDutilizes fine-grained feedbacks from deductive reasoning

to update statistical reasoning on the fly

Our Approach: CONCORD(An Overview)

Take Action Deduce

UpdatePolicy

initial policy !"

!specification #

program

A Running Example(Koala Habitat Selectivity Study*)

u (goal) aggregate two koala factor tracking lists:u (input1) factor 1 - GPS last seen locations (reversed order): [152,140,102,58,35]u (input2) factor 2 - tree conditions: [78,55,89,40,33]

u (output) model computes cumulative factors: [43,75,120,122,143]

koalaFactor::list->list->list

35 58 102 140 152

*Synthetic data inspired by Davies, N., Gramotnev, G., Seabrook, L. et al. Movement patterns of an arboreal marsupial at the edge of its range: a case study of the koala. Mov Ecol 1, 8 (2013).

Sample Domain Specific Language(A DSL for List Processing)

u koalaFactor::list->list->listu koalaFactor([152,140,102,58,35],[78,55,89,40,33])=[43,75,120,122,143]

u solution program:u v1 <- reverse input1

u v2 <- sumUpTo input2u v3 <- sub v2,v1

u which is: sub(sumUpTo(input2), reverse(input1))

|reverse(L)|add(L,L)|sub(L,L)|sumUpTo(L)

sample DSL

sumUpTo reverse

input2 input1

solution ast

Deductive Approach(Solving koalaFactor in Variant of CEGIS* Loop, Step1)

Decide Deduce

#1: add(sort(input1),take(input2,N))#2: add(sort(input1),drop(input2,N))#3: add(sort(input1),sumUpTo(input2))...#20: sub(sumUpTo(input2),reverse(input1))#21: add(sumUpTo(input1),take(input2,N))...

fixed candidate list

input1 = [152,140,102, 58, 35]input2 = [ 78, 55, 89, 40, 33]output = [ 43, 75,120,122,143]

*CEGIS: Counter-Example-Guided Inductive Synthesis

Deductive Approach(Solving koalaFactor in Variant of CEGIS Loop, Step2)

Decide Deduce

• rich and accurate deduction feedback• efficient pruning of search space

• incorporating deductive feedback into existing statistical model is difficult

Statistical Approach(Solving koalaFactor using Data-Driven Machine Learning, Step1)

Infer Check#1: add(sort(input1),take(input2,N))#2: add(sort(input1),drop(input2,N))#3: add(sort(input1),sumUpTo(input2))...#20: sub(sumUpTo(input2),reverse(input1))#21: add(sumUpTo(input1),take(input2,N))...

candidate list

! (if in reinforcement learning setting)

Infer Check

Infer Check#1: add(sort(input1),take(input2,N))#2: add(sort(input1),drop(input2,N))#3: add(sort(input1),sumUpTo(input2))...#20: sub(sumUpTo(input2),reverse(input1))#21: add(sumUpTo(input1),take(input2,N))...

candidate list

Infer Check

candidate list

• data-driven candidate list• can update policy seamlessly

• feedback from checker is relatively less informative than that from deduction

Our Approach: CONCORD(An Overview)

Take Action Deduce

UpdatePolicy

program

initial policy !#specification $

CONCORD: Formalization(Program Synthesis as Markov Decision Process)

sumUpTo reverse

input2 input1

initial state

final state +1

sumUpTo

sub 0.5

sumUpTo 0.3

input2 0.4

reverse 0.2

input1 0.3

... ...

action probability S working node instantiated nodeuninstantiated node?state reward

maximize rewardsObjective

CONCORD: Running Example(Solving koalaFactor using Deduction-Guided RL, Step1)

Take Action Deduce

#1: add(sort(input1),take(L,N))#2: add(sort(input1),drop(L,N))#3: add(sort(input1),sumUpTo(L))...#20: sub(sumUpTo(input2),reverse(L))#21: add(sumUpTo(input1),take(L,N))...

candidate list

UpdatePolicy

Take Action Deduce

#1: add(sort(input1),take(L,N))#2: add(sort(input1),drop(L,N))#3: add(sort(input1),sumUpTo(L))...#20: sub(sumUpTo(input2),reverse(L))#21: add(sumUpTo(input1),take(L,N))...

candidate list

UpdatePolicy

!add(sort(L),take(L,N))add(sort(L),drop(L,N))add(sumUpTo(L),take(L,N))...

sampled infeasible programs

Take Action Deduce

updated candidate list

UpdatePolicy

...#20: sub(sumUpTo(input2),reverse(L))

#2: add(sort(input1),drop(L,N))#3: add(sort(input1),sumUpTo(L))

#21: add(sumUpTo(input1),take(L,N))...

#1: add(sort(input1),take(L,N))!"#

Take Action Deduce

#1: sub(sumUpTo(input2),reverse(L))#2: add(sort(input1),sumUpTo(L))...#24: add(sumUpTo(input1),take(L,N))#25: add(sort(input1),take(L,N))#26: add(sort(input1),drop(L,N))...

UpdatePolicy

Take Action Deduce

#1: sub(sumUpTo(input2),reverse(input1))#2: add(sort(input1),sumUpTo(input2))...#24: add(sumUpTo(input1),take(input2,N))#25: add(sort(input1),take(input2,N))#26: add(sort(input1),drop(input2,N))...

UpdatePolicy

sampled infeasible programs• feedback from deduction flows seamlessly to the policy update• not only prune the search space, but also promote good candidates

What's the difference?

CONCORD: Synthesis Algorithm 24

DeductionEngine

infeasibleSampler

add(sort(input1),take(L,N))partial program

feasible empty set ∅

add(sort(L),take(L,N))add(sort(L),drop(L,N))add(sumUpTo(L),take(L,N))...

sampled infeasible programsDeduceTake

ActionUpdatePolicy

Deduce

CONCORD: Synthesis Algorithm 25

DeductionEngine

infeasibleSampler

?(?(L),take(L,N))?(?(L),drop(L,N))...

logical constraints

Deduce

CONCORD: Synthesis Algorithm(Deduction-Guided Reinforcement Learning)

DeductionEngine

infeasibleSampler

Take Action

UpdatePolicy

weighted future rewards

importance weighting

Off-Policy Sampling

program distribution of policy

program distribution of Sampler

Evaluations(Research Questions and Experiment Settings)

u Research Questions:u How does Concord compare against existing synthesis tools?

u How effective is the off-policy RL algorithm compared to standard policy gradient?

u Experiment Settingsu Deduction Engine: NEO's (Feng et al. 2018) deduction engineu Policy: Gated Recurrent Unit (GRU)

u Benchmarks: DEEPCODER benchmarks used in NEO

u 100 challenging list processing problems

u Comparison between:u NEO (Feng et al. 2018)

u DEEPCODER (Balog et al. 2017)

The architecture of the policy network used

Evaluations(Experiment Results and Analysis)

tool solved timeCONCORD 82% 36s

NEO 71% 99sDEEPCODER 32% 205s

tool solved speedup over NEO

CONCORD 82% 8.71xCONCORD

(StandardPG)65% 2.88x

• Concord tightly couples statistical and deductive reasoning based on reinforcement learning.• The off-policy reinforcement learning technique is effective.

Take-Away Message

Thank you!Questions?

Program Synthesis using Deduction-Guided Reinforcement...

Documents

SYNTHESIS OF ANTENNAS *mmmmmmm … · THE PROBLEM OF APERTURE SYNTHESIS OF ANTENNAS WHICH ARE MOUNTED--ETC U) ... new possibilities for using the principle of an artificial antenna

Chapter 17 From Gene to Protein. Question? u How does DNA control a cell? u By controlling Protein Synthesis. u Proteins are the link between genotype

Synthesis of Flavonoid Sulfates. III. Synthesis of 3',4 ...zfn.mpdl.mpg.de/data/Reihe_C/43/ZNC-1988-43c-0631.pdf · Flavonoid Sulfate Esters,C NMR Synthesis, 13 , FAB-MS , U SpectrV

A SYNTHESIS OF ALABAMA BEACH STATES AND …acumen.lib.ua.edu/content/u0015/0000001/0000723/u... · A SYNTHESIS OF ALABAMA BEACH . STATES AND NOURISHMENT . HISTORIES . by . ADAM DAVID

Shape memory polymers for compositessmart.hit.edu.cn/_upload/article/files/50/76/b43da46a4...Accepted Manuscript Shape memory polymers for composites Tong Mu, Liwu Liu, Xin Lan, Yanju

Synthesis of U-Pb and Fossil Age, Lithogeochemical …cdn.geosciencebc.com/pdf/SummaryofActivities2016/SoA2016...Synthesis of U-Pb and Fossil Age, Lithogeochemical and Pb-Isotopic

1. Synthesis of selenated hydroxy fatty acids BRASINOE U

EasyXpress Large-Scale Synthesis Handbook · QIA-32532-CN EasyXpress NMR (U-15N, U-13C) Kit 7 QIA-32535-CDN EasyXpress NMR (U-15N, U-13C, U-D) Kit 7 To place an order call 1.800.322.1174

Title Formal Total Synthesis of (±)-Strictamine ... - Kyoto U

Research performed at UNLV on the chemistry of Technetium in the nuclear fuel cycle 1. Separation U/Tc and synthesis of solids form 2. Synthesis and characterization

SHARP INC.] Synthesis of Degradation Products of 2- (6 ... · PDF file4984 LVAGNER, LYALTON, U'ILSON, RODIN, HOLLY, BRINK .4ND FOLKERS VOl. 81 was proved by synthesis. U'hen 2-methoxy-3,4-

Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

Algorithms for the Synthesis of Elementary Net Systems with ......Algorithms for the Synthesis of Elementary Net Systems with Localities 89 this q U 1:::U k! q 0or q!˙ q. We will

Modelling, Verification and Controller Synthesis for Real ... · Modelling, Verification and Controller Synthesis for Real-Time Systems. ICE-TCS, Reykjavik U, 2005 Kim G. Larsen 2

u Iu Iu Iu 111 SYNTHESIS, CHARACTERIZATION AND CATALYTIC

SYNTHESIS AND CHARACTERIZATION OF METAL-POLYMER … · l d-ai84 956 molecular-level synthesis and characterization of / metal-polymer interface .(u) massachusetts univ amherst dept

FLUORINATED ACETYLENES.(U) / / CAFI7/ SYNTHESIS MNCAR … · ao-a112 059 fluorochem inc azusa cafi7/ synthesis of fluorinated acetylenes.(u) / / mncar 82 k baum, c d bedford, r j

IC-Project I-Synthesis€¦ · To run the genus synthesis tool, issue the command 'genus' after you did the ... genus -legacy_ui. L U N D U N I V E R S I T Y ... Ref: Cadence - Genus

Click chemistry reactions in polymer synthesis and ... · PDF fileClick chemistry reactions in polymer synthesis and modification U. Jaffry,A. Muñoz-Bonilla and P. Herrasti Departamento

Jessica Hawley PROTEIN SYNTHESIS. Protein Synthesis Protein Synthesis PROTEIN SYNTHESIS