Syntax-Guided Synthesis Rajeev Alur Joint work with R.Bodik, G.Juniwal, M.Martin, M.Raghothaman, S.Seshia, R.Singh, A.Solar-Lezama, E.Torlak, A.Udupa 1

Syntax-Guided Synthesis

Rajeev Alur

Joint work with R.Bodik, G.Juniwal, M.Martin, M.Raghothaman, S.Seshia, R.Singh, A.Solar-Lezama, E.Torlak, A.Udupa

1

Program Verification

Does a program P meet its specification ,j where j is written as a logical formula?

Motivation: Correctness of systems, finding bugs

Program verification is hard!

Formalizing a structured program into logical formulas

Using tools (SMT solvers) to verify whether the formalized program meets its specification.

SMT-LIB – common standards and library of benchmarks of SMT solvers.

2

Program Synthesis

Automatically synthesize a program P that satisfies a given specification j

Can potentially have greater impact than program verification

Program synthesis is hard!

Let’s provide a syntactic template for the program – Syntax-Guided Synthesis (SyGuS)

Works on special cases already exist (e.g. Sketch 2008)

Let’s build a common standard and benchmarks for SyGuS solvers (SYNTH-LIB)

3

Talk Outline

Background: SMT Solvers

Formalization of SyGuS

Solution Strategies

Conclusions + SyGuS Competition

4

What is SMT?

Satisfiability Modulo Theories

+

Magnus Madsen

Recall SAT

The Boolean SATisfiability Problem:

• A=TRUE, =FALSE, =FALSE

literal or negated literal

Magnus Madsen

Recall SAT

• SAT is NP-complete (solveable in exponential time)

• Many SAT solvers exist – DPLL (1962) – Chaff (2001)– MiniSAT (2004)

• Some do remarkably well.Magnus Madsen

What is an SMT instance?

A logical formula built using– negation, conjunction and disjuction

• e.g. • e.g.

– theory specific operators• e.g. , • e.g.

theory of integers

theory of bitwise

operators

Magnus Madsen

Q: Why not encode every

formula in SAT?A: Theory

solvers have very efficient

algorithmsGraph Problems:

• Shortest-Path• Minimum Spanning Tree

Optimization:• Max-Flow• Linear Programming

(just to name a few)Magnus Madsen

Q: But then, Why not get rid

of the SAT solver?

A: SAT solvers are being

studied for a long time

Magnus Madsen

SAT Theory

Formula

YES

𝑥≥3∧ (𝑥≤0∨ 𝑦 ≥0 )

𝑎∧ (𝑏∨𝑐 )

𝑎∧𝑏

NO

add clause:

𝑎∧𝑐

𝑥≥3∧𝑥≤0𝑥≥3∧ 𝑦 ≥0

YES

SMT Solver

Magnus Madsen

Theories

Theory of:– Difference Arithemetic– Linear Arithmetic– Arrays– Bit Vectors– Algebraic Datatypes– Uninterpreted Functions

Magnus Madsen

C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 13

Equivalence Checking of Program Fragments

int fun1(int y) { int x, z; z = y; y = x; x = z;

return x*x;}

int fun2(int y) { return y*y;} What if we use SAT to check equivalence?

SMT formula Satisfiable iff programs non-equivalent

( z = y y1 = x x1 = z ret1 = x1*x1) ( ret2 = y*y ) ( ret1 ret2 )




return x*x;}

int fun2(int y) { return y*y;}

SMT formula Satisfiable iff programs non-equivalent

( z = y y1 = x x1 = z ret1 = x1*x1) ( ret2 = y*y ) ( ret1 ret2 )

Using SAT to check equivalence (w/ Minisat) 32 bits for y: Did not finish in over 5 hours 16 bits for y: 37 sec. 8 bits for y: 0.5 sec.




return x*x;}

int fun2(int y) { return y*y;}

SMT formula ’

( z = y y1 = x x1 = z ret1 = sq(x1) ) ( ret2 = sq(y) ) ( ret1 ret2 )

Using EUF solver: 0.01 sec

Verification Synthesis

16

Program Verification: Does P meet spec j ?

SMT: Is j satisfiable ?

SMT-LIB/SMT-COMP Standard API Solver competition

Program Synthesis: Find P that meets spec j

Syntax-Guided Synthesis

Plan for SyGuS-comp

Talk Outline

Formalization of SyGuS

Solution Strategies


17

Syntax-Guided Synthesis (SyGuS) Problem

Fix a background theory T: fixes types and operations

Function to be synthesized: name f along with its type

Inputs to SyGuS problem:Specification j (semantic constraint)

Typed formula using symbols in T + symbol f

Set E of expressions given by a context-free grammarSet of candidate expressions that use symbols in T

(syntactic constraint)

Computational problem: Output e in E such that j[f/e] is valid (in theory T)

18

SyGuS Example

Theory QF-LIATypes: Integers and BooleansLogical connectives, Conditionals, and Linear arithmeticQuantifier-free formulas

Function to be synthesized f (int x, int y) : int

Specification: (x ≤ f(x,y)) & (y ≤ f(x,y)) & (f(x,y) = x | f(x,y) = y)

Candidate Implementations: Linear expressionsLinExp := x | y | Const | LinExp + LinExp | LinExp - LinExp

No solution exists

19

SyGuS Example

Theory QF-LIA

Function to be synthesized: f (int x, int y) : int


Candidate Implementations: Conditional expressions with comparisons

Term := x | y | Const | If-Then-Else (Cond, Term, Term)Cond := Term <= Term | Cond & Cond | ~ Cond | (Cond)

Possible solution:If-Then-Else (x ≤ y, y, x)

…. Solving SyGus is hard!20

Talk Outline

Solution Strategies


21

Solving SyGuS as Active Learning:

22

Learning Algorithm

Verification Oracle

Initial examples I

Fail Success

CandidateExpression

Counterexample

Concept class: Set E of expressions

Examples: Concrete input values

Counter-Example Guided Inductive Synthesis

CEGIS Example


Set E: All expressions built from x,y,0,1, Comparison, +, If-Then-Else

23

LearningAlgorithm

Verification Oracle

Examples = { }

Candidatef(x,y) = x

Example(x=0, y=1)

CEGIS Example



24

LearningAlgorithm

Verification Oracle

Examples = {(x=0, y=1) } Candidate

f(x,y) = y

Example(x=1, y=0)

CEGIS Example



25

LearningAlgorithm

Verification Oracle

Examples = {(x=0, y=1) (x=1, y=0) (x=0, y=0) (x=1, y=1)} Candidate

ITE (x ≤ y, y,x)

Success

SyGuS Solutions

CEGIS approach (Solar-Lezama, Seshia et al)

Related work: Similar strategies for solving quantified formulas and invariant generation

Coming up: Learning strategies based on:Enumerative (search with pruning): Udupa et al (PLDI’13)Symbolic (solving constraints): Gulwani et al (PLDI’11)Stochastic (probabilistic walk): Schkufza et al (ASPLOS’13)

26

Enumerative Learning

Find an expression consistent with a given set of concrete examples

Enumerate expressions in increasing size, and evaluate each expression on all concrete inputs to check consistency

Key optimization for efficient pruning of search space (examples):Expressions e1 and e2 are equivalent if e1(a,b)=e2(a,b) on all concrete values (x=a,y=b) in Examples Only one representative among equivalent subexpressions needs to be considered for building larger expressions

Fast and robust for learning expressions with ~ 15 nodes

27

Symbolic Learning

Use a constraint solver for both the synthesis and verification steps

28

Each production in the grammar is thought of as a component.Input and Output ports of every component are typed.

A well-typed loop-free program comprising these component corresponds to an expression DAG from the grammar.

ITE

Term

Term

Term

Cond>=

Term Term

Cond

+

Term Term

Term

x

Term

y

Term

0

Term

1

Term

Symbolic Learning

29

xn1

xn2

yn3

yn4

0n5

1n6

+n7

+n8

>=n9

ITEn10

Synthesis Constraints:Shape is a DAG, Types are consistentSpec j[f/e] is satisfied on every concrete input values in Examples

Use an SMT solver (Z3) to find a satisfying solution.

If synthesis fails, try increasing the number of occurrences of components in the library in an outer loop

Start with a library consisting of some number of occurrences of each component.

Symbolic Learning - example

Iteration 1:

30

Learned counter-example: <x= -1, y=0>

x


Iteration 2:

31

Learned counter-example: <x= 0, y=-1>

x

ITE

≥ y

xx


Iteration 3:

32

Learned counter-example: -

x

ITE

≥ y

xy

Stochastic Learning

Idea: Use the Metropolis-Hastings Method to find desired expression e by probabilistic walk on graph where nodes are expressions and edges capture single-edits.

…..in simple words:

Let En be the expressions of size n (n picked randomly).

For every expression e in En set Score(e) between 0 and 1 (“Extent to which e meets the spec φ”)

Score(e) = exp( - 0.5 Wrong(e)), where Wrong(e) = No of examples in I for which ~ j [f/e]

Score(e) is large when Wrong(e) is small. Expressions e with Wrong(e) = 0 more likely to be chosen in the limit than any other expression

33

Initial candidate expression e sampled uniformly from En

When Score(e) = 1, return e

Pick node v in parse tree of e uniformly at random. Replace subtree rooted at e with subtree of same size, sampled uniformly

Stochastic Learning

34

+

z

e

+

yx

+

z

e’

-

1z

With probability min{ 1, Score(e’)/Score(e) }, replace e with e’ Repeat until finding e’ such that Score(e’) = 1. Outer loop responsible for updating expression size n

Specification: (x ≤ f(x,y)) & (y ≤ f(x,y)) & (f(x,y) = x | f(x,y) = y) Set E: All expressions built from x,y,0,1, Comparison, +, If-Then-

Else

Stochastic Learning - example

35

Suppose n = 6 ; there are 768 expressions of size 6

e = ITE(x ≤ 0,y,x) is picked with probability 1/768

The condition x ≤ 0 is mutated to y ≤ 0 with probability 1/6 X 1/48

Suppose the set of concrete examples is: {(-1,-4),(-1, 3),(-1, 2),(1,1),(1,2)} Then Score(e)=exp(-0.5X2), and Score(e’)=exp(-0.5X4)

As e’ is replaced with e with probability exp(-0.5X2)

Note that for e’’ = ITE(x<= y,y,x) we have Score(e’’)=1.

Benchmarks and Implementation

Prototype implementation of Enumerative/Symbolic/Stochastic CEGIS

Benchmarks:Bit-manipulation programs from Hacker’s delightInteger arithmetic: Find max, search in sorted arrayChallenge problems such as computing Morton’s number

Multiple variants of each benchmark by varying grammar

Results are not conclusive as implementations are unoptimized, but offers first opportunity to compare solution strategies

36

Evaluation: Integer Benchmarks

37

array_search_2.sl

array_search_3.sl

array_search_4.sl

array_search_5.sl

max2.sl max3.sl0.01

0.1

1

10

100

1000

Relative Performance of Integer Benchmarks

Enumerative Stochastic (median) Symbolic

appro

xim

ate

tim

e in s

ec.

Evaluation 3: Hacker’s Delight Benchmarks

38

hd-0

1-d0

-pro

g.sl

hd-0

1-d5

-pro

g.sl

hd-0

2-d0

-pro

g.sl

hd-0

3-d0

-pro

g.sl

hd-0

3-d1

-pro

g.sl

hd-0

3-d5

-pro

g.sl

hd-0

5-d1

-pro

g.sl

hd-0

6-d0

-pro

g.sl

hd-0

7-d1

-pro

g.sl

hd-0

9-d1

-pro

g.sl

hd-1

0-d1

-pro

g.sl

hd-1

1-d0

-pro

g.sl

hd-1

1-d1

-pro

g.sl

hd-1

1-d5

-pro

g.sl

hd-1

3-d0

-pro

g.sl

hd-1

3-d5

-pro

g.sl

hd-1

4-d0

-pro

g.sl

hd-1

4-d1

-pro

g.sl

hd-1

4-d5

-pro

g.sl

hd-1

5-d0

-pro

g.sl

hd-1

5-d1

-pro

g.sl

hd-1

5-d5

-pro

g.sl

hd-1

7-d0

-pro

g.sl

hd-1

7-d1

-pro

g.sl

hd-1

7-d5

-pro

g.sl

hd-1

8-d1

-pro

g.sl

hd-1

8-d5

-pro

g.sl

hd-1

9-d1

-pro

g.sl

hd-2

0-d0

-pro

g.sl

hd-2

0-d5

-pro

g.sl

0.01

0.1

1

10

100

1000

Relative Performance on a Sample of Hacker's Delight Benchmarks

Enumerative Stochastic (median) Symbolic

appro

xim

ate

tim

e in s

ec.

Evaluation Summary

Enumerative CEGIS has best performance, and solves many benchmarks within seconds

Potential problem: Synthesis of complex constants

Symbolic CEGIS is unable to find answers on most benchmarksCaveat: Sketch succeeds on many of these

Choice of grammar has impact on synthesis timeWhen E is set of all possible expressions, solvers struggle

None of the solvers succeed on some benchmarksMorton constants, Search in integer arrays of size > 4

Bottomline: Improving solvers is a great opportunity for research !

39

Plan for SyGuS-Comp

Proposed competition of SyGuS solvers at FLoC, July 2014

Organizers: Alur, Fisman (Penn) and Singh, Solar-Lezama (MIT)

Website: excape.cis.upenn.edu/Synth-Comp.html

Mailing list: [email protected]

Call for participation:Join discussion to finalize synth-lib format and competition formatContribute benchmarksBuild a SyGuS solver

40

mailto:[email protected]

Documents

Syntax-Guided Synthesis Rajeev Alur Joint work with R.Bodik, G.Juniwal, M.Martin, M.Raghothaman, S.Seshia, R.Singh, A.Solar-Lezama, E.Torlak, A.Udupa 1