68
Integrating Induction, Deduction and Structure for Synthesis Integrating Induction, Deduction and Structure for Synthesis Sanjit A. Seshia Associate Professor EECS Department UC Berkeley NSF ExCAPE Summer School June 12, 2013 Students & Postdocs: S. Jha, W.Li, A. Donze, L. Dworkin, B. Brady, D. Holcomb, J. Kotker, D. Sadigh Collaborators: A. Tiwari, S. Gulwani, J. Deshmukh, X. Jin

Integrating Induction, Deduction and Structure for … Induction, Deduction and Structure for Synthesis Sanjit A. Seshia Associate Professor EECS Department UC Berkeley ... Reverse

Embed Size (px)

Citation preview

Integrating Induction, Deduction and Structure for Synthesis

Integrating Induction, Deduction and Structure for Synthesis

Sanjit A. Seshia

Associate ProfessorEECS Department

UC Berkeley

NSF ExCAPE Summer SchoolJune 12, 2013

Students & Postdocs: S. Jha, W.Li,A. Donze, L. Dworkin, B. Brady, D. Holcomb, J. Kotker, D. Sadigh

Collaborators: A. Tiwari, S. Gulwani, J. Deshmukh, X. Jin

– 2 –

Software Synthesis

Machine Learning (Theory)

Formal Verification

Connections in this TalkConnections in this Talk

– 3 –

Two MessagesTwo Messages

1. Synthesis Everywhere– Many problems can be solved effectively

when viewed as synthesis

2. Effective Approach: Induction + Deduction + Structure – Induction: Learning from examples– Deduction: Logical inference and constraint solving– Structure: Hypothesis on syntactic form of artifact

to be synthesized

– 4 –

Many Problems map to SynthesisMany Problems map to Synthesis Formal Verification

– Synthesis of “verification artifacts”– E.g.: Inductive invariants, abstraction functions

Formal Specification– E.g., synthesis of temporal logic formulas

Reverse Engineering– E.g., malware analysis

System Modeling– E.g., timing models for embedded platforms

– 5 –

Formal Verification as SynthesisFormal Verification as Synthesis

Inductive Invariants

Abstraction Functions

– 6 –

Example Verification ProblemExample Verification Problem

Transition System– Init: I

x = 1 y = 1– Transition Relation:

x’ = x+y y’ = y+x Property: = G (y 1) Attempted Proof by Induction:

y 1 x’ = x+y y’ = y+x y’ 1 Fails. Need to Strengthen Invariant: Find s.t.

x = 1 y = 1 y 1 x’ = x+y y’ = y+x ’ y’ 1

– 7 –

Example Verification ProblemExample Verification Problem

Transition System– Init: I

x = 1 y = 1– Transition Relation:

x’ = x+y y’ = y+x Property: = G (y 1) Attempted Proof by Induction:

y 1 x’ = x+y y’ = y+x y’ 1 Fails. Need to Strengthen Invariant: Find s.t.

x 1 y 1 x’ = x+y y’ = y+x x’ 1 y’ 1 Safety Verification Invariant Synthesis

– 8 –

One Reduction from Verification to SynthesisOne Reduction from Verification to Synthesis

SYNTHESIS PROBLEMSynthesize s.t.

I ’ ’

VERIFICATION PROBLEMDoes M satisfy ?

NOTATIONTransition system M = (I, ) Safety property = G()

– 9 –

Two Reductions from Verification to SynthesisTwo Reductions from Verification to Synthesis

NOTATIONTransition system M = (I, ), S = set of states Safety property = G()

SYNTHESIS PROBLEM #1Synthesize s.t.

I ’ ’

VERIFICATION PROBLEMDoes M satisfy ?

SYNTHESIS PROBLEM #2Synthesize : S Ŝ where

(M) = (I, ) s.t.

(M) satisfies iff

M satisfies

ˆ ˆ

– 10 –

Common Approach for both: “Inductive” SynthesisCommon Approach for both: “Inductive” Synthesis

Synthesis of:-

Inductive Invariants– Choose templates for invariants– Infer likely invariants from tests (examples)– Check if any are true inductive invariants, possibly

iterate

Abstraction Functions– Choose an abstract domain– Use Counter-Example Guided Abstraction

Refinement (CEGAR)

– 11 –

Counterexample-Guided Abstraction Refinement is Inductive SynthesisCounterexample-Guided Abstraction Refinement is Inductive Synthesis

Invoke Model

CheckerDone

Valid

Counter-example

CheckCounterexample:

Spurious?Spurious

Counterexample

YES

Abstract Domain

System +Property

Initial Abstraction

Function

DoneNO

Generate Abstraction

Abstract Model + Property

Refine Abstraction

Function

New Abstraction Function

Fail

SYNTHESIS VERIFICATION

[Anubhav Gupta, ‘06]

– 12 –

CEGAR = Inductive SynthesisCEGAR = Inductive Synthesis

INITIALIZE

SYNTHESIZE VERIFY

CandidateArtifact

Counterexample

Verification SucceedsSynthesis Fails

Structure Hypothesis, Initial Examples

– 13 –

CEGAR = Inductive Synthesis = Learning from (Counter)ExamplesCEGAR = Inductive Synthesis = Learning from (Counter)Examples

INITIALIZE

LEARNINGALGORITHM

VERIFICATIONORACLE

CandidateConcept

Counterexample

Learning SucceedsLearning Fails

“Concept Class”, Initial Examples

– 14 –

Active Learning: Key ElementsActive Learning: Key Elements

ACTIVELEARNING

ALGORITHM

1. Search Strategy: How to search the space of candidate concepts?

2. Example Selection: Which examples to learn from?

Examples

Search Strategy

SelectionStrategy

– 15 –

Counterexample-Guidance: A Successful Paradigm for Synthesis and LearningCounterexample-Guidance: A Successful Paradigm for Synthesis and Learning

Active Learning from Queries and Counterexamples [Angluin ’87a,’87b]

Counterexample-Guided Inductive Synthesis (CEGIS) [Solar-Lezama et al., ’06]– See upcoming lectures

Both rely heavily on Verification Oracle

Other effective active learning methods?– Especially when Verification Oracle is expensive

– 16 –

Induction + Deduction + StructureInduction + Deduction + Structure

Structure Hypotheses(on artifacts to be synthesized)

SKETCH,TEMPLATE,

Etc.

Deductive Procedure “Lightweight”: solves

lower complexity problem or special case of original

decision problem

Inductive ProcedureActive Learning: selects examples to learn from

S. A. Seshia, “Sciduction: Combining Induction, Deduction,and Structure for Verification and Synthesis”, DAC 2012

– 17 –

Reverse Engineering MalwareReverse Engineering Malware

Obfuscated code:Input: y Output: modified value of y

{ a=1; b=0; z=1; c=0;while(1) { if (a == 0) {if (b == 0) { y=z+y; a =~a;b=~b; c=~c; if (~c) break; }else {z=z+y; a=~a; b=~b; c=~c;if (~c) break; } }else if (b == 0) {z=y << 2; a=~a;}else { z=y << 3; a=~a; b=~b;}

} }FROM CONFICKER WORM

What it does:

y = y * 45

We solve this using program synthesis.

Paper: S. Jha et al., “Oracle-Guided Component-Based Program Synthesis”, ICSE 2010.

– 18 –

Reverse Engineering MalwareReverse Engineering Malware

Obfuscated code:Input: y Output: modified value of y

{ a=1; b=0; z=1; c=0;while(1) { if (a == 0) {if (b == 0) { y=z+y; a =~a;b=~b; c=~c; if (~c) break; }else {z=z+y; a=~a; b=~b; c=~c;if (~c) break; } }else if (b == 0) {z=y << 2; a=~a;}else { z=y << 3; a=~a; b=~b;}

} }FROM CONFICKER WORM

What it does:

y = y * 45

We solve this using program synthesis.

Paper: S. Jha et al., “Oracle-Guided Component-Based Program Synthesis”, ICSE 2010.

Loop-free compositions of “components” (+, -, <<, >>, *, if-then-else,…)

+Learning from Distinguishing I/O Examples

+SMT Solving (bit-vector arithmetic)

– 19 –

Synthesizing Switching Logic for Hybrid SystemsSynthesizing Switching Logic for Hybrid Systems

• SAFETY: Room Temperature x must lie between 20 and 22 C.• OPTIMALITY: Minimize switching between modes to save energy

?

?

?

?

Papers: S. Jha et al., ICCPS 2010 and EMSOFT 2011.

– 20 –

Synthesizing Switching Logic for Hybrid SystemsSynthesizing Switching Logic for Hybrid Systems

• SAFETY: Room Temperature x must lie between 20 and 22 C.• OPTIMALITY: Minimize switching between modes to save energy

?

?

?

?Guards are Hyperboxes

+Hyperbox Learning from +/- Examples

(safe/unsafe switching states)+

Numerical Simulation (constraint solving)

Papers: S. Jha et al., ICCPS 2010 and EMSOFT 2011.

– 21 –

Reactive Synthesis from LTLReactive Synthesis from LTL

SystemRequirements

EnvironmentAssumptions

SynthesisTool

Realizable

FSM

Unrealizable

Often due to missing environment assumptions!

s

e

e s

Paper: W. Li et al., “Mining Assumptions for Synthesis”, MEMOCODE 2011.

– 22 –

Reactive Synthesis from LTLReactive Synthesis from LTL

SystemRequirements

EnvironmentAssumptions

SynthesisTool

Realizable

FSM

Unrealizable

Often due to missing environment assumptions!

s

e

e s

Paper: W. Li et al., “Mining Assumptions for Synthesis”, MEMOCODE 2011.

Env Assumptions are Restricted GR(1)+

Version Space Learning (from Counterstrategies)

+(finite-state) Model Checking

– 23 –

OutlineOutline

Program Synthesis from Input-Output Examples Synthesizing Environment Assumptions for

Reactive Synthesis from LTL Conclusions and Future Directions

– 24 –

Reverse Engineering Malware by Program SynthesisReverse Engineering Malware by Program Synthesis

Obfuscated code:Input: y Output: modified value of y

{ a=1; b=0; z=1; c=0;while(1) { if (a == 0) {if (b == 0) { y=z+y; a =~a;b=~b; c=~c; if (~c) break; }else {z=z+y; a=~a; b=~b; c=~c;if (~c) break; } }else if (b == 0) {z=y << 2; a=~a;}else { z=y << 3; a=~a; b=~b;}

} }FROM CONFICKER WORM

What it does:

y = y * 45

We solve this using program synthesis.

Paper: S. Jha et al., “Oracle-Guided Component-Based Program Synthesis”, ICSE 2010.

– 25 –

Class of Programs: “Loop-Free”Class of Programs: “Loop-Free”

Programs implementing functions: I O

Functions could be if-then-else definitions and hence, the above represents any loop-free code.

P(I):O1 = f1 (V1)O2 = f2 (V2)…On = fn (Vn)

where

f1,f2,…,fn are functions from a given component library

– 26 –

ProblemProblem

Space of all possible programsEach dot represents semantically unique program

I/O OracleProgram Space

Specification Oracle

I/O Examples that identify the correct program?

– 27 –

Program Learning as Set CoverProgram Learning as Set Cover

Space of all possible programsEach dot represents semantically unique program

[Goldman & Kearns, 1995]

– 28 –

Program Learning as Set CoverProgram Learning as Set Cover

Space of all possible programsEach dot represents semantically unique program

– 29 –

Program Learning as Set CoverProgram Learning as Set Cover

Space of all possible programsEach dot represents semantically unique program

(i1, o1)

Example Set of programs ruled out by that example

– 30 –

Program Learning as Set CoverProgram Learning as Set Cover

Space of all possible programsEach dot represents semantically unique program

(i1, o1) - E1(i2, o2) - E2

………

(in, on) - En

Theorem: [Goldman & Kearns, ‘95]Smallest set of I/O examples to learn correct program

IS

Minimum size subset of {E1, E2, ……, En } that covers all the incorrect programs

– 31 –

Program Learning as Set CoverProgram Learning as Set Cover

Space of all possible programsEach dot represents semantically unique program

(i1, o1) - E1(i2, o2) - E2

………

(in, on) - En

Smallest set of I/O examples to learn correct design

IS

Minimum size subset of {E1, E2, ……, En } that cover all the incorrect programs

Practical challenge: can’t enumerate all inputs and find set Ei for each

– 32 –

Program Learning as Set CoverProgram Learning as Set Cover

Space of all possible programsEach dot represents semantically unique program

(i1, o1) - E1(i2, o2) - E2

………

(in, on) - En

ONLINE set-cover:

In each step, • choose some (ij,oj) pair• eliminated incorrect programs Ejdisclosed

– 33 –

Program Learning as Set CoverProgram Learning as Set Cover

Space of all possible programsEach dot represents semantically unique program

(i1, o1) - E1(i2, o2) - E2

………

(in, on) - En

ONLINE set-cover:

In each step, • choose some (ij,oj) pair• eliminated incorrect programs Ejdisclosed

Our heuristic:|Ej| 1: atleast one incorrect program identified

Efficient model counting could yield better algorithm

– 34 –

Our Approach: Learning based on Distinguishing InputsOur Approach: Learning based on Distinguishing Inputs

Space of all possible programsEach dot represents semantically unique program

[Jha et al., ICSE’10]

– 35 –

Space of all possible programs

Example I/O set E := {(i1,o1)}

Our ApproachOur Approach

– 36 –

Space of all possible programs

Example I/O set E := {(i1,o1)}

P1

P2

Our ApproachOur Approach

SMT

SMT

– 37 –

Space of all possible programs

Example I/O set E := {(i1,o1)}

P1

P2i2

Our ApproachOur Approach

SMT

– 38 –

Space of all possible programs

Example I/O set E := {(i1,o1)}

i2

(i2, o2)

Input/output Oracle

P1

P2

Our ApproachOur Approach

– 39 –

Space of all possible programs

Example I/O set E := E {(i2,o2)}

Our ApproachOur Approach

– 40 –

Space of all possible programs

Example I/O set E := E {(ij,oj)}

Our ApproachOur Approach

– 41 –

Space of all possible programs

Example I/O set E := E {(ik,ok)}

Our ApproachOur Approach

– 42 –

Space of all possible programs

Example I/O set E := E {(in,on)}

Semantically Unique Program

Correct Program?

Our ApproachOur Approach

– 43 –

SoundnessSoundness

Library of components is sufficient ?

Correct design

I/O pairs show infeasibility ?

YES

Infeasibility reported

Incorrect design

YES

NO

NO

Can invoke a Verification Oracle at the end and iterate

Conditional on Validity of Structure Hypothesis

– 44 –

Other Important DetailsOther Important Details

Novel encoding of the space of possible programs as an SMT formula

Obtain a feasible program for given set of input/output pairs using SMT solving

Obtain second feasible program and a distinguishing input using SMT solving

[Jha et al., ICSE ’10, PLDI ’11]

– 45 –

Result HighlightsResult Highlights

Malware Deobfuscation– Conficker worm– MyDoom and – survey paper on obfuscations by Collberg et al*

Synthesized over 35 bit-manipulation programs from Hacker’s delight (the “Bible of bit-manipulation”).

Program length: 3-15 Number of input/output examples: 2 to 13. Total runtime: < 1 second to 5 minutes.

*C. Collberg, C. Thomborson, and D. Low. A taxonomy of obfuscating transformations. Technical Report 148, Dept. Comp. Sci., The Univ. of Auckland, July 1997.

– 46 –

OutlineOutline

Program Synthesis from Input-Output Examples Synthesizing Environment Assumptions for

Reactive Synthesis from LTL Conclusions and Future Directions

Synthesis from LTL

47

SystemRequirements

EnvironmentAssumptions

SynthesisTool

Unrealizable

Realizable

FSM

Often due to incomplete environment assumptions!

s

e

e s

Satisfiability and Realizability

48

A LTL formula is satisfiable if there exists an infinite word (i.e. sequence of inputs and outputs) that satisfies .

A LTL specification is realizable if there exists a finite-state transducer M (e.g. a Moore machine) which, for any input sequence, generates computations that satisfies .

Problem Description

49

Goal:Generate additional assumptions to enable synthesis

Context:Original specification is satisfiable but unrealizable

Assume:- Given a few interesting user scenarios (satisfying traces) - Specifications are in the GR(1) class

Challenge:- Space of possible additional assumptions is huge-Want assumptions that can be understood and analyzed by a human user

Example

50

Inputs: request r and cancel c

Outputs: grant g

System specification s:- G (r X F g)- G(c g X g)

Environment assumption e: - True

No user scenarios.

Not realizable because the environment can force c to be high all the time

Our Contribution

Counterstrategy-guided synthesis of environment assumptions Demonstrated to generate useful/intuitive

environment assumptions for digital circuits and robotic controllers

[Wenchao Li et al., MEMOCODE 2011]

51

Approach for Synthesizing Environment Assumptions

Structure Hypothesis: Environment Assumptions are Restricted GR(1) properties

Deductive Engine:(Finite-state) Model Checking

Inductive Inference: Version Space Learning from Counterstrategies

+

+

52

Formulas in the form: e s

o Input and output partitions I and O.o

i: initial state formulas.o

t: transition formulas, in the form of G B, where B is a Boolean combination of variables in I O and expressions X u, u I if = e and u I O if = s.

o f: fairness formulas, in the form of G F B, where B is a

Boolean formula over I O.

o Synthesis as a turn-based two-player game between the system and the environmento Realizable if the system has a winning strategy, otherwise env

wins; Strategy representable as finite-state transducer

GR(1) Synthesis

53

{e, s}

[Piterman, Pnueli, Saar]

Counter-strategy and counter-trace

54

Counter-strategy is a strategy for the environment to force violation of the specification.

Counter-trace is a fixed input sequence such that the specification is violated regardless of the outputs generated by the system.

Example

55

System s:- G (r X F g)- G(c g X g)

A counter-trace:r: 1 1 (1)c: 1 1 (1)

CounterStrategy-Guided Environment Assumption Synthesis

56

FormalSpecification

Synthesis Tool

Realizable

Done

ComputeCounterstrategy

Unrealizable

Start

MineAssumptions

AddSpecificationTemplates

UserScenarios

Assumption Synthesis Algorithm

57

Generate Candidate φ

e.g. G F ?Specification

Templates

UserScenarios

Counter-strategy

Eliminate or Accept

Candidate

e.g. G F p

DoneAccept φ

Eliminate φ

(check consistency with existing assumptions)

Assumption Synthesis Algorithm

58

Generate Candidate φ

e.g. G F ?Specification

Templates

UserScenarios

Counter-strategy E

Invoke Model Checker: Does E satisfy φ ?

e.g. G F p

DoneYES: Accept φ

NO: Eliminate φ

VERSION SPACES:• Retain candidates

consistent with examples

• Traverse weakest to strongest

(check consistency with existing assumptions)

Assumption Templates follow GR(1)

59

1: G F ?b, where b I 2: G (?b1 X ?b2), where b1 I O and b2 I 3: G (?b1 ?b2), where b1, b2 I

The specification remains in GR(1) after the addition of new assumptions.

Occam’s Razor: Simple propositional structure generalizable Selecting candidate assumptions: (next slide)

Weakest to Strongest, with heuristics G F b weaker than G X b weaker than G b

Version Space Learning

G F p1

true

false

G F p2 G F pn

G X p1

G p1

G p1 p2

G p2 G pn

G pn-1 pn

G X pn

G (p1 X p2)

G X p2

. . .

. . .

. . .

. . .

. . .

. . .

STRONGEST (most specific)

WEAKEST (most general)

Originally due to [Mitchell, 1978]

Theoretical Results

61

Theorem 1: [Completeness] If there exist environment assumptions under our structure hypothesis that make the spec realizable, then the procedure finds them (terminates successfully). “conditional completeness” guarantee

Theorem 2: [Soundness] The procedure never adds inconsistent environment assumptions.

Example

62

System s:- G (r X F g)- G(c g X g)

A counter-trace:r: 1 1 (1)c: 1 1 (1)

Test assumption candidates by checking its negation:o G (F c) ------------ o G (F c) ------------

System s:- G (r X F g)- G(c g X g)

Environment e:- G (F c)

Experimental Results Summary

63

Experiment Setup: - Remove assumptions from an originally realizable specification- Use a few (often a single) satisfying traces of the original

specification as representative user scenarios- Mine additional assumptions until the specification is realizable

Use Cadence SMV to generate the satisfying traces and model check the counter-strategies

Use RATSY [Bloem et al. 2010] to check realizability of the specifications and compute the counter-strategies and counter-traces in case of unrealizability

Result Summary:- Case studies in existing literature: AMBA AHB, IBM Gen

Buffer, robotic vehicle controller, etc.- Recovered the missing assumption in most cases- AMBA AHB Example:

Experimental Results Summary

64

Mined assumption:G (F HLOCK[0] = 0)

Original assumption:G (HLOCK[0] = 1 → HBUSREQ[0] = 1)

Master 0 requests locked access to the bus

Master 0 requests access to the bus

– 65 –

OutlineOutline

Switching Logic Synthesis for Hybrid Systems Reflections on the Approach Synthesizing Environment Assumptions for

Reactive Synthesis from LTL Conclusions and Future Directions

– 66 –

Induction + Deduction + StructureInduction + Deduction + Structure

Structure Hypothesis encodes human insight about form of artifact to be synthesized

Synthesis procedure combines inductive inference with deductive reasoning

Many instances of this approach exist already

Key Points of Difference between instances:1. Search Strategy: How to search the space of

candidate concepts?2. Example Selection: Which examples to learn from?3. Deductive Oracles

– 67 –

Comparison of Three ApproachesComparison of Three Approaches

Search Strategy for Candidate Programs

Example Selection Strategy

“Traditional”Counterexample-Guided Synthesis

Determined by Constraint Encoding & Solver Heuristics

Counter-Exampleguided

Program Synthesis from I/O Examples

Determined by Constraint Encoding & Solver Heuristics

Distinguishing Inputguided

Env Assumption Generation for LTL Synthesis

Version SpacesCounter-Strategy(counter-example) guided

– 68 –

Ongoing and Future WorkOngoing and Future Work Many Other Applications

– Switching logic synthesis for safety and optimality [Jha et al ’10,’11]

– Fixed-point code from floating-point [Jha & S, ’11, ’13]– Synthesizing Abstractions for Hardware Verification

[Brady et al., ’11]– Synthesizing Requirements for Industrial Control Models

[Jin et al, ’13]– …

Need better understanding of Inductive Synthesis – Explore range of learning models– Role of structure hypothesis– Control over deductive oracles (e.g., via problem encoding

/ search strategy in SAT/SMT)