78
Nikolaj Bjørner Senior Researcher Microsoft Research Redmond Modern Satisfiability Modulo Theories Solvers in Program Analysis

Nikolaj Bjørner Senior Researcher Microsoft Research Redmond

  • Upload
    pink

  • View
    68

  • Download
    0

Embed Size (px)

DESCRIPTION

Nikolaj Bjørner Senior Researcher Microsoft Research Redmond. Modern Satisfiability Modulo Theories Solvers in Program Analysis. Lectures. Wednesday 10:45–12:15 An Introduction to Z3 with Applications Thursday August 30 th 15:45–17:15 Introduction to SAT and SMT Friday 10:30–10:45 - PowerPoint PPT Presentation

Citation preview

Page 1: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Nikolaj BjørnerSenior ResearcherMicrosoft Research Redmond

Modern Satisfiability Modulo Theories Solvers in Program Analysis

Page 2: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Lectures

Wednesday 10:45–12:15An Introduction to Z3 with Applications

Thursday August 30th 15:45–17:15Introduction to SAT and SMT

Friday 10:30–10:45 Theories and Solving Algorithms

Friday 15:45–17:15 Advanced & Summary: Quantifiers, Arrays, Fixed-points

Page 3: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Plan

I. Introduction to Z3 – An Efficient SMT SolverI. ArchitectureII. Interaction – Input formatsIII. Theories

II. Introduction to ApplicationsI. Basic examples of SMT solvingII. A selection of applications using Z3/SMT.

Page 4: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Takeaways:

• Many program analysis, verification and test tools solve problems that can be reduced to logical formulas and transformations between logical formulas at their core.

• Modern SMT solvers are a often good fit.– Handle domains found in programs directly.

• The selected examples are intended to show instances where sub-tasks are reduced to SMT/Z3.

Page 5: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Z3 – AN EFFICIENT SMT SOLVER

http://research.microsoft.com/projects/z3

Page 6: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

What is Z3?

TheoriesBit-Vectors

Lin-arithmetic Nonlin-arithmetic

Free (uninterpreted) functions

Arrays

Quantifiers:E-matching

OCaml

.NET

CNative

SMT-LIB

Model Generation:Finite/parametric

Simplify

Floating pointsRecursive Datatypes

Quantifiers:Super-position

Proof objects

Tactics Assumption tracking

By Leonardo de Moura, Nikolaj Bjørner, Christoph Wintersteiger

F# quote

Python

Page 7: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Z3 is Cutting Edge

http://smtcomp.org

Z3 is Free Software

for research

Page 8: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Z3: Little Engines of Proof

Congruence

Closure

SAT Solver

Quant.Instance

s

Simplex

Super-position

Arrays

User-Theorie

s

MBQI

Freely available from http://research.microsoft.com/projects/z3

Page 9: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

INPUT FORMATS

Page 10: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Input Formats

• Text:– SMT-LIB2 - main exchange format for SMT solvers– Log - low-level log for replay– Datalog - a Datalog format for the fixed-point engine– DIMACS - a format for propositional SAT

• Programmatic:– C - API functions exposed for C– Ocaml - Ocaml wrapper around C API– .NET - .NET wrapper around C API– Python - Try it on http://rise4fun.com/z3py – Scala - by Phillipe Suter

Page 11: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

A Primer on SMT-LIB2

Online interactive tutorial

http://rise4fun.com/z3/tutorial

Page 12: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

THEORIES

Page 14: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Uninterpreted functionsArithmetic (linear)

Theories

Page 15: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Uninterpreted functionsArithmetic (linear)Bit-vectors

Theories

Page 16: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Uninterpreted functionsArithmetic (linear)Bit-vectorsAlgebraic data-types

Theories

Page 17: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Uninterpreted functionsArithmetic (linear)Bit-vectorsAlgebraic data-typesArrays

Theories

Page 18: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Uninterpreted functionsArithmetic (linear)Bit-vectorsAlgebraic data-typesArraysPolynomial Arithmetic

Theories

Page 19: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Uninterpreted functionsArithmetic (linear)Bit-vectorsAlgebraic data-typesArraysPolynomial Arithmetic

Theories

Page 20: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

USER-INTERACTION AND GUIDANCE

Page 21: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Logical Formula

Sat/Model

Interaction

Page 22: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Interaction

Logical Formula

Unsat/Proof

Page 23: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Interaction

Simplify

Logical Formula

Page 24: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Interaction

ImpliedEqualities

- x and y are equal- z + y and x + z are equal

Logical Formula

Page 25: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Interaction

QuantifierEliminatio

n

Logical Formula

Page 26: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Interaction

Logical Formula

Unsat. Core

Page 27: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

APPLICATIONS OF Z3

Page 28: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

A Decision Engine for SoftwareSome Microsoft engines:- SDV: The Static Driver Verifier- PREfix: The Static Analysis Engine for C/C++.- Pex: Program EXploration for .NET.- SAGE: Scalable Automated Guided Execution - Spec#: C# + contracts- VCC: Verifying C Compiler for the Viridian Hyper-Visor- HAVOC: Heap-Aware Verification of C-code.- SpecExplorer: Model-based testing of protocol specs.- Yogi: Dynamic symbolic execution + abstraction.- FORMULA: Model-based Design- F7: Refinement types for security protocols- M3: Model Program Modeling- VS3: Abstract interpretation and Synthesis

They all use the SMT solver Z3.

Hyper-V

Page 29: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Applications

Test case generation

Verifying Compilers

Predicate Abstraction

Invariant Generation

Type Checking

Model Based Testing

Page 30: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Theorem Provers/Satisfiability Checkers

A formula F is validIff

F is unsatisfiable

Theorem Prover/Satisfiability Checker

SatisfiableModel

FUnsatisfiableProof

Page 31: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Theorem Provers/Satisfiability Checkers

A formula F is validIff

F is unsatisfiable

Theorem Prover/Satisfiability Checker

SatisfiableModel

FUnsatisfiableProof

Timeout Memout

Page 32: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Verification/Analysis Tool: “Template”

Verification/AnalysisTool

Theorem Prover/Satisfiability Checker

Problem

Logical Formula

UnsatisfiableSatisfiable(Counter-example)

Page 33: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

SMT EXAMPLEBasic

Page 34: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Is formula satisfiable modulo theory T ?

SMT solvers have specialized algorithms for

T

Satisfiability Modulo Theories (SMT)

Page 35: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

ArithmeticArray Theory Uninterpreted Functions

𝑥+2=𝑦⇒ 𝑓 (𝑠𝑒𝑙𝑒𝑐𝑡 (𝑠𝑡𝑜𝑟𝑒 (𝑎 ,𝑥 ,3 ) , 𝑦−2 ) )= 𝑓 (𝑦−𝑥+1)

Satisfiability Modulo Theories (SMT)

Page 36: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

SMT EXAMPLEJob Shop Scheduling

Page 37: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Job Shop Scheduling

Machines

Jobs

P = NP? Laundry 𝜁 (𝑠 )=0⇒ 𝑠=12 +𝑖𝑟

Tasks

Page 38: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Constraints:Precedence: between two tasks of the same job

Resource: Machines execute at most one job at a time

413

2

[ 𝑠𝑡𝑎𝑟 𝑡2,2 ..𝑒𝑛𝑑2,2 ]∩ [ 𝑠𝑡𝑎𝑟 𝑡4,2 ..𝑒𝑛𝑑4,2 ]=∅

Job Shop Scheduling

Page 39: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Constraints:Encoding:

Precedence: - start time of job 2 on mach 3 - duration of job 2 on mach 3Resource:

413

2

[ 𝑠𝑡𝑎𝑟 𝑡2,2 ..𝑒𝑛𝑑2,2 ]∩ [ 𝑠𝑡𝑎𝑟 𝑡4,2 ..𝑒𝑛𝑑4,2 ]=∅

Not convex

Job Shop Scheduling

Page 40: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Job Shop Scheduling

Page 41: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Job Shop Scheduling

case split

case split

Efficient solvers:- Floyd-Warshal algorithm- Ford-Fulkerson algorithm

𝑧−𝑧=5 – 2 –3 – 2=−2<0

Page 42: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

EXAMPLE EXERCISESFollow-on to lectures by Hoare and Petersen

Page 43: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

True or false?

SMT-LIB2 formulation:(define-fun Max ((x Int) (y Int)) Int (if (> x y) x y))(define-fun ominus ((x Int) (y Int)) Int (Max 0 (- x y)))(declare-const x Int)(declare-const a Int)(assert (not (<= (+ (ominus x a) a) x)))(check-sat)(get-model)

Basic Arithmetic

Page 46: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Follow on Exercises

Prove the other properties for WP

Prove Axioms for Hoare and Milner calculi

Page 47: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

SMT APPLICATIONTest case generation

Page 48: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Test generation tools

SAGE

Internal. For Security Fuzzing

Runs on x86 instructions

External. For Developers

Runs on .NET code

Try it on: http://pex4fun.com

Finding security bugs before the hackers

black hat

Page 49: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

SMT APPLICATIONStatic Analysis

Integrating Z3 with PREfix static analyzer. Yannick Moy, David Selaff, B

Page 50: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

int binary_search(int[] arr, int low, int high, int key)

while (low <= high) { // Find middle value int mid = (low + high) / 2; int val = arr[mid]; if (val == key) return mid; if (val < key) low = mid+1; else high = mid-1; } return -1;}

void itoa(int n, char* s) {

if (n < 0) { *s++ = ‘-’; n = -n; } // Add digits to s ….

-INT_MIN= INT_MIN

(INT_MAX+1)/2 +(INT_MAX+1)/2

= INT_MIN

Package: java.util.ArraysFunction: binary_search

Book: Kernighan and RitchieFunction: itoa (integer to ascii)

What is wrong here?

Page 51: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

int init_name(char **outname, uint n){ if (n == 0) return 0; else if (n > UINT16_MAX) exit(1); else if ((*outname = malloc(n)) == NULL) { return 0xC0000095; // NT_STATUS_NO_MEM; } return 0;}

int get_name(char* dst, uint size) { char* name; int status = 0; status = init_name(&name, size); if (status != 0) { goto error; } strcpy(dst, name);error: return status;}

The PREfix Static Analysis Engine

C/C++ functions

Page 52: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

int init_name(char **outname, uint n){ if (n == 0) return 0; else if (n > UINT16_MAX) exit(1); else if ((*outname = malloc(n)) == NULL) { return 0xC0000095; // NT_STATUS_NO_MEM; } return 0;}

int get_name(char* dst, uint size) { char* name; int status = 0; status = init_name(&name, size); if (status != 0) { goto error; } strcpy(dst, name);error: return status;}

The PREfix Static Analysis Engine

C/C++ functions

model for function init_nameoutcome init_name_0: guards: n == 0 results: result == 0outcome init_name_1: guards: n > 0; n <= 65535 results: result == 0xC0000095outcome init_name_2: guards: n > 0|; n <= 65535 constraints: valid(outname) results: result == 0; init(*outname)

models

Page 53: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

int init_name(char **outname, uint n){ if (n == 0) return 0; else if (n > UINT16_MAX) exit(1); else if ((*outname = malloc(n)) == NULL) { return 0xC0000095; // NT_STATUS_NO_MEM; } return 0;}

int get_name(char* dst, uint size) { char* name; int status = 0; status = init_name(&name, size); if (status != 0) { goto error; } strcpy(dst, name);error: return status;}

The PREfix Static Analysis Engine

C/C++ functions

model for function init_nameoutcome init_name_0: guards: n == 0 results: result == 0outcome init_name_1: guards: n > 0; n <= 65535 results: result == 0xC0000095outcome init_name_2: guards: n > 0|; n <= 65535 constraints: valid(outname) results: result == 0; init(*outname)

path for function get_name guards: size == 0 constraints: facts: init(dst); init(size); status == 0

models

paths

warnings

pre-condition for function strcpy init(dst) and valid(name)

Page 54: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

6/26/2009 54

iElement = m_nSize;if( iElement >= m_nMaxSize ){

bool bSuccess = GrowBuffer( iElement+1 );…

}::new( m_pData+iElement ) E( element );m_nSize++;

Overflow on unsigned addition

m_nSize == m_nMaxSize == UINT_MAX

Write in unallocated

memory

iElement + 1 == 0

Code was written for

address space < 4GB

Page 55: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Using an overflown value as allocation size

ULONG AllocationSize;while (CurrentBuffer != NULL) {        if (NumberOfBuffers > MAX_ULONG / sizeof(MYBUFFER)) {             return NULL;

   }   NumberOfBuffers++;   CurrentBuffer = CurrentBuffer->NextBuffer;

}AllocationSize = sizeof(MYBUFFER)*NumberOfBuffers;UserBuffersHead = malloc(AllocationSize);

6/26/2009 55

Overflow check

Possible overflow

Increment and exit from loop

Page 56: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

LONG l_sub(LONG l_var1, LONG l_var2){ LONG l_diff = l_var1 - l_var2; // perform subtraction // check for overflow if ( (l_var1>0) && (l_var2<0) && (l_diff<0) ) l_diff=0x7FFFFFFF …

6/26/2009 56

Possible overflow

Forget corner case INT_MIN

Overflow on unsigned subtraction

Page 57: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

for (uint16 uID = 0; uID < uDevCount && SUCCEEDED(hr); uID++) {…

if (SUCCEEDED(hr)) { uID = uDevCount; // Terminates the loop

6/26/2009 57

Possible overflow

Loop does not terminate

uID == UINT_MAX

Overflow on unsigned addition

Page 58: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Using an overflown value as allocation size 

DWORD dwAlloc;dwAlloc = MyList->nElements * sizeof(MY_INFO);if(dwAlloc < MyList->nElements) … // returnMyList->pInfo = malloc(dwAlloc);

6/26/2009 58

Can overflow

Allocate less than needed

Not a proper test

Page 59: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Modular arithmetic

Bit-wise operations

1 0 1 0 1 1 0 1 1 0 0 1

1 0 1 0 1 1 0 1 1 0 0 1

=

Concatenation

1 0 1 0 1 1 [4:2] = 0 1 0

1 0 1 0 1 1

0 1 1 0 0 1

0 0 1 0 0 1=

1 0 1 0 1 1

0 1 1 0 0 1+

0 0 0 1 0 0

=

Extraction

Bit-wise and

AdditionVector

Segments

Vector Segments

Bit-precise analysis

Page 60: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Bit-precise analysis

FA

a0b0a0b1a0b2a0b3

a1b0a1b1a1b2

a2b0

HAHAHA

FA

FA

a2b1

a3b0

out0 out1 out2 out3

O(n2) clauses

SAT solving time increases exponentially. Similar for BDDs.

Brute-force enumeration + evaluation faster for 20 bits.

Method: Encode bit-vectors bit-by-bit into Boolean Satisfiability

Multiplication [out3, out2, out1, out0] = [a3, a2, a1, a0] * [b3, b2, b1, b0]

Page 61: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

SMT APPLICATIONVerified Software

Page 62: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Microsoft Verifying C Compiler

Page 63: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Hypervisor Verification (2007 – 2010)

Partners:• European Microsoft Innovation Center• Microsoft Research• Microsoft’s Windows Division• Universität des Saarlandes

co-funded by the German Ministry of Education and Research

http://www.verisoftxt.de

Page 64: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Hypervisor: Scale & Speed

• VCs have several Mega-bytes• Thousands of non ground formulas• Developers are willing to wait at most 5 min

per VC

Are you willing to wait more than 5 min for your compiler?

Page 65: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Verification Attempt Time vs.Satisfaction and Productivity

By Michal Moskal (VCC Designer and Software Verification Expert), Language quiz: “loose” or “lose” ?

Page 66: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Why did my proof attempt fail?

1. My annotations are not strong enough!weak loop invariants and/or contracts

2. My theorem prover is not strong (or fast) enough.Send “angry” email to Nikolaj and Leo.

Page 67: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

VCC Performance Trends Nov 08 – Mar 09

1

10

100

1000

Attempt to improve Boogie/Z3 interaction

Modification in invariant checking

Switch to Boogie2

Switch to Z3 v2

Z3 v2 update

Page 68: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

The Importance of Speed

Page 69: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Building VerveVerifie

d

Safe to the Last Instruction / Jean Yang & Chris Hawbliztl PLDI 2010

C# compiler

Kernel.cs

Boogie/Z3

Translator/Assembler

TAL checker

Linker/ISO generator

Verve.iso

Source file

Compilation toolVerification tool

Nucleus.bpl (x86) Kernel.obj (x86)

9 person-months

Page 70: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

SMT APPLICATIONFormal Language Theory and Security

Page 71: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Why string analysis?(motivating scenario)

Tomcat v. < 6.0.18

req = http://www.x.com/%c0%ae%c0%ae/%c0%ae%c0%ae/private/

Windows 2000 vulnerability: http://www.sans.org/security-resources/malwarefaq/wnt-unicode.php

Apache Tomcat vulnerability: http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2008-2938

1) security check: req must not contain "../"

2) dir = utf8decode("%c0%ae%c0%ae/%c0%ae%c0%ae/private/") = "../../private/"

access granted to "../../private/"

Analysis question:Does utf8decode reject overlong

utf8-encodings such as "%C0%AE" for '.'?

Page 72: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Symbolic Word Transducers

Relativized Formal Language Theory

Classical Word Transducers(e.g. decoding automata,

rational transductions)

Classical I/O Automata(e.g. Mealy machine)

ClassicalWord Acceptors

(NFA, DFA)

Symbolic Word Acceptors

regex matching

string transformation

Classical Word Acceptors modulo Th()

Classical Word Transducers modulo Th()

Page 73: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Rex & Bek – Symbolic RegEx & Transducers

Margus Veanes

Page 74: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Symbolic Finite Transducer (SFT)

• Classical transducer modulo a rich label theory• Core Idea: represent labels with guarded

transformation functions– Separation of concerns: finite graph / theory of labels

Concrete transitions:

p

q

Symbolic transition:

‘\x80’/“\xC2\x80”

… ‘\x7FF’/“\xDF\xBF”

q

p

x. 8016 ≤ x ≤ 7FF16/[C016|x10,6, 8016|x5,0]

guard

bitvector operations

1920transitions

Page 75: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

SMT APPLICATIONSoftware Model Checking

Page 76: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Static Driver Verifier

SLAM – part of SDV (Windows 7, 8)Z3 is used for:

Predicate abstraction Counter-example refinement

Yogi – next generation SDVZ3 is used for: Refining transition abstractions

Corral – uses reachability modulo theoriesSLAyer – checks memory safety using separation logic

Page 77: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Z3 & Static Driver Verifier: SLAM

• All-SAT– Better (more precise) Predicate Abstraction

• Unsatisfiable cores– Why the abstract path is not feasible?– Fast Predicate Abstraction

Page 78: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Summary

Introduction to Z3: An Efficient SMT Solver

Introduction to Applications:

– Exemplary applications

– Tools built around Z3

Tomorrow. More technical:– Introduction to SAT solving.– Introduction to SMT solving.