41
Relational String Verification Using Multi-track Automata Fang Yu, Tevfik Bultan, and Oscar Ibarra Department of Computer Science University of California, Santa Barbara

Relational String Verification Using Multi-track Automata

  • Upload
    senwe

  • View
    46

  • Download
    0

Embed Size (px)

DESCRIPTION

Relational String Verification Using Multi-track Automata. Fang Yu, Tevfik Bultan, and Oscar Ibarra Department of Computer Science University of California, Santa Barbara. Web software. Web software is becoming increasingly dominant Web applications are used extensively in many areas: - PowerPoint PPT Presentation

Citation preview

Page 1: Relational String Verification Using Multi-track Automata

Relational String Verification Using Multi-track Automata

Fang Yu, Tevfik Bultan, and Oscar Ibarra

Department of Computer ScienceUniversity of California, Santa Barbara

Page 2: Relational String Verification Using Multi-track Automata

Web software

• Web software is becoming increasingly dominant• Web applications are used extensively in many areas:

– Commerce: online banking, online shopping, …– Entertainment: online music & videos, …– Interaction: social networks

• We will rely on web applications more in the future:– Health records

• Google Health, Microsoft HealthVault– Controlling and monitoring of national infrastructures:

• Google Powermeter• Web software is also rapidly replacing desktop applications

– Could computing + software-as-service• Google Docs, Google …

Page 3: Relational String Verification Using Multi-track Automata

One Major Road Block

• Web applications are not secure!

• Web applications are notorious for security vulnerabilities – Their global accessibility makes them a target for many malicious

users

• As web applications are becoming increasingly dominant and as their use in safety critical areas is increasing – Their security is becoming a critical issue

Page 4: Relational String Verification Using Multi-track Automata

Web applications are not secure

• There are many well-known security vulnerabilities that exist in many web applications. Here are some examples:– Malicious file execution: where a malicious user causes the

server to execute malicious code– SQL injection: where a malicious user executes SQL commands

on the back-end database by providing specially formatted input– Cross site scripting (XSS): causes the attacker to execute a

malicious script at a user’s browser

• These vulnerabilities are typically due to – errors in user input validation or – lack of user input validation

Page 5: Relational String Verification Using Multi-track Automata

Web Application Vulnerabilities

Page 6: Relational String Verification Using Multi-track Automata

Web Application Vulnerabilities

• The top two vulnerabilities of the Open Web Application Security Project (OWASP)’s top ten list in 2007– Cross Site Scripting (XSS)– Injection Flaws (such as SQL Injection)

• The top two vulnerabilities of the OWASPs top ten list in 2010– Injection Flaws (such as SQL Injection)– Cross Site Scripting (XSS)

Page 7: Relational String Verification Using Multi-track Automata

Why are web applications error prone?

• Extensive string manipulation:– Web applications use extensive string manipulation

• To construct html pages, to construct database queries in SQL, etc.

– The user input comes in string form and must be validated and sanitized before it can be used

• This requires the use of complex string manipulation functions such as string-replace

– String manipulation is error prone

Page 8: Relational String Verification Using Multi-track Automata

String Related Vulnerabilities String related web application vulnerabilities occur when:

a sensitive function is passed a malicious string input from the user

This input contains an attack User input is not properly sanitized before it reaches the

sensitive function

String analysis: Discover these vulnerabilities automatically

Page 9: Relational String Verification Using Multi-track Automata

XSS Vulnerability A PHP Example:

1:<?php2: $www = $_GET[”www”];3: $l_otherinfo = ”URL”;4: echo ”<td>” . $l_otherinfo . ”: ” . $www . ”</td>”;5:?>

The echo statement in line 4 is a sensitive function It contains a Cross Site Scripting (XSS) vulnerability

<script ...

Page 10: Relational String Verification Using Multi-track Automata

String Analysis String analysis determines all possible values that a string expression

can take during any program execution Using string analysis we can identify all possible input values of the

sensitive functions Then we can check if inputs of sensitive functions can contain

attack strings How can we characterize attack strings?

Use regular expressions to specify the attack patterns Attack pattern for XSS: Σ <scriptΣ∗ ∗

If string analysis determines that the intersection of the attack pattern and possible inputs of the sensitive function is empty

then we can conclude that the program is secure If the intersection is not empty, then we conclude that the program

might be vulnerable

Page 11: Relational String Verification Using Multi-track Automata

String Systems

stmt ::= id := sexp; | id := call id (sexp);if exp then goto l; | (where l is a stmt label) goto L; | (where L is a set of stmt

labels) input id; |

output exp; | assert exp;

exp ::= bexp | exp and exp | exp and exp | not exp

bexp ::= atom = sexp

sexp ::= sexp . atom | atom | suffix(id) | prefix(id)

atom ::= id | c (where c is a string constant)

Page 12: Relational String Verification Using Multi-track Automata

Basic String System Categorization

We use the following categorization• N/D: nondeterministic or deterministic• U/B/K: unary, binary or arbitrary alphabet• The set of variables• The types of statements• The types of branch conditions

Example: NB(X1, X2) Xi := Xi.c; X1 = X2

Nondeterministic, binary alphabet, variables X1, X2, statements of the form Xi := Xi.c, branch conditions of the form X1 = X2

Define the reachability problem for the string systems as:

Given a string system and a configuration (an instruction label and values for the variables) is that configuration reachable?

Page 13: Relational String Verification Using Multi-track Automata

Decidability Results

Reachability problem for:

• NB(X1,X2) Xi := Xi.c; X1 = X2 is undecidable– Reduction from Post Correspondence Problem

• DU(X1,X2,X3) Xi := Xi.c; X1 = X3, X2 = X3 is undecidable– Can simulate 2-counter machines

• NK(X1, . . . ,Xk) Xi := d.Xi.c; c = Xi, c = prefix(Xi), c=suffix(Xi) is decidable– Reduction to emptiness check for multi-tape automaton

• DK(X1, . . . ,Xk) Xi := Xi . a, Xi := a . Xi; X1 = X2, c = Xi, c = prefix(Xi), c = suffix(Xi) is decidable.– Can bound the execution steps if there is no infinite loop

Page 14: Relational String Verification Using Multi-track Automata

Automata-based String Analysis

• Finite State Automata can be used to characterize sets of string values

• We use automata based string analysis– Associate each string expression in the program with an automaton– The automaton accepts an over approximation of all possible

values that the string expression can take during program execution

• Using this automata representation we symbolically execute the program, only paying attention to string manipulation operations

Page 15: Relational String Verification Using Multi-track Automata

String Analysis Stages Convert PHP programs to dependency graphs Use symbolic reachability analysis to compute an over-approximation

of reachable configurations Forward analysis

Assume that the user input can be any string Propagate this information on the dependency graph When a sensitive function is reached, intersect with attack pattern

Result If the intersection is not empty, there might be a vulnerability If the intersection is empty the program is not vulnerable (wrt

attack pattern)

Front Front EndEnd

ReachabilityReachabilityAnalysisAnalysis

Attackpatterns

PHPProgram

VulnerabilityReport

Page 16: Relational String Verification Using Multi-track Automata

Dependency Graphs

Given a PHP program,

first construct the:

Dependency graph

1:<?php2: $www = $ GET[”www”];3: $l_otherinfo = ”URL”;4: echo $l_otherinfo .

”: ” . $www;5:?>

echo, 4

str_concat, 4

$www, 2

$_GET[www], 2“: “, 4$l_otherinfo, 3

“URL”, 3

str_concat, 4

Dependency Graph

Page 17: Relational String Verification Using Multi-track Automata

Symbolic Reachability Analysis

• Using the dependency graph we conduct symbolic reachability analysis

• Automata-based forward fixpoint computation that identifies the possible string values of each node– Each node in the dependency graph is associated with a DFA

• DFA accepts an over-approximation of the strings values that the string expression represented by that node can take at runtime

• The DFAs for the input nodes accept Σ∗– Intersecting the DFA for the sink nodes with the DFA for the attack

pattern identifies the vulnerabilities

Page 18: Relational String Verification Using Multi-track Automata

Forward Analysis

echo, 4

str_concat, 4

$www, 2

$_GET[www], 2

“: “, 4$l_otherinfo, 3

“URL”, 3

str_concat, 4

URL: Σ*

URL: Σ*

Σ*

Forward = Σ*

:

URL

URL

URL:

Attack Pattern = Σ*<Σ*

∩ ≠ Ø L(URL: Σ*) L(Σ*<Σ*) = L(URL: Σ*< Σ*)

Page 19: Relational String Verification Using Multi-track Automata

Relational String Analysis

• Earlier work on string analysis use multiple single-track DFAs during symbolic reachability analysis– One DFA per variable per program location

• Our approach: Use one multi-track DFA per program location– Each track represents the values of one string variable

• Using multi-track DFAs:– Identifies the relations among string variables– Improves the precision of the path-sensitive analysis– Can be used to prove properties that depend on relations among

string variables, e.g., $file = $usr.txt

Page 20: Relational String Verification Using Multi-track Automata

Multi-track Automata

• Let X (the first track), Y (the second track), be two string variables• λ is the padding symbol• A multi-track automaton that encodes the word equation:

X = Y.txt

(a,a), (b,b) …

(λ,t) (λ,x) (λ,t)

Page 21: Relational String Verification Using Multi-track Automata

Alignment

• To conduct relational string analysis, we need to compute ”intersection” of multi-track automata– Intersection is closed under aligned multi-track automata

• In an aligned multi-track automaton λs are right justified in all tracks, e.g., abλλ instead of aλbλ

• However, there exist unaligned multi-track automata that are not equivalent to any aligned multi-track automata – We propose an alignment algorithm that constructs aligned

automata which over or under approximates unaligned ones• Over approximation: Generates an aligned multi-track

automaton that accepts a super set of the language recognized by the unaligned multi-track automaton

• Under approximation: Generates an aligned multi-track automaton that accepts a subset of the language recognized by the unaligned multi-track automaton

Page 22: Relational String Verification Using Multi-track Automata

Symbolic Reachability Analysis

• Transitions and configurations of a string system can be represented using word equations

• Word equations can be represented/approximated using aligned multi-track automata which are closed under intersection, union, complement and projection

• Operations required for reachability analysis (such as equivalence checking) can be computed on DFAs

Page 23: Relational String Verification Using Multi-track Automata

Word Equations

• Word equations: Equality of two expressions that consist of concatenation of a set of variables and constants– Example: X = Y . txt

• Word equations and their combinations (using Boolean connectives) can be expressed using only equations of the form X = Y . c, X = c . Y, c = X . Y, X = Y. Z, Boolean connectives and existential quantification

• Our goal:– Construct multi-track automata from basic word equations

• The automata should accept tuples of strings that satisfy the equation

– Boolean connectives can be handled using intersection, union and complement

– Existential quantification can be handled using projection

Page 24: Relational String Verification Using Multi-track Automata

Word Equations to Automata

• Basic equations X = Y . c, X = c . Y, c = X . Y and their Boolean combinations can be represented precisely using multi-track automata

• The size of the aligned multi-track automaton for X = c . Y is exponential in the length of c

• The nonlinear equation X = Y . Z cannot be represented precisely using an aligned multi-track automaton

Page 25: Relational String Verification Using Multi-track Automata

Word Equations to Automata

• When we cannot represent an equation precisely, we can generate an over or under-approximation of it

– Over-approximation: The automaton accepts all string tuples that satisfy the equation and possibly more

– Under-approximation: The automaton accepts only the string tuples that satify the equation but possibly not all of them

• We implement a function CONSTRUCT(equation, sign)– Which takes a word equation and a sign and creates a multi-track

automata that over or under-approximation of the equation based on the input sign

Page 26: Relational String Verification Using Multi-track Automata

Post condition computation

• During symbolic reachability analysis we compute the post-conditions of statements using the function CONSTRUCT

Given a multi-track automata M and

an assignment statement: X := sexp

Post(M, X := sexp) denotes the post-condition of X := sexp with respect to M

Post(M, X := sexp)

= ( X , M ∩ CONSTRUCT(X’ = sexp, +))[X/X’]

• We implement a symbolic forward reachability computation using the post-condition operations– It is a least fixpoint computation – We use widening to achieve convergence

Page 27: Relational String Verification Using Multi-track Automata

Widening

• String verification problem is undecidable

• The forward fixpoint computation is not guaranteed to converge in the presence of loops and recursion

• We compute a sound approximation– During fixpoint we compute an over approximation of the least

fixpoint that corresponds to the reachable states

• We use an automata based widening operation to over-approximate the fixpoint– Widening operation over-approximates the union operations and

accelerates the convergence of the fixpoint computation

Page 28: Relational String Verification Using Multi-track Automata

Summarization

• We developed techniques for handling function calls using summarization

• We generate a transducer that is the summary of a function– It represents a relation between the arguments of the function and

the value it returns– We generate a multi-track automaton for the function summary– We generate the function summary also using forward fixpoint

computation and widening

• We use the function summaries during reachability analysis to handle function calls

Page 29: Relational String Verification Using Multi-track Automata

Symbolic Automata Representation

• We used the MONA DFA Package for automata manipulation – [Klarlund and Møller, 2001]

• Compact Representation:– The transition relation of the DFA is represented as a multi-terminal

BDD (MBDD)

• Exploits the MBDD structure in the implementation of DFA operations– Union, Intersection, and Emptiness Checking– Projection and Minimization

• Cannot Handle Nondeterminism:– We extended the alphabet with dummy bits to encode

nondeterminism

Page 30: Relational String Verification Using Multi-track Automata

Symbolic Automata RepresentationExplicit DFArepresentation

Symbolic DFArepresentation

Page 31: Relational String Verification Using Multi-track Automata

Stranger: A String Analysis Tool

– Uses Pixy [Jovanovic et al., 2006] as a PHP front end– Uses MONA [Klarlund and Møller, 2001] automata package for

automata manipulation

ParserParser

DependencyDependencyAnalyzerAnalyzer

StringString AnalyzerAnalyzer

MONA Automata MONA Automata PackagePackage

Automata BasedAutomata BasedString ManipulationString Manipulation

LibraryLibrary

CFG

DependencyGraphs

Symbolic String AnalysisSymbolic String Analysis

DFAs

Pixy Front EndPixy Front EndString/Automata

Operations

Stranger Automata

String Analysis Report

(Vulnerability Signatures)

PHP program

Attackpatterns

Stranger is available at:www.cs.ucsb.edu/~vlab/stranger

Page 32: Relational String Verification Using Multi-track Automata

Experiments

• XSS (Cross-Site Scripting) benchmarks (contain vulnerability)• We check whether the input to a sensitive function can contain the string

<script– S1: MyEasyMarket-4.1, trans.php (218)– S2: PBLguestbook-1.32, pblguestbook.php(1210)– S3: Aphpkb-0.71, saa.php(87)– S4: BloggIT 1.0, admin.php(23)

• MFE (Malicious File Execution) benchmarks (do not contain vulnerability):

• We check whether the retrieved files and the external inputs are consistent with the security policy– M1: PBLguestbook-1.32, pblguestbook.php(536)– M2, M3: MyEasyMarket-4.1, prod.php (94, 189)– M4, M5: php-fusion-6.01, db backup.php (111), forums prune.php

(28).

Page 33: Relational String Verification Using Multi-track Automata

Experiments

DFA size(states,BDD)

Time(sec)

Mem(KB)

MDFA size(states,BDD)

Time(sec)

Mem(KB)

S1 17(148) 0.012 444 65(1629) 0.345 1231

S2 42(376) 0.02 626 49(1205) 0.065 4232

S3 27(226) 0.035 838 47(2714) 0.161 2684

S4 79(633) 0.067 1696 79(1900) 0.229 2826

M1 56(801) 0.03 621 50(3551) 0.061 1294

M2 22(495) 0.017 555 21(604) 0.044 996

M3 5(113) 0.01 417 3(276) 0.019 465

M4 1201(25949) 0.251 9495 181(9893) 0.791 19322

M5 211(3195) 0.057 1676 62(2423) 0.103 1756

Page 34: Relational String Verification Using Multi-track Automata

Case Study Schoolmate 1.5.4

Number of PHP files: 63 Lines of code: 8181

Forward Analysis results

After manual inspection we found the following:

Actual Vulnerabilities False Positives

105 48

Time Memory Number of XSS sensitive sinks

Number of XSS Vulnerabilities

22 minutes 281 MB 898 153

Page 35: Relational String Verification Using Multi-track Automata

Case Study – False Positives Why false positives?

– Path insensitivity: 39 Path to vulnerable program point is not feasible

– Un-modeled built in PHP functions : 6– Unfound user written functions: 3

– PHP programs have more than one execution entry point

– We can remove all these false positives by extending our analysis to a path sensitive analysis and modeling more PHP functions

Page 36: Relational String Verification Using Multi-track Automata

Case Study - Sanitization We patched all actual vulnerabilities by adding sanitization routines

We ran stranger the second time– Stranger proved that our patches are correct with respect to the

attack pattern we are using

Page 37: Relational String Verification Using Multi-track Automata

Related Work: String Analysis

• String analysis based on context free grammars: [Christensen et al., SAS’03] [Minamide, WWW’05]

• String analysis based on symbolic/concolic execution: [Bjorner et al., TACAS’09]

• Bounded string analysis : [Kiezun et al., ISSTA’09]• Automata based string analysis: [Xiang et al., COMPSAC’07]

[Shannon et al., MUTATION’07]• Application of string analysis to web applications: [Wassermann and

Su, PLDI’07, ICSE’08] [Halfond and Orso, ASE’05, ICSE’06]

Page 38: Relational String Verification Using Multi-track Automata

Related Work

• Size Analysis– Size analysis: [Hughes et al., POPL’96] [Chin et al., ICSE’05] [Yu et

al., FSE’07] [Yang et al., CAV’08]– Composite analysis: [Bultan et al., TOSEM’00] [Xu et al., ISSTA’08]

[Gulwani et al., POPL’08] [Halbwachs et al., PLDI’08]

• Vulnerability Signature Generation– Test input/Attack generation: [Wassermann et al., ISSTA’08]

[Kiezun et al., ICSE’09]– Vulnerability signature generation: [Brumley et al., S&P’06]

[Brumley et al., CSF’07] [Costa et al., SOSP’07]

Page 39: Relational String Verification Using Multi-track Automata

Our Other String Analysis Publications

• Yu et al. Stranger: An Automata-based String Analysis Tool for PHP [TACAS’10]

• Yu et al. Generating Vulnerability Signatures for String Manipulating Programs Using Automata-based Forward and Backward Symbolic Analyses [ASE’09]

• Yu et al. Symbolic String Verification: Combining String Analysis and Size Analysis [TACAS’09]

• Yu et al. Symbolic String Verification: An Automata-based Approach [SPIN’08]

Page 40: Relational String Verification Using Multi-track Automata

Current and Future Work

• Vulnerability signature generation– A characterization of all the inputs that might exploit a vulnerability

• Automated sanitization generation– Automatically fixing a vulnerability by modifying the input in a

minimal way

• Client side string analysis – Javascript

Page 41: Relational String Verification Using Multi-track Automata

THE END