Weighted Pushdown Systems and their Application to Interprocedural Dataflow Analysis Thomas Reps...

Preview:

Citation preview

Weighted Pushdown Systemsand their Application to

Interprocedural Dataflow Analysis

Thomas RepsUniversity of Wisconsin

GrammaTech, Inc.

Joint work with S. Schwoon, S. Jha, and D. Melski

Weighted Pushdown Systemsand their Application to

Interprocedural Dataflow AnalysisInterprocedural Dataflow AnalysisApplication

Weighted Pushdown Systems

Dataflow Analysis

Pushdown Systems

Intraprocedural Analysis

enter n

V0

MOP(n) = pfp(V0) pPathsTo[n]

pfp = fk fk-1 … f2 f1

f1 f2 fk-1 fk

x = 3

p(x,y)

return from p

printf(y)

start main

exit main

if . . .

b = a

printf(b)

(

)

p(a,b)

return from p

exit p

start p(a,b)

]

(

Context-Sensitive Interprocedural Analysis

start n

V0 ret

( )

f1 f2 fk-1 fk

f3

f4

f5

fk-2

fk-3

callq

enterq exitq

MOVP(n) = pfp(V0) pMatchedPathsTo[n]

void p() { if (...) { x = x + 1; p(); // p_calls_p1 x = x - 1; } if (...) { x = x - 1; p(); // p_calls_p2 x = x + 1; } return;}

An Expanded Set of Queries

int x;

void main() { x = 5; p(); //main_calls_p return;}

5

<x, enterp p_calls_p2 p_calls_p1 main_calls_p>

An Expanded Set of Queries

main_calls_p

enterp

p_calls_p1

p_calls_p2

x = 5

x = 5

x = x + 1

x = x - 1

x = 5

x = x + 1

x = x - 1p_calls_p1

main_calls_p

enterp

p_calls_p1

p_calls_p2

x = 5

+1

-1p_calls_p1

+1-1

p_calls_p1+1

-1

p_cal

ls_p

2+1-1

An Expanded Set of Queries

int x;

void main() { x = 5; p(); //main_calls_p return;}

void p() { if (...) { x = x + 1; p(); // p_calls_p1 x = x - 1; } if (...) { x = x - 1; p(); // p_calls_p2 x = x + 1; } return;}

5

<x, enterp (p_calls_p2 p_calls_p1)* main_calls_p>

An Expanded Set of Queries

int x;

void main() { x = 5; p(); //main_calls_p return;}

void p() { if (...) { x = x + 1; p(); // p_calls_p1 x = x - 1; } if (...) { x = x - 1; p(); // p_calls_p2 x = x + 1; } return;}

5 4 =

<x, enterp (p_calls_p2 + p_calls_p1)* main_calls_p>

5 4

An Expanded Set of Queries

int x;

void main() { x = 5; p(); //main_calls_p return;}

void p() { if (...) { x = x + 1; p(); // p_calls_p1 x = x - 1; } if (...) { x = x - 1; p(); // p_calls_p2 x = x + 1; } return;}

5 4 =

<x, enterp Σ*>

5 4

MOVP = any stack configuration

An Expanded Set of Queries

L1 = <x, enterp p_calls_p2 p_calls_p1 main_calls_p>

L2 = <x, enterp (p_calls_p2 p_calls_p1)* main_calls_p>

L3 = <x, enterp (p_calls_p2 + p_calls_p1)* main_calls_p>

L4 = <x, enterp Σ*>

MOVP’(L) = pfp(V0) c L, p MatchedPathsTo[c]

MOVP(n) = pfp(V0) pMatchedPathsTo[n]

MOVP’(L3) = MOVP’(L4) = MOVP(enterp)

So What? Who Cares? [Yawn]

• Virtual inline expansion– Value for x in configurations with an even # of calls to p: MOVP’(<x, n (p_calls_p p_calls_p)* main_calls_p>)– Value for x in configurations with an odd # of calls to p : MOVP’(<x,n p_calls_p (p_calls_p p_calls_p)*

main_calls_p>)

• Stack-constrained queries– at breakpoint at n, fetch stack from debugger (say S)– stack-constrained slicing:

“What are the program elements that could have affected the values used at n, given that we have reached n with stack S?”

So What? Who Cares? [Yawn]

• Software model checking: Check properties by model checking a CFG encoded as a PDS– SLAM [Ball & Rajamani]

– MOPS [Chen & Wagner]

– “Meta-Compliation” [Engler et al.]

• PDS WPDS– GrammaTech software model checker implemented

on top of the WPDS++ library

So What? Who Cares? [Yawn]

• Convenient framework for implementing interprocedural dataflow analyses– Create weighted PDS from interprocedural CFG

– Either exhaustive or demand-driven dataflow analysis

– Used to solve subproblem needed for recovering the organization of stack-frames in x86 executables [Balakrishnan & Reps CC 04]

d eq:

Unrolled Program = Transition System

b g

a

c h

j

f i

p:

d eq:

a

b d

f

c e

p: a

b d

f

c e

p: a

b d

f

c e

p: a

b d

f

c e

p: a

b d

f

c e

p: a

b d

f

c e

p: a

b d

f

c e

p:

Unrolled Program = ∞Transition System

׃ ׃ ׃ ׃׃ ׃ ׃ ׃

Pushdown System (PDS)

States: { σ1, σ2, σ3, σ4 }

Stack symbols: { A, B, C, D }

Transition rules: <σ1, A> <σ2, > <σ1, A> <σ2, B>

<σ1, A> <σ2, B C>

Pushdown System (PDS)

States: { σ1, σ2, σ3, σ4 }

Stack symbols: { A, B, C, D }

Transition rules: <σ1, A> <σ2, > <σ1, A> <σ2, B>

<σ1, A> <σ2, B C>

If the state is σ1 and thetop of the stack is A, then pop A transition to state σ2

Pushdown System (PDS)

States: { σ1, σ2, σ3, σ4 }

Stack symbols: { A, B, C, D }

Transition rules: <σ1, A> <σ2, > <σ1, A> <σ2, B>

<σ1, A> <σ2, B C>

If the state is σ1 and thetop of the stack is A, then pop A transition to state σ2

push B

Pushdown System (PDS)

States: { σ1, σ2, σ3, σ4 }

Stack symbols: { A, B, C, D }

Transition rules: <σ1, A> <σ2, > <σ1, A> <σ2, B>

<σ1, A> <σ2, B C>

If the state is σ1 and thetop of the stack is A, then pop A transition to state σ2

push C; then push B

Rules Define a Transition Relation

<σ,A> <σ’,ε>

<σ,A> <σ’,B>

<σ,A> <σ’,B C>

σ A

σ’

B

σ’σ A

B

C

σ’σ A

Pushdown System (PDS)

• PDS = Pushdown automaton without an input tape

• Mechanism for defining a class of infinite-state transition systems

<σ, A> <σ, A A>

<σ,A>

<σ,AA>

<σ,AAA>

׃

<σ,AAAA>

Supergraph as a PDS

d e

b g

a

c h

j

f i

p:

q:<σ, a> <σ, b>

Supergraph as a PDS

d e

b g

a

c h

j

f i

p:

q:<σ, b> <σ, c>

Supergraph as a PDS

d e

b g

a

c h

j

f i

p:

q:<σ, c> <σ, d f>

save return siteon stack

Supergraph as a PDS

d e

b g

a

c h

j

f i

p:

q:<σ, d> <σ, e>

Supergraph as a PDS

d e

b g

a

c h

j

f i

p:

q:<σ, e> <σ, ε>

uncovers mostrecent call site

Supergraph as a PDS

d e

b g

a

c h

j

f i

p:

q:<σ, f> <σ, g>

Supergraph as a PDS

d e

b g

a

c h

j

f i

p:

q:<σ, g> <σ, h>

Supergraph as a PDS

d e

b g

a

c h

j

f i

p:

q:<σ, h> <σ, d i>

save return siteon stack

a

b d

f

c e

p:

a

b d

f

c e

p:

a

b d

f

c e

p:

a

b d

f

c e

p:a

b d

f

c e

p:a

b d

f

c e

p:a

b d

f

c e

p:

Unrolled Program = ∞ Transition System

׃ ׃ ׃ ׃׃ ׃ ׃ ׃

<σ, f e c>

<σ, b c c>

PDS Terminology

Configuration <σ, f e c>

c c’ (transition relation) c’ follows from c by a transition rule c predecessor of c’ c’ successor of cc0 c1 . . . cn (a run)

c * c’ reflexive transitive closure of

σ,fec

a

b d

f

c e

p:

a

b d

f

c e

p:

a

b d

f

c e

p:

a

b d

f

c e

p:a

b d

f

c e

p:a

b d

f

c e

p:a

b d

f

c e

p:

A Run

׃ ׃ ׃ ׃׃ ׃ ׃ ׃

<σ,a> <σ,b> <σ,ac> <σ,bc> <σ,acc> <σ,fcc> <σ,cc> <σ,dc> <σ,aec> <σ,fec>

a

b d

f

c e

p:

a

b d

f

c e

p:

a

b d

f

c e

p:

a

b d

f

c e

p:a

b d

f

c e

p:a

b d

f

c e

p:a

b d

f

c e

p:

A Run

׃ ׃ ׃ ׃׃ ׃ ׃ ׃

a

b d

f

c e

p:

a

b d

f

c e

p:

a

b d

f

c e

p:

a

b d

f

c e

p:a

b d

f

c e

p:a

b d

f

c e

p:a

b d

f

c e

p:

A Run

׃ ׃ ׃ ׃׃ ׃ ׃ ׃

Representing Distributive Functions[POPL 95]

Identity Function

Constant Function

a b c

a b c

f({a,b}) = {a,b}

f = λV.V

f({a,b}) = {b}

f = λV.{b}

“Gen/Kill” Function

Non-“Gen/Kill” Function a b c

a b c

Representing Distributive Functions[POPL 95]

f({a,b}) = {a,c}

f({a,b}) = {a,b}

f = λV.(V {b}) {c}

f = λV. if aV then V {b} else V {b}

x = 3

p(x,y)

return from p

printf(y)

start main

exit main

start p(a,b)

if . . .

b = a

p(a,b)

return from p

printf(b)

exit p

x y a b

, [start] , [x = 3] , [start] x, [x = 3], [start] y, [x = 3]

, [start] , [x = 3] , [start] x, [x = 3], [start] y, [x = 3], [x = 3] , [p(x,y)] y, [x = 3] y, [p(x,y)], [x = 3] , [p(x,y)] y, [x = 3] y, [p(x,y)]

׃ ׃ ׃ ׃׃ ׃ ׃ ׃

M

pre*(M)

׃ ׃ ׃ ׃׃ ׃ ׃ ׃

M

post*(M)

• The set of configurations pre*(S) can be infinite

• Example– <σ,A> <σ>– pre* ( {<σ,A>}) = { σ Ai | i ≥ 1 }

• Solution in the PDS literature: Represent a set of configurations with an automaton

Representation Issue

<σ,A>

<σ,AA>

<σ, >

<σ,AAA>

...

From M to Pre*(M)

<σ,A> <σ1,A1 . . . Am>

σ

A

A1 . . . Amσ1

σ A

σ1

A1

Am

...

Observation

• For IFDS problems (Reps, Horwitz, & Sagiv [POPL 95]), PDS literature provides solution to MOVP’ problem– Bouajjani, Esparza, & Maler [Concur 97]– Esparza et al. [CAV 00]

• But . . . some problems are not IFDS– linear constants [Sagiv, Reps, & Horwitz 96]

– affine relations [Müller-Olm & Seidl 03]

Interprocedural Dataflow Analysis

Application

Weighted Pushdown Systems

Dataflow Analysis

Pushdown Systems

Weighted Pushdown System (WPDS)

States: { σ1, σ2, σ3, σ4 }

Stack symbols: { A, B, C, D }

Transition rules: <σ1, A> <σ2, > <σ1, A> <σ2, B>

<σ1, A> <σ2, B C>

w1

w2

w3

Idempotent Semiring (D, , , 0, 1)

[= Meet Semilattice (D, , ..., , ...)]

a 0 = aa b = b aa (b c) = (a b) ca a = aa 1 = aa (b c) = (a b) ca (b c) = (a b) (a c)(a b) c = (a c) (b c)a 0 = 0 a = 0

a b iff a b = a = = R

׃ ׃ ׃ ׃׃ ׃ ׃ ׃

Mroot

post*(Mroot)

׃ ׃ ׃ ׃׃ ׃ ׃ ׃

n

post*(Mroot)

n

W

W

׃ ׃ ׃ ׃׃ ׃ ׃ ׃

n

post*(Mroot)

n

0

From M to Pre*(M)

σ

<σ,A> <σ1,A1 . . . Am>w

V (w X)A

A1 . . . Amσ1

X

wσ1

A1

Am

......

..

.X

σk

V

σ A

σk w X

void p() { if (...) { x = x + 1; p(); // p_calls_p1 x = x - 1; } if (...) { x = x - 1; p(); // p_calls_p2 x = x + 1; } return;}

An Expanded Set of Queries

int x;

void main() { x = 5; p(); //main_calls_p return;}

Demo

An Application• Analysis of x86 code

– no use of debugging information– Subgoal: discover affine relations on registers

int main(){int i,j, a[10];j=0;for(i=0;i<10;++i){

a[i]=i;}return a[2];

}

Difficulties with Object Code

; ebx corresponds to variable isub esp, 44 mov [esp+40],0 ; j = 0 xor ebx, ebx ; i = 0 lea ecx, [esp] loc_9:

mov [ecx], ebx ; a[i]=iinc ebx ; i++add ecx, 4 cmp ebx, 10 ; i<10?jl short loc_9 ;

mov eax, [esp+8] ;return a[2]

add esp, 44

retn

Identifying the layout of stack-frames

?2

1

1

2

Rest of stack0ffffh

int j (4 bytes)

int a[10]

(40 bytes)………

0h

ebp

esp + 40

; ebx corresponds to variable isub esp, 44 mov [esp+40],0 ; j = 0 xor ebx, ebx ; i = 0 lea ecx, [esp] loc_9:

mov [ecx], ebx ; a[i]=iinc ebx ; i++add ecx, 4 cmp ebx, 10 ; i<10?jl short loc_9 ;

mov eax, [esp+8] ;return a[2]

add esp, 44

retn

esp

Problems with Widening

esp + 4

esp + 36esp + 8

ecx = esp + 4*ebx + 4

Rest of stack0ffffh

int j (4 bytes)

int a[10]

(40 bytes)………

0h

ebp

esp + 40

; ebx corresponds to variable isub esp, 44 mov [esp+40],0 ; j = 0 xor ebx, ebx ; i = 0 lea ecx, [esp] loc_9:

mov [ecx], ebx ; a[i]=iinc ebx ; i++add ecx, 4 cmp ebx, 10 ; i<10?jl short loc_9 ;

mov eax, [esp+8] ;return a[2]

add esp, 44

retn

Problems with Widening

ecx [esp, esp + 36] ?

Widening: ecx [esp, ]

Limited widening: ecx [esp, esp + 36]

esp

Affine-Relation Analysis

• Affine-relation– r1, r2, …, rn: variables, a0,a1, …, an: integer

constants– a0 +i=1..n(airi) = 0

• Interprocedural affine-relation analysis [Müller-Olm & Seidl 04]– Flow-sensitive and context-sensitive

problem– Algorithm: Solve a constraint system

• Our application:– r1, r2, …, r8: x86 registers– Constraint system WPDS

Performance

• Running time: linear in program size• Constant of proportionality: k8

– Only 8 registers operations on 9 x 9matrices

Program nInsts nProcs Time (s)

print 75539 697 .77

finger 96123 893 7.13

winhlp32 157634 6491 17.32

regsvr32 225857 9625 37.15

cmd 230481 2317 52.38

notepad 239408 2911 41.85

Contributions

• Algorithms for the “generalized pushdown reachability problem”:

MOVP’(L) = pfp(V0) c L, p MatchedPathsTo[c]

– Running time: O(|Q|2 x |PDS| x H)– [Sound solutions for non-distributive dataflow

problems]– Differential propagation algorithms, too

• Construction of witness trees (optional)

• Program analysis– T. Reps, S. Schwoon, and S. Jha, Weighted

pushdown systems and their application to interprocedural dataflow analysis, SAS 03

• Authorization problems– S. Jha and T. Reps, Analysis of SPKI/SDSI

certificates using model checking, CSFW 02– S. Schwoon, S. Jha, T. Reps, and S. Stubblebine,

On generalized authorization problems, CSFW 03– S. Jha and T. Reps, Model checking SPKI/SDSI. To

appear in J. Comp. Security

Second Topic

Authorization Problems• Traditionally, authorization restrictions are

specified using access control lists (ACLs)– Associate permissions with objects– E.g., AFS permissions for directory D:

reps rlidwkajha rlidwkreps:students rl

• SPKI/SDSI– Local name spaces

reps studentreps student spouse

– Delegation

SPKI/SDSI

Principals (Public Keys) KBob, KAlice Individuals KCS CS Department KOwner[R] Owner of resource R

Local Names KCS faculty KBob myStudents

Extended Names KBob myStudents Spouse

Name Certs

Bob is a CS faculty member KCS faculty KBob

Alice is a student of Bob’s KBob myStudents KAlice

Alice’s friends . . . KAlice myFriends KJoe

KAlice myFriends KMary enemies KAlice myFriends KMary enemies spouse

Auth Certs

A CS faculty member can use host H KOwner[H] KCS faculty

Bob allows access to his students KBob KBob myStudents

Can delegate

Cannot delegate

Alice allows access to her friends KAlice KAlice myFriends

Certificate ChainKOwner[H]

KBob

KCS faculty

KAlice

KBob myStudents

KOwner[H] KCS faculty

KCS faculty KBob

KBob KBob myStudents

KBob myStudents KAlice

KAlice KAlice myFriends

Does not apply!

A Certificate Chain is a Run

<KBob, >

<KOwner[H],>

<KCS, faculty >

<KAlice, >

<KBob, myStudents >

<KOwner[H], > <KCS,faculty >

<KBob, > <KBob, myStudents >

<KBob, myStudents> <KAlice, >

<KCS, faculty> <KBob, >

Pre*(S)

Basic Authorization Query:<KOwner[H],> Pre*({<KAlice,□>,

<KAlice,■>})?

S = {<KAlice, >, <KAlice, >}<KOwner[H],>

{<KAlice, >,<KAlice, >}

KCSKOwner[H] KAliceKBob

{ , }

What Does the Automaton Represent?

• A set of configurations:<K, a1 … am > is in the set if there is a path

• Initial automaton represents {<KAlice, >,<KAlice, >}

KK . . .a1 a2

am

{ , }

KCSKOwner[H] KAlice

KBob

From M to Pre*(M)

<σ,A> <σ1,A1 . . . Am>

A1 . . . Amσ1

σ A

σ1

A1

Am

...

σ

A

Pre*({<KAlice, >, <KAlice, >})

KCSKOwner[H] KAliceKBob

{ , }

faculty myStudents

<KCS,faculty > <KBob, >

<KBob, myStudents > <KAlice, >

Pre*({<KAlice, >, <KAlice, >})

KCSKOwner[H] KAliceKBob

{ , }

faculty myStudents

<KBob, > <KBob, myStudents ■>

<KOwner[H], > <KCS, faculty >

Pre*({<KAlice, >, <KAlice, >})

KCSKOwner[H] KAliceKBob

{ , }

faculty myStudents

<KOwner[H], > Pre*({<KAlice, □>, <KAlice, ■>})

Other Certificate-Set-Analysis Problems

• Shared access?– Given two resources R1 and R2, what principals can

access both R1 and R2?

• Expiration vulnerability?– What resources will principal K be prevented from

accessing if certificate set C ’ expires?

• Many more . . .

• Main message– Several certificate-set-analysis problems can be

solved by model checking a PDS

Weighted Pushdown System (WPDS)

States: { σ1, σ2, σ3, σ4 }

Stack symbols: { A, B, C, D }

Transition rules: <σ1, A> <σ2, > <σ1, A> <σ2, B>

<σ1, A> <σ2, B C>

w1

w2

w3

<KInsurer, □> <KH, patient ■> <KH, patient> <KAIDS, patient> <KH, patient> <KIM, patient> <KAIDS, patient> <KAlice, > <KIM, patient> <KAlice, >

Privacy using a Weighted PDS

S

I

ISISI

Privacy using a Weighted PDS

<KInsurer, □>

<KH, patient ■>

I

<KIM, patient ■>

I

<KAlice, ■>

I

<KH, patient ■>

I

<KAIDS, patient ■>

S

SS I = I

S I

I S S = S I I I = I

What to Take Away . . .

• Observation: Can perform interprocedural dataflow analysis using WPDSs– supports a broader set of dataflow-analysis

queries than past work (30 years worth . . .)

• WPDSs have another application– Certificate-set-analysis problems

• Libraries for WPDSs– WPDS Library: C [Schwoon, Reps, & Jha]– WPDS++ (soon): C++ [Kidd & Reps]

Related Work• Pushdown systems

– Bouajjani, Esparza, & Maler [Concur 97]– Esparza et al. [CAV 00]– Bouajjani, Esparza, & Touili [POPL 03]

• Dataflow analysis– Sharir & Pnueli 81– IDE framework: Sagiv, Reps, & Horwitz [TCS 96]

• Weighted-hypergraph problems– Knuth [IPL 77]– Grammar flow analysis: Möncke & Wilhelm [WAGA

91]– Ramalingam thesis [LNCS #1089]– Ramalingam & Reps [J. Alg 96]

Recommended