Creating Formally Verified Components for Layered ... · (2012) – Intention to produce a verified LLVM compiler • Magnus Myreen’s (Cambridge) “decompilation into logic”

© 2013 Rockwell Collins All rights reserved.

Creating Formally Verified Components for Layered Assurance with an LLVM-to-ACL2 Translator Jennifer Davis, David Hardin, Jedidiah McClurg

December 2013


Introduction

• Research objectives:

– Reduce need to trust compiler optimizations by reasoning about post-optimization intermediate representations

– Reason about small fragments of code written in assembly language

• We wish to create a library of formally verified software component models for layered assurance.

• In our current work, the components are in LLVM intermediate form.

• We wish to translate them to a theorem proving language, such as ACL2.

• We built an LLVM-to-ACL2 translator

2


Motivating Work

• Jianzhou Zhao (U Penn) et al. produced several different formalizations of operational semantics for LLVM in Coq. (2012)

– Intention to produce a verified LLVM compiler

• Magnus Myreen’s (Cambridge) “decompilation into logic” work (2009)

– Imperative machine code (PPC, x86, ARM) -> HOL4

– Extracts functional behavior of imperative code

– Assures decompilation process is sound

• Andrew Appel (Princeton) observed that “SSA is functional programming” (1998)

– This inspired us to build a translator from LLVM to ACL2

3


LLVM

• LLVM is the intermediate form for many common compilers, including clang.

• LLVM code generation targets exist for a wide variety of machines

• LLVM is a register-based intermediate in Static Single Assignment (SSA) form (each variable is assigned exactly once).

• An LLVM program consists of a list of entities.

4

– There are eight types including:

• function declarations

• function definitions

• Our software component models are created from code that has been compiled into the LLVM intermediate form


• A Computational Logic for Applicative Common Lisp (ACL2)

• Highly automated theorem proving system

• Functional language with admission criteria

• Executable subset of language

• Rich set of legal identifiers (@foo, %.06, @bar_%.0)

• Side-effect free subset of Lisp, so it inherits Lisp peculiarities

– Function definition: (defun funname (parm1 parm2 parm3) (<body>))

– Function invocation: (funname x y z)

– let binds variables to values within a function body

– Multiway conditionals use the cond form

– Lists are the fundamental data structure

– ACL2 supports integers and rationals

– Lisp predicate names are traditionally given a suffix of “p”

ACL2

5


LLVM-to-ACL2 Translation Toolchain

6

theorem prover


Example—C Source

long sumarr(unsigned int n, long sum, long *array) {

unsigned int j = 0;

for (j = 0; j < n; j++) {

sum += array[j];

}

return sum;

}

• We can produce LLVM from C source as follows:

clang –O4 –S –emit-llvm sumarr.c

7


Example—LLVM

define i64 @sumarr(i32 %n, i64 %sum, i64* nocapture %array) nounwind

uwtable readonly {

%1 = icmp eq i32 %n, 0

br i1 %1, label %._crit_edge, label %.lr.ph

.lr.ph: ; preds = %.lr.ph, %0

%indvars.iv = phi i64 [ %indvars.iv.next, %.lr.ph ], [ 0, %0 ]

%.06 = phi i64 [ %4, %.lr.ph ], [ %sum, %0 ]

%2 = getelementptr inbounds i64* %array, i64 %indvars.iv

%3 = load i64* %2, align 8, !tbaa !0

%4 = add nsw i64 %3, %.06

%indvars.iv.next = add i64 %indvars.iv, 1

%lftr.wideiv = trunc i64 %indvars.iv.next to i32

%exitcond = icmp eq i32 %lftr.wideiv, %n

br i1 %exitcond, label %._crit_edge, label %.lr.ph

._crit_edge: ; preds = %.lr.ph, %0

%.0.lcssa = phi i64 [ %sum, %0 ], [ %4, %.lr.ph ]

ret i64 %.0.lcssa

}

8

C Source:

long sumarr(unsigned int n, long sum, long *array) {

unsigned int j = 0;

for (j = 0; j < n; j++) {

sum += array[j];}

return sum;}

j

sum


Translation Snippet

• Each block within an LLVM function contains a list of instructions in SSA form with type information.

• Hence we can readily convert a list of instructions into an appropriate let construct.


%3 = load i64* %2, align 8, !tbaa !0

%4 = add nsw i64 %3, %.06

(let ((%2 (getelementptr %array %indvars.iv 8)))

(let ((%3 (load-i64l %2 st)))

(let ((%4 (ifix (+ %3 %.06))))

...)))

9


Main Translator Algorithm

10

Translator

Get Function Names

Remove Aliases

Promote Blocks to Functions

Translate to ACL2

LLVM AST ACL2 Code


Remove Aliases

• Aliases allow new names (e.g., @foo1 and @foo2) to be used

for globals and functions

@bar = global i32 123

@foo1 = alias i32* @bar

@foo2 = alias i32* @bar

• We eliminate these

11



12

Translator

Get Function Names

Remove Aliases


Translate to ACL2

LLVM AST ACL2 Code


• As we have seen, LLVM functions often contain inner blocks and branch instructions

• Each of these blocks is pulled out as a new function.

• For each block, the phi instructions denote variables that

become parameters for that new function.

%.06 = phi i64 [ %4, %.lr.ph ], [ %sum, %0 ]

• The phi instructions also tell us the parameter values that must

be used at the new function’s call site(s)


13


Dealing with Order of Declarations

• ACL2 requires functions and constants to be defined before they are used

• We do a topological sort on each of the call/dependency graphs

14



15

Translator

Get Function Names

Remove Aliases


Translate to ACL2

LLVM AST ACL2 Code


• Function declaration ACL2 function stub

• Function definition with instruction list ACL2 defun construct with a nested let-bound expression

• Memory ACL2 single-threaded object (stobj) for efficient

execution

• Floating-point number corresponding rational number

1.234

(+ 1 (/2 10) (/3 100) (/4 1000))

Translate to ACL2

16


Example—ACL2 .lr.ph: ; preds = %.lr.ph, %0

%indvars.iv = phi i64 [ %indvars.iv.next, %.lr.ph ], [ 0, %0 ]

%.06 = phi i64 [ %4, %.lr.ph ], [ %sum, %0 ]


%3 = load i64* %2, align 8, !tbaa !0

%4 = add nsw i64 %3, %.06

%indvars.iv.next = add i64 %indvars.iv, 1

%lftr.wideiv = trunc i64 %indvars.iv.next to i32

%exitcond = icmp eq i32 %lftr.wideiv, %n

br i1 %exitcond, label %._crit_edge, label %.lr.ph

(defun @sumarr_%.lr.ph (%.06 %indvars.iv %n %array st)

(declare (xargs :stobjs st

:guard (and (integerp %.06) (natp %indvars.iv) (natp %n)

(natp %array))))


(let ((%3 (load-i64l %2 st)))

(let ((%4 (ifix (+ %3 %.06))))

(let ((%indvars.iv.next (nfix (+ %indvars.iv 1))))

(let ((%exitcond (if (= %indvars.iv.next %n) 1 0)))

(if (= %exitcond 1) %4

(@sumarr_%.lr.ph %4 %indvars.iv.next %n %array st))))))))

17


(defun @sumarr_%.lr.ph (%.06 %indvars.iv %n %array st)

(declare (xargs :measure (nfix (- (nfix %n) (nfix %indvars.iv)))

:stobjs st

:guard (and (integerp %.06) (natp %indvars.iv) (natp %n)

(natp %array) (< %indvars.iv %n))))

(if (not (and (mbt (integerp %.06)) (mbt (natp %indvars.iv)) (mbt (natp %n))

(mbt (natp %array)) (mbt (< %indvars.iv %n)))) %.06


(let ((%3 (load-i64l %2 st)))

(let ((%4 (ifix (+ %3 %.06))))

(let ((%indvars.iv.next (nfix (+ %indvars.iv 1))))

(let ((%exitcond (if (= %indvars.iv.next %n) 1 0)))

(if (= %exitcond 1) %4

(@sumarr_%.lr.ph %4 %indvars.iv.next %n %array st)))))))))

Example—ACL2

18


Tail Recursion

• Note that the translated function for the LLVM loop becomes a tail-recursive function (uses an accumulator) in ACL2.

• Tail-recursive functions are nice for execution, since an arbitrary number of recursive tail calls can be made without exhausting the stack.

• However, tail-recursive functions are not convenient for reasoning because they pollute the induction scheme.

• We can generate non-tail-recursive functions operating over simple lists from tail-recursive, stobj-based functions. This technique is called Hardin’s Bridge*.

19

*in memory of Scott Hardin, a civil engineer who designed several physical bridges,

and a man who valued rigor. He was the father of one of the authors.


20

Hardin's Bridge

for(k=0; k< *SZ*; k++) {

res = op(d[k], res);

}

Form: Imperative and

operating over an array

defiteration

(x-tail k res st)

(x-iter j res st)

Form: Tail recursive with

mutable state

Form: Non-tail-recursive

with mutable state

(defun x (res d)

…

(if (endp d) res

(op (car d)

(x res (cdr d)))) Form: Non-tail-recursive

and operating over a list

1 2

1 2

3

3

defiteration


Applying Hardin’s Bridge

We use the bridge technique to prove that @sumarr_%.lr.ph is

equal to the following non-tail-recursive function:

(defun sumlist64 (res lst)

(declare (xargs :measure (len lst)))

(cond ((not (true-listp lst)) (ifix res))

((endp lst) (ifix res))

(t (+ (ifix (load-i64ll (take 8 lst)))

(sumlist64 res (nthcdr 8 lst))))))

Proving properties about @sumarr_%.lr.ph can then be accomplished by proving them instead about sumlist64, a non-

tail-recursive function better suited for theorem proving

21


Current State of the Translator

• We use the def::ung macro to automatically define the

domain of recursive functions.

– This allows recursive functions to be admitted in ACL2 without manually adding measures.

• We added support for modular arithmetic (e.g., fixed width addition).

• We have rerun the sumarr example with the current translator.

– No editing of the translated code was needed.

– Recursive function admitted automatically via def::ung

– Fixed width addition is preserved in the non-tail-recursive spec.

22


Limitations of the Translator

• Exceptions

• Indirect call instructions

23


Future Work

• Stack analysis and data structure analysis

• LLVM DataLayout directives (global endianness, alignment/padding specification)

• LLVM intrinsic functions (there are a large number of these).

• Variable-length argument lists

• Attempt to eliminate cycles in the call graph by code rewrites when possible (rather than just blindly emitting mutual-recursion).

24


Conclusion

• Built an LLVM-to-ACL2 translator

– Produced an executable ACL2 specification

• Tail recursion

• Efficient execution with in-place updates via ACL2’s stobj mechanism

– Demonstrated that the translation produces working ACL2 code for a recursive example program

• Presented technique for reasoning about tail-recursive ACL2 functions that execute in-place

– Utilizes formally proven Hardin’s bridge to non-tail-recursive versions operating on lists

• Tested examples with global variables, pointers, and string constants.

25

Documents

Creating Formally Verified Components for Layered ... · (2012) – Intention to produce a verified LLVM compiler • Magnus Myreen’s (Cambridge) “decompilation into logic”