Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
© 2013 Rockwell Collins All rights reserved.
Creating Formally Verified Components for Layered Assurance with an LLVM-to-ACL2 Translator Jennifer Davis, David Hardin, Jedidiah McClurg
December 2013
© 2013 Rockwell Collins All rights reserved.
Introduction
• Research objectives:
– Reduce need to trust compiler optimizations by reasoning about post-optimization intermediate representations
– Reason about small fragments of code written in assembly language
• We wish to create a library of formally verified software component models for layered assurance.
• In our current work, the components are in LLVM intermediate form.
• We wish to translate them to a theorem proving language, such as ACL2.
• We built an LLVM-to-ACL2 translator
2
© 2013 Rockwell Collins All rights reserved.
Motivating Work
• Jianzhou Zhao (U Penn) et al. produced several different formalizations of operational semantics for LLVM in Coq. (2012)
– Intention to produce a verified LLVM compiler
• Magnus Myreen’s (Cambridge) “decompilation into logic” work (2009)
– Imperative machine code (PPC, x86, ARM) -> HOL4
– Extracts functional behavior of imperative code
– Assures decompilation process is sound
• Andrew Appel (Princeton) observed that “SSA is functional programming” (1998)
– This inspired us to build a translator from LLVM to ACL2
3
© 2013 Rockwell Collins All rights reserved.
LLVM
• LLVM is the intermediate form for many common compilers, including clang.
• LLVM code generation targets exist for a wide variety of machines
• LLVM is a register-based intermediate in Static Single Assignment (SSA) form (each variable is assigned exactly once).
• An LLVM program consists of a list of entities.
4
– There are eight types including:
• function declarations
• function definitions
• Our software component models are created from code that has been compiled into the LLVM intermediate form
© 2013 Rockwell Collins All rights reserved.
• A Computational Logic for Applicative Common Lisp (ACL2)
• Highly automated theorem proving system
• Functional language with admission criteria
• Executable subset of language
• Rich set of legal identifiers (@foo, %.06, @bar_%.0)
• Side-effect free subset of Lisp, so it inherits Lisp peculiarities
– Function definition: (defun funname (parm1 parm2 parm3) (<body>))
– Function invocation: (funname x y z)
– let binds variables to values within a function body
– Multiway conditionals use the cond form
– Lists are the fundamental data structure
– ACL2 supports integers and rationals
– Lisp predicate names are traditionally given a suffix of “p”
ACL2
5
© 2013 Rockwell Collins All rights reserved.
LLVM-to-ACL2 Translation Toolchain
6
theorem prover
© 2013 Rockwell Collins All rights reserved.
Example—C Source
long sumarr(unsigned int n, long sum, long *array) {
unsigned int j = 0;
for (j = 0; j < n; j++) {
sum += array[j];
}
return sum;
}
• We can produce LLVM from C source as follows:
clang –O4 –S –emit-llvm sumarr.c
7
© 2013 Rockwell Collins All rights reserved.
Example—LLVM
define i64 @sumarr(i32 %n, i64 %sum, i64* nocapture %array) nounwind
uwtable readonly {
%1 = icmp eq i32 %n, 0
br i1 %1, label %._crit_edge, label %.lr.ph
.lr.ph: ; preds = %.lr.ph, %0
%indvars.iv = phi i64 [ %indvars.iv.next, %.lr.ph ], [ 0, %0 ]
%.06 = phi i64 [ %4, %.lr.ph ], [ %sum, %0 ]
%2 = getelementptr inbounds i64* %array, i64 %indvars.iv
%3 = load i64* %2, align 8, !tbaa !0
%4 = add nsw i64 %3, %.06
%indvars.iv.next = add i64 %indvars.iv, 1
%lftr.wideiv = trunc i64 %indvars.iv.next to i32
%exitcond = icmp eq i32 %lftr.wideiv, %n
br i1 %exitcond, label %._crit_edge, label %.lr.ph
._crit_edge: ; preds = %.lr.ph, %0
%.0.lcssa = phi i64 [ %sum, %0 ], [ %4, %.lr.ph ]
ret i64 %.0.lcssa
}
8
C Source:
long sumarr(unsigned int n, long sum, long *array) {
unsigned int j = 0;
for (j = 0; j < n; j++) {
sum += array[j];}
return sum;}
j
sum
© 2013 Rockwell Collins All rights reserved.
Translation Snippet
• Each block within an LLVM function contains a list of instructions in SSA form with type information.
• Hence we can readily convert a list of instructions into an appropriate let construct.
%2 = getelementptr inbounds i64* %array, i64 %indvars.iv
%3 = load i64* %2, align 8, !tbaa !0
%4 = add nsw i64 %3, %.06
(let ((%2 (getelementptr %array %indvars.iv 8)))
(let ((%3 (load-i64l %2 st)))
(let ((%4 (ifix (+ %3 %.06))))
...)))
9
© 2013 Rockwell Collins All rights reserved.
Main Translator Algorithm
10
Translator
Get Function Names
Remove Aliases
Promote Blocks to Functions
Translate to ACL2
LLVM AST ACL2 Code
© 2013 Rockwell Collins All rights reserved.
Remove Aliases
• Aliases allow new names (e.g., @foo1 and @foo2) to be used
for globals and functions
@bar = global i32 123
@foo1 = alias i32* @bar
@foo2 = alias i32* @bar
• We eliminate these
11
© 2013 Rockwell Collins All rights reserved.
Main Translator Algorithm
12
Translator
Get Function Names
Remove Aliases
Promote Blocks to Functions
Translate to ACL2
LLVM AST ACL2 Code
© 2013 Rockwell Collins All rights reserved.
• As we have seen, LLVM functions often contain inner blocks and branch instructions
• Each of these blocks is pulled out as a new function.
• For each block, the phi instructions denote variables that
become parameters for that new function.
%.06 = phi i64 [ %4, %.lr.ph ], [ %sum, %0 ]
• The phi instructions also tell us the parameter values that must
be used at the new function’s call site(s)
Promote Blocks to Functions
13
© 2013 Rockwell Collins All rights reserved.
Dealing with Order of Declarations
• ACL2 requires functions and constants to be defined before they are used
• We do a topological sort on each of the call/dependency graphs
14
© 2013 Rockwell Collins All rights reserved.
Main Translator Algorithm
15
Translator
Get Function Names
Remove Aliases
Promote Blocks to Functions
Translate to ACL2
LLVM AST ACL2 Code
© 2013 Rockwell Collins All rights reserved.
• Function declaration ACL2 function stub
• Function definition with instruction list ACL2 defun construct with a nested let-bound expression
• Memory ACL2 single-threaded object (stobj) for efficient
execution
• Floating-point number corresponding rational number
1.234
(+ 1 (/2 10) (/3 100) (/4 1000))
Translate to ACL2
16
© 2013 Rockwell Collins All rights reserved.
Example—ACL2 .lr.ph: ; preds = %.lr.ph, %0
%indvars.iv = phi i64 [ %indvars.iv.next, %.lr.ph ], [ 0, %0 ]
%.06 = phi i64 [ %4, %.lr.ph ], [ %sum, %0 ]
%2 = getelementptr inbounds i64* %array, i64 %indvars.iv
%3 = load i64* %2, align 8, !tbaa !0
%4 = add nsw i64 %3, %.06
%indvars.iv.next = add i64 %indvars.iv, 1
%lftr.wideiv = trunc i64 %indvars.iv.next to i32
%exitcond = icmp eq i32 %lftr.wideiv, %n
br i1 %exitcond, label %._crit_edge, label %.lr.ph
(defun @sumarr_%.lr.ph (%.06 %indvars.iv %n %array st)
(declare (xargs :stobjs st
:guard (and (integerp %.06) (natp %indvars.iv) (natp %n)
(natp %array))))
(let ((%2 (getelementptr %array %indvars.iv 8)))
(let ((%3 (load-i64l %2 st)))
(let ((%4 (ifix (+ %3 %.06))))
(let ((%indvars.iv.next (nfix (+ %indvars.iv 1))))
(let ((%exitcond (if (= %indvars.iv.next %n) 1 0)))
(if (= %exitcond 1) %4
(@sumarr_%.lr.ph %4 %indvars.iv.next %n %array st))))))))
17
© 2013 Rockwell Collins All rights reserved.
(defun @sumarr_%.lr.ph (%.06 %indvars.iv %n %array st)
(declare (xargs :measure (nfix (- (nfix %n) (nfix %indvars.iv)))
:stobjs st
:guard (and (integerp %.06) (natp %indvars.iv) (natp %n)
(natp %array) (< %indvars.iv %n))))
(if (not (and (mbt (integerp %.06)) (mbt (natp %indvars.iv)) (mbt (natp %n))
(mbt (natp %array)) (mbt (< %indvars.iv %n)))) %.06
(let ((%2 (getelementptr %array %indvars.iv 8)))
(let ((%3 (load-i64l %2 st)))
(let ((%4 (ifix (+ %3 %.06))))
(let ((%indvars.iv.next (nfix (+ %indvars.iv 1))))
(let ((%exitcond (if (= %indvars.iv.next %n) 1 0)))
(if (= %exitcond 1) %4
(@sumarr_%.lr.ph %4 %indvars.iv.next %n %array st)))))))))
Example—ACL2
18
© 2013 Rockwell Collins All rights reserved.
Tail Recursion
• Note that the translated function for the LLVM loop becomes a tail-recursive function (uses an accumulator) in ACL2.
• Tail-recursive functions are nice for execution, since an arbitrary number of recursive tail calls can be made without exhausting the stack.
• However, tail-recursive functions are not convenient for reasoning because they pollute the induction scheme.
• We can generate non-tail-recursive functions operating over simple lists from tail-recursive, stobj-based functions. This technique is called Hardin’s Bridge*.
19
*in memory of Scott Hardin, a civil engineer who designed several physical bridges,
and a man who valued rigor. He was the father of one of the authors.
© 2013 Rockwell Collins All rights reserved.
20
Hardin's Bridge
for(k=0; k< *SZ*; k++) {
res = op(d[k], res);
}
Form: Imperative and
operating over an array
defiteration
(x-tail k res st)
(x-iter j res st)
Form: Tail recursive with
mutable state
Form: Non-tail-recursive
with mutable state
(defun x (res d)
…
(if (endp d) res
(op (car d)
(x res (cdr d)))) Form: Non-tail-recursive
and operating over a list
1 2
1 2
3
3
defiteration
© 2013 Rockwell Collins All rights reserved.
Applying Hardin’s Bridge
We use the bridge technique to prove that @sumarr_%.lr.ph is
equal to the following non-tail-recursive function:
(defun sumlist64 (res lst)
(declare (xargs :measure (len lst)))
(cond ((not (true-listp lst)) (ifix res))
((endp lst) (ifix res))
(t (+ (ifix (load-i64ll (take 8 lst)))
(sumlist64 res (nthcdr 8 lst))))))
Proving properties about @sumarr_%.lr.ph can then be accomplished by proving them instead about sumlist64, a non-
tail-recursive function better suited for theorem proving
21
© 2013 Rockwell Collins All rights reserved.
Current State of the Translator
• We use the def::ung macro to automatically define the
domain of recursive functions.
– This allows recursive functions to be admitted in ACL2 without manually adding measures.
• We added support for modular arithmetic (e.g., fixed width addition).
• We have rerun the sumarr example with the current translator.
– No editing of the translated code was needed.
– Recursive function admitted automatically via def::ung
– Fixed width addition is preserved in the non-tail-recursive spec.
22
© 2013 Rockwell Collins All rights reserved.
Limitations of the Translator
• Exceptions
• Indirect call instructions
23
© 2013 Rockwell Collins All rights reserved.
Future Work
• Stack analysis and data structure analysis
• LLVM DataLayout directives (global endianness, alignment/padding specification)
• LLVM intrinsic functions (there are a large number of these).
• Variable-length argument lists
• Attempt to eliminate cycles in the call graph by code rewrites when possible (rather than just blindly emitting mutual-recursion).
24
© 2013 Rockwell Collins All rights reserved.
Conclusion
• Built an LLVM-to-ACL2 translator
– Produced an executable ACL2 specification
• Tail recursion
• Efficient execution with in-place updates via ACL2’s stobj mechanism
– Demonstrated that the translation produces working ACL2 code for a recursive example program
• Presented technique for reasoning about tail-recursive ACL2 functions that execute in-place
– Utilizes formally proven Hardin’s bridge to non-tail-recursive versions operating on lists
• Tested examples with global variables, pointers, and string constants.
25