22
CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer http://www.cs.berkeley.edu/ ~necula/cil – CC ’02 Friday, April

CIL: Infrastructure for C Program Analysis and Transformation

Embed Size (px)

DESCRIPTION

CIL: Infrastructure for C Program Analysis and Transformation. George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer http://www.cs.berkeley.edu/~necula/cil. ETAPS – CC ’02 Friday, April 12. What is CIL?. Distills C language into a few key forms - PowerPoint PPT Presentation

Citation preview

Page 1: CIL: Infrastructure for C Program Analysis and Transformation

CIL: Infrastructure for C Program Analysis and Transformation

George C. Necula, Scott McPeak,S. P. Rahul, Westley Weimer

http://www.cs.berkeley.edu/~necula/cil

ETAPS – CC ’02 Friday, April 12

Page 2: CIL: Infrastructure for C Program Analysis and Transformation

What is CIL?

Distills C language into a few key forms with precise semantics

Parser + IR + Program Merger for CMaintains types, close ties to sourceHighly structured, clean subset of CHandles ANSI/GCC/MSVC

Page 3: CIL: Infrastructure for C Program Analysis and Transformation

Why CIL?

Analyses and TransformationsEasy to use impersonates compiler & linker $ make project CC=cil

Easy to work with converts away tricky syntax leaves just the heart of the language separates concepts

Page 4: CIL: Infrastructure for C Program Analysis and Transformation

C Feature Separation

CIL separates language components pure expressions statements with side-effects control-flow embedded CFG

Keeps all programmer names temps serialize side-effects simplified scoping

Page 5: CIL: Infrastructure for C Program Analysis and Transformation

Example: C Lvalues

An exp referring to a region of storageExample: rec[1].fld[2]May involve 1, 2, 3 memory accesses 1 if rec and fld are both arrays 2 if either one is a pointer 3 if rec and fld are both pointers

Syntax (AST) is insufficient

Page 6: CIL: Infrastructure for C Program Analysis and Transformation

CIL Lvalues

An exp referring to a region of storage

lval ::= <base offset>base ::= Var(varinfo) | Mem(exp)offset ::= None | Field(f offset) | Index(exp offset)

Page 7: CIL: Infrastructure for C Program Analysis and Transformation

CIL Lvalues

Example: rec[1].fld[2] becomes either:<Var(rec), Index(1, Field(fld, Index(2, None)))> or:<Mem(2 + Lvalue(<Mem(1 + Lvalue(<Var(rec),

None>)), Field(fld, None)>), None>

Full static and operational semantics

Page 8: CIL: Infrastructure for C Program Analysis and Transformation

Semantics

CIL gives syntax-directed semanticsExample judgment:

(x) = ` Var(x) (&x,)

environment

lvalue formmeaning

Page 9: CIL: Infrastructure for C Program Analysis and Transformation

CIL Lvalue Semantics

(x) =

`Var(x) (&x,)

` e : Ptr()

`Mem(e) (e,)

` b (a,)

`None@b (a,)

` b (a1,Arr(1)) `o@(a1+e|1|,1) (a2,2)

`Index(e,o)@b (a2,2)

` o@b (a,)

`<b,o> (a,)

Page 10: CIL: Infrastructure for C Program Analysis and Transformation

CIL Source Fidelity

CIL output:struct __anonstruct1 { int fld[3] ;}; typedef struct

__anonstruct1 * Myptr;Myptr rec;(rec + 2)->fld[1] = (int)’h’;

SUIF 2.2.0-4 output:typedef int __ar_1[3];struct type_1 { __ar_1 fld; };struct type_1 * rec;(((((int *)(((char *)&((((struct

type_1 *) (rec))))[2])+0U))))[1]) =(104);

typedef struct { int fld[3]; } * Myptr;Myptr rec;rec[2].fld[1] = ’h’;

Page 11: CIL: Infrastructure for C Program Analysis and Transformation

Corner Cases

Your analysis will not have to handle: return ({goto L; p;}) && ({L: 5;}); return &(--x ? : z) - & (x++, x);

Full handling of GNU-isms, MSVC-isms attributes initializers

Page 12: CIL: Infrastructure for C Program Analysis and Transformation

Corner Cases

Your analysis will not have to handle: return ({goto L; p;}) && ({L: 5;});

int tmp;

goto L;

if (p) { L: tmp = 1; }

else { tmp = 0; }

return tmp;

Page 13: CIL: Infrastructure for C Program Analysis and Transformation

StackGuard Transform

Cowan et al., USENIX ’98Buffer overrun defense push return addess on private stack pop before returning only change functions with local arrays

40 lines of commented code with CILQuite easy: uses visitors for tree replacement, explicit returns, etc.

Page 14: CIL: Infrastructure for C Program Analysis and Transformation

Other Transforms

Instrument and log all calls: 150 linesEliminate break, continue, switch: 1101 memory access per assignment: 100Make each function have a single return statement: 90Make all stack arrays heap-allocated: 75Log all value/addr memory writes: 45

Page 15: CIL: Infrastructure for C Program Analysis and Transformation

Whole-Program Merger

C has incremental linking, compilation coupled with a weak module system!

Example (vortex / gcc / c++2c):

/* foo.c */

struct list { int head;

struct list * tail;

};

struct list * mylist;

/* bar.c */

struct chain { int head;

struct chain * tail;

};

extern struct chain * mylist;

Page 16: CIL: Infrastructure for C Program Analysis and Transformation

Merging a Project

Determine what files to mergeMerge the files handle file-scoped identifiers C uses name equivalence for types but modules need structural equivalence

Key: Each global identifier has 1 type!

Page 17: CIL: Infrastructure for C Program Analysis and Transformation

Other Merger Details

Remove duplicate declarations every file includes <stdio.h>

Match struct pointer with no defined body in file A to defined body in file B

Be careful when picking representatives

Page 18: CIL: Infrastructure for C Program Analysis and Transformation

How Does it Work?

Make project, pass all files through CILRun your transform and analysisEmit simplified CCompile simplified C with GCC/MSVC… and it works!

Page 19: CIL: Infrastructure for C Program Analysis and Transformation

Large Programs

Program #LOC *.[ch]

Notes

SPECINT95 360K

GIMP-1.2.2 800K large libraries

linux-2.4.5 2.5M 132% compile time

ACE (in C) 2M 2000 files

Used in the CCured and BLAST projects

Page 20: CIL: Infrastructure for C Program Analysis and Transformation

Merged Kernel Stats

Stock monolithic Linux 2.4.5 kernelhttp://manju.cs.berkeley.edu/cil/vmlinux.cStatistics: Before | After 324 files | One 12.5MB file 11.3 M-words | 1.5 M-words 7.3 M-LOC (post-process) | 470 K-LOC$ make CC=“cil –merge” HOSTCC=“cil –merge” LD=“cil –merge” AR=“cil –mode=AR –merge”

Page 21: CIL: Infrastructure for C Program Analysis and Transformation

Conclusion

CIL distills C to a precise, simple subset easy to analyze well-defined semantics close to the original source

Well-suited to complex analyses and source-to-source transformsParses ANSI/GCC/MSVC CRapidly merges large programs

Page 22: CIL: Infrastructure for C Program Analysis and Transformation

Questions?

Try CIL out:

http://www.cs.berkeley.edu/~necula/cil

Complete source, documentation and test cases freely available