Certifying (RISC) Machine Code Safe from Aliasing (OpenCert 2013)

Certifying (RISC) Machine Code Safe from Aliasing

Peter T. BreuerUniversity of Birmingham, UK

Jonathan P. BowenLondon South Bank University, UK

Little and Large Problem

● Small arithmetic unit, embedded processor– 40 bit arithmetic

● Large memory unit– 64 bit addressing

● What do we do with the extra wires?

Hardware Aliasing

● What happens to the extra wires?– depends on the hardware

● 4 + 0xfffffffffffffffc = 0x0000000000000000 or0xfffff00000000000 ?

● Both mean 0– If use arithmetic to calculate address 0

● Sometimes get the 0 you want● Sometimes not!

Also happens in KPU

● A KPU is an encrypted processor– Instead of 4 - 4 = 0

– Does 99900 - 99900 = 78763298● Homomorphism conditions on encrypted

arithmetic guarantee correct behaviour

– Real encryption is always 1-many● The encoding of 0 is 9896861● 99900 - 99900 = 78763298

9896861● Another encoding of 0 is 78763298

– Encrypted arithmetic gives different result ● Depending on how you do the calculation

Problem

● How to check a program is safe from hardware aliasing

● Where `hardware aliasing' means that arithmetic on addresses does not always give the same result.

– Trust only exactly the same calculation

– Because 4 - 4 != 0 – It's `equivalent' to 0, not identical!

Can imagine in both cases ...

● Values have invisible extra bits● 42.1101101● Represent different encodings of '42'

● Arithmetic ignores but mutates the extra bits● 42.1101101 + 42.1100001 = 84.0110110

● Memory unit is sensitive to invisible extra bits● Can't see just '42'.

● Needs loving care from programmer

How to deal with hardware aliasing

● Left program returns different alias of SP to caller

Subroutine foo:

SP -= 32 # 8 local vars…code ...SP += 32 # destroy framereturn

Subroutine foo:GP = SPSP -= 32…code ...SP = GPreturn

GoodBad

Regard machine code as compiled from Stack Machine

control language● Good code:

cspt GP # copy stack pointer to GPpush 32 # make 32B space on stack…rspf GP # restore stack pointer from GPreturn

What makes that SM code safe?

● No access outside the current frame– The stack access commands are

● Get 10 gp # 10th stack cell contents.. ● Put 10 gp # .. transfer to/from reg gp

– If all access offsets in current frame range● Only one way to access stack content..● By offset from current stack pointer

– Can only make new frame, not shift sp● Push 32

– Can only return sp to value saved earlier● Cspt gp … rspf gp

Heap access

● Deal with that later!– Look for array and string treatment in text

Verifying SM code

● Means verifying that all stack accessesare within the current frame boundary

● That's so easy! Check n in 'get n r'.● But we have machine code, not SM code!

Machine code looks like this

● Mov gp sp # cspt gpAddi sp sp -32 # push 32…mov sp gp # rspf gpjr ra # return

● Is it compiled from safe SM code?

To prove m/c safe

● Apply Hoare-like rules of reasoning– Whose names are the SM code that the

m/c is supposed to be compiled from

● Requires human being to chose rule– Or an automaton to search solution space

– Either way, it's deduction-guided disassembly

Example

● Think about a 32B current frame

{ sp=c32!10; (10)=x } ld gp 10(sp) [get 10 gp] {sp=c32!10; (10)=gp=x}● 'c32!10' means pointer to 32B

– Already written at offset 10

● (10)=x means stack cell 10 has an x-thing● Machine code is 'ld gp 10(sp)'

– Load reg gp from offset 10 from stack ptr

● Name of the rule is 'get 10 gp'

Types● Logic is based on stack machine model

– manipulates types in register/stack/heap

● C32 – pointer to stack frame of size 32– Only access by bounded offset from ptr

● U10 – array of size 10 on heap– Can only access by offset from fixed base

● C1 - string accessed in increments of 1– String is like a stack of frames size 1

– Stepping up `pops one off the stack'

– Access within `current frame' only

●

Typing

● Milner typing– Assign type variables to every register

and stack position within current frame

– Calculate effect of instructions

– Ambiguous modulo assignment of rule● Equals dis-assembly of instruction

● Proved – soundness– Assigned types say what really happens

Other Proved Things

● Termination– Milner algorithm terminates

– With a typing, if one exists, errors if not

● Uniqueness– The type found is unique most general

● For a given annotation

● There are at most 32 valid annotations– Differ in position of stack pointer register

Conclusion

1.Disassemble machine code • Human activity

2.Apply Milner typing• Includes stack machine bounds verification• Automated activity

3.Certify m/c as hardware alias safe● Steps 1 & 2 can be mixed/simultaneous

● Inference-guided disassembly

4.Apply to assembler in Linux kernel

Technology

Certifying (RISC) Machine Code Safe from Aliasing (OpenCert 2013)