View
219
Download
0
Embed Size (px)
Citation preview
Language-Based Security- Guest lecture in CSE 605: Advanced
Programming Languages November 15,
2006 Shambhu Upadhyaya CSE Department
University at Buffalo
Motivation Conventional security mechanisms treat
programs as black box Encryption Firewalls System calls/privileged mode, Access control
But it cannot address important current and future security needs: Downloaded, mobile code Buffer overruns and other safety problems
Threat Model
Attacker can send in-band data
Attacker can force execution to jump to the data
RunningProcess
Remote Exploit
Language-based security is one way to deal with it
Outline The need for language-based security Safety analyses and transformations Making languages safe Certifying compilation and verification Runtime Environment Driven Program Safety – a
case study
Acknowledgements:Fred Schneider, Greg Morrisett Tutorial (Cornell)Ramkumar Chinchani (Cisco)
Computer security Goal: prevent bad things from happening
Clients not paying for services Critical service unavailable Confidential information leaked Important information damaged System used to violate copyright
Conventional security mechanisms aren’t up to the challenge
1st Principle of Security DesignLeast Privilege: each principal is given the
minimum access needed to accomplish its task [Saltzer & Schroeder ‘75]
Examples:+ Administrators don’t run day-to-day tasks as root, So
“rm –rf /” won’t wipe the disk - fingerd runs as root so it can access different
users’ .plan files. But then it can also “rm –rf /”
Problem: OS privilege is coarse-grained
2nd Principle of Security DesignKeep the TCB small. Trusted Computing Base (TCB) : components
whose failure compromises the security of a system
Example: TCB of operating system includes kernel, memory protection system, disk image
Small/simple TCB: TCB correctness can be checked/tested/reasoned
about more easily more likely to work Large/complex TCB:
TCB contains bugs enabling security violations
Return addresschar buf[100];…gets(buf);
Attacker gives long input that overwrites function return address, local variables
“Return” from function transfers control to payload code
Example Attack: buffer overflows
ProgramStack
buf
Return address
Payload
sp
Example Attack: format stringsfgets(sock, s, n);
…fprintf(output, s);
Attack: pass string s containing a %n qualifier (writes length of formatted input to arbitrary location)
Use to overwrite return address to “return” to malicious payload code in s
NB: neither attack is viable with a type-safe language such as Java
But who’s going to rewrite 50Mloc in Java/C#?
1988: Morris WormPenetrated an estimated 6,000 machines (5-10%
of hosts at that time)Used a number of clever methods to gain access
to a host brute force password guessing bug in default sendmail configuration X windows vulnerabilities, rlogin, etc. buffer overrun in fingerd
Remarks: Buffer overruns account for roughly half of CERT
advisories You’d think we could solve this but…
1999: Love Bug & Melissa
Both email-based viruses that exploited: a common mail client (MS Outlook) trusting (i.e., uneducated) users VB scripting extensions within messages to:
lookup addresses in the contacts database send a copy of the message to those contacts ran with full privileges of the user
Melissa: hit an estimated 1.2 million machines
Love Bug: caused estimated $10B in damage
When to enforce security
Possible times to respond to security violations: Before execution:
analyze, reject, rewrite During execution:
monitor, log, halt, change After execution:
rollback, restore, audit, sue, call police
Language-based techniquesA complementary tool in the arsenal: programs
don’t have to be black boxes! Options:
1. Analyze programs at compile time or load time to ensure that they are secure
2. Check analyses at load time to reduce TCB
3. Transform programs at compile/load/run time so that they don’t violate security, or so that they log actions for later auditing
Fixing The Problem
Static analysis Model-checking Type-safety
Runtime checks Anomaly detection
Source Program Binary Executable
Compiler
Maturity of language tools
Some things have been learned in the last 25 years… How to build a sound, expressive type system that
provably enforces run-time type safety
protected interfaces Type systems that are expressive enough to encode
multiple high-level languages language independence
How to build fast garbage collectors trustworthy pointers
On-the-fly code generation and optimization high performance
Static Analysis
Type-Based
Runtime Checks
Pros
One-time effort
Cons
Undecidability of
aliasing False negatives
Pros
One-time effort Efficient
Cons
Weak type system Arbitrary
typecasting
Pros
Coverage Few false positives
Cons
Inefficient
BOON
CycloneCQUAL
BoundsChecker
StackGuard
CCured
Security properties
What kinds of properties do we want to ensure programs or computing systems satisfy?
Safety properties “Nothing bad ever happens” A property that can be enforced using only
history of program Amenable to purely run-time enforcement Examples:
access control (e.g., checking file permissions on file open)
memory safety (process does not read/write outside its own memory space)
type safety (data accessed in accordance with type)
Information security: confidentiality Confidentiality: valuable information should not be leaked by
computation Also known as secrecy, though sometimes a distinction is
made: Secrecy: information itself is not leaked Confidentiality: nothing can be learned about information
Simple (access control) version: Only authorized processes can read from a file But… when should a process be “authorized” ?
End-to-end version: Information should not be improperly released by a computation no
matter how it is used Requires tracking information flow in system - Encryption provides
end-to-end secrecy—but prevents computation
Information security: integrity Integrity: valuable information should not be
damaged by computation Simple (access control) version:
Only authorized processes can write to a file But… when should a process be “authorized”
End-to-end version: Information should not be updated on the basis of
less trustworthy information Requires tracking information flow in system
Privacy and Anonymity Privacy: somewhat vague term
encompassing confidentiality, secrecy, anonymity Sometimes means: individuals (principals) and
their actions cannot be linked by an observer Anonymity: identity of participating principals
cannot be determined even if actions are known stronger than privacy
Availability System is responsible to requests DoS attacks: attempts to destroy availability
(perhaps by cutting off network access) Fault tolerance: system can recover from
faults (failures), remain available, reliable Benign faults: not directed by an adversary
Usual province of fault tolerance work Malicious or Byzantine faults: adversary can
choose time and nature of fault Byzantine faults are attempted security violations usually limited by not knowing some secret keys
Reference MonitorObserves the execution of a program and halts
the program if it’s going to violate the security policy
Common Examples: memory protection access control checks routers firewalls
Most current enforcement mechanisms are reference monitors
What policies? Reference monitors can only see the past
They can enforce all safety properties but not liveness properties
Assumptions: monitor can have access to entire state of
computation monitor can have arbitrarily large state but monitor can’t guess the future – the predicate
it uses to determine whether to halt a program must be computable
Interpreter and sandboxingInterpreter:
easy to implement (small TCB) works with binaries (high-level language-independent) terrible execution overhead (25x? 70x?)
Sandboxing: Software Fault Isolation (SFI) code rewriting is
“sandboxing” Requires code and data for a security domain are in one
contiguous segment Java sandbox model Restricts the functionality of the code
Type-safe languagesSoftware-engineering benefits of type safety: memory safety no buffer overruns (array subscript a[i] only
defined when i is in range for the array a) no worries about self-modifying code, wild
jumps, etc. Type safety can be used to construct a
protected interface (e.g., system call interface) that applies access rules to requests
Java Java is a type-safe language in which type
safety is security-critical Memory safety: programs cannot fabricate
pointers to memory Type safety: private fields, methods of
objects cannot be accessed without using object operations
Bytecode verifier ensures compiled bytecode is type-safe
Security operations Each method has an associated protection domain
e.g., applet or local doPrivileged(P){S}:
fails if method's domain does not have priv. P switches from the caller's domain to the method's while
executing statement S (think setuid) checkPermission(P) walks up stack S doing:
for (f := pop(S); !empty(S) ; f := pop(S)) { if domain(f) does not have priv. P then error; if f is a doPrivileged frame then break;}
Ensures integrity of control flow leading to a security-critical operation
Some pros and cons? Pros:
rich, dynamic notion of context that tracks some of the history of the computation
this could stop Melissa, Love Bug, etc. low overhead
Cons: implementation-driven (walking up stacks)
Could be checked statically [Wallach] policy is smeared over program possible to code around the limited history
e.g., by having applets return objects that are invoked after the applet's frames are popped
Require type safety? Write all security-critical programs in type-safe high-
level language? (e.g., Java) Problem 1: legacy code written in C, C++
Solution: type-safe, backwards compatible C Problem 2: sometimes need control over memory
management Solution: type-safe memory management
Can we have compatibility, type safety and low-level control? Can get 2 out of 3: CCured [Necula et al. 2002]
Emphasis on compatibility, memory safety Cyclone [Jim et al. 2002]
Emphasis on low-level control, type safety
CCured [Necula, 2002] Another type-safe C dialect Different pointer classes
DYNAMIC : no info, slow, all accesses checked SAFE: a memory- and type-safe pointer (or null) SEQ: pointer to an array of data (like Cyclone fat)
Type-safe world Memory-safe world
DYNAMICSAFE,SEQ DYNAMIC
Nonmodular but fast C CCured converter based on BANE constraint solving framework
Code certification mechanisms Problem: can you trust the code you run?
Code signing using digital signatures Too many signers If you can’t trust Microsoft,…
Idea: self-certifying code Code consumer can check the code itself to ensure
it’s safe Code includes annotations to make this feasible Checking annotations easier than producing them
Certifying compiler generates self-certifying code Java/JVM: first real demonstration of idea
PCC:
optimizer
machine code
verifier SecurityPolicy
SystemBinary
prover
code proof
could be you
trusted computing base
“certified binary”
invariants
Making “Proof” Rigorous:
Specify machine-code semantics and security policy using axiomatic semantics
{Pre} ld r2,r1(i) {Post}
Given: security policy (i.e., axiomatic semantics and
associated logic for assertions) untrusted code, annotated with (loop) invariants
it’s possible to calculate a verification condition: an assertion A such that if A is true then the code respects the policy
Producer side
The code producer takes its code & the policy: constructs some loop invariants constructs the verification condition A from the
code, policy, and loop invariants constructs a proof that A is true
code proof
“certified binary”
invariants
Code producerCode producer
Proof-carrying code
Code consumer Code consumer
Verificationconditiongenerator
SafetyProof
SafetyConditions
Proofchecker
Safetypolicy
Targetcode
Code consumer side
Verifier (~5 pages of C code): takes code, loop invariants, and policy calculates the verification condition A checks that the proof is a valid proof of A:
fails if some step doesn’t follow from an axiom or inference rule
fails if the proof is valid, but not a proof of A
code proof
“certified binary”
in-variantsinvariants
Advantages of PCCA generic architecture for providing and checking
safety propertiesIn Principle: Simple, small, and fast TCB No external authentication or cryptography No additional run-time checks “Tamper-proof” Precise and expressive specification of code safety
policies
In Practice: Still hard to generate proofs for properties stronger than
type safety. Need certifying compiler…
Summary Extensible, networked systems need language-
based security mechanisms Analyze and transform programs before running
Software fault isolation, inlined reference monitors Use safe programming languages to defend
against attacks and for fine-grained access control Java, C#, CCured, Cyclone
Certifying compilation to simplify TCB JVM, TAL, PCC
Static analysis of end-to-end system security Noninterference, static information flow, …
Other work and future challenges Lightweight, effective static analyses for common
attacks (buffer overflows, format strings, etc.) Security types for secrecy in network protocols Self-certifying low-level code for object-oriented
languages Applying interesting policies to PCC/IRM Secure information flow in concurrent systems Enforcing availability policies See website for bibliography, more tutorial
information: www.cs.cornell.edu/info/People/jgm/pldi03
Making the case Static analysis is one-time but poor coverage Runtime checks have good coverage but per
variable checks are inefficient Type-based safety is efficient but can be
coarse-grained
A new vulnerability class:
Recently seen in openssh, pine, Sun RPC and several other software
Cause: attacker-controlled integer variable
Motivation
Integer Overflow Vulnerability
Integer Overflow Attack
alloc_mem(u_short size)
{
u_short pad_size = 16;
size = size + pad_size;
return malloc(size);
}
size = 65535
size = 15 !!
return smaller memory
Program Security Is NOT Portable!
Source or Binary code
Program Security
Safe
Safe
32-bit
16-bitUnsafe
Buffer Overflows And Runtime Environment
Intel Immediate caller’s
return address on stack For successful attack,
callee has to return Variable width
instruction set
PowerPC Immediate caller’s
return address in link register
For successful attack, both callee and caller have to return
Fixed width instruction set
NOTE: StackGuard has only been implemented for the x86 architecture. Porting to other architectures should be easy, but we haven't done it yet
Observations
Vulnerabilities are specific to a runtime environment
CERT incident reports contain information such as
architecture, software distribution, version, etc.
Programming language level analysis is not adequate
Machine word size determines behavior of numerical
types
Operating system and compiler determine memory
layout
Approach Outline
Infer safety properties in context of runtime environment
Enforce these properties Java-like; except no JVM
Runtime Environment Driven Program Safety
Basic Methodology A Type-Based Safety Approach
Runtime-dependent interpretation Not merely an abstraction, but using actual values No new types Also, can be efficient
Prototype Implementation: ARCHERR
Implemented as a parser using flex and bison Currently works on 32-bit Intel/Linux platform
Detecting Integer Overflows Machine word size is an important factor
Main idea: Analyze assignment and arithmetic operations in context of machine word size
16-bit
32-bit
Intel XScale Processor
(now 32-bit version)
Intel Pentium Processor
x : int → x є I
x, y : int
x = y
succ(x : int) = (x + 1)
pred(x : int) = (x – 1)
where I = (-∞, +∞)
Integers : Classical View
Assignment:
Arithmetic:
Integer Arithmetic Safety Checksif x ≥ 0; y ≥ 0, then
x + y assert : x (MAXINT - y)if x ≥ 0; y < 0, then
x - y assert : x (MAXINT + y)if x < 0; y ≥ 0, then
x - y assert : x ≥ (MININT + y)if x < 0; y < 0, then
x + y assert : x ≥ (MININT - y)
x, y,x y assert : x ≥ MININT/y /\ x MAXINT/yx y assert : y ≠ 0x % y assert : y ≠ 0
Other Numerical Types short, long, unsigned short/long, etc.
Similar analysis float, double, long double
Floating points use a standard IEEE format Analysis is more complex But floating point arithmetic is discouraged for
efficiency reasons
Other Operators Bitwise operators
<< : multiplication by 2 >> : division by 2 (is safe)
Logical operators? Not exactly arithmetic in nature
In A Program?
foo(int x, int y)
{
VALIDATE_ADD_INT(x,y);
return (x + y);
}
16-bit check?32-bit check?
Compile-time Annotations
Runtime Checking
A High-Level View What have we
achieved actually?
RE 1RE 2
A programmer’s view
An attacker’s view
Properties of types in classical sense
Automatic safety conversion
Extending Idea To Pointers Common concept of segments – data, text, stack But differences in actual layout
Windows NT
Process Address Map
Linux
4 GB
(0xFFFFFFFF)
3 GB
(0xBFFFFFFF)
0 GB
System space
User space
Similarities/Differences Between
Integers and Pointers Pointers are represented as “unsigned int’s” A machine word to represent the entire virtual
address space Both pointers and integers have a type However, arithmetic is different
integers increment by 1 pointers increment by size of type they point to
Valid values that they can assume? For integers, just one interval For pointers, several disjoint intervals
Bookkeeping Exampleschar buffer[256];
char *p;
int main() {
ADD_GLOBAL(buffer, sizeof(buffer));
ADD_GLOBAL(&p, sizeof(p));
…
}
int foo() {
char buffer[256];
ADD_STACK_FRAME();
…
DEL_STACK_FRAME();
return 0;
}
Reading stack frames currently implemented through ESP and EBP register values.
Pointers : Runtime Dependent View Safe pointer assignment
A pointer variable p, which points to variables of type be denoted by p:q(
Safe pointer arithmetic (the following must obey the above rule)
Pointer Check ExamplesVALIDATE_PTR(q);
p = q;
VALIDATE_PTR(&p[i]);
p[i] = 2;
VALIDATE_PTR_ADD(p, 1);
p++;
q is a valid ptr?
[q, sizeof(*q)] is inside same range?
&p[i] is a valid ptr?
[&p[i], sizeof (*(&p[i]))] is inside same range?
p is a valid ptr?
[p, sizeof(*p)] is inside same range?
p + 1 is a valid ptr and belongs to the same address range?
Additional Pointer Issues Function pointers
If not protected, can lead to arbitrary code execution
Maintain a separate list of function addresses and check against them
Typecasting is a feature in C Private fields in structures through void * Leads to dynamic types
Optimizations Remove unnecessary checks using static
analysis Currently, integer arithmetic
Speed up memory range lookups Maintain separate FIFO lists for stack, data and
heap Pointer assignment is "safe"; dereferencing
is not Optimize initialization loops
Security Testing Does this approach actually work? Real-world programs Vulnerabilities and exploits available at SecurityFocus website
Program Vulnerability Detected?
sendmail (8.11.6) Stack-based buffer overflow
YES
GNU indent (2.2.9) Heap-based buffer overflow
YES
man (1.5.1) Format string NO
pine (4.56) Integer overflow YES
Performance Testing Scimark2 benchmark 32-bit Intel/Linux 2.6.5 Compared against CCured and BoundsChecker
CCured 1.5 x
BoundsChecker 35 x
ARCHERR w/o pointer checks 2.3 x
ARCHERR with pointer checks 2.5 x
Performance Hit (slowdown)
Impact On Code Size Source code annotations cause bloat
Source Code Bloat 1.5 – 2.5 x
Runtime Image Bloat 1.2 – 1.4 x
Limitations Source code annotations
Therefore, no guarantees if code is not annotated Write wrappers for well-documented binary
libraries
Maybe considered inefficient for high performance applications Security has a price but it is not too high
32 strncpy(dst, src, 32);
Features Portable safety is runtime environment
dependent First work to show systematic way to
detect/prevent integer overflow attacks Currently on one architecture
Extended the idea to detect/prevent memory-based attacks Again on one architecture
Security testing and performance evaluation
Static Analysis
Type-Based
Runtime Checks
Pros
One-time effort
Cons
Undecidability of
aliasing False negatives
Pros
One-time effort Efficient
Cons
Weak type system Arbitrary typecasting
Pros
Coverage Few false positives
Cons
Inefficient
BOON
CycloneCQUAL
BoundsChecker
StackGuard
CCured
ARCHERR
Current Status And Future Work Code to be released soon
Currently research grade Investigating implementation on other runtime
environments 32-bit Intel/Windows PE32 32-bit Intel/FreeBSD ELF 32-bit SPARC/ELF
Improve efficiency? rndARCHERR – randomized runtime checks Static analysis driven optimizations