Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware

Preview:

Citation preview

• Co-founder and Chief Scientist at Lastline, Inc.– Lastline offers protection against zero-day threats and advanced

malware

• Professor in Computer Science at UC Santa Barbara– many systems security papers in academic conferences

• Part of Shellphish

Who are we?

• PhD Student at UC Santa Barbara– research focused primarily on binary security and embedded devices

• Part of Shellphish– team leader of Shellphish's effort in the DARPA Cyber Grand

Challenge

• Doesn't like peanut butter

Who are we?

- firmware- binary analysis- angr

What are we talking about?

The “Internet of Things”

Embedded software is everywhere

• Embedded Linux and user-space programs

• Custom OS and custom programs combined together in a binary blob– typically, the binary is all that you get– and, sometimes, it is not easy to get this off the device

What is on embedded devices?

Binary analysisnoun | bi·na·ry anal·y·sis | \ˈbī-nə-rē ə-ˈna-lə-səs\

1. The process of automatically deriving properties about the behavior of binary programs

2. Including static binary analysis and dynamic binary analysis

Binary Analysis

• Program verification• Program testing• Vulnerability excavation• Vulnerability signature generation

• Reverse engineering• Vulnerability excavation• Exploit generation

Goals of Binary Analysis

– reason over multiple (all) execution paths– can achieve excellent coverage– precision versus scalability trade-off

• very precise analysis can be slow and not scalable• too much approximation leads to wrong results (false positives)

– often works on abstract program model• for example, binary code is lifted to an intermediate representation

Static Binary Analysis

– examine individual program paths– very precise– coverage is (very) limited– sometimes hard to properly run program

• hard to attach debugger to embedded system• when code is extracted and emulated, what happens with calls to

peripherals?

Dynamic Binary Analysis

• Get the binary code

• Binaries lack significant information present in source

• Often no clear library or operating system abstractionso where to start the analysis from?o hard to handle environment interactions

Challenges of Static Binary Analysis

From Source to Binary Code

compile link

strip

From Source to Binary Code

compile link

striptype info

function names

variable names

jump targets

• (Linux) system call interface is great– you know what the I/O routines are

• important to understand what user can influence– you have typed parameters and return values– lets the analysis focus on (much smaller) main program

• OS is not there or embedded in binary blob– heuristics to find I/O routines– open challenge to find mostly independent components

Missing OS and Library Abstractions

• Library functions are great– you know what they do and can write a “function summary”– you have typed parameters and return values– lets the analysis focus on (much smaller) main program

• Library functions are embedded (like static linking)– need heuristics to rediscover library functions– IDA FLIRT (Fast Library Identification and Recognition Technology)– more robustness based on looking for control flow similarity

Missing OS and Library Abstractions

• Memory safety vulnerabilities– buffer overrun– out of bounds reads (heartbleed)– write-what-where

• Authentication bypass (backdoors)

• Actuator control!

Types of Vulnerabilities

Linux embedded device: HTTP server for management and video monitoring, with a known backdoor.

Backdoor!!!➔ Username: 3sadmin➔ Password: 27988303

Heffner, Craig. "Finding and Reversing Backdoors in Consumer Firmware." EELive! (2014).

Motivating Example

Authentication Bypass

Prompt

Authentication

Success Failure

Authentication Bypass

Prompt

Authentication

Success Failure

Backdoore.g. strcmp()

Authentication Bypass

Prompt

Authentication

Success Failure

Backdoore.g. strcmp()

Hard to find.

Authentication Bypass

Prompt

Success

Missing!

Modeling Authentication BypassPrompt

Authentication

Success Failure

Backdoore.g. strcmp()

Easier to find!

Hard to find.

Input DeterminismPrompt

Authentication

Success Failure

Backdoore.g. strcmp()

Can we determine the input needed to reach the successfunction, just by analyzing the code?

The answer is NO

Input Determinism

Prompt

Authentication

Success Failure

Backdoore.g. strcmp()

Can we determine the input needed to reach the successfunction, just by analyzing the code?

The answer is YES

Modeling Authentication BypassPrompt

Authentication

Success Failure

Backdoore.g. strcmp()

Easier to find!

But how?

• Without OS/ABI information:

• With ABI information:

Finding “Authenticated Point”

EXEC()

Using Binary Analysis to Hunt for Vulnerabilities

Program

Symbolic ExecutionSecurity policies Security

Policy Checker

POCs

Static Analysis

angr: A Binary Analysis Framework

Static Analysis Routines

Symbolic Execution Engine

Binary Loader

angr

angr: A Binary Analysis Framework

Static Analysis Routines

Symbolic Execution Engine

Binary Loader

angr

"How do I trigger path X or condition Y?"

Symbolic Execution

Input Determinism

Prompt

Authentication

Success Failure

Backdoore.g. strcmp()

Can we determine the input needed to reach the successfunction, just by analyzing the code?

"How do I trigger path X or condition Y?"

- Dynamic analysis- Input A? No. Input B? No. Input C? …- Based on concrete inputs to application.

- (Concrete) static analysis- "You can't"/"You might be able to"- Based on various static techniques.

We need something slightly different.

Symbolic Execution

"How do I trigger path X or condition Y?"

1. Interpret the application.2. Track "constraints" on variables.3. When the required condition is triggered,

"concretize" to obtain a possible input.

Symbolic Execution

Yan Shoshitaishvili
add simple example

Constraint solving:

❏ Conversion from set of constraints to set of concrete values that satisfy them.

❏ NP-complete, in general.

Constraints

x >= 10x < 100

x = 42

Symbolic Execution

Yan Shoshitaishvili
fix weird path

Pros

- Precise- No false positives (with

correct environment model)

- Produces directly-actionable inputs

Symbolic Execution - Pros and ConsCons

- Not scalable- constraint solving is np-

complete- path explosion

Our Case

Worst-Case

Worst-Case

angr: A Binary Analysis Framework

Static Analysis Routines

Symbolic Execution Engine

Binary Loader

angr

angr: A Binary Analysis Framework

Control-Flow GraphData-Flow AnalysisValue-Set Analysis

Static Analysis Routines

Symbolic Execution Engine

Binary Loader

angr

angr: A Binary Analysis Framework

Control-Flow GraphData-Flow AnalysisValue-Set Analysis

Static Analysis Routines

Symbolic Execution Engine

Binary Loader

angr

angr: A Binary Analysis Framework

Control-Flow GraphData-Flow AnalysisValue-Set Analysis

Static Analysis Routines

Symbolic Execution Engine

Binary Loader

angr

angr: A Binary Analysis Framework

Control-Flow GraphData-Flow AnalysisValue-Set Analysis

Static Analysis Routines

Symbolic Execution Engine

Binary Loader

angr

Example

cmp rbx, 0x1024 ja _OUT cmp [rax+rbx], 1337 je _OUT add rbx, 4

rbx?

What is rbx in the yellow square?mov rax, 0x400000mov rbx, 0

Symbolic execution: state explosion

Naive static analysis: "anything"

Range analysis: "< 0x1024"

Can we do better?

Memory access checks Type inference

Variable recovery Range recoveryWrapped-interval analysis

Value-set analysisAbstract interpretation

Value Set Analysis

Value Set Analysis - Strided Intervals4[0x100, 0x120],32

Stride Low High Size

0x100 0x10c 0x118

0x104 0x110 0x11c

0x108 0x114 0x120

Example

cmp rbx, 0x1024 ja _OUT cmp [rax+rbx], 1337 je _OUT add rbx, 4

rbx?

What is rbx in the yellow square?

1. 1[0x0, 0x0],64

2. 4[0x0, 0x4],64

3. 4[0x0, 0x8],64

4. 4[0x0, 0xc],64

5. 4[0x0, ∞],64

6. 4[0x0, 0x1024],64

mov rax, 0x400000mov rbx, 0

Widen

Narrow

1

234 56

angr: A Binary Analysis Framework

Control-Flow GraphData-Flow AnalysisValue-Set Analysis

Static Analysis Routines

Symbolic Execution Engine

Binary Loader

angr

CBvulnerable program

RBpatched program

POVexploit

CyberReasoning

System

The Cyber Grand Challenge!

The Shellphish CRS

CB

ProposedRBs

Autonomous vulnerability

scanning

Autonomous service

resiliency

PCAP

Test cases

POV

RB

Autonomous processing

Autonomous patching

ProposedPOVs

The Shellphish CRS

CB

ProposedRBs

Autonomous vulnerability

scanning

Autonomous service

resiliency

PCAP

Test cases

POV

RB

Autonomous processing

Autonomous patching

ProposedPOVs

- ipython-accessible- powerful analyses- versatile- well-encapsulated- open and expandable- architecture "independent"

Angr

Angr Mini-howto# ipython

In [1]: import angr, networkxIn [2]: binary = angr.Project("/some/binary")In [3]: cfg = binary.analyses.CFG()In [4]: networkx.draw(cfg.graph)

In [5]: explorer = binary.factory.path_group()In [6]: explorer.explore(find=0xc001b000)

➔ http://angr.io ➔ https://github.com/angr➔ angr@lists.cs.ucsb.edu

Pull requests, issues, questions, etc super-welcome! Let's bring on the next generation of binary analysis!

Angr - Open source!

Birthday: September 2013Total line numbers: 59950Total commits: ALMOST 9000!! (actually ~6000)

Value Set Analysis - Strided Intervals

4[0x100, 0x110],32 + 1 = 4[0x101, 0x111],32

4[0x100, 0x110],32 >> 1 = 2[0x80, 0x88],31

2[0x80, 0x88],31 << 1 = 1[0x100, 0x110],32

4[0x100, 0x110],32 ⋃ 4[0x102, 0x112],32 = 2[0x100, 0x112],32

4[0x100, 0x110],32 ⋂ 3[0x100, 0x110],32 = 12[0x100, 0x112],32

WIDEN (4[0x100, 0x110],32 ⋃ 4[0x100, 0x112],32) = 4[0x100, ∞],32

Recommended