53
Analysis and Visualization of Common Packers HITBSecConf2008 - Kuala Lumpur Ero Carrera - [email protected] Reverse Engineer at zynamics GmbH Chief Research Officer at VirusTotal

Analysis and Visualization of Common Packers - … · Analysis and Visualization of Common Packers HITBSecConf2008 ... Windbg Can do kernel-mode ... A Quick Survey on Automatic Unpacking

Embed Size (px)

Citation preview

Analysis and Visualization of Common Packers

HITBSecConf2008 - Kuala LumpurEro Carrera - [email protected]

Reverse Engineer at zynamics GmbH

Chief Research Officer at VirusTotal

Introduction

Originally meant to save space by reducing the redundancy in executable file formats

Simply compressed parts or the whole of the executable

Created a new "envelope" around it that restored the original executable and the passed control to it

The decompressing envelope did not much more than just restoring the executable

An historical perspective

Compression provided a trivial degree of obfuscation, but obfuscation nonetheless

Was easy to add additional measures in the decompressing envelope

Evolution of the techniques

Overview of the techniques

Import address table

Simple late reconstruction into an original form

Construction of new connectivity artifacts between the original code and imported modules

Strings

Destruction of informational components

Aimed at making tracing hard

Using SEHs triggered by hard to handle exceptions

Confuse debuggers throwing INTs they use

Calling hard-to-hook low level APIs/syscalls

Checking for hooks

Anti-debug

VM detection.

VMWare, VirtualPC, etc

Techniques aimed against specific tools

OllyDBG, IDA, Softice, etc

Anti-environment

Tricks detecting, confusing or aimed at crashing some of the most common tools

IDA

OllyDBG

Procdump

Softice

Breaking tools

Code obfuscation

Adding junk code, using opaque predicates

Code transformation

Virtual machines

Flow obfuscation (SEH, Nanomites)

Anti-analysis

Bochs

Provides with a high-level view

No need to worry about most of the anti-* techniques

Windbg

Can do kernel-mode debugging, hook syscalls, look deeper that user-mode tools

Inspection of physical/virtual memory

Memoryze, for the real hardcore

Tools

Obfuscation & Anti-Analysis

Most tools will linearly disassemble a chunk of code

Introduce and non-terminal flow branching instruction (not a ret or jmp)

Make it point later in the code, to the middle of what would be an instruction if disassembling linearly

Result => confusion

(latest IDA, 5.3, has some workarounds against this)

Also: indirect obfuscation through heavy optimization

Basic trickery against analysis algorithms

0101F001 60________ pusha

0101F002 E803000000 call near ptr loc_101F007+1

0101F007 E9EB045D45 jmp near ptr 465EF4F7h

Example: ASPack (original)

0101F001 60________ pusha

0101F002 E803000000 call loc_101F008

0101F007 E9________ db 0E9h ; T

0101F008 EB04______ jmp short loc_101F00E

Example: ASPack (fixed)

0DFE000000 or eax, 0FEh

3D02000000 cmp eax, 2

E901000000 jmp 0x1

75B8______ jnz short near ptr 0FFFFFFC9h

F3C001C0__ rep rol byte ptr [ecx], 0C0h

C3________ ret

Example: Linear Disassembly

0DFE000000 or eax, 0FEh

3D02000000 cmp eax, 2

E901000000 jmp 0x1 // HERE

75________ db 0x75

HERE: B8F3C001C0 mov eax, 0xc001c0f3

C3________ ret

Example: Linear Disassembly

0DFE000000 and eax, 2

3D02000000 cmp eax, 2

7401______ jz 0x01

75B8______ jnz short near ptr 0FFFFFFC6h

F3C001C0__ rep rol byte ptr [ecx], 0C0h

C3________ ret

Example: Opaque Predicate

0DFE000000 and eax, 2

3D02000000 cmp eax, 2

7401______ jz HERE

75________ db 0x75

HERE: B8F3C001C0 mov eax, 0xc001c0f3

C3________ ret

Example: Opaque Predicate

FunctionExecutable Image

Memory Page

Memory Page

Memory Page

Function Chunk

Function Chunk

Function Chunk

Function Chunk

address instruction (operand, ...)

address instruction (operand, ...)

address instruction (operand, ...)

...

address instruction (operand, ...)

address instruction (operand, ...)

address instruction (operand, ...)

...

address instruction (operand, ...)

address instruction (operand, ...)

address instruction (operand, ...)

...

address instruction (operand, ...)

address instruction (operand, ...)

address instruction (operand, ...)

...

address instruction (operand, ...)

address instruction (operand, ...)

address instruction (operand, ...)

...

Function Chunk

FunctionExecutable Image

Memory Page

Memory Page

Memory Page

Function Chunk

Function Chunk

Function Chunk

Function Chunk

Function Chunk

address instruction (operand, ...)

address instruction (operand, ...)

address instruction (operand, ...)

...

address instruction (operand, ...)

address instruction (operand, ...)

address instruction (operand, ...)

...

address instruction (operand, ...)

address instruction (operand, ...)

address instruction (operand, ...)

...

address instruction (operand, ...)

address instruction (operand, ...)

address instruction (operand, ...)

...

address instruction (operand, ...)

address instruction (operand, ...)

address instruction (operand, ...)

...

Shared Blocks

Function A Function Baddress ��������� (������ , ...)

address ��������� (������ , ...)

address ��������� (������ , ...)

...

address ��������� (������ , ...)

address ��������� (������ , ...)

address ��������� (������ , ...)

...

address ��������� (������ , ...)

address ��������� (������ , ...)

address ��������� (������ , ...)

...

address ��������� (������ , ...)

address ��������� (������ , ...)

address ��������� (������ , ...)

...

address ��������� (������ , ...)

address ��������� (������ , ...)

address ��������� (������ , ...)

...

address ��������� (������ , ...)

address ��������� (������ , ...)

address ��������� (������ , ...)

...

address ��������� (������ , ...)

address ��������� (������ , ...)

address ��������� (������ , ...)

...

address ��������� (������ , ...)

address ��������� (������ , ...)

address ��������� (������ , ...)

...

address ��������� (������ , ...)

address ��������� (������ , ...)

address ��������� (������ , ...)

...

Junk. Polymorphic and static

pusha

popa

Non-Standard Branching

Junk JMP insertion

Junk Code

018900CE 48________ dec eax

018900CF 60________ pusha

018900D0 B9FBDEF000 mov ecx, 0F0DEFBh

018900D5 50________ push eax

018900D6 9C________ pushf

018900D7 E912000000 jmp loc_18900EE

018900D7 [junk data]

Exmple: Junk Code I (Themida)

018900EE loc_18900EE:

018900EE E90E000000__ jmp loc_1890101

018900EE [junk data]

01890101 loc_1890101:

01890101 9D__________ popf

01890102 5E__________ pop esi

01890103 61__________ popa

01890104 0F844E06FA7A jz loc_7C830758

Exmple: Junk Code II (Themida)

Visual Basic, Java, Python, Ruby, Perl, .NET

Starforce, VMProtect, x86 Virtualizer, Themida/CodeVirtualizer

At a high-level it’s a: fetch, decode, handle algorithm

Virtual Machines

Real CPU

Virtual CPU

Virtualized Code

Real CPU

Standard Code

Runs Runs

Runs

Virtual CPU opcodes

Virtual CPU

Decoder

opA reg1, reg2

opB reg2

branchA XYZ

Real CPU opcodes

handler for opA

Fetch Instruction Pointer

1

handler for opB

handler for opC

handler for branchA

Execute handlerRegistersUpdate registers

3

4

Registers

-General Purpose

-Instruction Pointer

-Stack pointer

Decode2

...

Decoder

-Look up operand in table

-Call handler

Rolf Rolles and Boris Lau have already shown that optimization/reduction techniques can help

Translating to an intermediate representation and performing optimization in the code leads to reduced forms

You could use a tool like Peter’s “Find executable code” to discover instruction handlers from a memory dump of the VM

Virtual Machine Countermeasures

Some of the hardest current packers are VMProtect, Themida, Armadillo

They incorporate some complex, custom techniques

Usually commercial products protectors

Advanced Packers

Armadillo

Double process debugging, debug blocker

Nanomites

Strategic Code Splicing

Armadillo's invalid instructions

LOCK prefix

Invalid MOV

Armadillo

Parent process Child processDebug

INT 3Parent catches it

Look up address

Find target

Set target in child

context

Resume child Child process

INT 3Parent

catches it

Child's code

push ebp

mov ebp, esp

push 0

push 0

call XYZ

cmp eax, 0

INT 3

.

.

.

.

mov [UWZ], 0xff

pop ebp

ret

Transfer control

Debug

Transfer control

Themida

The general algorithm can be summarized as:

Retrieve the API's function body

Perform a basic analysis and disassembly

Reconstruct the API's function body inserting junk in between each of the real instructions

Re-assemble functionality, keep the semantics, change the syntax

Themida's API obfuscation

Executable Imported DLL

Standard Imports

Executable Imported DLL

Exported DLL Function

Internal DLL Function

A function references other code

Themida protected executable

DLL Function (Obfuscated)

Imported DLL

Exported DLL Function

Internal DLL Function

Some of the references are kept

The algorithm has limitations

References to other functions within the DLL are kept

Same for true branches of conditional branches

Those two points can allow us to do API discovery by studying their connectivity

Reconstruction

Adds lots of branching and junk

Keeps few "real" instructions per obfuscated block

IDA can “easily” deal with the branching

Although bogus calls break IDA analysis and lead to broken obfuscated functions

Some scripting can make this look better

Themida's obfuscation

Packing vs unpacking

Packing is not always a symmetric proces, sometimes it can't be undone perfectly

You won’t get the original process back

Can it be done generically? Some cases the answer is "mostly" yes

You will mostly always be able to obtain code close to its original form

Current state

skape documented an elegant trick on uninformed 10 a few weeks ago

Attacks a basic heuristic used by most “generic” unpackers

Tracking execution transfer to “dirty-memory”

Recent techniques

Virtual Address Range A Virtual Address Range A

WRITE

TIME

Virtual Address Range A

WRITE

Virtual Address Range A

MMU

Virtual Address Range A

Physical Memory

Virtual Address Range A

MMU

Virtual Address Range A

Physical Memory

WRITE

EXECUTE

Windbg can see the mappings from virtual to physical

Not to hard to spot doubled mapped regions

Bochs and other low level emulators can easily do it as well

Requires kernel-mode access or “higher”

Countermeasures

Reversing. Secrets of Reverse Engineering. Eldad Eilam

Déprotection semi-automatique de binaire, Yoann Guillot & Alexandre Gazet

http://metasm.cr0.org/SSTIC08-article-Guillot_Gazet-Deprotection_Semi_Automatique_Binaire.pdf

Virtual Machine Threats , Peter Ferrie

http://www.symantec.com/avcenter/reference/Virtual_Machine_Threats.pdf

References

A Quick Survey on Automatic Unpacking Techniques, Daniel Reynaud

http://indefinitestudies.wordpress.com/2008/09/25/automatic-unpacking/

Using dual-mappings to evade automated unpackers, skape

http://www.uninformed.org/?v=10&a=1

Dealing with Virtualization packer, Boris Lau

http://www.datasecurity-event.com/uploads/boris_lau_virtualization_obfs.pdf

References II

Rolf Rolles blog in OpenRCE

https://www.openrce.org/blog/browse/RolfRolles

Oreans Themida/CodeVirtualizer

http://www.oreans.com

ReWolf's x86 Virtualizer

http://www.openrce.org/blog/view/847/x86_Virtualizer_-_source_code

References III

VMProtect

http://www.vmprotect.ru/

Deroko's Nanomite's write up

http://www.phearless.org/i3/Nanomites_And_Misc_Stuff.txt

Memoryze, Mandiant

http://www.mandiant.com/software/memoryze.htm

References IV

Thanks Dhillon & the HitB crew!

Q&A