On the Viability of Memory Forensics in Compromised ... · Malicious software can employ anti-forensic techniques to intercept the acquisition and filter memory contents while they

On the Viability of Memory Forensics inCompromised Environments

Zur Praktikabilitat von Hauptspeicherforensik inkompromittierten Umgebungen

Der Technischen Fakultat derFriedrich-Alexander-Universitat

Erlangen-Nurnbergzur Erlangung des Grades

D O K T O R - I N G E N I E U R

vorgelegt von

Johannes Stuettgen

aus Herdecke

Als Dissertation genehmigt vonder Technischen Fakultat der

Friedrich-Alexander-UniversitatErlangen-Nurnberg

Tag der mundlichen Prufung: 28.05.2015Vorsitzende des Promotionsorgans: Prof. Dr.-Ing. habil. Marion MerkleinGutachter: Prof. Dr.-Ing. Felix Freiling

Prof. Dr. Michael Meier

Abstract

Memory forensics has become a powerful tool for the detection and analysis of ma-licious software. It provides investigators with an impartial view of a system, expos-ing hidden processes, threads, and network connections, by acquiring and analyzingphysical memory. Because malicious software must be at least partially resident inmemory in order to execute, it cannot remove all its traces from RAM. However,the memory acquisition process is vulnerable to subversion in compromised envi-ronments. Malicious software can employ anti-forensic techniques to intercept theacquisition and filter memory contents while they are copied.In this thesis, we analyze 12 popular memory acquisition tools for Windows, Linux,and Mac OS X, and study their implementation in regard to how they enumerateand map memory. We find that all of the analyzed programs use the operatingsystem to perform these tasks, and further illustrate this by implementing an opensource memory acquisition framework for Mac OS X. In a survey of kernel rootkittechniques, that prevent or filter physical memory access, we show that all 12 testedprograms are vulnerable to anti-forensics, because they rely on the operating systemfor critical functions.To elliminate this vulnerability, we develop an operating system independent ap-proach that directly utilizes the hardware to enumerate and map memory. By inter-acting with the PCI controller, we are able to safely avoid memory mapped devicebuffers while acquiring the entire physical address space. We program the page tablesdirectly to map memory, forcing the MMU to facilitate arbitrary physical memoryaccess from our driver’s data segment. We implement our techniques into the opensource memory acquisition frameworks Winpmem, Pmem, and OSXPmem, further-ing the capabilities of memory acquisition software on the Windows, Linux, and MacOS X platforms.Finally, we apply our novel technique to related problems in memory forensics.Memory acquisition software for Linux can only be run on a system with the exactsame kernel version and configuration as the system it was compiled on, due todependencies on kernel data structures. We are able to create a minimal, kernelindependent version of our module, which we inject into a compatible host moduleon the target. By hijacking the hosts data structures, we are able to load the infectedmodule, redirect control flow, and communicate with it using a character device.A second innovative property of our acquisition approach is that, because we canenumerate the location of memory mapped device buffers, we are able to safelyaccess memory regions unknown to the operating system. This allows us to acquiremalicious firmware during of the memory acquisition process. We present a surveyon firmware code and data in the physical address space, and show how we cancapture the BIOS, PCI option ROMs, and the ACPI tables using our approach. Weimplement plugins for the open source memory analysis framework Volatility, whichare able to extract the ACPI tables from memory and analyze them for maliciousbehavior.

ZusammenfassungHauptspeicherforensik hat sich zu einem machtigen Werkzeug fur die Erkennung undAnalyse von Schadsoftware entwickelt. Sie stellt Ermittlern eine objektive Sicht aufComputersysteme bereit, mit der versteckte Artefakte, wie Prozesse und Netzwerk-verbindungen, durch Analyse des Hauptspeicherinhalts enttarnt werden konnen. DaSchadsoftware zumindest in Teilen des Hauptspeichers vorhanden sein muss um aus-gefuhrt werden zu konnen, ist es unmoglich alle Spuren einer Infektion zur Laufzeitaus dem Speicher zu beseitigen. In kompromittierten Umgebungen besteht allerdingsdie Gefahr, dass der Zugriff auf den Hauptspeicher durch anti-forensische Methodenunterwandert wird.In dieser Arbeit analysieren wir die Implementierung von 12 weit verbreitete Werk-zeuge zur Erstellung von Hauptspeicherabbildern unter Windows, Linux, und MacOS X. Unsere Untersuchungen zeigen, dass alle Programme das Betriebssystem zurUmsetzung kritischer Aufgaben verwenden, was wir durch die Implementierung desProgramms OSXPmem illustrieren. In einer Studie geben wir dann einen Uberblickuber die verschiedenen anti-forensischen Techniken auf Betriebssystemebene, undzeigen mit einem Experiment das alle 12 untersuchten Programme anfallig fur Anti-Forensik sind.Wir schließen diese Lucke durch die Entwicklung von betriebssystemunabhangigeTechniken zur Hauptspeichersicherung. Durch direkte Interaktion mit dem PCI Con-troller identifizieren wir in den physischen Adressraum eingeblendete Gerate, was unsden sicheren Zugriff auf den restlichen Speicher ermoglicht. Wir blenden diesen inden Adressraum unserer Anwendung ein, indem wir die Datenstrukturen der MMUmanipulieren. Wir implementieren unsere Techniken in die quelloffenen ProgrammeWinpmem, Pmem, und OSXPmem, was Ermittlern den Einsatz unter Windows,Linux, und Mac OS X ermoglicht.Schließlich nutzen wir unsere Techniken, um zwei weitere Probleme der Hauptspei-cherforensik zu losen. Programme zur Speichersicherung unter Linux sind nur aufSystemen mit der exakt gleichen Kernel Version und Konfiguration lauffahig mit dersie kompiliert wurden, da sie ein Kernel Module laden was von den Datenstrukturendes Kernels abhangig ist. Wir losen dieses Problem indem wir eine minimale Versionunseres Moduls in ein kompatibles ”Opfer“-Modul auf dem Zielsystem injizieren. ZurKommunikation mit dem Kernel zweckentfremden wir die Datenstrukturen des Op-fers, was uns erlaubt unser Programm auf einer großen Menge verschiedener LinuxSysteme zu verwenden, ohne es neu kompilieren zu mussen. Die zweite innovativeEigenschaft unseres Ansatzes ist, dass wir gefahrlos auf Speicherbereiche zugreifenkonnen die dem Betriebssystem nicht bekannt sind, da uns die Position der in denAdressraum eingeblendeten Gerate bekannt ist. Dies erlaubt uns den Zugriff auf dieFirmware im Zuge der Hauptspeicheruntersuchung. Wir geben einen Uberblick uberdie Lage von BIOS, PCI option ROMs und den ACPI Tabellen im physischen Adress-raum, und implementieren Techniken zur Sicherung und Analyse von Firmware furdie quelloffene Speicheranalyse-Software Volatility.

Acknowledgments

This thesis would not have been possible without the support of others. First andforemost, I would like to thank my supervisor Felix Freiling for his continuous adviceand support during my time at the Security Research Group of the Departmentof Computer Science in Erlangen. Many thanks also go to Michael Meier, fromthe University of Bonn, for agreeing to be my second supervisor. I also thankmy colleagues at the Security Research Group, for a cheerful and friendly workingatmosphere.

I would also like to extend my thanks to the Google Incident Response Team, formany interesting discussions and an exciting working environment. A special thankyou goes to Michael Cohen, who, with his guidance and inspiration facilitated twoof the three papers this thesis is built upon.

In addition, I want to thank the following list of people (in alphabetical order) forhelping me proofread this thesis and forge it into something legible: Michael Gruhn,Tilo Muller, Ben Stock, Heiner Stuttgen and Stefan Vomel.

Finally, I want to thank Mathieu Suiche, for his commitment to forensic tool testing,providing me with an evaluation version of Moonsols Windows Memory Toolkit, thatallowed me to also test a commercial tool for anti-forensic resilience.

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Technical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1 x86 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.1 The Physical Address Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1.2 Memory Protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1.3 The PCI Express Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2 Linux Kernel Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.2.1 Module Binary Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.2.2 Linking and Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.3 System Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.3.1 Basic Input Output System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.3.2 (Unified) Extensible Firmware Interface . . . . . . . . . . . . . . . . . . . 30

2.3.3 PCI Option ROMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.3.4 Advanced Configuration and Power Interface . . . . . . . . . . . . . . 31

2.4 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3 Memory Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.1 Principles of Memory Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.1.1 Criteria for Sound Memory Acquisition . . . . . . . . . . . . . . . . . . . 36

3.1.2 Correctness of Existing Memory Acquisition Tools . . . . . . . . . 39

3.1.3 Memory Image Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.2 Software Memory Acquisition Techniques . . . . . . . . . . . . . . . . . . . . . . . . 41

3.2.1 Memory Acquisition Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . 41

i

Contents

3.2.2 Operating System Memory Interfaces . . . . . . . . . . . . . . . . . . . . . 42

3.2.3 Driver-Based Memory Acquisition . . . . . . . . . . . . . . . . . . . . . . . . 44

3.3 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4 Anti-Memory Forensics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.1 Anti-Forensic Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2 Attacks on Memory Acquisition Software . . . . . . . . . . . . . . . . . . . . . . . . 51

4.2.1 Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.2.2 Mac OS X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.2.3 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.3 Passive Anti-Forensics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.3.1 Hidden Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.3.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.4 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5 Anti-Forensic Resilient Memory Acquisition . . . . . . . . . . . . . . . . . 63

5.1 Improving Memory Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.1.1 Hardware-based Memory Enumeration . . . . . . . . . . . . . . . . . . . . 64

5.1.2 Hardware-based Memory Mapping . . . . . . . . . . . . . . . . . . . . . . . 66

5.1.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.2.1 Loading of Driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.2.2 Interception of Data Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.2.3 Debug Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.2.4 Shadow Page Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.2.5 Reliability and Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.3 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6 Kernel Independent Memory Acquisition on Linux . . . . . . . . . . 73

6.1 Compatibility of Linux Kernel Modules With Different Kernels . . . . . 75

6.1.1 Bypassing Module Version Checking . . . . . . . . . . . . . . . . . . . . . . 77

6.1.2 Requirements for a Stable Approach . . . . . . . . . . . . . . . . . . . . . . 77

ii

Contents

6.2 Reliable Loading of Generic Acquisition Modules . . . . . . . . . . . . . . . . . 786.2.1 Parasitizing a Compatible Module . . . . . . . . . . . . . . . . . . . . . . . . 786.2.2 Code Injection into Kernel Modules . . . . . . . . . . . . . . . . . . . . . . 79

6.3 Redirection of Control Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 806.3.1 Interception of Module Initialization . . . . . . . . . . . . . . . . . . . . . . 806.3.2 Communication with User Mode . . . . . . . . . . . . . . . . . . . . . . . . . 816.3.3 Selection of a Suitable Host . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.4 Implementation of a Minimal Acquisition Module . . . . . . . . . . . . . . . . 836.5 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

7 Acquisition and Analysis of Compromised Firmware . . . . . . . . 877.1 Rootkit Strategies for Compromising Firmware . . . . . . . . . . . . . . . . . . . 88

7.1.1 BIOS- and EFI-Based Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . 887.1.2 PCI Option ROM-Based Attacks . . . . . . . . . . . . . . . . . . . . . . . . . 897.1.3 ACPI-Based Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

7.2 Enumeration of Firmware in the Physical Address Space . . . . . . . . . . 917.2.1 Enumeration of the Physical Address Space . . . . . . . . . . . . . . . 917.2.2 Mapping of Memory and Firmware Regions . . . . . . . . . . . . . . . 94

7.3 Firmware Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 957.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

7.4.1 Stability and Correctness of the Acquisition Method . . . . . . . . 967.4.2 Comparison with Available Memory Acquisition Solutions . . . 977.4.3 Detection of ACPI Rootkits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

7.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987.5.1 Technological Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 997.5.2 Anti-Forensics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

7.6 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1018.1 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1018.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

iii

List of Figures

2.1 Organization of the Background Chapter . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Architecture of a North- and South-Bridge Based Chipset . . . . . . . . . 102.3 Architecture of a modern PCH based chipset . . . . . . . . . . . . . . . . . . . . 112.4 Memory Transaction Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.5 Memory Map of a Haswell System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.6 Virtual Address Space on an x86-64 System . . . . . . . . . . . . . . . . . . . . . 172.7 Datastructures Involved in Virtual to Physical Address Translation . 182.8 PCIe Protocol Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.9 PCIe Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.10 PCI Configuration Space Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.11 PCIe Type 0 Configuration Space Header . . . . . . . . . . . . . . . . . . . . . . . 232.12 PCI 32-Bit MMIO BAR Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.13 PCIe Type 1 Configuration Space Header . . . . . . . . . . . . . . . . . . . . . . . 242.14 ELF file layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.15 Static vs. Dynamic Linking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.16 Loading of a Kernel Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.17 ACPI Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.1 Space-Time Diagram of an Atomicity Violation . . . . . . . . . . . . . . . . . . 383.2 Space-Time Diagram of Integrity Violations . . . . . . . . . . . . . . . . . . . . . 39

4.1 Effects of DKOM on /proc/iomem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.2 Hidden memory on Test System with 4 GB RAM . . . . . . . . . . . . . . . . 59

5.1 PTE Remapping Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6.1 Initialization of a Kernel Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766.2 Relocation Hook of module->init . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 816.3 Relocation Hook of file_operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

7.1 Firmware Memory Ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927.2 Views on the Physical Address Space . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

iv

List of Tables

4.1 Evaluation of Acquisition with Active Anti-Forensics . . . . . . . . . . . . . . 54

6.1 Host Modules by Kernel Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

7.1 Firmware Acquisition Capabilities of Memory Forensic Software . . . . 977.2 Classification of Operation Regions in the ACPI Test Data Set . . . . . 98

v

Listings

3.1 Identifying Physical Memory Regions in /proc/kcore . . . . . . . . . . . . . 443.2 Memory Mapping in OSXPmem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.3 Accessing the Memory Map in OSXPmem . . . . . . . . . . . . . . . . . . . . . . . 484.1 Attack on Windows Memory Management APIs . . . . . . . . . . . . . . . . . . 534.2 OS X Memory-Map Overwriting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.3 DKOM Attack on Linux Memory Map . . . . . . . . . . . . . . . . . . . . . . . . . . 575.1 PCI BAR Sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656.1 Module Data Structure (The Linux Kernel Archives, 2013) . . . . . . . . 75

vi

Acronyms

ACPI . . . . . . . . . . . . . . . . Advanced Configuration and Power Interface

AMD . . . . . . . . . . . . . . . . Advanced Micro Devices

AML . . . . . . . . . . . . . . . . ACPI Machine Language

API . . . . . . . . . . . . . . . . . Application Programming Interface

ASL . . . . . . . . . . . . . . . . . ACPI Source Language

ASLR . . . . . . . . . . . . . . . . Address Space Layout Randomization

ATM . . . . . . . . . . . . . . . . Automated Teller Machine

BAR . . . . . . . . . . . . . . . . . Base Address Register

BDF . . . . . . . . . . . . . . . . . Bus, Device, Function

BIOS . . . . . . . . . . . . . . . . Basic Input Output System

BSOD . . . . . . . . . . . . . . . . Blue Screen of Death

CAM . . . . . . . . . . . . . . . . Configuration Access Mechanism

CPU . . . . . . . . . . . . . . . . . Central Processing Unit

CS . . . . . . . . . . . . . . . . . . Code Segment

DKOM . . . . . . . . . . . . . . . Direct Kernel Object Manipulation

DMA . . . . . . . . . . . . . . . . Direct Memory Access

DMI . . . . . . . . . . . . . . . . . Direct Media Interface

DOS . . . . . . . . . . . . . . . . . Disk Operating System

DSDT . . . . . . . . . . . . . . . . Differentiated System Description Table

DTB . . . . . . . . . . . . . . . . . Directory Table Base

DWORD . . . . . . . . . . . . . . Double Word

DXE . . . . . . . . . . . . . . . . . Driver Execution Environment

EBDA . . . . . . . . . . . . . . . . Extended BIOS Data Area

ECAM . . . . . . . . . . . . . . . Enhanced CAM

EFI . . . . . . . . . . . . . . . . . Extensible Firmware Interface

ELF . . . . . . . . . . . . . . . . . Executable and Linkable Format

vii

Acronyms

EPROM . . . . . . . . . . . . . . . Erasable Programmable ROM

FADT . . . . . . . . . . . . . . . . Fixed ACPI Description Table

FSB . . . . . . . . . . . . . . . . . Front Side Bus

GFX . . . . . . . . . . . . . . . . . Graphics

GPU . . . . . . . . . . . . . . . . . Graphical Processing Unit

GTT . . . . . . . . . . . . . . . . . Graphics (GFX) Translation Tables

HAL . . . . . . . . . . . . . . . . . Hardware Abstraction Layer

Haswell . . . . . . . . . . . . . . . Intel 4th Generation Core Architecture

HF . . . . . . . . . . . . . . . . . . High Frequency

I/O . . . . . . . . . . . . . . . . . Input/Output

ID . . . . . . . . . . . . . . . . . . Identifier

IDT . . . . . . . . . . . . . . . . . Interrupt Descriptor Table

iMC . . . . . . . . . . . . . . . . . Integrated Memory Controller

IOMMU . . . . . . . . . . . . . . . Input/Output MMU

IVT . . . . . . . . . . . . . . . . . Interrupt Vector Table

LFSR . . . . . . . . . . . . . . . . Linear Feedback Shift Register

LMAP . . . . . . . . . . . . . . . . Linux Memory Acquisition Parasite

LPC . . . . . . . . . . . . . . . . . Low Pin Count

MBR . . . . . . . . . . . . . . . . Master Boot Record

ME . . . . . . . . . . . . . . . . . Management Engine

MMIO . . . . . . . . . . . . . . . . Memory Mapped Input/Output

MMU . . . . . . . . . . . . . . . . Memory Management Unit

NVRAM . . . . . . . . . . . . . . Non-Volatile RAM

OS . . . . . . . . . . . . . . . . . . Operating System

OS X . . . . . . . . . . . . . . . . Apple Mac OS X

PAM . . . . . . . . . . . . . . . . Programmable Attribute Map

PCH . . . . . . . . . . . . . . . . . Platform Controller Hub

PCI . . . . . . . . . . . . . . . . . Peripheral Component Interconnect

viii

Acronyms

PCIe . . . . . . . . . . . . . . . . . Peripheral Component Interconnect Express

PD . . . . . . . . . . . . . . . . . . Page Directory

PDE . . . . . . . . . . . . . . . . . PD Entry

PDPT . . . . . . . . . . . . . . . . Page Directory Pointer Table

PDPTE . . . . . . . . . . . . . . . PDPT Entry

PEI . . . . . . . . . . . . . . . . . Pre-EFI Initialization

PFN . . . . . . . . . . . . . . . . . Page Frame Number

PLT . . . . . . . . . . . . . . . . . Procedure Linkage Table

PML4 . . . . . . . . . . . . . . . . Page Map Level 4

PML4E . . . . . . . . . . . . . . . PML4 Entry

PMM . . . . . . . . . . . . . . . . POST Memory Manager

POST . . . . . . . . . . . . . . . . Power On Self Test

PS . . . . . . . . . . . . . . . . . . Page Size

PT . . . . . . . . . . . . . . . . . . Page Table

PTE . . . . . . . . . . . . . . . . . PT Entry

RAM . . . . . . . . . . . . . . . . Random Access Memory

ROM . . . . . . . . . . . . . . . . Read Only Memory

RSDP . . . . . . . . . . . . . . . . Root System Description Pointer

RSDT . . . . . . . . . . . . . . . . Root System Description Table

RX . . . . . . . . . . . . . . . . . . Receive

SATA . . . . . . . . . . . . . . . . Serial Advanced Technology Attachment

SEC . . . . . . . . . . . . . . . . . Security

SMI . . . . . . . . . . . . . . . . . System Management Interrupt

SMM . . . . . . . . . . . . . . . . System Management Mode

SMRAM . . . . . . . . . . . . . . System Management RAM

SPI . . . . . . . . . . . . . . . . . Serial Peripherial Interface

TCP/IP . . . . . . . . . . . . . . . Internet protocol suite

TLB . . . . . . . . . . . . . . . . . Translation Lookaside Buffer

TOLUD . . . . . . . . . . . . . . . Top of Lower Usable DRAM

ix

Acronyms

TOUUD . . . . . . . . . . . . . . . Top of Upper Usable DRAM

TSEG . . . . . . . . . . . . . . . . Top of Main Memory Segment

TX . . . . . . . . . . . . . . . . . . Transmit

UEFI . . . . . . . . . . . . . . . . Unified Extensible Firmware Interface

UMA . . . . . . . . . . . . . . . . Uniform Memory Access

USB . . . . . . . . . . . . . . . . . Universal Serial Bus

VFS . . . . . . . . . . . . . . . . . Virtual Filesystem

VMM . . . . . . . . . . . . . . . . Virtual Machine Monitor

XROMBAR . . . . . . . . . . . . Expansion ROM Base Address Register

XSDT . . . . . . . . . . . . . . . . Extended Root System Description Table

YAML . . . . . . . . . . . . . . . . YAML Ain’t Markup Language

x

Chapter 1

Introduction

In 2013, a bank in the Ukraine noticed that an Automated Teller Machine (ATM)was dispensing cash for no apparent reason. At seemingly random intervals, themachine would start emptying its money supply onto the street without user in-teraction. The security cameras showed that the money was being picked up byrandom strangers, just happening to pass by at the right moment. When computersecurity specialists analyzed the computers at the bank, they uncovered one of thebiggest bank heists in history (The New York Times, 2015). A group of criminalshad attacked the computers of over 100 banks worldwide, infecting key systems withthe malicious software agent Carbanak (Kaspersky Labs, 2015). Aside from manip-ulating ATMs to dispense money, the software was used to monitor bank employeesand discover how the banks conducted their operations. By impersonating bankemployees, the group managed to transfer hundreds of millions of US dollars intooffshore accounts. The transfers went unnoticed for months, because the criminalsmanipulated the banks internal bookkeeping records to hide the missing balance.Total financial losses are estimated to be between 300 million and 1 billion US dol-lars (Kaspersky Labs, 2015). Incidents like the Carbanak bank heist show thatmalicious software has become a significant threat to businesses worldwide. In fact,a recent study by McAfee estimates the global cost of cybercrime to be more than400 billion US dollars (McAfee Inc., 2014).

Memory forensics, the process of acquiring and analyzing the contents of a com-puters RAM, has become an integral part of digital forensic investigations targetingmalicious software, because it provides an impartial view of a computer systemsinternal state (Walters and Petroni, 2007). It can be used to detect and analyzehidden processes, network connections, and other artifacts on computers infectedby malicious software (Sutherland et al., 2008). Because malicious software mustreside somewhere in memory to execute, it is impossible to hide all traces of theinfection from RAM (Kornblum, 2006). To remain undetected, software must sub-vert the memory acquisition process to present analysis software with a filtered viewof physical memory. There are multiple ways of obtaining a copy of memory, allwith different characteristics, constraints, and potential to subversion by malicioussoftware.

Hardware-based memory acquisition methods like memory transplantation (Halder-man et al., 2008) and bus attacks (Boileau, 2006) require physical access to thetarget system, which is not available in remote incident response scenarios. Further-more, new technologies pose problems to established hardware memory acquisition

1

1 Introduction

techniques. Memory data scrambling, as employed by DDR3 memory controllers,poses a big challenge to memory transplantation attacks, as the acquired memoryis scrambled by undocumented methods that have yet to be deciphered (Gruhnand Muller, 2013; Skochinsky, 2014). Bus-based memory acquisition techniques areimpeded by the introduction of the Input/Output MMU (IOMMU), which allowssoftware to configure the memory controller to protect certain memory regions fromdevice access (Rutkowska, 2007).

Software-based memory acquisition methods don’t require physical access to the tar-get system, but have their own problems. While previous work has shown that it ispossible to leverage System Management Mode (SMM) to run memory acquisitionsoftware in an environment isolated from the potentially subverted system (Wanget al., 2011), this method requires a firmware modification, which makes it plat-form dependent and not portable. There has also been work to leverage hardwarevirtualization technology for memory acquisition (Martignoni et al., 2010), whichrequires the processor to support hardware virtualization and can only work if noother program (including malicious software) has already made use of this. Withoutthe availability of firmware components or virtualization extensions, software has toresort to the operating system level to acquire physical memory.

Software memory acquisition techniques at the operating system level are the mostversatile ones, requiring no preparation on the target system. They can be usedto create a memory image on the local hard-disk, send an image remotely over thenetwork, and even to perform remote, live memory analysis (Cohen et al., 2011).However, because they run with the same privileges as malicious software on a poten-tially infected system, they are prone to subversion by anti-forensic techniques. Thisthesis aims at improving the resilience of memory acquisition software to subversion,raising the bar for criminals to hide their actions.

1.1 Contributions

This thesis consists of three major parts: First, we analyze the functionality ofmemory acquisition software and identify anti-forensic techniques that can subvertthe acquisition process. Based on our findings, we develop a novel technique that isnot subject to subversion by anti-forensics. Finally, we show how our approach canbe adapted to solve two other memory forensic problems as well.

Memory Acquisition Operating System Internals In Chapter 3, we presenta survey on the current state of the art of software memory acquisition. We give anoverview on the criteria we use to asses the quality of a memory image, and pointout the importance of the correctness of an image for analysis. We then analyze aset of 12 popular memory acquisition frameworks for Windows, Linux, and Mac OSX, and survey their functionality in regard to two major tasks: memory enumeration

2

1.1 Contributions

and memory mapping. We find that all tested programs use the operating systemto enumerate and map memory, and give an overview of the programming interfacesused for this purpose. Finally, we illustrate the technical details of memory acquisi-tion software by implementing the OSXPmem tool for Mac OS X systems. At thetime of release this was the only software able to acquire memory on Mac OS X sys-tems newer than version 10.8. Our results foster a better understanding of memoryacquisition software and provide investigators with the much needed capability toacquire memory on recent versions of Mac OS X.

Anti-Memory Forensics Because of the reliance on operating system servicesto enumerate and map memory, software memory acquisition can be subverted bymalicious software running in kernel-mode. In Chapter 4, we give an overview ofpractical anti-forensic techniques on current memory acquisition software. We findthat some tools employ undocumented functions to map memory instead of standardoperating system interfaces, to avoid previously published anti-forensic techniquesthat filter the operating system memory interface (Bilby, 2006). We show that theseundocumented functions can still be attacked using standard rootkit techniqueslike inline hooking and direct kernel object manipulation. Furthermore, we presentnew classes of anti-forensic attacks against memory enumeration, that are able toselectively hide sections of physical memory. To prove our claims, we create proof-of-concept implementations of these attacks for Windows, Linux, and Mac OS X, thatdisable several operating system programming interfaces used by memory acquisitionsoftware. Our evaluation shows that none of the 12 tested forensic tools is able toacquire memory on systems with anti-forensic modifications in place. This workserves as a demonstration on how easy it still is for malicious software to subvertthe memory acquisition process, 6 years after the first public demonstration by theDDFY rootkit (Bilby, 2006).

Anti-Forensic Resilient Memory Acquisition In Chapter 5, we present a newsoftware memory acquisition technique that does not depend on operating systemfunctionality. Instead of querying the operating system for available memory, weinteract directly with the hardware to enumerate memory mapped device buffersin the physical address space. We then map the remaining regions of the physicaladdress space into our drivers virtual address space by directly manipulating theprocesses page tables. Our evaluation shows that this approach is not vulnerableto the anti-forensic methods presented in the previous chapter. Our technique isimplemented into the open source Winpmem (Cohen, 2012), Pmem for Linux (Co-hen, 2011), and OSXPmem (Stuttgen, 2012) tools, and publicly released within theRekall memory forensics framework (Cohen, 2014b). Our research raises the bar formalicious software to subvert memory forensic investigations, enabling investigatorsto detect and analyze malware that was previously invisible.

3

1 Introduction

Kernel Independent Linux Memory Acquisition One of the advantages ofour novel memory acquisition techniques is that it does not rely on any operatingsystem functionality. In Chapter 6 we utilize this property to solve a key problem inLinux memory forensics: The requirement of having to compile a memory acquisitionkernel module specifically for the target system. The reason for this is that Linuxkernel modules are statically linked with the kernel at runtime, which requires themto be compiled with the exact same version of the kernel headers and configurationto be binary compatible. Since our method is kernel independent, we don’t requirebinary compatibility as long as we are able to load our module and communicatewith it. To achieve this goal we implement a custom linker that is able to inject ourmemory acquisition module into a compatible host module on the target system.By modifying the relocation tables of the host, we instrument its data structures forcommunication with the kernel. This is stable because the host module was compiledwith the correct configuration and headers, so we are actually using compatible datastructures. Our method allows us to create a memory acquisition program that canbe distributed as an executable and does not need to be re-compiled for the targetsystem. This reduces the amount of preparation necessary to acquire memory onLinux systems, relieving investigators and shortening response times.

Acquisition and Analysis of Compromised Firmware In Chapter 7, we ex-plore a second innovative property of our new memory acquisition technique. Sinceour hardware-based memory enumeration method provides us with the location ofmemory mapped device buffers in the physical address space, we can safely accessmemory regions that are unknown to the operating system. This allows us to acquirecode and data from the system firmware. We analyze the physical address spacefor firmware related regions, and conduct an experiment that shows we are able toacquire the BIOS, PCI option ROMs, and the ACPI tables this way. We developplugins for the open source Volatility (Walters, 2014) memory analysis framework,that extract the ACPI tables from a memory image and scan them for maliciousbehavior. We evaluate these tools by implementing a proof-of-concept ACPI rootkit,which we successfully detect using our methods. The developed tools and techniquesenable investigators to acquire and analyze malware at the firmware level, which waspreviously impossible with memory forensic tools.

1.2 Related Work

The focus of this thesis lies on software memory acquisition techniques on the oper-ating system level. However, there has been a considerable research effort on othermethods, which we will outline in this section.

Memory Acquisition Using Hardware Virtualization To isolate memory ac-quisition software from the potentially subverted operating system, previous work

4

1.2 Related Work

has suggested to leverage hardware virtualization extensions, which are availablein most recent x86 processors (Intel Corporation, 2014b). They allow the memoryacquisition program to load a Virtual Machine Monitor (VMM) to isolate itself fromthe operating system, making it impossible for malicious software to manipulate theacquisition software. By virtualizing physical memory on the fly, it is possible tocreate a memory image without the inconsistencies caused by concurrent system ac-tivity. Previous work includes the Hypersleuth framework (Martignoni et al., 2010)and Vis (Yu et al., 2012).The major limitation of memory acquisition software based on hardware virtualiza-tion is the requirement of loading the VMM first. If another VMM is already activethis approach cannot work, as there can only be one VMM active at the same time.This becomes a problem when malicious software makes use of a VMM to hide itsactivities from the operating system, as has been successfully demonstrated in thepast (King and Chen, 2006; Rutkowska, 2006).

Firmware Assisted Memory Acquisition Firmware can leverage SMM to per-form management tasks transparently to the operating system. This mode can onlybe entered through a System Management Interrupt (SMI), and its code and dataare protected by the memory controller in a special region of memory called SystemManagement RAM (SMRAM). Wang et al. (2011) proposed to leverage this modeas a trusted and protected execution environment for memory acquisition software.SMMDumper (Reina et al., 2012) is a proof-of-concept that implements this idea.It is delivered as a firmware upgrade and injects itself into SMM on system boot.It then modifies the configuration of the interrupt controller to redirect keyboardinterrupts to an SMI, where they are filtered for a specific command that initiatesmemory acquisition. When this command is received, SMMDumper directly ac-cesses the network card and sends the contents of physical memory to an analysissystem over the network.While this method is resilient to subversion from the operating system level, thereare a number of limitations that make it impractical for most cases at the moment.To install the program in SMM a firmware update is required, since SMRAM islocked before the firmware passes control to the operating system. This requiresvendor support because firmware updates are cryptographically signed, and alsoinvolves a reboot to install the firmware update. It is also a platform dependentsolution, because the software has to be adapted to work with the systems firmwareas well as ship with custom SMM drivers for the network card. The required amountof preparation and custom development for the specific target platform disqualifiesthis approach for most incident response scenarios.

Cold boot attacks Halderman et al. (2008) have shown that by using simple cool-ing techniques it is possible to preserve memory contents over significant periodsof time without power. Because memory cells are essentially capacitors, they don’t

5

1 Introduction

loose their state immediately when not powered. In fact, the stored charge slowlydrains over time and needs to be refreshed periodically. The time until a memorycell looses it’s charge can be dramatically extended by lowering its temperature. Bycooling memory modules to -50℃with a simple can of compressed air over 99.9%of the data can be recovered after a period of 60 seconds without power (Halder-man et al., 2008). This allows investigators to quickly transplant the memory of acomputer into another system, which then copies it’s contents to persistent storage.However, recent advances in DRAM technology have made this technique consid-erably more difficult. Intel DDR3 integrated memory controllers mangle each dataword with an undocumented scrambling system to reduce the effects of excessivedi/dt, caused by successive 1s and 0s, on the data bus (Intel Corporation, 2013).Recent patents by Intel suggest the use of a Linear Feedback Shift Register (LFSR)(Mozak, 2011), which is randomly seeded at system boot by the system firmware.Without knowing the LFSR polynom and seed for a specific system configurationit is impossible to recover the original contents of memory as seen by the system.At the time of writing there is no publicly known way of de-scrambling the con-tents of DDR3 modules and further research is indicated (Gruhn and Muller, 2013;Skochinsky, 2014).

Warm Reboot Attacks Depending on the installed firmware and system config-uration, memory contents are sometimes preserved over warm reboots (Chow et al.,2005). If an investigator has the ability to boot a system from a custom medium likean USB flash drive, he can boot a small acquisition OS to create a memory imageafter forcing a warm reboot (Vidas, 2010). Because firmware might clear memoryor at least overwrite some memory regions during boot, this method is not reliableand should only be used as a last resort. If it fails there is no other way to obtain amemory image as contents of memory have been irreversibly altered.

Direct Memory Access Attacks Any device with bus master capability is ableto initiate Direct Memory Access (DMA) transactions on the Peripheral ComponentInterconnect (PCI) bus without involving the CPU (PCI-SIG, 2002). This alleviatesthe CPU from handling data transfers from devices to memory and vice versa. Sincebus mastering basically allows any PCI device to read arbitrary memory regions, itis possible to acquire a memory image with a specially crafted PCI device. Therehave been multiple proof-of-concept implementations like the Tribble (Carrier andGrand, 2004), CoPilot (Petroni et al., 2004) and FRED (BBN Technologies, 2006)PCI cards. However, such cards need to be installed in the target system prior to anincident and are not generally available. There is only one commercially availableproduct (WindowsSCOPE, 2014), which is very expensive 1.The IEEE 1394 (Firewire), Thunderbolt and ExpressCard protocols all allow forDMA and thus can be utilized for memory acquisition (Hermann, 2014). There

1 At the time of writing, the Windowsscope CaptureGUARD PCIe card cost 7999$

6

1.3 Publications

has been significant work on reading and writing to RAM through Firewire (Becheret al., 2005; Boileau, 2006), and open source software for memory acquisition isreadily available (Witherden, 2010; Maartmann-Moe, 2013).

However, modern x86-64 systems implement an IOMMU that allows to remap andblock memory transactions to and from devices (Intel Corporation, 2014d). Thisallows software to configure the memory controller in a way that protects arbitrarymemory regions from rogue devices performing DMA (Rutkowska, 2007). Also,some operating systems disable DMA when the system is put to sleep to preventDMA attacks when the device is stolen. Mac OSX Lion and later implement thisprotection with FileVault 2 (Garrison, 2011).

1.3 Publications

Parts of this thesis are based on three peer-reviewed academic papers the authorhas presented at international conferences over the past three years. To improvereadability, we will not cite every section adapted from these papers again. Thissection serves as an overall reference that attributes each paper to the relevantchapters it was included in.

In our paper “Anti-Forensic Resilient Memory Acquisition” (Stuttgen and Cohen,2013), written together with Michael Cohen, we created a software based memoryacquisition technique that is resilient to current anti-forensic methods. The PCImemory enumeration technique was developed by Michael Cohen, while the author ofthis thesis created the anti-forensic survey as well as the memory mapping technique.The resulting research paper was mostly written by the author, with exception of theintroduction, conclusion and PCI enumeration sections. Parts of this paper are usedin Chapters 3, 4, and 5, but have been significantly expanded to more thoroughlycover all aspects of software memory acquisition.

Based on the techniques developed in our previous paper, we have developed amethod to inject memory acquisition kernel modules into arbitrary kernels on Linux.The research paper “Robust Linux memory acquisition with minimal target impact”(Stuttgen and Cohen, 2014) was created under the guidance of Michael Cohen. Allof the software development was accomplished by the author of this thesis, and, withthe exception of the introduction, the resulting research paper was also written bythe author of this thesis. The publication was honored with the best paper awardat the DFRWS EU conference in Amsterdam, 2014. It forms the foundation ofChapter 6.

Chapter 7 explores the memory acquisition capabilities developed in our previouswork to acquire and analyze malicious firmware on x86 systems. It is based on ourpublication “Acquisition and Analysis of Compromised Firmware Using MemoryForensics” (Stuttgen et al., 2015), which was created together with Stefan Voemeland Michael Denzel. The memory acquisition software, the BIOS and PCI option

7

1 Introduction

ROM experiments and the majority of the research paper were written by the authorof this thesis. Stefan Voemel composed the introduction and parts of the backgroundsection, while Michael Denzel created the ACPI Volatility plugins and performed theACPI evaluation under the guidance of the author.

1.4 Outline

This thesis is organized as follows: Chapter 2 provides the technical backgroundnecessary to understand our techniques. In Chapter 3, we present a survey on thepractical details of software memory acquisition. Chapter 4 focuses on illustrat-ing anti-forensic techniques against current software memory acquisition methods.Chapter 5 incorporates our insights into an anti-forensic resilient memory acquisitionapproach. In the next two chapters we explore the new capabilities of this approach.In Chapter 6, we create a Linux memory acquisition module that is compatible witha wide range of kernels without recompilation, by combining kernel module infectiontechniques normally utilized by rootkits with our new memory acquisition method.Chapter 7 focuses on using our technique to acquire malicious firmware. Finally, weconclude our work in Chapter 8, and present opportunities for future work.

8

Chapter 2

Technical Background

In this chapter, we will illustrate the basic concepts and techniques this thesis is builton. Our explanations provide the reader with the specialized technical knowledgenecessary to understand our work. This is not intended to be a complete descriptionof the x86-64 architecture, but a compact primer on the concepts utilized within thisthesis. An exhaustive explanation of every architectural detail can be found in thework of other authors (Intel Corporation, 2014b; Corbet et al., 2005; Salihun, 2006).

Outline of the Chapter

This chapter is organized as follows: Section 2.1 presents an overview of the archi-tecture of x86-64 systems, focusing on memory organization and management. Weexplain the different address spaces, as well as the components that route memorytransactions through the system, most notably the Peripheral Component Intercon-nect Express (PCIe) bus. Section 2.2 then introduces the structure and operation ofkernel modules for the Linux platform. Here we show how modules are organized,linked and loaded. Finally, Section 2.3 introduces the system firmware, its differentcomponents, and how they work.

This chapter can be read selectively depending on the readers experience and goal.Figure 2.1 depicts the requirements of each major area of the thesis. Section 2.1illustrates core concepts utilized everywhere in this thesis and therefore should al-ways be studied by the reader. In addition, Chapter 6 assumes an understanding ofLinux kernel modules, which requires reading Section 2.2. Chapter 7 focuses on thesystem firmware, which is explained in Section 2.3.

2.1

2.2 2.3543

6 7

Figure 2.1: Organization of the Background Chapter

9

2 Technical Background

CPU

North-Bridge

South-Bridge

ChipsetGraphics Card

Memory

PCI DevicePCI Device PCI Device

Graphics Bus Memory Bus

(DDR3)(PCIe)

PCI Bus PCI Bus

Figure 2.2: Architecture of a North- and South-Bridge Based Chipset

2.1 x86 Architecture

In this section, we illustrate the core components that implement memory man-agement on Intel x86-64 Central Processing Units (CPUs) with the Intel Core ar-chitecture. We focus on x86-64 systems in particular, as all of the work done inthis thesis is targeted towards this architecture. Some of the finer details like thespecific bus technology used or the exact location of some components differ on Ad-vanced Micro Devices (AMD) based systems. However, the data structures relevantto memory routing and mapping are standardized and apply equally to computerswith an AMD based CPU and chipset. The information we provide is prerequisiteto understanding the technical details of this thesis.

Most of this section is based on an article by Drepper (Drepper, 2007), as well as theIntel System Architecture Manual (Intel Corporation, 2014b) and Intel ArchitectureWhitepaper (Turley, 2014). Readers interested in the architectural differences ofAMD based systems are referred to the AMD64 Architecture Programmers Manual(Advanced Micro Devices, 2011).

Modern x86 computers consist of a multitude of interconnected components. Thereare CPU cores running the actual code, devices for input and output, and Ran-dom Access Memory (RAM). The system tying these modules together is called thechipset. For a long time, chipsets typically consisted of two components, the north-and the south-bridge. The north-bridge was responsible for high-speed devices likethe graphics card and RAM. The south-bridge connected lower-speed devices like

10


CPU Core CPU Core

CPU

PCIe

DMIiMC

Host BridgeGraphics CardMemory

DMI

LPC

PCIe

SATA

USB

SPI

PCH

BIOS ROM

PCIe Bus Devices

Graphics Bus

Memory Bus

(DDR3)

(PCIe)

Figure 2.3: Architecture of a modern PCH based chipset

network interfaces or controllers for persistent storage. The architecture of such asystem is illustrated in Figure 2.2.

In a classical north- and south-bridge architecture like the Intel 815, the entirechipset is located on the mainboard (Intel Corporation, 2000). The CPU is connectedto the north-bridge over the Front Side Bus (FSB). North- and south-bridge arealso tied together with a dedicated interface. On more modern Intel chipsets likethe Platform Controller Hub (PCH) this is done using the Direct Media Interface(DMI) (Intel Corporation, 2014c).

For performance reasons, Intel started to integrate northbridge functionality intothe CPU starting with the PCH architecture (Intel Corporation, 2009). In the PCHarchitecture, all north-bridge functionality is handled by a component in the CPUcalled the host bridge. The host-bridge consists of a PCIe controller, an IntegratedMemory Controller (iMC) and a DMI interface connecting it to the PCH.

11


The PCH also has PCIe functionality, to which all faster Input/Output (I/O) devicesare connected. Other protocols like Serial Advanced Technology Attachment (SATA)or Universal Serial Bus (USB) are used to communicate with peripheral devices likehard-disks or the keyboard. The controllers for these protocols are located in thePCH and connected to the PCIe bus. Slow legacy devices like the RS-232 serialinterface, PS/2 keyboards and mice are connected to the Low Pin Count (LPC) buson the PCH. Finally, the Basic Input Output System (BIOS)1 flash chip is attachedto the PCH through the Serial Peripherial Interface (SPI). A more detailed (butstill incomplete) depiction of a modern PCH based system is provided in Figure 2.3(Intel Corporation, 2014c).

2.1.1 The Physical Address Space

Among other interfaces, each CPU is connected to its hostbridge via an address bus.While its main purpose is addressing of RAM, it is also used for Memory MappedInput/Output (MMIO) (Intel Corporation, 2014b, Chapter 14). MMIO is a form ofI/O, where the registers and/or memory of devices are mapped into the CPUs viewof memory by the hostbridge. Thus, whenever the CPU attempts to read data froma physical address the result is not always a memory read. There are a large numberof devices that are mapped into the physical address space for performance reasons(interrupt controllers, graphics cards, firmware network cards, etc.). The physicaladdress space is the set of all valid addresses on the CPUs memory address bus.

Note that there is also an address space called the DMA or bus address space.This refers to the physical address space from the view of devices performing DMA.Because the hostbridge can remap addresses coming from devices, the view of adevice on the physical address space can be different than that of the CPU (Milleret al., 2015). However, this is not important for this thesis as we focus on softwarememory acquisition running on the CPU.

The Memory Bus When the CPU initiates a memory transaction it travelsthrough the hostbridge, which is responsible for routing the transaction to the appro-priate device (Salihun, 2014). The hostbridge decodes the target of the transactionand chooses to either forward it to the memory controller, if the target is memory,the integrated Graphical Processing Unit (GPU), if it targets the GPU, or to theDMI controller, if the target is unknown.

The DMI connection interfaces the CPU package with the PCH, to which all lowerspeed devices are connected. The PCH also has a memory target decoder logic whichis responsible for forwarding transactions from the DMI interface to the correctdevice and vice versa.

1 Since 2005, more and more vendors are replacing the BIOS with the Unified Extensible FirmwareInterface (UEFI) (Zimmer et al., 2010)

12


IntegratedGraphics

PCIe RootComplex

MemoryController

Memory Tar-get Decoder

DMI Controller

Hostbridge

CPU Package

Memory

Graphics Card

DMIController

MemoryTarget

Decoder

PCIe

LPC

SPI

USB

PCH

BIOS ROMEthernet

Wifi

Sound

PCIe Devices

Figure 2.4: Routing of a memory transaction through the hostbridge and PCH tothe sound card on a Haswell system (Salihun, 2014)

Figure 2.4 illustrates the routing of memory transactions through the chipset of aIntel 4th Generation Core Architecture (Haswell) system. The red line illustrates thepath a transaction takes when the CPU reads from an address that is mapped to abuffer in the sound card. The memory target decoding logic in the hostbridge looksup the range and finds it assigned to the PCH. The read is thus passed throughthe DMI interconnect to the PCH. It is then decoded and forwarded to the PCIcontroller. The PCI controller then initiates a PCI transaction for this address.Finally, the transaction is claimed by the sound card and the data is passed backupstream until it reaches the CPU.

System Address Ranges The physical address space is the set of all addressablememory addresses and contains all system address ranges. While the x86-64 addressspace defines physical addresses to be 64 bit long, the effective width of the address

13


Legacy

MainMemory

TSEG

PCIMemory

Flash,APIC LT

MainMemory

MainMemoryReclaim

PCIMemory

OS Visible<4GB

TSEG

GTT

GFX

OSInvisibleReclaim

OS Visible>4GB

ME-UMA

X

0

1MB

TSEGMBTOLUD

0xFEC000004GB

RECLAIM BASE

TOUUD (RECLAIM BASE + X)

512GB

0

TSEGMBTOLUD

4GB

MESEG BASETOM

Figure 2.5: Memory Map of a Haswell System (Intel Corporation, 2013)

bus, and thus the size of the address space, is implementation specific (Intel Cor-poration, 2014b, Section 3.3). All physical addresses that are routed to the samedevice are called a system address range. The exact number and location of all sys-tem address ranges depend on the chipset, firmware, and installed devices. Becauseof this, the physical address space layout is different between most systems. Theexact location and size of these segments is stored in registers inside the memorycontroller, as well as on the mapped devices themselves. A detailed explanation onhow device memory is accessed and configured is the focus of Section 2.1.3.

Figure 2.5 illustrates the memory map of a Haswell system with more than 4 GiBof RAM. The physical address space (as seen by the CPU) is shown on the left, theactual physical memory is on the right. The CPU has 39 address lines supporting aneffective physical address space of 512 GiB. This information is taken directly fromthe Haswell datasheet (Intel Corporation, 2013), where further details are availableif necessary.

The address space begins with the Disk Operating System (DOS) legacy range,which occupies the first 1 MiB. The first 640 KiB are always mapped to RAM, whilethe remainder is mapped according to the Programmable Attribute Map (PAM)registers, depending on how it is used by the system firmware (see Section 2.3).

14


Next to the legacy range lies a large block of RAM that is directly mapped. Its endis determined by the location of the Top of Main Memory Segment (TSEG) range.All physical memory above this address is inaccessible to the Operating System (OS)and needs to be remapped.The location of the TSEG range depends on the value of the TSEGMB register. It isused by the system firmware and is only accessible in SMM. The entire range is thusinaccessible from normal CPU operations.The Top of Lower Usable DRAM (TOLUD) denotes the border to the first MMIOregion. Everything from TOLUD to 4 GiB is reserved for MMIO by the hostbridgein the physical address space. It forwards all transactions from the CPU into thisregion to the DMI bus for processing by the PCH. Physical memory located in thisregion needs to be reclaimed somewhere else in the address space, as it is inaccessiblefrom this location. Depending on the graphics device used, the graphics card mightalso use some of the physical memory in this region to store the GFX TranslationTables (GTT). Also, if the CPU features an internal GFX card, it will use some ofthe memory in this region.RAM located above 4 GiB is directly mapped into the physical address space and isreferred to as another main memory address range. At the top of this range, RAMthat was shadowed by the first MMIO range (OS invisible, reclaim) is remapped.The Top of Upper Usable DRAM (TOUUD) register marks the end of this range.The remainder of the physical address space is again used for MMIO and decoded tothe DMI bus. Firmware will map device memory individually in a non-overlappingway somewhere into this range.Most CPUs also feature an embedded management processor, which is called Man-agement Engine (ME) on Intel systems. The ME can allocate part of physical mem-ory for its own use, which is depicted as the ME-Uniform Memory Access (UMA)region. This memory is not accessible by any other device other than the ME.

2.1.2 Memory Protection

For security and stability reasons, most modern computer architectures offer a fea-ture called virtual memory. In this concept each program runs inside its own “virtualaddress space”, isolated from all other programs. A special component in the CPUcalled the Memory Management Unit (MMU) is responsible of mapping all virtualaddress spaces into the physical address space. In this thesis we refer to this processas paging2.Paging is controlled by data structures managed by the OS, which are then parsedby the MMU. When paging is turned on the MMU will translate all addresses before

2 Every x86 CPU also supports a memory translation model called segmentation. However, seg-mentation is largely disabled when operating in 64 bit mode, which is why we ignore it in thisthesis (Intel Corporation, 2014b, Section 3.3.3)

15


they are put on the address bus. This mechanism further allows to map files directlyinto memory or page out parts of unused memory to disk, effectively using RAMas a cache for slower persistent storage. The MMU maintains a cache called theTranslation Lookaside Buffer (TLB) to avoid having to walk the page tables forevery memory access repeatedly.

Intel x86-64 processors support two different implementations of address translation,two-level paging for 32 bit code (IA-32) and four-level paging for 64 bit code (IA-32e) (Intel Corporation, 2014b). Since our focus in this thesis is on 64 bit operatingsystems only, we will limit our explanation to IA-32e paging.

The Virtual Address Space The set of virtual addresses the CPU can accessis called the virtual address space. On x86-64 systems all virtual addresses havea size of 64 bits, resulting in a virtual address space of 264 bytes (16 EiB) (IntelCorporation, 2014b, Section 3.3.7). However, implementations of the architecturecan choose to use a smaller size of effective virtual address to improve efficiency. Intelx86-64 implementations today use an effective address length of 48 bits, resulting ina virtual address space of 248 bytes (256 TiB). All unused bits in 64 bit addressesare sign extended, to equal the most significant bit of the effective address (IntelCorporation, 2014b, Section 3.3.7.1).

In Figure 2.6, the virtual address space of a typical x86-64 Linux process is laid out.The exact usage layout of the virtual address space depends on the implementation,the contents of user- and kernel-space in this figure are just an example. The pagingdatastructures are managed by the kernel, so the virtual address space can lookdifferent on systems with disparate operating systems.

Canonical addressing divides the address space into two halves, separated by thenon-canonical space. Memory accesses to non-canonical addresses generate a generalprotection fault (Intel Corporation, 2014b, Section 3.3.7.1), which is why the wholerange from 0x0000800000000000 to 0xffff7f0000000000 can be considered non-existing.

In higher half kernel architectures like Linux, this separation is used to divide OSand user code (Tanenbaum and Bos, 2014). The lower half is used by the processitself and is called userspace. It contains the programs code and data, as wellas mapped files and libraries. Userspace spans from 0x0000000000000000 to 0x00007fffffffffff, for a total size of 128 TiB.

Because changing the address space to perform work in the kernel flushes the TLBwhich causes a performance hit, operating systems like Microsoft Windows, AppleMac OS X (OS X) and Linux use the higher half of each process address space tomap a view of the kernel. Depending on the implementation there are one or morekernel stacks, heaps and the kernel code and data segments. The upper half is calledkernel space and also has a size of 128 TiB.

16


Program Code

Data

Heap

Mapped Files

Stack

User Space

128 TiB

Non-Canonical Addresses

Kernel Code

Kernel Data

...

Kernel Space

128 TiB

0x0000000000000000

0x00007fffffffffff

0xffff800000000000

0xffffffffffffffff

Figure 2.6: Virtual Address Space on an x86-64 System with 48 Address Bits

Code running in user-space does not have access to kernel-space memory. This isensured by the MMU by matching the privilege level of the running code with thelevel of the memory region. Privileges follow a ring model, where the operatingsystem runs in the innermost ring (0) with maximum privileges, while user-modecode runs in the outermost ring (3) with restricted access to memory. The detailsare out of the scope of this thesis though, interested readers are referred to the IntelSoftware Developers Manual (Intel Corporation, 2014b, Volume 1, Section 6.3.5).

Paging Datastructures IA-32e paging uses a 4-layer paging hierarchy to translatea virtual address into a physical address. This means there are 4 different kinds oftables, which are traversed by the MMU during page translation to find the physicaladdress for a given virtual address. The results of these lookups are cached in theTLB, so the lookup does not have to be repeated for subsequent accesses into thesame page.

Figure 2.7 depicts the data structures involved for translating a 4 KiB page. IA-32epaging also supports page sizes of 2 MiB and 1 GiB. For translation the virtual

17


PML4 Directory Ptr. Directory Page Table Offset

47 39 38 30 29 21 20 12 11 0

PML4E PDPTE PDE PTE

CR3

Physical Address

9 9 9 9

12

40 40 40

40

40

PML4 PDPT PD PT

Virtual AddressPhysical Memory

Physical Page

Figure 2.7: Datastructures Involved in Virtual to Physical Address Translation (IA-32e Paging) (Intel Corporation, 2014b, Chapter 4.5)

address is separated into table indexes and a page offset. The indexes are usedto find the relevant entry to this page in the respective data structure. After thephysical page has been found, the offset is added to find the actual physical addressof a specific byte.

The CR3 register always points to the physical address of the first level data structurecalled the Page Map Level 4 (PML4). This table contains 512 PML4 Entry (PML4E)of 64 bit size. It then uses the first 9 bits of the virtual address as an index into thePML4 to find the PML4E for this specific page.

The PML4E contains, among flags and other bookkeeping data, the physical addressof the Page Directory Pointer Table (PDPT) that manages the range of pages forthis virtual address. The MMU then looks up the corresponding glspdpte by usingthe next 9 bits from the virtual address as an index.

Starting with the PDPT Entry (PDPTE), the Page Size (PS) flag becomes impor-tant. If it is set to 1, the PDPTE contains the physical address of a 1 GiB pageencompassing this virtual address. If not, it references the Page Directory (PD)responsible for this address. The MMU then interprets the next 9 bits of the virtualaddress as an index to find the corresponding PD Entry (PDE) for this page.

The PDE is then checked for its PS flag. If it is set to 1, the PDE contains thephysical address of a 2 MiB page. Otherwise, it points to a Page Table (PT) for thisaddress range. Again, 9 bits from the virtual address act as an index into the PT,selecting the corresponding PT Entry (PTE).

Finally, the PTE contains the physical address of a 4 KiB page. The MMU thenuses the remaining 12 bits as an offset into the page to compute the final physicaladdress of the virtual address.

18


Please note that we have only skimmed the surface of x86-64 paging. Most of thedetails used for paging memory to disk or mapping files are not important for theunderstanding of this thesis, and have been implemented in different ways on indi-vidual operating systems. A more detailed explanation on paging, memory mappedfiles, shared memory and all other operating system specific memory managementinternals can be found in the work of other authors (Intel Corporation, 2014b; Russi-novich et al., 2009; Duarte, 2009; Gorman, 2004; Levin, 2012).

2.1.3 The PCI Express Bus

The PCIe bus is a high-performance, general purpose I/O interconnect used to linkmost devices in modern computers with the chipset. It replaces the older parallelPCI bus with a faster serial bus. Examples of PCIe devices in computer systemsinclude graphics and network cards, as well as SATA and USB controllers connectingperipheral devices. While the PCIe bus implements a new protocol and architecture,it is still backwards compatible to PCI and supports the legacy PCI configurationmechanism.

Because especially Chapters 5 and 7 make use of certain PCIe features, we aregoing to give a short introduction on the architecture, protocol and configuration ofPCIe. Our explanations are by no means complete, we deliberately skip most of thelow level details like error correction, flow control and the physical link. For moredetailed information we refer the reader to the official specification, on which thissection is based (PCI-SIG, 2010a).

PCIe Protocol Similar to common networking protocol stacks such as the Internetprotocol suite (TCP/IP), PCIe is a layered, packet switched protocol. Figure 2.8provides a brief overview on the different layers. Packets are formed in the trans-action layer, then passed down the stack until they are actually transmitted overthe wire. When they arrive at the other endpoint, the individual layers decode thepackets and extract the data.

On the physical layer, PCIe devices are connected through lanes, which are fullduplex serial connections. A lane consists of two differential signaling pairs, one toReceive (RX) and one to Transmit (TX). Lanes can be bundled to links of 1x, 2x,4x, 8x, 12x, 16x, and 32x lanes, where data is divided onto the lanes by bytewisestriping.

The data link layer handles link management and data integrity. For link manage-ment it can construct its own packets that don’t have a transaction packet embedded.For integrity it handles error detection by attaching and verifying error detectingcodes, as well as retransmitting erroneous packets.

The transaction layer creates packets that communicate events such as memoryreads, writes or signals. It also implements a credit based form of flow control. It

19


Transaction

Data Link

Physical

RX TX

Transaction

Data Link

Physical

RX TX

Figure 2.8: PCIe Protocol Layers (PCI-SIG, 2010a)

supports four different address spaces for a transaction: Memory, I/O, Configurationand Message. Memory transactions are used to transfer data using MMIO, whileI/O transactions use the CPUs I/O space. Configuration transactions are used toaccess a devices configuration space, which we will further explain in Section 2.1.3.Finally, message transactions are used for signaling between devices, for example totrigger interrupts.

PCIe Fabric Architecture PCIe is a point to point protocol. The set of alllinks between interconnected components is referred to as a fabric or hierarchy. Anillustration of a PCIe fabric is provided in Figure 2.9. The fabric is composed of aroot complex, multiple endpoints, a switch and a PCIe to PCI bridge.

The root complex is the root of the PCIe hierarchy and connects the CPU to thePCIe fabric. It can support one or more PCIe root ports, which each define aseparate hierarchy domain. Each domain in turn can be composed of one or moreendpoints, switches or bridges. For example, the root complex in Figure 2.9 connectsfour domains: GPU, PCI, Memory Controller and a switch with four endpoints.

Root ports do not have to be physically located in the root complex. For example,on a Haswell system the root complex is located in the CPU and provides ports tothe integrated GPU, the memory controller and the PCH. However, the port forthe PCH is linked through the chipset interconnect (Salihun, 2014) and physicallylocated on the PCH (Intel Corporation, 2014c). So even if the PCH appears to haveits own root complex, the PCHs root port is actually linked to the root complex inthe CPU via DMI.

PCI-to-PCI bridges are the “routers” of the PCIe fabric. They have a primary(ingress) and secondary (egress) port, and can forward transactions from one portto the other in both directions. Each bridge is configured with a specific memoryrange and will claim all transactions that fall into that range on its ingress port.

20


Root Complex iMC

PCIe toPCI Bridge

Switch

CPU

PCIe Endpoint(GPU)

PCIe Endpoint(Memory)

PCI

Legacy EndpointLegacy Endpoint PCIe Endpoint PCIe Endpoint

PCIe

PCIe

PCIe

PCIe

PCIe

PCIe

PCIe

PCIe

Figure 2.9: PCIe Architecture (PCI-SIG, 2010a)

Switches are logical assemblies of multiple virtual PCI-to-PCI bridges. To config-uration software they look like multiple bridges. They forward transactions usingmemory address based routing, just like PCI-to-PCI bridges.

Finally, there are bridges to other protocols such as PCI. A PCIe to PCI bridgemust comply with PCIe specifications on its PCIe port and connects a legacy PCIbus to the PCIe fabric.

PCIe Configuration PCIe configuration is supported via two different mecha-nisms: the legacy PCI compatible Configuration Access Mechanism (CAM) andPCIe Enhanced CAM (ECAM). CAM is binary compatible with the old PCI config-uration mechanism and is accessed through the CPUs I/O space, while ECAM is anextension to increase the size of the configuration space and only available throughMMIO.

The techniques we developed in this thesis only access information that is availablein both configuration spaces. For simplicity and backwards compatibility we decidedto use the CAM mechanism, which is why we will not explain the extended config-uration mechanism here. Readers interested in the details of ECAM configurationcan find more information in the PCI specification (PCI-SIG, 2010a).

21


012781011151623243031

EN Reserved Bus Number Device

NumberFunctionNumber Register 0 0

Figure 2.10: PCI Configuration Space Addressing (PCI-SIG, 2010a)

Configuration transactions follow a PCI compatible addressing scheme, by which anaddress consists of 3 parts: Bus, Device, Function (BDF), separated by colons anda dot. For example the address of the host bridge is usually 00:00.0, implying bus0, device 0 and function 0. The bus number refers to the PCI legacy bus topologyof parallel buses linked via PCI-to-PCI bridges. This terminology has been carriedover to PCIe for compatibility reasons, so buses correspond to links in the fabricmanaged by a specific bridge. The device number corresponds to exactly one deviceon a specific bus. A device is allowed to implement multiple independent servicescalled functions. Each function must provide its own configuration space, which canbe addressed with the function number.

CAM is accomplished through two Double Word (DWORD) sized registers in thesystems I/O space, CONFIG_ADDRESS (0xCF8) and CONFIG_DATA (0xCFC). Softwarecan access data in the configuration space by first writing a configuration addressto CONFIG_ADDRESS, and then reading or writing the selected DWORD throughCONFIG_DATA.

The format of the CONFIG_ADDRESS register follows the BDF notation of functionaddressing, as depicted in Figure 2.10. The first bit (EN) is an flag that enablestranslation of I/O read/writes to PCI configuration space transactions by the hostbridge. It must be set to 1 for all configuration space access. Bits 24-31 are reservedfor future use. The bus number is encoded as an 8 bit integer, allowing for 256different buses per PCI domain. The device number occupies 5 bits, for 32 devicesper bus. The next 3 bits are used for the function number, for a maximum of 8functions per device. Finally, there are 6 bits that select the appropriate DWORDinside of the configuration space. This leads to a total of 256 bytes of configurationdata. Because configuration space access must be DWORD aligned, the last 2 bitsare always set to zero.

Each PCIe function must implement a configuration space. While CAM configura-tion space has a size of 256 bytes and ECAM even fits 4096 bytes, the only strictlydefined part of configuration space is the configuration space header. It is located inthe first 64 bytes of configuration space. The layout of memory behind the headeris implementation specific and organized into a linked list of so called capabilities.The capability pointer in the configuration header points to the start of this list.

There are two different types of configuration space headers, type 0 for endpoints andtype 1 for the root complex, bridges and switches. Because we are only interested infunctionality to determine device enumeration and memory mapping, we will ignoremost of the details and focus only on the relevant fields in each header. For more

22


08162431

Device Identifier (ID) Vendor ID 00h

Status Command 04h

Class Code Revision ID 08h

BIST Header Type Latency Timer Cache Line Size 0Ch

Base Address Register 0 10h

...


Cardbus CIS Pointer 28h

...

Expansion Read Only Memory (ROM) Base Address 30h

Reserved Capabilities 34h

...

Deprecated Interrupt Pin Interrupt Line 3Ch

Figure 2.11: PCIe Type 0 Configuration Space Header (PCI-SIG, 2010a)

information on other parts of these headers and the capability list, we encourage thereader to consult the PCI specification (PCI-SIG, 2010a, Chapter 7.4).

Figure 2.11 shows a redacted version of a type 0 configuration header. The first16 bytes are identically laid out in both header types and used for general devicecontrol and bus enumeration. The vendor ID is a 16 bit integer and assigned by thePCI Special Interest Group, who also maintain a list of all vendors and their IDs(PCI-SIG, 2015). The device ID is assigned by each vendor individually to uniquelyidentify each device. The command register contains flags that control the behaviourof the device, for example if it responds to memory or I/O transactions, or if it isallowed to issue those transactions (bus mastering). Finally, the header type fielddefines the further layout of the configuration header. Its most significant bit alsospecifies if the device supports multiple functions.

Memory and I/O space mapping of device memory is performed using the BaseAddress Register (BAR). Type 0 configuration headers have 6 BAR located adjacentto the fixed part of the header. Each BAR defines the start of a memory range thatthe device maps into the physical memory address space. The exact layout of a BARis illustrated in Figure 2.12. The first 28 bits determine the address of the range.The prefetchable flag is hardwired by the device to show if reads from this memoryregion have side effects on the device. If it is set, this memory region is guaranteed

23


Prefetc

hable

TypeI/

O

0123431

Base Address 01

0010 0

Figure 2.12: PCI 32-Bit MMIO BAR Layout (PCI-SIG, 2002)

08162431

Device ID Vendor ID 00h

...



Sec. Lat. Timer Subordinate BusNumber

Secondary BusNumber

Primary BusNumber 18h

...

Memory Limit Memory Base 20h

Prefetchable Memory Limit Prefetchable Memory Base 24h

Prefetchable Base Upper 32 Bits 28h

Prefetchable Limit Upper 32 Bits 2Ch

...

Bridge Control Interrupt Pin Interrupt Line 3Ch

Figure 2.13: PCIe Type 1 Configuration Space Header (PCI-SIG, 2010a)

not to cause side effects on the device when read. The type field indicates if theBAR references a 32 bit (00) or 64 bit (10) region. For 64 bit regions the BAR isextended with the next BAR in the header, interpreted as the most significant partof the address. Finally, the least significant bit indicates if the BAR references aMMIO or an I/O space region. I/O BARs have a slightly different layout, but arenot important for this thesis.

Software can determine the size of BAR ranges by writing a sequence of all 1s tothe BAR. Devices must hardwire all address bits to zero in a way that performing abitwise not operation on the result and then adding 0x01 yields the size of the range.For example if software writes 0xFFFFFFFF to a BAR and then reads 0xFFFFFFE0, itperforms a bitwise not (0x00000001F) and adds 0x01 to obtain a size of 32 (0x20).Because the lower 4 bits are used as flags, the minimum size of a BAR region is 16bytes and those bits are set to 0 for this calculation.

24

2.2 Linux Kernel Modules

The type 1 configuration header starts to differ from type 0 on byte 16. For the sakeof brevity we are going to focus only on transaction routing and memory mappingrelated fields. For more details we encourage the reader to refer to the PCI-to-PCIbridge specification (PCI-SIG, 1998), which we use as our main reference for thissection. A redacted depiction of a type 1 header is provided in Figure 2.13. Thisheader type only has two BARs, which have the same meaning as in type 0 headers.

The bus numbers in the next DWORD describe the bus topology in PCI notation.They are ignored by PCIe, but still set to be compatible with legacy software. Theprimary bus number is the number of the PCI bus to which the primary bridgeinterface is connected. The secondary number thus denotes the number of the busconnected to the secondary interface, while the subordinate bus number denotes thehighest PCI bus that is behind the bridge.

Memory transaction routing is performed through the memory base and limit regis-ters. If the memory limit register is set a lower value than the memory base register,MMIO forwarding is disabled. In any other case, the bridge will forward all mem-ory transactions that fall into the range between the base and limit on its ingressinterface to the egress interface. The minimum size of MMIO ranges for bridges is1 MiB, thus the lower 20 bits are hardwired to zero in the base and all 1s in the limitregister. The prefetchable memory base and limit registers are optional and workthe same way, except that the memory ranges they describe have no side effects onreads.

In conclusion, MMIO transaction from the physical address space to PCIe devices arerouted through the PCIe fabric by PCI-to-PCI bridge compatible nodes dependingon their address. Bridges and endpoints are configured by the firmware on systemboot and can be relocated by the OS or drivers by programming the endpoint BARsand corresponding bridge memory registers. Software can enumerate this configu-ration by parsing PCI configuration space, which is also available on PCIe basedsystems.


To foster an understanding for our Linux kernel module injection techniques inChapter 6, we give a short overview on the anatomy of a kernel module and how itis linked and loaded.

The Linux kernel does not have a dedicated driver model like Windows or OS X.Instead, drivers are either compiled directly into the kernel or linked with the kernelbinary at runtime through a Loadable Kernel Modules (LKM) (Corbet et al., 2005).LKMs are stored in files with the extension .ko and loaded through the insmod andmodprobe programs by issuing the system call init_module.

25


Section Header Table

...

...Section n

...Section 1

Program Header Tableoptional

ELF Header

Section Header Tableoptional

...

Segment 2

Segment 1

Program Header Table

ELF Header

Linking View Execution View

Figure 2.14: ELF file layout (based on TIS Committee, 1995)

2.2.1 Module Binary Organization

The executable file format on Linux systems is ELF (TIS Committee, 1995). It is abinary format composed of a generic ELF header, a number of program- and sectionheaders and finally the actual sections/segments which contain program code anddata.

The ELF header stores information on the file class, programs architecture, endi-anness, entry point and other generic details. There are four classes of ELF files:executables, object files, shared objects and core dumps. Executables are ready to beloaded and run, while object files are intended to be further processed by a linker.Shared objects can be dynamically linked with other objects. Finally, core dumpsare created during program crashes to store debugging information (Levine, 1999).

The loader relies on the segments to identify the file layout and decide which partsto map into memory with which permissions. The program header table stores infor-mation on the segment types and locations. It is therefore required in an executablebut optional in an object file. The linker instead relies on sections to operate onthe file. The section header table stores information on section location, type andsize. The section headers are thus mandatory in an object file, but optional in anexecutable. Because of this dualism, there are actually two disparate views of thesame ELF file which are illustrated in Figure 2.14. Depending on which headersare consulted, the internal structure of the file is organized differently, resulting ina linking- and an execution view. Which one is relevant depends on the intendedpurpose.

26


...

.got

.got.plt

.text

...

...

.rela.text

.text

...

...

.text

...

...

.text

...

EXEC

DYN REL

REL

Global Offset Table

GOT Procedure Link Table

Relocation Table

Figure 2.15: Static vs. Dynamic Linking

Linux kernel modules are relocatable ELF object files and not an executable. Theobvious difference is that executable ELF files are processed by a loader, whilerelocatable objects are intended for a linker. Dependencies on other objects in anELF executable are resolved by dynamic linking. In this process, external symbolsare referenced through the Global Offset Table (.got) and Procedure Linkage Table(.got.plt), and resolved by the dynamic linker at runtime.In contrast to this, relocatable ELF objects are statically linked using relocations.Each section with references to symbols in other sections or objects has a corre-sponding relocation table. Entries in these tables contain information on the specificsymbol referenced, and how to patch a specific code or data reference with the finaladdress of the symbol after it has been relocated. One or more of these relocatableobjects can be linked together by placing them into their final position in the finalexecutable or address space, after which the linker applies all relocations to patchthe now final references directly into the code.

2.2.2 Linking and Loading

The actual loading process of a kernel module is started with a system call fromuser mode, and then handled by the kernel directly. We give a brief overview on themost important steps, as illustrated in Figure 2.16:

1. A user-mode process (usually insmod) loads the kernel module image into mem-ory and issues an init module system call.

2. The system call causes the kernel to dynamically allocate memory for the moduleand copy it into kernel space.

3. After the kernel has checked that the module is a valid ELF file it starts toanalyze the .modinfo section of the module. This section contains information

27


.text

.rela.text

.init.text

versions

.modinfo

insmod

do init module

apply relocations

check module license and versions

check modinfo

copy module from user

sys init moduleinit module1

2

3

4

5

6

module.ko Kernel

–

Figure 2.16: Loading of a Kernel Module

on the exact version of kernel headers the module was compiled with. The kernelwill refuse to load any modules that contain incompatible version magic.

4. If CONFIG_MODVERSIONS is enabled, the kernel will also check the version magicfor every individual symbol the module imports. During compilation a list withthis symbol magic is placed in the __versions section of the module. The kernelwill also refuse to load a module with incompatible symbol version magic.

5. After the version check, the kernel invokes its internal linker to resolve all re-locations in the module. This will replace any inter-section or external symbolreferences in the module with the actual addresses of these symbols in the run-ning kernel, assimilating the module into the kernel image.

6. Finally, the kernel will link the module structure provided by the module into themodule list and call the function pointer stored in module.init, which passesexecution to the modules init module function.

In the context of the Linux kernel this means that loading a kernel module is actuallythe same thing as linking an executable. The kernel module is linked into the kernelexecutable, by the kernel itself, at runtime.

28

2.3 System Firmware

2.3 System Firmware

Because Chapter 7 focuses on the acquisition and analysis of firmware, we give ashort overview on the different firmware components used on x86 systems. Thissection is based on work by Salihun (2006), as well as the PCI (PCI-SIG, 2002) andAdvanced Configuration and Power Interface (ACPI) (Intel Corporation, 2014a)specifications.

The system firmware, i.e., the BIOS or the UEFI on more modern systems, is thefirst program that runs on the CPU when a computer is turned on. The respectivecode is saved in a non-volatile storage area, usually an EEPROM, on the mainboardof the machine. The chipset is initially configured to map the contents of this ROMinto the physical address space from 0xF0000 to 0xFFFFF. The ROM is also aliasedin a way that at least 16 bytes are mapped to the physical address 0xFFFFFFF0, theCPUs reset vector (Intel Corporation, 2014b, Chapter 9.1.4).

As soon as the power supply is stable and all clocks are synchronized, the reset line inthe CPU is deasserted, and execution resumes at the processors reset vector. At thistime the CPU is in real mode, so even with segmentation it is not able to reach thisaddress. This problem is remedied by statically initializing the base address of theCS register to 0xFFFF0000 on reset. The CPU will thus fetch the first instructionsfrom 0xFFFFFFF0 (not 0xFFF0), until the Code Segment (CS) base is reset by a jmpor call instruction. Firmware code at this address then performs a far jump intothe code residing in the mapped firmware ROM (Intel Corporation, 2014b, Chapter9.1.4).

The firmware code initially runs on the ROM chip. More precisely, only a smallstub is directly executed at the beginning. The remaining instructions are typicallycompressed, because the firmware ROM is orders of magnitude slower than thesystem RAM. The stub is responsible for setting up the memory controller as wellas the individual DRAM modules. The firmware then moves from ROM into RAMby manipulating the PAM registers, to shadow the ROM-mapped regions with RAMand uncompressing its code image into memory.

As one of the last steps, the system firmware creates and maintains a runtime en-vironment with basic I/O services. It starts initializing devices on the PCI bus andmaps their registers and memory into the physical address space as required. It isduring this phase that the final layout of the physical address space is determined.

While this procedure is similar in BIOS as well as UEFI based systems, the runtimeenvironment after this step is fundamentally different.

2.3.1 Basic Input Output System

The BIOS runtime environment operates in 16-bit real mode. It is responsiblefor creating an Interrupt Vector Table (IVT) in order to support a set of simple

29


operations, e.g. sending output to the screen or reading data from a hard disk. Thelatter functionality is required to drive the bootstrapping process and load the bootmanager as well as the operating system later on. Precisely, the BIOS reads the codeof the boot manager from the Master Boot Record (MBR) of the first harddisk intomemory, and directly transfers execution to it. The boot manager, in turn, loads theoperating system and further prepares the system environment. For these tasks, theprimary BIOS services are used. In the last step, the operating system takes overinterrupt handling by setting up an appropriate Interrupt Descriptor Table (IDT).By switching to the new IDT, direct access to the BIOS services is lost.

2.3.2 (Unified) Extensible Firmware Interface

Contrary to BIOS-based firmware, UEFI operates in 32-bit protected mode. Theboot process comprises a distinct Security (SEC) phase in which the integrity of thefirmware is explicitly checked, and secure booting is facilitated. In a second, so-calledPre-EFI Initialization (PEI) phase, similar tasks as during early BIOS initializationare performed. However, at the end of this phase, EFI provides a structured DriverExecution Environment (DXE) for drivers and services. These are not loaded fromthe MBR but from the file system of a designated EFI System Partition. Thelocation of the respective images are specified in Non-Volatile RAM (NVRAM) onthe mainboard. Drivers, bootloaders, and the OS can interact with the firmwarethrough specific protocols. After the operating system has started, it still has accessto some firmware interfaces through the so-called UEFI runtime services.

2.3.3 PCI Option ROMs

Because the system firmware has no internal knowledge on the functionality of at-tached devices, code specifically required for device initialization is provided by thedevices themselves. For PCI and PCIe devices this code is located on a ROM chip onthe device. For a more detailed explanation of device initialization code the readeris referred to the official specification (PCI-SIG, 2010b, Section 6.3), on which thissection is based.During Power On Self Test (POST) the firmware maps these ROMs into the physicaladdress space by configuring the Expansion ROM Base Address Register (XROM-BAR) in the devices configuration space (see Section 2.1.3).Expansion ROMs can contain multiple code images, one for each supported archi-tecture. Each image is aligned to 512 bytes and starts with a header, describingits contents. The location of subsequent images depends on the size of the previ-ous image. Firmware parses each header and selects the image appropriate for itsarchitecture.The header consists of a 2 byte signature, 16 bytes of architecture dependent dataand a 2 byte offset to the so called PCI Data Structure. The PCI data structure

30

2.3 System Firmware

contains information on the architecture for which the image was built, the size ofthe image and the device this image was designed for. For x86 compatible imagesthe header additionally has fields which store the offset of the INIT function, as wellas the amount of memory required for initialization.

POST code on x86 systems must copy the appropriate image into a writable regionof memory and pass control to the INIT function. Option ROM code will then runand initialize the device. Memory allocated to option ROM code must be writeableto allow initialization code to unpack and modify its own image in memory.

On x86 systems the POST Memory Manager (PMM) is responsible for allocatingmemory to device initialization code. It uses the memory area spanning from 0xC0000 to 0xEFFFF for storage of code and data. For legacy reasons the first 64 KBare reserved for video cards.

Note that version 3.0 of the PCI firmware specification has changed the way memoryis managed during POST. The PMM is now allowed to allocate memory above 1 MBeven while the OS is running (PCI-SIG, 2010b). The location of regions outside ofthe previously described area are implementation specific and not standardized.

2.3.4 Advanced Configuration and Power Interface

In this section, we give a brief overview on the ACPI. Because Chapter 7 deals withacquisition and analysis of ACPI malware from memory, we will focus purely on thecode execution mechanics of ACPI. For further information we refer the reader tothe official ACPI specification (ACPI Promoters Corporation, 2013), on which thissection is based.

ACPI defines a platform independent interface between the OS and the hardware.Figure 2.17 illustrates the interaction between ACPI, the OS and the hardware. Theinterface consists of three components: Registers, System Description Tables, andFirmware.

ACPI System Description Tables characterize the hardware and what needs to bedone to make it function. They are supplied to the OS by the ACPI firmware. Ad-ditionally, ACPI tables contain Definition Blocks with code to control the hardware.This code is supplied in the ACPI Machine Language (AML), which is an abstractlanguage that is executed by an AML interpreter in the OS.

ACPI registers are actually part of the platform hardware. They refer to the partof the hardware that is constrained by the ACPI specification. The ACPI firmwarein turn amounts to the part of the platform firmware that implements the ACPIinterface. It consists of routines that manage power and system sleep states, and isresponsible for supplying the ACPI tables to the OS.

When the system boots, ACPI firmware copies the ACPI tables into an arbitrarymemory region. To enable the OS to find them, it must place a structure called the

31


Platform Hardware Platform Firmware

ACPIFirmware

ACPIRegisters

ACPITables ACPI

ACPIDriverAML

Interpreter

DeviceDrivers

Kernel Operating System

Figure 2.17: ACPI Architecture (based on ACPI Promoters Corporation, 2013)

Root System Description Pointer (RSDP) into the first 1 KB of the Extended BIOSData Area (EBDA) or the firmware ROM image between 0xE0000 and 0xFFFFF3 ona 16 byte boundary. This structure contains a signature ("RSD PTR "), a checksum,and a pointer to the Extended Root System Description Table (XSDT)4, which inturn points to all other ACPI tables. The OS scans the specified memory regions forthe signature, validates the checksum, and, if successful, follows the pointer insideto locate the XSDT.

The XSDT is the root directory from which all other tables are discovered. The OSthen loads tables such as the Fixed ACPI Description Table (FADT), which in turnleads to the Differentiated System Description Table (DSDT). The DSDT containscode and data in AML format, which is executed in the AML interpreter of the OSupon initialization. There are 15 other ACPI tables, some of which are optionaland don’t have to be present on every ACPI implementation. A detailed list of allavailable tables and their function can be found in the ACPI specification (ACPIPromoters Corporation, 2013).

The OS uses these tables to interact with the hardware without a need for anyplatform specific knowledge. All hardware details are embedded in the AML code,so the OS just needs to interact with the description blocks to enumerate hardwareand interact with it.

3 On UEFI systems the RSDP is provided to the bootloader in the UEFI System Table, so thereis no need to scan for it.

4 On 32 bit systems the legacy Root System Description Table (RSDT) is used instead of theXSDT.

32

2.4 Summary

2.4 Summary

In this chapter, we have introduced the concepts and technologies that drive thememory architecture on modern x86 systems. We have introduced the physical ad-dress space and explained how the PCIe protocol ties devices and memory together.Furthermore, we have presented the virtual address space and the mechanisms anddata structures responsible for address translation. These concepts form the founda-tion on which software memory acquisition is built, and are necessary to understandmemory enumeration and mapping techniques.

In addition, we have given a short overview on the architecture of Linux kernelmodules. This is important to understand the relinking techniques we introduce inChapter 6 to load our acquisition module into arbitrary kernels.

We have also laid out the components of the system firmware that run under the hoodof typical x86 computers. This forms the base for Chapter 7, where we focus on theacquisition of firmware code and data in the course of forensic memory acquisition.

33

Chapter 3

Memory Acquisition

Memory acquisition is the process of obtaining a copy of the physical memory ofa system for analysis. It is the first step of a memory forensics investigation, inwhich insights into a computer system are gained by analyzing its physical memory.Memory forensics is very useful for the discovery of rootkits, which manipulate theOS into hiding their presence. Because a rootkit’s code, data, processes and threadsneed to exist somewhere in memory in order to run, it is impossible for a rootkitto remove all its traces from physical memory. This problem is referred to as therootkit paradox, which memory analysis techniques can exploit to detect and analyzerootkits (Kornblum, 2006).

In this chapter, we give an overview of the field of memory acquisition. It servesas background and motivation for the main contributions of this thesis. We firstexplore the theoretical foundations of the field and then present a study on thetechnical principles that facilitate memory acquisition in current tools. We focusespecially on the correctness of memory images, which we introduce as the mainmetric to measure the quality of memory acquisition techniques in regard to therootkit threat.


This chapter is outlined as follows: First, we give a definition of the memory acqui-sition process and depict the characteristics of the resulting memory image in moredetail in Section 3.1. We introduce criteria that define forensically sound memory ac-quisition and present an evaluation that examines popular memory acquisition toolswith regard to these criteria. In Section 3.2 we examine current software memoryacquisition mechanisms and tools. We describe operating system memory interfacesfor Windows, Linux and OS X and provide an overview on the technical details of aselection of third party tools. Finally, we develop the memory acquisition programOSXPmem, an open source memory acquisition tool we developed for the OS Xplatform. Our insights further the understanding of the OS internals involved inacquiring physical memory, which serves as a foundation for understanding theirattack surface.

35

3 Memory Acquisition

3.1 Principles of Memory Acquisition

Modern memory analysis frameworks like Rekall (Cohen, 2014b) and Volatility (Wal-ters, 2014) require a copy of physical memory from the system under investigation(Ligh et al., 2014). This copy is referred to as a memory image and its creation asmemory acquisition. Along with the work of Schatz (2007a) and Vomel and Freiling(2012) we define a memory image as follows:

Definition 1. A memory image is an exact copy of all physical memory ranges ofa computer system at a specific point in time.

This implies three important requirements. The copy must be exact, meaning theremust not be any errors in the image. It must be complete, which means all physicalmemory ranges must be copied. And finally, the copy must be created at a specificpoint in time, which implies the image must be taken at once, not over a long periodof time.

3.1.1 Criteria for Sound Memory Acquisition

Memory analysis techniques can only be reliable if the memory image they operateon corresponds exactly to Definition 1. If the image is not an exact copy of memoryat the time of acquisition, this can result in incorrect analysis results.

In addition to problems resulting from errors in the acquisition program, softwarememory acquisition is prone to concurrency issues. Because the acquisition processis not an atomic operation and takes time to complete, the acquisition time is a timespan rather than a discrete value. Software running in parallel to the acquisitionprogram can write to parts of the systems memory while it is copied into the image,leading to inconsistencies in the image called memory smear (Richard and Case,2014).

To be able to measure and evaluate the quality of memory acquisition procedures andthus the resulting memory image, several authors have proposed criteria for memoryacquisition quality (Afek et al., 1993; Schatz, 2007b; Inoue et al., 2011; Vomel andFreiling, 2012). We use the model proposed by Vomel and Freiling (2012) to classifyour work, because it combines all relevant aspects into three independent criteria:correctness, atomicity, and integrity. In the following section we will give a shortoverview of these criteria, to assess the methods developed in this thesis.

Correctness Regarding a set of memory regions, a memory image is correct if forall these regions the data captured in the memory image matches the contents ofthis region at the point in time it is duplicated (Vomel and Freiling, 2012).

While this seems trivial, there are at least three things that can go wrong duringacquisition, resulting in an incorrect image:

36


1. Errors in the acquisition software can result in memory regions being copied towrong parts of the image. For example, a popular open source software hada bug that resulted in memory regions being stored at the wrong offset in araw image (Suiche, 2009a). As a result, virtual to physical address translationbecame impossible on the image, because physical pointers were no longer valid.

2. A broken memory enumeration procedure can result in some pages not beingwritten to the memory image at all. Such an image is incorrect as it does notinclude the content of some pages at all. For example in a comparison of mdd(ManTech CSI, Inc., 2009) and win32dd (Suiche, 2009b), mdd produced an imagethat was 118 pages smaller than the win32dd image (Inoue et al., 2011). Thisimplies that at least on of the programs did not enumerate memory correctly,resulting in an incorrect image.

3. Malicious software can subvert the acquisition process and manipulate the result-ing image. For example Milkovic (2012) developed a proof of concept softwarecapable of hiding malware artefacts from memory images. By manipulating thebuffers of the image file as it is being written, the malware causes the acquisitionsoftware to write an incorrect image.

Atomicity As we will see in Section 3.2, software memory acquisition is not anatomic operation. Some regions of memory can be overwritten by concurrent systemactivity before they can be copied to the image. The atomicity of a memory image isquantified by the amount of memory that is changed by concurrent system activityduring the duration of the imaging operation.

An atomicity violation happens when the contents of a memory region are modi-fied by concurrent system activity before it can be acquired, but the cause of themodification is not present in the image (Vomel and Freiling, 2012). State of theart memory analysis tools are unable to identify atomicity violations in a memoryimage. They have to assume to be operating on an atomic image to be able tofunction. However, under the right conditions, this can lead to problems and evenincorrect results.

Vomel and Freiling (2012) illustrate concurrent activity using space-time diagrams,which visualize dependencies of memory operations over time. Each memory regionconsists of a horizontal line, dots on that line represent an operation (read or write)on that region.

A simplified example is given in Figure 3.1. The illustration consists of processesp ∈ P operating on memory regions r ∈ R at times t ∈ T . A process p1 allocatesa region of memory r3 at time t0 and stores some data in it at time t1. The pagetable entry for this allocation is stored in r1. Simultaneously, memory acquisitionsoftware p2 starts and copies the page tables into the image. Before it can also copythe contents of the mapped page, p1 releases this region of memory. Shortly afterthat (at time t3), another process p3 allocates the same memory region with its page

37


r1

r2

r3

timet0 t1 t2 t3 t4 t5

p1 p2 p3

Figure 3.1: Space-Time Diagram of an Atomicity Violation

tables stored in r2. It copies incriminating data into r3 at time t4. Finally, p2 copiesthe data from the page into the image at t5.Memory analysis software operating on the image will attribute the data in r3 top1, because the page tables show it mapped into this process. The image is correct,after all the page tables did contain this information at the time they were copied.The analysis is also performed correctly, the software did interpret the page tablesin the appropriate way.However, the result of the analysis is wrong, because the image is not atomic. Theevents leading to a change in ownership of r3 at t2 and t3 are not visible in theimage. However, the incriminating data copied into r3 at t4 is present because theacquisition process is interleaved with concurrent system activity by p1 and p3. Thisresults in incorrect analysis results in spite of a correct image, caused by a lack ofatomicity.

Integrity Similarly to atomicity, the level of integrity of an image also depends onthe amount of memory that changes during the acquisition. However, it does notdepend on causality. Instead, the integrity of an image is measured relatively to apoint in time. Intuitively, this is the time where the acquisition started, but thedefinition allows for any arbitrary point in time if necessary (Vomel and Freiling,2012).A simple example for integrity violations is the change the imaging software itselfcauses during acquisition. The program needs to be copied into memory, librarieshave to be loaded and copy operations fill buffers. All these operations cause memoryto be overwritten after the imaging process has started. Thus, the integrity of theimage is violated in regard to the starting time of the acquisition process.But not only the memory imaging program can affect the integrity of a memory im-age. Figure 3.2 illustrates system activity during a memory acquisition procedure.

38


r1

r2

r3

time

p1 p2 p3

t1 t2

Figure 3.2: Space-Time Diagram of Integrity Violations

Processes p1 and p2 run in parallel with the acquisition software p3. The acquisi-tion software is started at time t1. There are no atomicity violations, because thecausality of all memory writes is preserved. However, the integrity of the image isviolated with respect to t1, because the value contained in r3 at that point in timeis not the value that is copied to the image. The process p2 overwrites this valuebefore the imager copies the memory region, causing the integrity violation. Anypotential evidence stored in r3 at t1 is lost.

Note that the integrity of the image is intact in regard to a second point in time t2.All values that existed in r1 − r3 at t2 are copied to the image intact. The writesthat happen after t2 are not important, because the respective regions have alreadybeen captured.

3.1.2 Correctness of Existing Memory Acquisition Tools

In a previous publication (Vomel and Stuttgen, 2013), which is not part of thisthesis, we built an evaluation platform for memory acquisition software based onthe criteria introduced in Section 3.1.1. The platform consists of a modified versionof the Bochs x86 emulator (The Bochs Project, 2013), that has been extended withan instrumentation module. The module is capable of monitoring and logging allmemory operations in the emulated environment and even identify the responsiblethread. This information can later be used to track causality and identify potentialatomicity violations.

We have evaluated three memory acquisition tools for windows: mdd (ManTech CSI,Inc., 2009), WinPMEM (Cohen, 2012) and Win32dd (Suiche, 2009b). We chose thesetools because the source was freely available, which simplified the integration of theplatforms hypercall mechanism.

39


In our tests we immediately discovered errors in mdd (ManTech CSI, Inc., 2009) andWin32dd (Suiche, 2009b) that caused both programs to create an incorrect image.To avoid system instability both programs skip MMIO regions in the physical addressspace (see Section 3.2.3). However, when writing subsequent memory regions to theimage they did not pad the gaps with zeroes, instead directly writing the adjacentmemory region next to the previous one.

This is an interesting bug, because the image is complete so it looks correct atfirst glance. The difference becomes apparent if we go back to Figure 2.5 and havea look at the physical address space. If we just remove all regions mapped todevices and shift the memory regions to be adjacent to each other, the address of allmemory regions except the first one changes. This invalidates any physical memoryreference that points into one of the upper regions. As shown in Section 2.1.2,all virtual address translation datastructures rely on physical address references.Because memory analysis software relies on physical to virtual address translationto bridge the semantic gap in a memory image, any analysis technique more advancedthan string extraction becomes impossible.

Other results of our evaluation indicate that software memory acquisition does havesevere problems regarding atomicity and integrity. However, for the purpose of thisthesis we will concentrate on correctness, as this is the critical criterion in regard torootkit manipulation. While rootkits cannot remove themselves from memory, theycan subvert the memory acquisition process to remove themselves from the memoryimage. Memory analysis techniques aimed at rootkit discovery and analysis thusrely on an absolute correct image to function.

3.1.3 Memory Image Formats

Memory images can be stored in a variety of formats, the most notable differencebeing sparseness and inclusion of metadata. The simplest format for a memoryimage is a raw image. It is a binary file that represents an exact copy of the physicaladdress space of the system it was acquired from. Inaccessible regions are zeropadded and the resulting file has the same size as the physical address space. Thisis done to preserve the physical address of data in the image. Each memory regionis located at the same offset in the image file as it was stored in the physical addressspace. Physical memory references can thus be resolved by interpreting them as afile offset. Note that because of the padded MMIO regions this file can be muchlarger than the amount of memory installed in the system. Because of its simplicity,the format is supported by most major memory analysis frameworks.

More sophisticated formats adopt a sparse approach. By providing metadata on thephysical address of memory regions and their respective file offsets in a header ofthe file, they don’t have to carry padding for inaccessible memory regions. Becauseof this sparseness, they are much smaller than a raw image and roughly amountto the size of physical memory. Examples include the LiME format (Sylve, 2012),

40

3.2 Software Memory Acquisition Techniques

Executable and Linkable Format (ELF) and the Mach-O core files, and the Microsoftcrashdump format.

ELF and Mach-O core files are binary file formats to store dumps of virtual memoryand debugging information, which makes them ideally suited to also store physicalmemory images. However, at the time of writing there is no general method ofstoring metadata together with the memory image, with some tools inventing theirown formats. For example, WinPMEM (Cohen, 2012) is able to store additionalmetadata like the page file or the address of the kernels page tables in a YAMLAin’t Markup Language (YAML) footer at the end of any container file. However,no standards exist in this regard and the YAML data generated by WinPMEM iscurrently usable with the Rekall (Cohen, 2014b) memory forensic framework only.


To identify the different points in the memory acquisition process where software canbe intercepted by malware, we study the inner workings of open and closed sourcememory acquisition software on the Windows, Linux and Mac OS X platform. Todiscover the most common ways of accessing memory on these platforms, we studiedthe implementation of all open source memory acquisition tools that were availableat the time of writing and capable of acquiring memory from a 64 bit version versionof the respective OS:

• WinPMEM (Cohen, 2012)

• Win32dd (Suiche, 2009b)

• mdd (ManTech CSI, Inc., 2009)

• fmem (Kollar, 2010)

• pmem (Cohen, 2011)

• LiME (Sylve, 2012)

To get a complete picture, we also reverse engineered the most popular closed sourcememory acquisition applications that were freely available:

• FTK Imager (AccessData, 2012)

• DumpIt (Suiche, 2011)

• Memoryze (Mandiant, 2011)

• MacMemoryze (Mandiant, 2012)

• WindowsMemoryReader (ATC-NY,2012b)

• MacMemoryReader (ATC-NY, 2012a)

3.2.1 Memory Acquisition Challenges

To create a correct image of the entire physical memory on a system, software mustsolve two main challenges. As illustrated in Section 2.1.2, all software started bythe user runs in an isolated virtual address space with no control over physicalmemory. A memory acquisition program running in user space can only access it’s

41


own memory, and thus never create a complete memory image as defined in Section3.1. This limitation can only be bypassed by code running in system mode, whichmeans the acquisition program needs help from the OS or a driver to map all physicalmemory into its virtual address space.

But even with direct access to the physical address space, software must enumeratethe address space layout and determine the location of the physical memory regions.As shown in Section 2.1.1, this layout varies on most machines, and memory regionsare interleaved with MMIO regions mapped to device memory. Reading from aMMIO region can cause an interrupt on the device, leading to data corruption andsystem crashes.

This leads us to two main challenges software must solve to successfully acquirememory:

1. Software must reliably map all physical memory regions into its virtual addressspace.

2. Software must enumerate the entire physical address space and identify all phys-ical memory and MMIO regions.

These challenges equally apply to all x86 systems running in protected- or long-moderegardless of the OS.

3.2.2 Operating System Memory Interfaces

Most major operating systems have dedicated interfaces to physical memory in-tended for debugging or legacy software. Some of these interfaces can also be usedto create a memory image for forensic purposes. In the following, we will give anoverview of the different interfaces available on Windows, Linux and OS X.

Microsoft Windows The Microsoft Windows family of operating systems providesthe section object \\.\Device\PhysicalMemory to software that needs to accessphysical memory. This object presents a section object interface to the physicaladdress space. Since Windows Server 2003 Service Pack 1, this object can only beaccessed from kernel space, so a driver is needed to use it (Microsoft Corporation,2013). Memory acquisition software can simply map regions of physical memoryinto its own address space from this file1 and write them to disk or the network.

In addition, Windows has the ability to write memory dumps on system crashes.The Windows kernel adopts a fail fast policy, meaning it reacts to inconsistencies byshutting down with what is commonly known as a blue screen (Russinovich et al.,

1 Section objects can be mapped using the ZwMapViewOfSection() Application Programming In-terface (API).

42


2009). When the kernel detects a problem, it calls the KeBugCheckEx function. Thisfunction first disables all interrupts on all CPUs, writes a memory dump to theregion on disk occupied by the page file, and then halts the system while displayingan error message on a blue background.Because system activity is halted and the memory image is written by the OS itself,the atomicity of the image is better than with other software approaches (Vomeland Freiling, 2011). However, the memory image is incomplete by default and onlycontains kernel memory (Russinovich et al., 2009). While it is possible to configurethe system to include all physical memory into the image, this requires reconfigura-tion and a reboot. Software can register hooks to the crashdump function that getcalled before the memory image is written (Russinovich et al., 2009), which makesit trivial for malware to hide from this mechanism. Also note that the crashdumpfunction effectively brings down the system, which can result in loss of data andforces a reboot. Because of its limitations and the required preparation to achieve acomplete memory dump on demand, this method is not well suited for most forensicscenarios.Finally, it is possible to exploit the hybernation mode on Windows to acquire mem-ory. When Windows goes into suspend-to-disk state, it saves the state of the pro-cessor and memory to disk into the so called hybernation file. While there has beenwork to analyze and utilize these files in course of memory forensic investigations,they are not guaranteed to be complete and the format varies between differentWindows versions (Ruff and Suiche, 2007).

Linux Linux has a special character device that is usually mounted to /dev/mem.Similarly to the Windows physical memory section object, it provides a file-like viewof the physical address space. It is used for legacy software, e.g. the X-Server usesit on systems where the graphics driver does not offer direct access to the videocards framebuffer and configuration registers (Lineberry, 2009). Because it can beabused to escalate privileges and install rootkits, kernel 2.6.26 introduced a newconfig option CONFIG_STRICT_DEVMEM (van de Ven, 2008). This option restricts /dev/mem access to the first megabyte of physical memory, which makes it unsuitablefor memory acquisition (Lineberry, 2009).Linux kernels that have the CONFIG_PROC_KCORE config option enabled have the/proc/kcore file used for kernel debugging. This file exports the kernels virtualaddress space as an ELF core dump. Because the Linux kernel maps all physicalmemory into its virtual address space on x86-64 systems (Kleen, 2004), it is possibleto extract a physical memory image from the kcore file (Ligh et al., 2014). A proof ofconcept implementation exists in the Volatility project (Walters, 2014) and we havealso implemented this method into Linux Memory Acquisition Parasite (LMAP),our Linux memory acquisition platform introduced in Chapter 6.To acquire a physical memory image from kcore, LMAP parses the ELF header of the/proc/kcore file. Listing 3.1 illustrates a simplified version of the algorithm. LMAP

43


1 for ( size_t i = 0; i < ehdr. e_phnum ; i++) {2 ...3 // Only add segment if inside kernels physical memory

mapping4 if (phdr. p_vaddr >= 0 xffff880000000000 &&5 phdr. p_vaddr <= 0 xffffc80000000000 ) {6 memory_map_append (7 mm , phdr.p_vaddr , phdr.p_filesz , phdr. p_offset8 );9 }

10 }

Listing 3.1: Identifying Physical Memory Regions in /proc/kcore

iterates over each ELF program header and checks the virtual address of the corre-sponding segment. Segments with a virtual address between 0xffff880000000000and 0xffffc80000000000 belong to the direct kernel physical memory mapping andneed to be copied. The p_offset field shows the location of this segment in thekcore file, while the p_filesz field describes the size of the segment. By iteratingover all segments in the file, LMAP enumerates the physical address space and addssegments containing physical memory to its memory map. These segments are thencopied into the memory image by reading from the stored file offsets in /proc/kcore.

Mac OS X Early versions of OS X provided the /dev/mem and /dev/kmem devicefiles similarly to Linux systems. However, these devices were disabled by Apple withthe move to the x86 architecture (Singh, 2006). They can be re-enabled by settingthe kmem boot argument, which requires a reboot (Halvorsen and Clarke, 2011).Because of this, they are only useful in cases where the system can be prepared formemory acquisition before the incident.

While it should also be possible to exploit the hybernation file on OS X to acquirememory, we are not aware of publicly available solutions to this problem. However,there are indications this is being investigated (Ruff and Suiche, 2007).

3.2.3 Driver-Based Memory Acquisition

Some operating systems like Mac OSX or Linux since kernel 2.6.26 do not offer directphysical memory access. On these systems custom drivers are needed to access theentire physical address space and create a memory image. Also, OS physical memorydevices like \\.\Device\PhysicalMemory are obvious targets for malicious software(Bilby, 2006). This warrants the use of more robust and stealthy methods to accessphysical memory.

44


Microsoft Windows The standard API for memory acquisition on Windows isthe \\.\Device\PhysicalMemory section object. Because this interface is built intothe OS for the purpose of physical memory access, it is considered to be the moststable approach.

However, memory acquisition drivers increasingly use undocumented APIs for map-ping of physical memory to evade interception by malicious software. These includethe MmMapIOSpace symbol originally intended for drivers to map MMIO regions ofdevices, as well as the MmMapMemoryDumpMdl symbol used by the kernels own crash-dump facilities.

Memory enumeration is commonly achieved by calling the MmGetPhysicalMemoryRanges function, which returns the contents of MmPhysicalMemoryBlock. This datastructure contains an array that stores the physical address and size of all availablephysical memory ranges in the system (Cohen, 2014a).

Linux The restrictions of /dev/mem through the CONFIG_STRICT_DEVMEM optionhave forced developers to pursue an alternate route of physical memory access.The RedHat crash utility for example is a debugging tool to investigate systemcrashes by analyzing kernel memory. It requires access to the entire physical ad-dress space to work and relied on /dev/mem before kernel 2.6.26. To work aroundCONFIG_STRICT_DEVMEM the “crash” kernel module was developed, to provide similarfunctionality as the unrestricted /dev/mem device (Anderson, 2008). This modulecan be used to get access to physical memory from userspace, which can then becopied into an image by use of a file copying tool like “dd”.

Based on this idea, multiple implementations of /dev/mem like modules for forensicmemory acquisition have been developed (Kollar, 2010; Cohen, 2011). However, withthe exception of the pmem module, these tools lack any safeguards for the addressspace regions read, which requires userspace tools reading from their device node tomake sure not to read from MMIO regions to ensure system stability (see Section3.2). Software can retrieve information on the physical address space layout from /proc/iomem, which exports the memory resource tree to userspace. Regions markedas “System RAM” are guaranteed to be backed by physical memory and safe toread.

There are several problems with this approach. First of all, memory reading isperformed with a block-wise file copying tool such as dd. This uses a lot of memoryand is rather slow, because each page is first copied to userspace, and then copiedback into the kernel for writing to disk or sending over the network. This alsocauses a lot of memory to be overwritten, which violates forensic principles and candestroy evidence resident in memory regions that have already been freed (Sylve,2012). Furthermore, it is very easy for malicious software, even from user space, tointercept the operation and filter or modify the memory image.

45


Recent research on Linux memory acquisition has focused on moving most of theoperation to kernel space to minimize these problems. For example, the LiMe andpmem tools obtain information on the address space layout directly from kernel modeby parsing the iomem_resource tree (Sylve, 2012; Cohen, 2011). Furthermore, LiMeavoids the buffer copying issues of user mode imagers by writing the image directlyfrom kernel mode (Sylve, 2012).

Mac OS X The Mac OS X kernel actually consists of multiple components thateach run in systemmode and thus have access to physical memory. The hardware-specific details are managed by the platform expert, which is a kernel object thatinterfaces other components with the systems buses (Singh, 2006). Memory and taskmanagement is ultimately performed by the Mach component of the OS X kernel,which is based on the Mach microkernel (Accetta et al., 1986). However, most otherkernel functionality is implemented by a task running in system mode which is basedon BSD (Singh, 2006). Finally, the IOKit provides a C++-based environment fordrivers (Levin, 2012). While it is possible to load generic kernel extensions, thepreferred method of loading a driver is through the IOKit.

Memory enumeration on OS X can be achieved by obtaining the firmware memorymap. When the system boots, the Extensible Firmware Interface (EFI) passes infor-mation on the layout of the physical address space to the platform expert via the socalled boot arguments. Memory acquisition software can utilize this information toenumerate physical memory and avoid accessing MMIO regions. Kernel extensionscan access them directly through the MemoryMap member of the bootArgs symbolin the platform experts state.

The IOKit communicates with the platform expert through driver connection pointscalled nubs (Apple Inc., 2013b, I/O Kit Architecture). It receives a copy of theboot arguments through the root nub during initialization, which it stores in theIOPlatformArgs (Singh, 2006). IOKit drivers can find the root nub using theIOService::getServiceRoot function and then get the IOPlatformArgs from there.

There are multiple APIs available for physical memory access. Software can calldirectly into Mach memory management functions or use the IOKit as a wrapper.However, mach memory management symbols are not exported, so the officiallysanctioned way of mapping memory is through the IOKit. The two most commonlyused interfaces in the IOKit for mapping memory are the IOMemoryDescriptor andIOService APIs (Apple Inc., 2009, 2013a). We will go into this in more detail inour description of the OSXPmem kernel extension in Section 3.2.3.

The first freely available tool for Mac OS X memory acquisition was MacMemo-ryReader (Inoue et al., 2011). It uses the DTrace framework to obtain the memorymap from usermode by reading the PE_state.bootArgs from the platform expert.It then loads a generic kernel extension that creates a /dev/mem character device thatmaps memory using an IOMemoryDescriptor. Unfortunately MacMemoryReader is

46


1 page_desc = IOMemoryDescriptor :: withPhysicalAddress (2 page , PAGE_SIZE , kIODirectionIn3 );4 ...5 page_map = page_desc -> createMappingInTask (6 kernel_task , 0, kIODirectionIn , 0, 07 );8 ...9 *vaddr = (void *)(page_map -> getAddress ());

10 ...11 uiomove64 (( uint64_t )vaddr_page , ( uint32_t )chunk_len , uio);

Listing 3.2: Memory Mapping in OSXPmem

not open-source and the project seems to have been abandoned. At the time ofwriting the project website was unreachable (ATC-NY, 2012a).

A free alternative to MacMemoryReader is MacMemoryze (Mandiant, 2012). Itutilizes an IOKit driver to enumerate memory by getting the memory map from theIOKit root nub. It then services a /dev/mem character device using the IOServiceAPI.

OSXPmem At the time of writing only one free memory acquisition tool for OSX existed and it was closed source (ATC-NY, 2012a). Its development has sincebeen ceased and the old versions are not capable of acquiring memory on recentversions of OS X such as 10.9 and 10.10. To fill this gap we developed the programOSXPmem, which is able to acquire memory from all recent OS X from 10.6 to 10.10.It consists of a user space tool that creates the memory image, as well as a generickernel extension facilitating physical memory access and enumeration. Interactionwith the kernel extension is accomplished through a character device in /dev/pmem.

OSXPmem maps memory using the IOKit IOMemoryDescriptor class. We providean simplified version of the relevant code in Listing 3.2. By creating an IOMemory-Descriptor with a physical address, and then calling the function createMappingInTask, the IOKit maps the requested page into the virtual address space of the ker-nels mach task. From there we can copy it to user space on reads to the device fileusing the uiomove64 function.

For flexibility reasons, we don’t restrict the read offsets for the device file, allowingaccess to the entire physical address space. To ensure system stability, the userspace component must enumerate the physical address space to ensure it doesn’tread from MMIO regions. This is accomplished by parsing the EFI memory map.Because the memory map is not available from user space, the kernel extensionprovides an interface for user space programs to obtain it. The relevant parts of theprogram are illustrated in Listing 3.3. The kernel extension first obtains a pointer

47


1 boot_args *ba = ( boot_args *) PE_state . bootArgs ;2 mmap = ( EfiMemoryRange *)ba -> MemoryMap ;3 mmap_size = ba -> MemoryMapSize ;4 ...5 copyout (mmap , *(( uint64_t *) buffer ), mmap_size );

Listing 3.3: Accessing the Memory Map in OSXPmem

to the boot arguments from the platform expert. It then stores references to theEFI memory map and its size, and uses the copyout function to copy it to userspaceupon request.To create a memory image, the user space component of OSXPmem first loads thekernel extension and obtains the memory map. It then iterates over the memorymap and reads all valid memory regions from the /dev/pmem device file. The resultis written either as a raw binary file, an ELF core file, or a Mach-O core dump foranalysis.

3.3 Summary

In this chapter we have given an overview of the theoretical and practical detailsof software memory acquisition. We have introduced criteria for sound memoryacquisition and given examples as well as a short overview of an evaluation of thesecriteria.Furthermore, we have pointed out the importance of correctness of memory imagesfor the purpose of rootkit detection. Because rootkits cannot remove themselvesfrom memory entirely, they must subvert this property to remain invisible.Finally, we have presented an overview on the inner workings of most freely availablememory acquisition software. We have defined the tasks of memory access andmemory enumeration as the two critical steps to obtain a correct memory image. Bystudying open source tools and reverse engineering other, freely available, memoryforensic software, we have compiled an overview on how software can solve thesetasks on the Windows, Linux and OS X platforms.Note that our results show that all publicly available memory acquisition softwarecompletely relies on operating system interfaces to enumerate and access memory.This can be abused by malware to subvert the acquisition process, resulting in anincorrect image that has been cleaned of malware traces. Investigators have no wayof knowing if the created image is correct, and may therefore draw false conclusionsand remain oblivious of the malware present on the system.The remainder of this thesis is dedicated to identifying such deceptive techniquesand to develop methods to create correct memory images in spite of sophisticatedmalware with anti-forensic capabilities.

48

Chapter 4

Anti-Memory Forensics

Some of the most widely used kernel rootkit techniques are interception of systemcall APIs (hooking) and Direct Kernel Object Manipulation (DKOM). By hookingkernel APIs directly, rootkits can filter the view of the system that is presented todetection and analysis software (Hoglund and Butler, 2005). DKOM attacks directlymanipulate kernel data structures to hide processes, threads, network connectionsand other malware traces (Butler, 2004). Because the OS itself has been subvertedin this scenario, it can no longer be trusted to deliver accurate information. Memoryforensics provides a more reliable view of the system state, and thus is increasinglyoften used to detect and analyze malicious software.

Anti-memory-forensic techniques attack either the acquisition or analysis phase ofmemory forensic investigations. Analysis software like the Volatility framework(Walters, 2014), which relies mostly on memory scanning, can be subverted by de-stroying or manipulating certain kernel data structures needed for its operation.For example, it was possible to prevent analysis by Volatility by overwriting theKdDebuggerDataBlock.OwnerTag string (Haruyama and Suzuki, 2012). Becausethis data structure is only used for kernel debugging, destroying it does not im-pact regular system operation. Other work proposes flooding the address spacewith thousands of fabricated data structures intended to distract and overwhelminvestigators with false positives (Williams and Torres, 2014). While attacks on theanalysis phase can be effective, they have the drawback that they can be overcomewith sufficient effort. As investigators improve their methods to deal with thesetechniques, they can re-analyze previously acquired memory images and uncoverevidence they missed in past analysis.

Successful attacks on the acquisition phase are permanent, because of the volatilenature of RAM. If malware succeeds in hiding its traces from a memory image, it isvery unlikely investigators will be able to come back with improved tools and acquireanother image. By the time they do, concurrent activity on the system and/or sys-tem reboots will have erased most of the data of interest. The problem with currentmemory acquisition software is that it relies on OS APIs to enumerate and accessphysical memory. Malware can use the same techniques that are already employedto subvert system calls, to filter the view software has on physical memory. Forexample, the DDFY rootkit intercepts access to the \\.\Device\PhysicalMemoryobject on Windows systems by using a filter driver (Bilby, 2006). Dementia expandson this principle by filtering arbitrary traces from file system buffers of the memoryimage while it is being written to disk (Milkovic, 2012). The shadow walker rootkit

49

4 Anti-Memory Forensics

uses a hook in the page fault handler of the OS to desynchronize the data- andinstruction TLB (Sparks and Butler, 2005). This approach can hide control flowmodifications because the CPU will get different data on instruction fetches than amemory acquisition tool acquires by reading from the same region of memory.

In this chapter, we present an overview of the current state of the art in anti-forensicsagainst software memory acquisition, and analyze different techniques in regard tothe two main tasks we identified in the previous chapter: memory mapping, andmemory enumeration. Based on our analysis of memory acquisition software inter-nals in Section 3.2.3, we develop a generic DKOM attack on memory enumeration.We develop proof-of-concept implementations for Windows, Linux and OS X thatare able to subvert all publicly available memory acquisition software. We then iden-tify a method of hiding arbitrary code and data from memory acquisition softwareby utilizing hidden regions of memory that are unknown to the OS.


This chapter is organized as follows: In Section 4.1, we classify different anti-memory-forensic techniques in regard to their targeted part of the acquisition pro-cess. We show how an attacker can hide code and data from memory images byintercepting memory mapping and memory enumeration APIs. In Section 4.2, wepropose practical anti-memory-forensic techniques that attack the memory map-ping and enumeration APIs on Windows, Linux and OS X. We evaluate each of thedeveloped methods on a broad selection of publicly available memory acquisitionsoftware. Finally, in Section 4.3, we introduce a novel, passive, technique for hidingcode and data from memory acquisition software.

4.1 Anti-Forensic Techniques

As we have illustrated in Section 3.2.1, all software memory acquisition tools mustsolve two primary challenges: memory mapping and memory enumeration. Thesepresent key points in the memory acquisition process that anti-forensic malware canattack to hide its traces.

Attacks on Memory Mapping Memory acquisition software must map all phys-ical memory into its virtual address space to be able to access it. As we have shownin Section 3.2.2, this is accomplished with help of the OS. By intercepting the mem-ory mapping APIs in the OS, rootkits can selectively replace regions of memory thatcontain traces of their existence with a benign copy of this region. This allows themto transparently remove evidence from the memory image.

50

4.2 Attacks on Memory Acquisition Software

Attacks on Memory Enumeration To prevent access to MMIO regions whichcan destabilize the system, software must enumerate the physical address space andidentify all physical memory regions. This information is passed on to the OS by thefirmware, which is generally not available to drivers at runtime. Software can querythe OS for the memory map, which provides an overview on the physical addressspace. Rootkits can intercept these APIs to hide specific regions from memoryacquisition software. While it is not possible to hide all system modifications thisway, rootkits can still hide their code and data from a memory image. Dependingon the implementation of the OS it is also possible to perform a DKOM attack onthe memory map directly. This is even stealthier because it does not require anyredirection of control flow and can’t be detected by integrity checks.

As we will show in Section 4.3, memory enumeration can even be cheated usingpassive techniques. There are small regions of memory in the physical address spacethat the OS doesn’t know about. Their existence is a result on constraints duringPOST and they are so small that their loss is considered acceptable by the OS. Byidentifying those regions malware can move code and data out of known memorywithout actively interfering with either the OS or memory acquisition software itself.Since these memory regions don’t exist from the OS perspective, they will not beacquired into the image.


In this section, we demonstrate practical attacks on memory acquisition software bypatching all relevant OS APIs to return an error instead of performing their intendedtask. This makes it impossible for software to access or enumerate physical memory.With this capability it is also trivial to selectively hide code and data from the imageby employing one of the strategies that have been described in previous work (Bilby,2006; Milkovic, 2012). We test our attacks against a wide range of freely availablememory acquisition software on Windows, Linux and OS X.

4.2.1 Windows

In our analysis of Windows memory acquisition software in Section 3.2.2, we haveidentified three APIs used by software to access physical memory, as well as one APIfor memory enumeration. Furthermore, we noticed that some memory acquisitiontools rely on debugging data structures in the OS for the creation of memory imagesin crashdump format. By manipulating these data structures we can prevent themfrom writing the crashdump image.

Memory Enumeration As mentioned in Section 3.2.1, memory acquisition driversneed to enumerate the physical address space prior to acquisition. On the Microsoft

51


Windows family of operating systems, all tested drivers use the undocumented sym-bol MmGetPhysicalMemoryRanges() to obtain a map of the physical address space.By patching this function to always return NULL, which is the failure indicator forthis function, we prevent drivers from learning about the physical address spacelayout. As reading from device memory can crash the kernel, this effectively pre-vents memory acquisition. Usage of this API is discouraged by Microsoft, so regulardrivers don’t use it and patching it does not impact system stability. An actualrootkit could of course simply return a modified version of the memory map, whichexcludes ranges it is trying to hide. Acquisition would then appear successful, whilebeing incomplete.

Memory Mapping To actually access physical memory, acquisition drivers needto map it into the kernels virtual address space (see Section 3.2.1). The three kernelAPIs commonly used for this purpose are ZwMapViewOfSection, MmMapIOSpace andthe undocumented symbol MmMapMemoryDumpMdl. For demonstration purposes, wepatch MmMapMemoryDumpMdl to return NULL. As this symbol is undocumented andusage is discouraged, we can patch it without affecting system stability. Becausethe other two APIs are often used by drivers, patching them can destabilize thesystem. However, a more sophisticated rootkit can easily install hooks that filtermapping operations on hidden pages. This would also be a very reliable modification,subverting any memory acquisition tools using the other two API.

Debugger Block hiding The static kernel structure KdDebuggerDataBlock isused by memory acquisition software to find the base address of the kernel imageand several non-exported symbols. It can be found by scanning for the OwnerTagmember, which is the static string “KDBG”. Haruyama and Suzuki already demon-strated that overwriting this tag is effective in thwarting analysis by frameworks likeVolatility (Haruyama and Suzuki, 2012). This technique can even disrupt memoryacquisition, as some drivers rely on the KDBG to resolve some symbols.

Evaluation We have created a small kernel patcher (shown in Listing 4.1) to demon-strate these techniques. Note that because of kernel patch protection this script willnot work on 64 bit kernels without disabling Patch Guard (Microsoft Corporation,2006). However, as more and more rootkits subvert this protection (Rusakov, 2011,2012; Allievi, 2014), we believe it is safe to assume an attacker is able to do this. Fortesting purposes we have enabled debug mode on our test systems, which disablesPatch Guard.

The script requires Winpmem 1.6.0 (Cohen, 2012) with write support to be loaded,which is used to get unrestricted access to physical memory. The script utilizes theRekall memory forensic framework (Cohen, 2014b) to locate the kernel and its sym-bols in memory. It then patches the previously mentioned enumeration and mapping

52


1 from rekall import session2 from rekall . plugins . overlays import windows34 def KernelApiPatch (session , symbol , patch):5 session . profile . get_constant_object (6 symbol , " String "7 ).write(patch)89 def PatchKDBG ( session ):

10 session . profile . get_constant_object (11 " KdDebuggerDataBlock ", " _KDDEBUGGER_DATA64 "12 ). Header . OwnerTag .write("MOOF")1314 if __name__ == " __main__ ":15 session = session . Session (16 filename = r"\\.\ pmem",17 autodetect = [" nt_index ", "pe"],18 profile_path =[19 "http :// profiles .rekall - forensic .com"20 ]21 )22 shellcode = "\x48\x31\xc0\xc3"23 KernelApiPatch (24 session , " MmGetPhysicalMemoryRanges ", shellcode25 )26 KernelApiPatch (27 session , " MmMapMemoryDumpMdl ", shellcode28 )29 PatchKDBG ( session )

Listing 4.1: Attack on Windows Memory Management APIs

functions to return NULL, and overwrites the KdDebuggerDataBlock.OwnerTag withthe meaningless string “MOOF”.

We evaluated our proof-of-concept techniques against several popular memory ac-quisition tools. For this study, we have requested evaluation copies of “Moon-sols Dumpit”, “HBGary Fastdump Pro”, “GMG Systems Kntdd” and “Guidance’sWinEn” for the purpose of forensic tool testing. Only Moonsols responded posi-tively to our request. Additionally, we included open source or free tools such asMemoryze (Mandiant, 2011), FTK Imager (AccessData, 2012), WinPmem (Cohen,2012), and WindowsMemoryReader (ATC-NY, 2012b). We believe that most othertools exhibit similar deficiencies. However, since we are unable to test these, readersare encouraged to use the provided test script in Listing 4.1 to reproduce these teststhemselves.

53


Program Version Format KDBG GetPhysical-MemoryRanges

MapMemory-DumpMdl

Memoryze 2.0 raw ✓ ✗ ✓

FTK Imager 3.1.2 raw ✓ ✗ ✓

Win64dd 1.4.0 raw ✓/✗ ✗ ✗

Win64dd 1.4.0 dmp ✗ ✗ ✗

DumpIt 1.4.0 raw ✓ ✗ ✗

WinPmem 1.3.1 raw ✗ ✗ ✓

WinPmem 1.3.1 dmp ✗ ✗ ✓

WMR 1.0 raw ✓ ✗ ✓

WMR 1.0 dmp ✓ ✗ ✓

Table 4.1: Evaluation of Acquisition with Active Anti-Forensics

Our test system is an x86-64 Intel computer with 8 GiB of RAM, running a fullypatched Windows 7 x86-64 with Service Pack 1. We have tested the tools using theirdefault settings to produce a raw image. In cases where the tool could produce animage in the crashdump format, the tests were repeated for this format. All patchesin Listing 4.1 were tested individually, as well as simultaneously. A summary of theevaluation results is depicted in Table 4.1. The ✓ symbol means the acquisition toolwas able to create an image of memory despite the employed anti-forensic method,the ✗ signals a failed acquisition.

The data shows that every tested acquisition tool was subverted by at least oneof the tested anti-forensic methods. After employing all anti-forensic techniquessimultaneously, none of the tools were able to acquire a single byte of memory.Some tools even crashed the kernel while trying, a very undesirable effect whenanalysing production systems. This may be due to missing error checking withinthe acquisition tool which may assume that Kernel memory management APIs cannever fail.

Mandiant Memoryze Destroying the KDBG Owner Tag had no impact on the per-formance of Memoryze. Also, hooking MmMapMemoryDumpMdl had no effect, as Mem-oryze only supports the \\.\Device\PhysicalMemory and MmMapIOSpace methodsfor mapping physical memory. Hooking MmGetPhysicalMemoryRanges caused Mem-oryze to crash the kernel immediately, making it impossible to acquire any memoryat all and forcing the target machine to reboot without an error message.

Accessdata FTK Imager Similarly to Memoryze, destroying the KDBG Owner Tagor hooking MmMapMemoryDumpMdl did not affect FTK Imager, as it maps memory by

54


calling ZwMapViewOfSection on the \\.\Device\PhysicalMemory device. However,hooking MmGetPhysicalMemoryRanges resulted in an empty image, without anyapparent warnings.

Moonsols Win64dd When creating a raw image, the destruction of the KDBG OwnerTag resulted in spontaneous reboots during acquisition with Win64dd. In our tests,an incomplete dump of 100 MB was created before the fault occurred. The log didnot include any error messages. Similar behaviour was experienced when hookingMmGetPhysicalMemoryRanges or MmMapMemoryDumpMdl, which is the default mem-ory mapping method of Win64DD. The tool behaved in the same way when creatinga crash dump (dmp). However, when providing all arguments on the command-lineand creating a raw image, the KDBG method did not cause Win64dd to crash any-more. It was still impossible to create a crashdump, though. We presume Win64dd’sinteractive mode queries the driver for some information, that triggers it to searchfor the KDGB, regardless of the image format.

Moonsols DumpIt Moonsols offers a packaged version of it’s memory acquisitiontools called DumpIt. This tool only supports the raw output format and does notseem to be affected by overwriting of the KDBG Owner Tag. It is still vulnerableto the other two anti-forensic methods.

WinPmem Overwriting the KDBG Owner Tag causes WinPmem to fail. In contrastto other tools we tested, it does not crash the kernel. However, there is no errormessage indicating the reason for the failure. The hooking of MmGetPhysicalMemoryRanges also causes an abort, displaying the error message to obtain memory geom-etry. Hooking MmMapMemoryDumpMdl does not affect WinPmem, as it utilizes the\\.\Device\PhysicalMemory and MmMapIOSpace methods for memory mapping.

ATC-NY WindowsMemoryReader The KDBG method did not affect Windows-MemoryReader at all. It was even able to create a crashdump. However, the re-sulting dump could not be parsed by WinDBG completely, as the contained KDBGblock was corrupted. Hooking of MmMapMemoryDumpMdl had no effect, as it is notused by WindowsMemoryReader. The MmGetPhysicalMemoryRanges method how-ever completely disabled both, raw and dmp output. It caused an error in the driverto crash the kernel, immediately rebooting the host.

4.2.2 Mac OS X

The demonstrated problems are not Windows specific. We have also conductedexperiments with other operating systems, with similar results. On Mac OS X10.8 Mountain Lion, we have tested MacMemoryReader in version 3.0.2 (ATC-NY,

55


2012a), as well as OXSPmem (Stuttgen, 2012) version RC1. Both function in asimilar way, with the same inherent problems malicious software can exploit.

On EFI enabled systems, rather than using the BIOS Interrupt 0x15 routine, mem-ory geometry is obtained by calling an EFI boot service. The platform expert com-ponent of the OS X kernel obtains the memory map from EFI and stores a pointerto this structure in the symbol PE_state.bootArgs.MemoryMap. Zeroing this struc-ture, or simply zeroing the size, will prevent acquisition drivers from obtaining amap of physical address space, effectively preventing acquisition. Of course, a moresophisticated rootkit could modify this map to exclude any protected data. Acquisi-tion will then succeed, without any indication of subversion. However, hidden datawill not be included in the image, reducing its evidentiary efficacy.

This procedure is very easy to implement, a possible implementation is depictedin Listing 4.2. In our tests a simple kernel extension calling this 2-line functioncompletely prevented OSXPmem and MacMemoryReader from acquiring even asingle byte of memory.

1 void destroy_efi_memory_map (void) {2 // Access boot arguments through platform export ,3 // and zero size member of EFI Memory Map.4 boot_args * ba = ( boot_args *)( PE_state . bootArgs );5 ba -> MemoryMapSize = 0;6 }

Listing 4.2: OS X Memory-Map Overwriting

Similarly to Windows acquisition tools, OS X physical memory mapping can alsobe easily subverted. On OS X, physical memory mapping is achieved by creatingan object of IOMemoryDescriptor, and then calling it’s createMappingInTask()method. By either hooking the constructor or mapping method, malicious softwarecan perform the exact same attacks as with the above mentioned Windows memorymapping functions. Because this API is also used by regular drivers we refrain fromdestroying it like in our Windows experiment, as this would cause system instabilityunless fully implemented with filtering capability.

4.2.3 Linux

The Linux kernel maintains a tree of data structures called iomem_resource, de-scribing the physical address space of the system. The tree is built during bootand subsequently defined when drivers declare responsibility for a specific region inthe physical address space. Each resource identifies a specific memory region andcontains information on the location of this regions in the physical address space

56


as well as a name, some flags, and pointers to other regions in the tree. Whenthe kernel initializes memory regions backed by RAM, it assigns the static string“System RAM” as a name. As we have shown in Section 3.2.3, this name is used inall publicly available memory acquisition tools for Linux to identify memory regionsbacked by RAM. Similar to OS X, this presents a choke point in the system whichcan be abused by malware to subvert the memory enumeration procedure.

1 struct resource *p = & iomem_resource ;2 for (p = p->child; p != NULL; p = p-> sibling ) {3 if (! strcmp (p->name , " System RAM")) {4 disable_writeprotect ();5 *(( char *)p->name + 8) = ’O’;6 enable_writeprotect ();7 break;8 }9 }

Listing 4.3: DKOM Attack on Linux Memory Map

To prove how vital this data structure is for memory enumeration, we have devised atechnique that stops Linux memory acquisition tools by patching one byte of kerneldata as shown in Listing 4.3. We traverse the iomem_resource tree and locate nodesthat have a reference to the “System RAM” string. When we encounter a referenceto this string, we set the byte at offset 8 to “O”1.

Figure 4.1 shows an illustration of the iomem_resource tree before and after themodification. As a result of replacing the “A” in RAM with an “O”, all “SystemRAM” regions are now named “System ROM”. Memory acquisition software thatuses the name of a region to identify its type is lead to believe the system does nothave any memory installed at all, and skip over the RAM regions.

This technique is naturally not very stealthy. An investigator will find it rather hardto believe that a system has no installed memory and quickly notice the suspicioulylarge ROM regions. However, he would still have a hard time acquiring memoryfrom this system. Note that a rootkit could take this one step further and justremove a small region of memory that contains the data it wants to hide. Memoryacquisition would then succeed, but the rootkits code and data would be missingfrom the image.

We have tested this method on a system with an Intel x86-64 CPU, 8 GiB of RAMand Ubuntu 14.04 with kernel 3.13.0-45-generic. Our test suite included pmem forLinux (Cohen, 2011) and the most recent version of LiME (Sylve et al., 2012)2.

1 Because the string is located in a read-only region of memory, we briefly disable memory writeprotection by modifying the WP flag in the CR0 register.

2 Downloaded on February 9th, 2015.

57


Before After

1 00000000 -00000 fff : reserved

2 00001000 -0009 fbff : System RAM

3 0009fc00 -0009 ffff : reserved

4 000a0000 -000 bffff : PCI Bus 0000:00

5 000c0000 -000 c8fff : Video ROM

6 000c9000 -000 c99ff : Adapter ROM

7 000ca000 -000 cc3ff : Adapter ROM

8 000f0000 -000 fffff : reserved

9 000f0000 -000 fffff : System ROM

10 00100000 -07 ffdfff : System RAM

11 01000000 -017344 c3 : Kernel code

12 017344c4 -01 d1e2ff : Kernel data

13 01e77000 -01 fdffff : Kernel bss

1 00000000 -00000 fff : reserved

2 00001000 -0009 fbff : System ROM

3 0009fc00 -0009 ffff : reserved

4 000a0000 -000 bffff : PCI Bus 0000:00

5 000c0000 -000 c8fff : Video ROM

6 000c9000 -000 c99ff : Adapter ROM

7 000ca000 -000 cc3ff : Adapter ROM

8 000f0000 -000 fffff : reserved

9 000f0000 -000 fffff : System ROM

10 00100000 -07 ffdfff : System ROM

11 01000000 -017344 c3 : Kernel code

12 017344c4 -01 d1e2ff : Kernel data

13 01e77000 -01 fdffff : Kernel bss

Figure 4.1: Effects of DKOM on /proc/iomem

Both programs were unable to acquire a single byte of memory after we modifiedthe RAM string and simply wrote an empty image.

4.3 Passive Anti-Forensics

We use the term passive anti-forensics to describe methods that refrain from in-terfering with the system at all. Because they do not modify any code or data onthe system they are very hard to detect from an acquired image, even after theirexistence has become known.

4.3.1 Hidden Memory

The memory map provided by the system firmware does not give any details on theprecise location of all memory regions in the physical address space. When the OSobtains the memory map from the firmware, it can also decide to partially ignorethis information. This results in regions of physical memory that are not known tothe OS which we call hidden memory. We have discovered this phenomenon whileimplementing our hardware based memory enumeration method detailed in Chapter5.

In Figure 4.2, we compare the actual physical address space layout of our test systemwith the view provided by the BIOS E820 map and the MmGetPhysicalMemoryRangesAPI on Windows. The blue regions represent memory that is visible in the respectiveview and safe to use, while the red regions must not be accessed at all because theyrepresent MMIO.

The physical memory map of the system is shown on the left, and was obtainedby applying our PCI-based memory enumeration method (see Section 5.1.1). Thismethod reliably determines all MMIO regions in the physical address space (shown

58


EBDA

Video Window

PCI Option ROMs

Lower BIOS

Upper BIOS

PCI MMIO

PCI MMIO

PCI MMIO

APIC + BIOS ROM

0x00000000

0x0009FC00

0x000A0000

0x000C0000

0x000E0000

0x000F0000

0x00100000

0xE0000000

0xE8000000

0xF0000000

0xF0020000

0xF03FFFFF

0xF080C000

0xFFFC0000

0xFFFFFFFF

PhysicalMemory Map

Memory

Reserved

Reserved

Memory

ACPI Reclaim

reserved

BIOS E820 View

Available

Available

0x00000000

0x0009F000

0x00100000

0x7FFF0000

0xFFFFFFFF

MemoryManager View

Figure 4.2: Hidden memory on Test System with 4 GB RAM

in red). Note that aside from static memory regions like BIOS and option ROMareas we cannot determine which regions contain memory and which are not mappedat all. However, this view gives us an accurate view on which regions are safe toread and which are not.

The area in the center of the figure represents the view on the physical addressspace as supplied by BIOS interrupt 0x15 with AX=0xE820. This is the memorymap as seen by the bootloader, who then passes it to the OS. We have obtained thisdata by using the undocumented x86 BIOS emulator in the Hardware AbstractionLayer (HAL) of Windows (Chappell, 2010). This API allows us to directly callBIOS interrupts and obtain the BIOS memory map through the same channel asthe bootloader (Chappell, 2011). The resulting view shows which regions in thephysical address space contain memory that can safely be used by the OS. However,it neither includes all memory available on the system, nor all MMIO regions.

59


The right side of the figure represents the view of the Windows memory manager. Itwas obtained by calling the MmGetPhysicalMemoryRanges API on Windows. ThisAPI does not inform about reserved or MMIO regions, only about available memory.Similar to the BIOS memory map it can only be used to enumerate memory that issafe to use by the OS, but doesn’t inform the user about MMIO regions that mustbe avoided.

Note that the decreasing completeness of these views lead to regions of memory thatare invisible to the Windows memory manager.

• There is a range of 3072 bytes between 0x0009F000 and 0x0009FC00. Thefirmware claims the EBDA region behind this for its own use. Because the win-dows memory manager only handles whole pages, it ignores the trailing 0xC00bytes in the first memory region. Note that the size of the EBDA is not stan-dardized and can vary from system to system. This means that the amount ofmemory hidden here also varies in size on different systems.

• The 128 KiB range between 0x000C0000 and 0x000E0000 is reserved for PCIoption ROMs. This region must be backed by RAM for option ROMs to exe-cute (PCI-SIG, 2002). There are no guarantees this region will not be used byoption ROMs in the future, so overwriting it can destabilize the system. As it isconsidered tedious to reclaim memory from this region it is ignored by the OS.

• The BIOS memory regions between 0x000E0000 and 0x00100000 are normallyalso backed by RAM. While the actual mapping is controlled by the PAM regis-ters in the memory controller, firmware migrates to RAM during initialization toincrease performance (Salihun, 2006), so it is very likely that this region containsmemory.

• The ACPI reclaim range also consists of memory. Firmware marks this rangeas reclaimable to prevent its use before the OS power management has obtainedthe ACPI tables from here. After that, it’s available for use by the system. Mostsystems don’t spend the effort to reclaim it though, as it is rather small. In ourtests it never appeared in the Windows memory managers available ranges.

There can be other regions in the physical address space that are not mentioned inthe memory maps of the BIOS or Windows memory manager. When the BIOS setsup MMIO mappings, it can shadow regions that contain RAM with device memory.Also some OS don’t bother reclaiming small regions like the ACPI tables as theyonly add up to a few KiB of memory. Malware can utilize these regions to storecode and data outside of the scope of OS controlled memory, if it can sucessfullyenumerate them. Since all memory acquisition tools we tested only acquire theavailable regions from the Windows memory manager, the hidden regions are notpart of the memory image.

60


4.3.2 Evaluation

The size and location of hidden memory regions depend on the chipset, firmwareand hardware devices, and differ from system to system. We have observed caseswhere only the firmware regions and a very small region in front of the EBDAexists, as well as a case with many hidden memory regions intermingled with MMIOregions. To get an overall impression on the amount of available hidden mem wehave performed experiments to acquire hidden memory on Windows with memoryacquisition software.

We have located regions of hidden memory on our test system by identifying seg-ments outside of available regions of the Windows memory manager that are notmapped by devices. We analyzed these regions with a memory probing scheme,where we first discarded any regions that were not completely zeroed. We thenwrote known values to the remaining memory regions and read them back to ensurethe write persisted. On our test system we found 52 pages of hidden memory withthis method.

We then tested if memory acquisition software was able to acquire this memory byfilling all hidden memory segments with a known string. We performed the test withthe same selection of tools as in the previous evaluation for Windows, by acquiring araw memory image with each program, and comparing the data at the correspondingoffsets with the known string. Note that some tools provide multiple settings onwhich memory regions to acquire. Because the hidden memory segments lie insideregions reported as MMIO by the operating system, we have acquired the test imagesusing the most extensive setting possible. Mandiant Memoryze, AccessData FTKImager, Winpmem and Moonsols DumpIt didn’t allow the acquisition of anythingother than the “available” regions. In the resulting image, these regions were zero-padded, except for Memoryze, which uses the 0xBA byte for padding. The knownstring was not acquired.

ATC-NY WindowsMemoryReader allows very fine tuning on the parts of memorythat are acquired. It even resolves all device DMA mappings and provides optionsto include them in the image. Unfortunately, it regards regions that are neither“available” nor memory mapped IO as non-existent, so they can not be selected foracquisition. They were zero padded in the image, the known string could not beacquired. When using the most extensive setting “-r” to acquire all resources, thesystem crashed before the entire memory could be acquired.

Moonsols Win64dd is an exception in this test, because it provides a mode thatacquires the entire physical address space. In our test this did acquire the knownstring from the first 3 hidden memory segments. However the machine crashed andrebooted while imaging the reserved memory region containing the 4th segment.This resulted in the image file being incomplete, missing the last hidden segment.

61


4.4 Summary

In this chapter, we have presented a study on the most common anti-forensic tech-niques against software memory acquisition. We have organized the techniques inregard to which phase in the acquisition process they target: memory mapping ormemory enumeration. We have used the knowledge gained in the last chapter to lo-cate specific APIs on Windows, Linux, and OS X, that can be intercepted to subvertthe memory acquisition process. Furthermore, we have developed new techniquesthat target the memory map on these platforms to either selectively exclude memoryfrom the image or even make acquisition completely impossible.

We have implemented a proof-of-concept implementation for 64 bit Windows 7 sys-tems that proves all freely available memory acquisition tools today are vulnerableagainst these techniques. In addition, we have provided implementations of ourmemory enumeration attack for Windows, Linux, and OS X, and evaluated themwith all publicly available tools for their respective platform.

Finally, we have shown the existence of hidden memory, which is memory in thesystem that is unknown to the OS. We found that the amount of hidden memoryvaried from around 3 KiB to over 200 KiB in our test environment. In an experimentwe have demonstrated that none of the freely available memory acquisition toolsfor Windows was able to acquire all of the hidden memory regions. In fact, onlyWin64dd was able to acquire any hidden memory at all, but it crashed the systemin the middle of the acquisition process.

Note that none of the presented techniques are particularly hard for an attacker toperform. We were able to disable all available memory acquisition tools for Windowswith a 29 line Python script, on Linux and OS X we even succeeded by changinga single byte of kernel memory. From this we conclude that the current softwarememory acquisition techniques are ill suited for malware detection, as it is trivialfor a rootkit to subvert the acquisition process and filter evidence from the image.

Hardware memory acquisition techniques are not without problems either and can-not replace software methods in all cases because of physical access requirementsand the sometimes negative effects on the stability of the system. Because of thiswe see an urgent need for software memory acquisition techniques that are moreresilient to malware subversion. In the next chapter we will show that by inter-acting directly with the hardware it is possible to remove the potentially subvertedoperating system from the acquisition process, which significantly reduces the attacksurface.

62

Chapter 5

Anti-Forensic Resilient Memory Acquisition

As we have seen in the previous chapters, all publicly available memory acquisitionsoftware depends entirely on the operating system to achieve its two most criticaltasks: memory mapping and memory enumeration. This enables malicious softwareto prevent memory acquisition with minimal effort on Windows, Linux and OS X.Kernel rootkits can intercept the APIs used by memory acquisition software to accessand enumerate physical memory, which allows them to filter the resulting image atwill (Bilby, 2006; Milkovic, 2012).

In this chapter, we advance the field of forensic memory acquisition by developingsoftware techniques to map and enumerate physical memory without relying on thepotentially subverted OS. We achieve this by interacting directly with the hardwareto enumerate MMIO regions in the physical address space, and then programmingthe MMUs data structures to map the remaining regions into the virtual addressspace. We show that these techniques are resilient to the attacks presented in Chap-ter 4, and discuss their potential attack surface.


This chapter is outlined as follows: In Section 5.1, we develop memory mappingand enumeration techniques that are independent of operating system APIs. InSection 5.2, we discuss the anti-forensic resilience of our approach. We presenta number of conceivable attacks against our methods and discuss possible countermeasures. We conclude the chapter with a short summary of our work in Section 5.3.

5.1 Improving Memory Acquisition

To become more resilient against anti-forensic attacks, memory acquisition softwaremust stop relying on the potentially subverted OS to perform its task. We achievethis goal by accessing the hardware directly, rather than relying on kernel APIs.Our driver is therefore not vulnerable to the simple anti-forensic techniques demon-strated in Chapter 4. Additionally, not using non-standard APIs makes it harderto differentiate our acquisition driver from ordinary drivers without thorough codeanalysis.

63

5 Anti-Forensic Resilient Memory Acquisition

5.1.1 Hardware-based Memory Enumeration

As discussed in Section 3.2, obtaining the physical memory map via firmware ser-vice routines can only be done early in the operating system’s boot sequence. WhileWindows provides an undocumented real-mode emulator to access the BIOS (Chap-pell, 2010), it is trivial for malware to extend the approach we presented in Section4.2 to filter this API as well.

As we have also shown in Section 4.3, data may be hidden in reserved regions whichare not used by the operating system. Forensic memory acquisition tools should aimto recover all available data, including data in reserved regions. However, the dangerwith reading from MMIO regions is that hardware may become activated, crash thesystem, corrupt data, or get physically damaged. Therefore, rather than finding thememory regions which are safe to read (e.g., via the MmGetPhysicalMemoryRangesroutine), we instead directly enumerate the memory ranges which are not safe toread, and avoid those.

As we have shown in Section 2.1.1, the physical address space is mainly composedof regions backed by memory, as well as regions that are routed to the PCIe fabric,which are allocated dynamically by the firmware during POST. As we have illus-trated in Section 2.1.3, the PCIe CAM protocol can be used to enumerate all activePCI devices, and locate their MMIO regions through the BAR registers. Secondarybuses on the main PCI bus must also reserve memory ranges for themselves, whichcan also be read using this method by enumerating PCI-to-PCI bridge memory baseand limit registers (PCI-SIG, 1998). For a detailed explanation of the configurationprocedure please refer to Section 2.1.3.

To enumerate all MMIO regions on the PCI bus, we read the “vendor ID” fieldfrom the configuration space of each possible BDF address. The read will return theinvalid ID 0xffff if there is no device with this address. If we instead get a validID, we parse all non I/O BAR registers, to determine the location and size of anyMMIO regions of the device. We have to handle endpoint (type 0) and bridge (type1) configuration space differently, as they implement a different number of BARs.Also, devices can implement 64 bit BARs, which in turn leads to different offsetsand sizes in configuration space.

In Listing 5.1, we illustrate the process of extracting the start- and end-address froma 32 bit bar. The read_pci_config and write_pci_config functions translatethe given bus, device, function and configuration offset into a PCI configurationaddress as shown in Figure 2.10 in Section 2.1.3. They write this address to theCONFIG_ADDRESS I/O port and then read/write to/from the CONFIG_DATA port toinitiate a PCI configuration transaction on the bus. We use these functions tofirst read the contents of the BAR and the command register of this function tocreate a backup, as BAR sizing requires writing to both of them. Then we disabletransaction decoding on the device by writing 0s to the command register. Thisprevents the device from claiming any transactions while we modify its BAR. We

64


1 u32 mask = 0;2 u32 start = 0;3 u32 end = 0;4 u32 bar = read_pci_config (bus , dev , fun , offset );5 u16 command = read_pci_config (bus , dev , fun , PCI_COMMAND );67 // Disable transaction decoding8 write_pci_config (bus , dev , fun , PCI_COMMAND , 0);9

10 // Get BAR size11 write_pci_config (bus , dev , fun , offset , 0 xffffffff );12 mask = read_pci_config (bus , dev , fun , offset ) & 0 xfffffff0 ;13 write_pci_config (bus , dev , fun , offset , bar);1415 // Re - Enable transaction decoding16 write_pci_config (bus , dev , fun , PCI_COMMAND , command );1718 // Strip flags from BAR to get start of MMIO region19 start = bar & 0 xfffffff0 ;20 // Get end of range by adding inverse mask21 end = ˜mask + start;

Listing 5.1: PCI BAR Sizing

then write a sequence of 1s to the BAR and read it back to determine the numberof hardwired bits, which determine the BARs mask. After restoring the BAR fromthe previously obtained backup, we re-enable transaction decoding by restoring thecommand register from its backup. The base of the MMIO region described by thisBAR is then obtained by discarding the least significant four bits, which are usedas flags and have nothing to do with the MMIO address. The end of the region isobtained by adding the inverse mask to the base.

Note that CAM is performed by interacting with the PCI root complex throughport I/O, not operating system APIs, and cannot be hooked in the usual way. Inaddition to the discovered MMIO regions, standard memory regions that are assignedto hardware, such as the ISA bus hole ranges, are automatically added to the list ofexcluded memory ranges. Also, there might be other devices that are not registeredon the PCI bus but might have memory mapped into the physical address space.Examples include the High Precision Event Timer (HPET) on the LPC Bus, as wellas local APICs, I/O APICs and BIOS ROMs. While it is possible to locate MMIOranges used by these devices by parsing the MP (Intel Corporation, 1997) or ACPITables (Hewlett-Packard et al., 2011), these tables are not expected to be updatedafter the system has booted (Hewlett-Packard et al., 2011; Intel Corporation, 1997),making them an easy target for rootkit manipulation. While there are programmingrules enforcing register alignment for reads in some of these MMIO regions like theHPET or APIC (Intel Corporation, 2014b), reading them does not violate any of

65


the documented constraints and did not cause any problems in our experiments.However, some devices might exist that cause problems when being read and don’tadhere to the PCI specifications. A broad evaluation of different devices should befocus of future research.Once we have obtained a list of all PCIe MMIO regions, we need to determine thehighest addressable physical memory region in the system. Whilst the OS storesthis value internally, we do not wish to query it as it could be compromised. Calcu-lating or obtaining this value from the hardware is not trivial, since, as described inSection 2.1.1, some regions might not be mapped to RAM at all. Because of mem-ory reclaiming, the highest physical memory address can be much higher than thetotal size of installed memory. We therefore allow this setting to be user selectable,and prefer to acquire past the end of physical memory, which simply yields zeros onreads.

5.1.2 Hardware-based Memory Mapping

The main function of the kernel’s memory mapping APIs is to set up the page tablesused by the MMU to point to the respective page frame in physical memory (seeSection 2.1.2). Memory acquisition software relies on these APIs because interferingwith the kernel’s management of the page tables is risky due to synchronization re-quirements and detailed understanding of kernel page table management. Especiallyon multi-core systems race conditions can occur by simultaneous manipulation ofpage table entries by different cores.However, the kernel’s memory mapping APIs are limited and, as we have shown inChapter 4, their use makes memory acquisition software vulnerable to anti-forensicattacks. To be resilient against such attacks, it is important to directly create thepage table entries that map physical memory.To avoid directly manipulating the kernel’s own page tables, which would be acomplex endeavor with high potential for crashes and system instability, we ask thekernel to allocate a single non-pageable page for our own use. This causes the kernelto create a PTE to our own private allocation. Since this memory is non-paged, wecan be confident that the mapping to this memory will not be modified by the kernelwhile we are using it, guaranteeing that our driver has exclusive access to this PTE.There are multiple methods to achieve this, depending on the operating system.On Windows, the regular non-paged pool allocations usually have large page PTEs(2 MiB), which would complicate our technique. Instead, we create an unused,page sized, static char array for this purpose within the driver’s binary. We thenmake sure this allocation does not get paged out while the driver is loaded bycalling the MmProbeAndLockPages routine. On Linux we use vmalloc and on OSX IOMallocAligned for this purpose. The created page-sized mapping is furtherreferred to as the “Rogue Page”, because we will abuse its PTE for mapping physicalmemory. Rather than using the APIs the operating system offers to drivers that

66


PML4 Directory Ptr. Directory Page Table Offset

47 39 38 30 29 21 20 12 11 0

PML4E PDPTE PDE PTE

CR3

Allocated Page

Target Page

Rogue Page

Flushaddress

from TLB9 9 9 9

40 40 40

40

PML4 PDPT PD PT

Virtual AddressPhysical MemoryVirtual Memory

Figure 5.1: PTE Remapping Technique

need to manipulate the page tables, we perform a very common operation (memoryallocation), which allows us to keep a lower profile and make it harder for malwareto identify our module as a memory acquisition driver.

The driver then walks the page tables directly using the value of the CR3 registerto find the Directory Table Base (DTB), and then determines the virtual address ofthe responsible PTE. While page table data structures are references using physicaladdresses, most operating systems have the page tables permanently mapped intothe kernel address space for quick access. As illustrated in Figure 5.1, the driverfirst obtains the address for the PML4 from the CR3 register. It then uses parts ofthe virtual address of the rogue page to locate the corresponding PDPTE, which ituses to find the PDE and finally the PTE. The PTE, in turn, refers to the PageFrame Number (PFN), which is the physical offset of the page divided by the pagesize.

For each physical page we wish to access (further referred to as the “Target Page”),the driver changes the PFN in the PTE to match the physical address of the targetpage. It then flushes the virtual address of the rogue page from the TLB. All furtherreads from the virtual address of the rogue page will now be performed from thephysical target page by the system’s MMU. Once the TLB is flushed, the MMUwill automatically translate our buffer’s virtual address into the physical page inhardware.

This algorithm does not call any operating system functionality at all, once the roguepage has been locked into memory. We simply write to the PTE address directlyand copy memory out of the rogue page to the user space buffers.

Note that, depending on the caching type in the PTE that holds the original mappingto a physical page, writing to the rogue page can cause cache incoherence and isstrongly discouraged. Thus, operating systems usually prevent the creation of anincompatible second mapping to the same physical page (Vidstrom, 2006). However,

67


this is not a problem for the purpose of memory acquisition, as we only need to readfrom this mapping. Of course it is possible that reading from the rogue page resultsin stale data that has already been replaced in one of the CPU caches. Becauseof the inherent atomicity and integrity issues that come with any software basedacquisition procedure (see Section 3.1.1), the image contains stale data already, sothis is not problematic. By effectively bypassing the operating system in the creationof the rogue mapping, this approach is even more powerful than using one of theAPIs that would prevent the mapping in some situations.

5.1.3 Evaluation

We have integrated these techniques into the open source acquisition tool Winpmem(Cohen, 2012). We then tested it against all anti-forensic techniques presented inChapter 4 on a Windows 7 x64 virtual machine as well as a physical Intel Ivy-BridgeSystem with 4 GiB of RAM running Windows 8 x64. Both systems were equippedwith Intel 510 Series Solid State Drives, to minimize the storage bottleneck whenwriting the image. The tool was able to acquire the entire address space on bothfully compromised systems with a broken KDBG and hooks in MmMapMemoryDumpMdland MmGetPhysicalMemoryRanges. It also correctly acquired the contents of hiddenmemory.We did not witness non-deterministic stability issues, like we experienced withWin64dd and WindowsMemoryReader when acquiring the entire address space. Ourapproach is generally more stable than current established techniques, because weare in no danger to trigger any API checks in the kernel (see Section 5.2.5). While thePCI memory enumeration technique is available optionally with a special commandline switch, our hardware-based memory mapping technique has been the defaultsetting for Winpmem on 64 bit systems since version 1.5.51.It is not possible to do an exact performance evaluation against other approaches,as we acquire a large amount of memory that is not accessible to other tools, whichis why we have to read and write more data. However, in comparison to currenttechniques our approach is significantly slower. For example, the unpatched versionof Winpmem wrote a zero padded image of the 4.8 GiB physical address space onour test machine in 22 seconds at 218 MiB/s. Our tool created a 6.3 GiB imagein 3 minutes and 20 seconds, about 9 times slower at 31.5 MiB/s. While this doeshave a negative impact on the atomicity of the image, we believe it to be sufficientin real world scenarios, given the benefits the technique provides. Depending on thechosen storage medium, the bottleneck could also be the network or hard-disk.Furthermore, we believe I/O throughput can be significantly improved in the future,by mapping bigger ranges of memory. Our current implementation writes each pageseparately, which can not utilize the large file I/O buffers of the operating systemin an optimal way.

1 See git commit 2f375a3f6e398af940f0de53cb734e27f2a872de in March 2014 (Cohen, 2014b).

68

5.2 Discussion

5.2 Discussion

Our technique does not require any complicated techniques to implement and yetraises the bar for anti-forensic methods significantly. Since we do not rely on the OSfor mapping physical pages or enumerating memory, simple hooking techniques, suchas demonstrated in DDFY (Bilby, 2006) are ineffective. By flushing the rogue pagefrom the TLB just before copying the memory out, we remove the effect of desyn-chronized TLB attacks (Sparks and Butler, 2005). Also, the technique is completelyoperating system independent and works on all x86 systems. We have successfullytested it on Windows, Linux, and OS X systems with implementations based on theLinux pmem (Cohen, 2011), Winpmem (Cohen, 2012), and OSXPmem (Stuttgen,2012) drivers. The following discussion evaluates our solution against possible anti-forensic attacks that a rootkit might implement.

5.2.1 Loading of Driver

Our memory acquisition technique depends on being able to run in kernel mode.The obvious countermeasure a rootkit can implement is to prevent our driver frombeing loaded into kernel-mode, for example by hooking the Service Control Managerinterface.

Although our driver requires access to system-mode, there are few signatures thatcan be employed to detect our driver’s intentions. Currently, it is trivial for a rootkitto identify a memory acquisition driver simply by inspecting the module’s importtable. This is especially true for a driver that uses undocumented functions whichare not usually imported by legitimate drivers (e.g. MmMapMemoryDumpMdl as is usedby the Win64DD driver (MoonSols, 2012)).

By rejecting the driver from loading, the rootkit reveals its existence, so it must onlydo this as a last resort, when it is certain that a forensic agent is running. Since ourdriver does not import any special OS functions, a more thorough analysis must beconducted to determine its intentions.

5.2.2 Interception of Data Buffers

Once physical memory is accessible, memory acquisition drivers typically write it todisk, to a network socket, or copy it to user-mode buffers. A simple anti-forensictechnique is to mark certain regions of memory using a magic string and then hookingall kernel file operations and kernel to user-space copy operations, searching for themagic strings. If these are found, the rootkit has an opportunity to scrub the data.

This attack can be easily circumvented by encrypting or obfuscating the raw data asit is copied to userspace. We can use simple RC4 encryption to prevent the rootkitfrom identifying the data as it is passed from kernel-space to user-space.

69


5.2.3 Debug Registers

An effective anti-forensic technique is the use of the debug registers to alert therootkit of reading certain memory regions (Halfdead, 2008). Modern CPUs have aset of debug registers which can be used to set hardware breakpoints on memoryaccess (Intel Corporation, 2014b). The processor can monitor four distinct memoryaccess breakpoints stored in debug registers DR0-DR3. Ordinarily, the debug registerscontain a virtual address and trap when the processor accesses the breakpoint inthe virtual address space. This kind of breakpoint is ineffective against our imagingdriver since, in the kernel’s virtual address space, we are accessing our own privatememory page. The PTE manipulation simply makes the desired physical memorypage available through this virtual page.However, the Debug Control Register (DR7) can configure the breakpoint to be anI/O read or write breakpoint. This has the effect of generating a trap when the CPUexecutes an in or out instruction with an operand matching the breakpoint. Ouracquisition process is not affected by this (since we do not use in/out instructions toread physical memory). Unfortunately, our PCI introspection routine which is usedto enumerate MMIO regions utilizes these instructions to access PCI configurationspace.A malicious rootkit can thus hook our PCI enumeration routine and cause a fabri-cated PCI device to appear in the PCIe hierarchy by returning a forged configurationspace buffer when querying for a specific device ID. This configuration can claimthat the fabricated device is occupying a specific memory region for MMIO, causingour tool to exclude it from the imaging process.To become resilient against debug register I/O emulation attacks like these, we couldswitch to using the ECAM configuration method of PCIe. PCIe configuration spaceis not mapped into the I/O space, but directly into the physical address space usingMMIO. By using our direct page remapping technique to access it, our tool wouldno longer be susceptible to I/O breakpoints. However, we need a way to reliablylocate PCIe configuration space in memory without using the OS or I/O space. Weleave an implementation of this technique as future work.

5.2.4 Shadow Page Tables

Another weakness of our technique is that it relies on the OS to find the page tablesin the first place. All addresses in CR3 and the page tables are physical addresses.Hence walking the page tables requires a physical-to-virtual translation function,which relies on the OS. A rootkit could hook this translation function and employ ashadow-paging approach to hook write access to PTEs. This would require removingwrite access to the page tables and hooking the page fault handler (Ooi, 2009).There is no way to prevent a rootkit from doing this, nor to detect it, but thereis a simple solution for this problem. If the memory driver creates its own page

70

5.3 Summary

tables and changes CR3 to point to these custom tables, we can remain in completecontrol over the translation process without alerting the rootkit. The details of thisimplementation are left for future research, as well.

5.2.5 Reliability and Stability

The Windows kernel adopts a fast fail policy to minimize data corruption in caseof an error (Russinovich et al., 2009). There are a lot of checks in place that try todetect misbehaving drivers. When an inconsistency is detected, the kernel createsa so called bugcheck, commonly known as the Blue Screen of Death (BSOD). Thisimmediately halts all system activity and prints an error message (see Section 3.2.2).From a forensic perspective this is undesirable, because the BSOD effectively shutsdown the system and makes further memory acquisition difficult. While the bugcheckcauses the kernel to write a crashdump, this dump is not complete by default andpartially overwrites the page file, resulting in the destruction of evidence. It istherefore crucial that the acquisition method is as robust as possible.We believe our technique is generally more stable than others because we do not callany kernel APIs during memory acquisition. Thus, we bypass any internal checksin the kernel which could cause a BSOD. For example, page table manipulation isnot allowed at interrupt level and attempting to do so causes a BSOD. However,since we operate outside of the kernel’s APIs we bypass the checks in this case andpotentially avoid a number of cases which can lead to a BSOD.

5.3 Summary

In this chapter we have shown a technique for software memory acquisition thatdoes not rely on operating system support for its two most critical tasks: memoryenumerationg and memory mapping. Instead, we consult the hardware itself toenumerate all memory regions that are unsafe to read. We then map the safe regionsinto the drivers physical address space by programming the data structures usedby the MMU directly. We have implemented this technique into the open sourcememory acquisition programs WinPmem, PMEM, and OSXPmem, and used themto evaluate our approach on the Windows, Linux, and OS X platforms.Our evaluation showed that our method can reliably acquire all physical memory,even on systems subverted with the active and passive anti-forensic techniques in-troduced in Chapter 4. We did not witness system instability, instead we arguethat the memory mapping technique used is actually more stable than interactingwith the OS, at least on Windows, because it is not subject to driver validation andinterrupt level safeguards.We have discussed possible counter measures and found that it is hard for malwareto identify a driver using our technique as a forensic agent, due to the lack of

71


memory management APIs used. The only two possibilities for intercepting ourtechniques on the same privilege level would be the use of I/O port hooking usinghardware debug registers, or shadow paging to sandbox page table manipulations.Both techniques require significant effort and introduce changes into the system bywhich the rootkit could be detected. Finally, we have discussed how to improve ourapproach to become resilient against both of these attacks, by switching from CAMto ECAM configuration and using our own private page tables.

Note that our technique does not depend on any operating system functionality,except for a single page sized memory allocation. It is also not limited to acquirejust memory. As long as reads from the mapped region do not have side effects itcan also be used to acquire the contents of ROM chips or other devices. The nexttwo chapters explore these new capabilities by using our approach to implement aLinux memory acquisition kernel module compatible with a wide range of kernelswithout recompiling, as well as acquiring firmware code and data to detect andanalyze BIOS, UEFI, option ROM and ACPI rootkits.

72

Chapter 6

Kernel Independent Memory Acquisition on Linux

As we have shown in Section 3.2.2, there are two mechanisms that allow access tophysical memory from user-space on Linux: /dev/mem and /proc/kcore. However,the /dev/mem device is restricted on most systems and /proc/kcore is not alwaysenabled. Also, user-space memory acquisition tools are vulnerable to even basicmalware techniques like LD_PRELOAD based shared library rootkits (Ligh et al., 2014).Because of this, memory acquisition on Linux systems typically requires loading akernel module.The Linux kernel checks modules for having the correct version and checksums beforeloading, and will refuse to load a kernel module pre-compiled on a different kernelversion or configuration to the one being acquired. This check is necessary, since thelayout of internal kernel data structures varies between versions and configurations,and a module calling kernel APIs with incompatible data structures will result insystem instability and potentially a crash.For incident response this requirement makes memory acquisition problematic, sinceresponders often do not know in advance which kernel version they will need to ac-quire. It is not always possible to compile the kernel module on the acquired system,which may not even have compilers or kernel headers installed. Some Linux mem-ory acquisition solutions aim to solve this problem by maintaining a vast library ofkernel modules for every possible distribution and kernel version (Raytheon Pikew-erks, 2013). While this works well as long as the specific kernel is available in thelibrary, it is hard to maintain and can not cover cases where the kernel has beencustom-compiled or just is not common enough to be awarded a place in the library.This is especially the case on mobile phones. Phone vendors often publish the kernelversion they used, but the configuration and details on all vendor specific patchesare frequently not known, severely impeding memory acquisition (Sylve, 2012).Rootkit authors also have encountered the same problem when trying to infectkernels where the build environment is not available. Recent work for Androidshows that while it is trivial to bypass module version checking, it is still a hardproblem to identify the layout of data structures in unknown binary kernels (You,2012). In the Android case this problem is solved by restricting dependencies tovery few kernel symbols and reverse engineering their data structures on the flyusing heuristics (You, 2012).A solution for data structure layout detection could be live disassembly of functionswhich are known to be stable and use certain members in these data structures.Recent work has shown that it’s possible to dynamically determine the offsets of

73

6 Kernel Independent Memory Acquisition on Linux

particular members in certain data structures used in memory management, fileI/O and the socket API (Case et al., 2010).

Kernel integrity monitoring systems also face similar problems, as they have tomonitor dynamic data and need to deduce its type and structure to analyze it.Since this changes with different kernel versions, these systems need to infer thekernels data layout from external sources. The KOP (Carbone et al., 2009) andMAS (Weidong et al., 2012) frameworks are designed to monitor integrity of dynamickernel data structures. Their approach involves statically analyzing the kernel sourcecode and debug symbols to infer type information for dynamic data. However, theyrely on the kernel source-code and debug symbols for the exact running kernel beingavailable in advance, which is exactly the dependency we can not guarantee in theincident response scenario.

Since the hardware-assisted memory acquisition technique we presented in Chapter5 does not require access to kernel APIs, it is able to function in any kernel, regard-less of its version or configuration. To solve the problem of having to custom-builda kernel module for every target system, we have developed a method to load aminimal module into a running kernel using a parasitic approach. Most modernkernels have a large number of legitimate kernel modules, compiled specifically forthe running kernel, already present on the system. Our approach locates a suitableexisting kernel module (host module), injects a minimal memory acquisition mod-ule into it (parasite module) and loads the combined module into the kernel. Theresulting modified kernel module is fully compatible with the running kernel. Alldata structures accessed by the kernel are taken from the host module, and were infact compiled with compatible kernel headers and config options. However, controlflow is diverted from the host module to the parasite module, by modifying staticlinking information. This allows the parasite module’s code to use the hosts’ datastructures for communication with kernel APIs.


This chapter is organized as follows: In Section 6.1, we give an overview of theexisting methods of loading incompatible modules into the Linux kernel, and definethe requirements a module faces to be safely loaded into an unknown kernel. InSection 6.2, we then present an approach of using rootkit techniques to inject amemory acquisition module into an existing compatible module on the system. InSection 6.3, we develop techniques to redirect control flow and hijack data structuresfrom existing modules by manipulating their relocation tables. Finally, we discussthe implementation of a minimal acquisition module that utilizes these techniquesto be able to be loaded into arbitrary kernels without recompilation in Section 6.4.

74

6.1 Compatibility of Linux Kernel Modules With Different Kernels

1 struct module {2 enum module_state state;3 struct list_head list;4 char name[ MODULE_NAME_LEN ];5 ...6 #ifdef CONFIG_UNUSED_SYMBOLS7 ...8 #endif9 #ifdef CONFIG_MODULE_SIG

10 bool sig_ok ;11 #endif12 ...13 /* Startup function . */14 int (* init)(void);15 ...16 }

Listing 6.1: Module Data Structure (The Linux Kernel Archives, 2013)


As we have seen in Section 2.2, Linux kernel modules are object files and linkeddirectly into the running kernel. Because they run at the same privilege level as thekernel, there is no protection of kernel memory from their actions, which is why anerror in a module can lead to kernel data corruption and thus to a kernel panic.

Furthermore, because the kernel is directly linked with the module object file, itactually uses some of the modules data structures. For example, each module con-tains a special section called .gnu.linkonce.this_module, which holds a staticdata structure generated in the compilation process. The layout of this data struc-ture is defined in the kernel headers, and it is used by the kernel for bookkeepingand managing of the module. An abbreviated version is shown in Listing 6.1. It islinked into the module list and the kernel will regularly access its members.

Figure 6.1 shows a part of the kernel’s code that loads the module. After loadingand relocating the module the kernel directly dereferences the module.init memberto call the modules initialization function. The offset of init relative to the startof the module struct depends on three factors:

• The configuration of the compiler affects the size of the individual members,as well as the padding in between. If any of these change, the location of allfollowing members of the structure is shifted.

• The configuration of the kernel affects which ifdef directives evaluate to true.For example, if CONFIG_MODULE_SIG is enabled the structure contains an addi-tional boolean member before the init pointer, shifting its location backwards.

75


Module

Kernel

1 struct module __this_module

2 __attribute__ (( section(".gnu.linkonce.this_module"))) = {

3 .name = KBUILD_MODNAME ,

4 .init = init_module ,

5 ...

6 };

1 static int do_init_module(struct module *mod) {

2 ...

3 /* Start the module */

4 if (mod ->init != NULL)

5 ret = do_one_initcall(mod ->init);

6 ...

7 /* Now it’s a first class citizen! */

8 mod ->state = MODULE_STATE_LIVE;

9 ...

10 }

Figure 6.1: Initialization of a Kernel Module

• The struct layout can also change between different kernel versions. A new kernelversion might add or remove a member of the struct in front of init, shifting itslocation.

Forcing the module to be compiled with the exact same kernel version, configurationand compiler settings ensures that all APIs are compatible and structs have the exactsame layout in both the module and the kernel. If the number of members, theirorder, the compilers padding settings or a conditional member are only present oncertain configurations or differ from kernel to module, certain members (like forexample the init pointer) will be at a different offset than the kernel expects. Thecall to mod->init might result in a call to something entirely different, such asuninitialized data or even unmapped memory. This can easily result in a kernelcrash, forcing a reboot or leading to possible data loss or corruption.

As we have seen in this section, it is crucial to compile LKMs for the exact kernelthey are to be loaded into. It is important to use the same kernel headers, configand compiler as was used to build the target kernel. There are numerous safeguardsin place to prevent incompatible modules from loading. Disabling or circumventingthis protection can cause undefined behaviour and/or data corruption and shouldnot be attempted.

76


6.1.1 Bypassing Module Version Checking

There are multiple ways to get around the version check and load a module evenif it was compiled for a different kernel version. However, because of the reasonsmentioned before this should only be a last resort as it can result in undefinedbehaviour, data corruption or worse.

The kernel config option CONFIG_MODULE_FORCE_LOAD allows modules without validversion magic to be loaded by using the “--force” option of the modprobe program.In many cases if the module was compiled on a very closely related kernel (e.g.only the last digit is different) for the same distribution this will work. For largerdifferences this technique could cause a kernel crash and is usually not recommended.

Because it is hard to verify if the versions are compatible without comparing thekernel headers and configuration, this option essentially allows for a gamble withthe possibility of a very bad outcome. Documentation clearly states that “Forcedmodule loading sets the ‘F’ (forced) taint flag and is usually a really bad idea.” (TheLinux Kernel Archives, 2013, init/Kconfig), which is the reason few productionkernels are compiled with this configuration option enabled.

Even without the forced loading option enabled, the kernel can still be tricked intoaccepting an incompatible module by modification of the .modinfo and __versionssections. The version magic is not cryptographically signed, so it can simply be

extracted from a valid module on the target system and replace the incompatiblemagic previously stored in another module. Because the module now contains validmagic strings for this kernel version and all its imported symbols, the version checkwill pass and the kernel will allow the module to be loaded. Nevertheless, the inher-ent danger with this is the same as with forced loading. It can result in undefinedbehavior, kernel crash and data loss.

Finally, the kexec system call offers another way to insert code into system mode.“[K]exec is a system call that enables you to load and boot into another kernel fromthe currently running kernel” (The Linux man-pages, 2012). This can be used toload a custom acquisition kernel, replacing the old one, similar to the approachtaken by the Body Snatcher tool (Schatz, 2007a). However, this will render the oldkernel unusable and there is no way to recover from this into the state the systemwas in before. Additionally, this system call only exists on kernels compiled withCONFIG KEXEC enabled, so there is no guarantee that it will be available.

6.1.2 Requirements for a Stable Approach

Multiple problems have to be solved to load an incompatible kernel module in areliable manner without affecting system stability. The first is the matter of gettingsystem mode code execution. We need the ability to insert arbitrary code into therunning kernel and pass control to it. This involves bypassing the version check and

77


handing the kernel a valid struct module with an module->init pointer under ourcontrol.

For this to work it is also necessary to predict the layout of the kernel’s data struc-tures. Especially the module data structure is needed to get code execution in thefirst place, but usage of many kernel APIs also requires creation of specific data struc-tures with the correct layout. For example the creation of a device inode to commu-nicate with user mode requires a kernel module to have a valid file_operationsdata structure with correctly positioned pointers to the relevant driver functions(such as read, write, and llseek).

The more APIs a kernel module wants to employ, the more data structures haveto be used, which increases the necessary knowledge of the layout of the runningkernels data structures. This implies that the problem becomes much easier to solveif the memory acquisition module uses as few APIs as possible. Some linux memoryacquisition solutions have a rich feature set, such as writing to disk from kernelmode or dumping memory over the network (Sylve, 2012). However, this requiresknowledge of the layout of data structures used in the Virtual Filesystem (VFS) ornetwork sockets. Additionally, some existing tools parse the iomem_resource tree toenumerate physical memory mappings (to avoid acquiring MMIO regions as shownin Section 3.2.1). Kernel APIs mapping the virtual address space or even allocatingmemory can be difficult to use without detailed knowledge of the running kernel’sdata structures and APIs. Ideally, an acquisition module for this scenario shoulduse as few kernel APIs as possible.

6.2 Reliable Loading of Generic Acquisition Modules

A technique for loading a generic memory acquisition kernel module simplifies theacquisition process for incident responders on Linux systems. Investigators canconcentrate on the incident and stop worrying about the exact kernel version of thetarget system, and prebuilding compatible kernel modules. Because the acquisitiontechnique we developed in Chapter 5 does not use any kernel APIs, we can use itto acquire memory on kernels where the layout of data structures is unknown. Wesimply inject our module into a compatible module we find on the target system,and then redirect the control flow.

6.2.1 Parasitizing a Compatible Module

The first step in parasitizing a compatible module, is to locate a valid kernel modulefor the running kernel suitable for parasitizing. On most distributions the directory/lib/modules/ contains a large number of kernel modules for different devices,which have all been compiled with the correct headers and configuration and thusare compatible for linking into the running kernel. Code injection into one of these

78

6.2 Reliable Loading of Generic Acquisition Modules

modules not only allows us to pass the kernel version checks but also ensures thatthe struct module linked into the kernel is compatible.

Parasitizing an existing kernel module is not a novel technique. The technique haspreviously been employed by malware authors as a stealthy persistence technique(Truff, 2003). Because a kernel module is a relocatable object, it is easy to add newcode and data to it using standard tools. It can be essentially relinked with anothermodule to combine both into a single object file. This can be done using the linkerld or by copying individual sections using objcopy. The Adore-ng rootkit (Stealth,2004) for example uses this technique to hide its kernel module inside a legitimateone on infected systems and gain code execution when the host is loaded on startup.The method is documented to work on a wide variety of kernels, from the 2.4 series(Truff, 2003) to more current 2.6 and 3.0 kernels (Styx, 2012).

To divert control flow in the infected module, malware rewrites the symbol names ofinitialization functions. By renaming init module to something else and changingthe name of the injected initialization routines to init module, the kernel linkerwill insert the address of the injected routine into the struct module->init memberwhen relocating the module. When the kernel initializes the loaded module it willthus call the malware’s code, not the hosts.

While this technique provides a stable method to solve the first problem of gettingcode execution in a stable manner, it does not address the problem of learning thestruct layout of the running kernel. For our use case, we are interested in other datastructures a host module has to offer. If we can find a kernel module on the targetsystem that contains all necessary data structures the parasite kernel module needsin order to use the kernel APIs, we can parasitize this module and make use of themourselves.

Because code references in the host module’s data structures are resolved by thekernel linker on load through relocations, the relocation tables of the host modulecontain information on the data structure layout. This can be exploited to patchpointers in relocated data structures on module load to suit our needs, withouthaving to know anything about their layout.

6.2.2 Code Injection into Kernel Modules

Previous work used the linker ld to link code into the host module (Truff, 2003;Styx, 2012). However, this complicates the build process because it either needs thelinker available on the target system, or it is necessary to first copy a suitable modulefrom the target to a system with a suitable build environment, infect it there andthen copy the result back. This is both undesirable when responding to an incident,as it changes the target’s state and increases forensic impact.

Therefore it is prudent to implement a custom linker that can perform this processon the fly in memory when executed on the target. The linker has to be able to

79


insert entries into section header, symbol and string tables and add sections to thebinary. We have created the elfrelink C library for this purpose (Stuttgen, 2014). Itis able to inject ELF object files into each other and migrate the required symbol,string and relocation tables automatically.

6.3 Redirection of Control Flow

Once we are able to inject code into a kernel module, we need to divert the controlflow away from the host to the parasite. This can be performed by using a techniquewe call “Relocation Hooking”. This is commonly used to manipulate entries in theProcedure Linkage Table (PLT) to hook calls to dynamic libraries in ELF executa-bles (Shoumikhin, 2010). The general idea is that the linker will use information inthe relocation tables to patch the program’s control flow, thus manipulation of thesetables can force the linker to patch a program for us.

Relocation Tables are an array of relocation entries, each describing the use of asymbol in a specific location of the program. They provide information on how thiscode needs to be patched to reference the actual address of this symbol, as soon asit has been loaded and its address is known. Because references and addressing arehighly architecture dependent, a large number of different types of relocations exist.On the x86-64 architecture a relocation table is an array of struct ELF64 Rela,storing the offset in the code where the relocation will be performed, informationon the type of relocation, the index of the referenced symbol, and an addend. De-pending on the type of relocation, the addend has to be added to the symbol off-set, for example when patching an RIP relative reference in position independentcode. There are 37 different types of relocation on x86-64 (Matz et al., 2012), ofwhich only 5 are actually used in kernel modules (The Linux Kernel Archives, 2013,arch/x86/kernel/module.c).

6.3.1 Interception of Module Initialization

Each kernel module contains a data structure called __this_module, which is au-tomatically generated from the module source code at compile time through macroexpansion. The resulting definition is available in the generated .mod.c file, andlinked into the module using the relocation table for its section (.gnu.linkonce.this_module). This data structure is then used by the kernel to call the initializationcode pointed to by __this_module->init. The relocation table for this section hasan entry that instructs the kernel to patch the address of the init_module functioninto this member of the struct. By modifying the symbol index in that relocationentry we can make the linker patch any symbol we want into the struct when themodule is loaded. Thus it is sufficient to find this relocation entry and change itssymbol index to the one of the parasites initialization function to get code execution.This process is illustrated in Figure 6.2.

80

6.3 Redirection of Control Flow

this module

1 struct module __this_module __attribute__(

2 (section(".gnu.linkonce.this_module"))) = {

3 .name = KBUILD_MODNAME ,

4 .init = init_module ,

5 #ifdef CONFIG_MODULE_UNLOAD

6 .exit = cleanup_module ,

7 #endif

8 .arch = MODULE_ARCH_INIT ,

9 };

.gnu.linkonce.this module

.rela.gnu.linkonce.this module

.text

.text

.data

Kernel Module

init parasite

Figure 6.2: Relocation Hook of module->init

Note that we don’t need to know anything about the layout of __this_module atall to do this, all information needed to patch this data structure is available in therelocation entry and the patch itself is performed by the linker.

6.3.2 Communication with User Mode

Even after we have achieved code execution, we still lack a method of communicatingwith user space. A memory acquisition driver needs to receive instructions from userspace on which physical pages to acquire and needs to pass these pages back to userspace.

One of the simplest and most commonly used methods for system- to user-modecommunication in Linux is the character device. A kernel module can create a datastructure called file_operations, which contains function pointers for operationslike read, write, and llseek. The module then registers a major number withthe kernel, which will link the data structure to any inode referencing that majornumber. The system call mknod can be used from user space to create such an inode.Any file operations on this inode will be dispatched to the functions referenced inthe corresponding file_operations data structure.

If the host module implements a character device, it must already have a compatibleversion of this struct in its .data or .rodata section (Usually kernel modules ini-tialize their file_operations statically at compile time). To populate the functionpointers in this struct there have to be relocation entries for this section, becausethe functions are placed in another section whose address is not known until it isloaded. When the kernel loads the module, the linker relocates the sections andthen places the addresses of all relevant functions into the file_operations datastructure, by parsing the corresponding relocation table.

We can exploit this process by modifying the relocation table of the host to point toa symbol of our choice instead of the original read and llseek functions exportedby the host module, as illustrated in Figure 6.3. When the parasitized module isloaded, the kernel linker will patch the data structure with function pointers to

81


file operations

1 static struct file_operations lp_fops = {

2 .owner = THIS_MODULE ,

3 .llseek = lp_llseek ,

4 .read = lp_read ,

5 };

.data

.rela.data

.text

.text

.data

Kernel Module

parasite llseekparasite read

Figure 6.3: Relocation Hook of file_operations

the parasites’ read and llseek functions instead. The parasite can then call theregister_chrdev API in the kernel with a pointer to this struct, which is guaranteedto be compatible with the running kernel. Knowledge of file_operations layout isnot necessary, because the relocation entries contain the necessary information. Ourpointers will be placed at the correct offsets by the linker and any read or llseekcalls to a device inode with our major number will be dispatched to the parasites’read or llseek functions.

6.3.3 Selection of a Suitable Host

Due to the need for certain symbols and structs, this approach won’t work witharbitrary kernel modules. However, most distributions ship with a large numberof modules to handle many different hardware devices, which are found in /lib/modules/‘uname -r‘. We can scan this directory and select a host module thatsatisfies the following criteria:

• It contains a symbol with an _fops suffix in the .data or .rodata section, whichindicates it has a file_operations data structure available.

• It contains symbols with _read and _llseek suffixes, with relocation entries intothe file_operations data structure. This is necessary for us to successfullypatch file_operations.

• It imports the symbols register_chrdev and copy_to_user, which the parasiteneeds to register the file operations struct with a major number and copy datato user buffers when called for read.

If we find such a module on the target we can load it into memory, inject theacquisition module, hook the relocations and then pass it to the init_module systemcall for linking into the kernel.

82

6.4 Implementation of a Minimal Acquisition Module

6.4 Implementation of a Minimal Acquisition Module

As mentioned in Section 6.1.2, it is important that the memory acquisition moduleimports as few kernel symbols as possible. While it is possible to employ the sametechnique for other data structures as used on the module and file_operationsdata structures, this increases the requirements on the host module.

For each additional API we want to use, we add a dependency that must be satisfiedby some module on the target. This decreases the number of suitable modules,reducing the chance of finding a suitable host.

We have developed a minimal physical memory acquisition module, which only relieson the register_chrdev and copy_to_user symbols. The module is based on thetechniques introduced in Chapter 5, and maps memory without kernel support. Thisis accomplished by directly editing the page tables and manually remapping partsof the modules data segment to the desired physical page.

Commonly, memory acquisition modules perform memory enumeration in kernelmode by parsing the iomem_resource tree (Sylve, 2012; Cohen, 2011). However,this requires knowledge of the layout of the resource data structure. We removedthis functionality from the kernel module, and leave the detection of physical memorylayout to the user-space imaging tool. It can achieve this by parsing /proc/iomemfrom user-space, or by using PCI introspection as shown in Section 5.1.1.

In our original implementation of PTE Remapping we used the preempt_disableand preempt_enable symbols to ensure the modules thread cannot be interruptedand resumed on another CPU. Because the TLB of another CPU might still containthe old mapping for the remapped page, this could result in a corrupted image. Useof these symbols implies we would have to find a valid version magic on the target,which we do not want to rely on. We have replaced them by simply using the cli/sti instructions to disable interrupts for the brief period of remapping and copyinga page. We also removed debug logging from the module, as not every suitable hostmodule might import printk.

Furthermore, we removed all dynamic memory allocation from the pmem module,and placed all data structures into the data segment. This even allows us to get ridof the kmalloc, vmalloc, kfree and vfree symbols, as each module might use adifferent memory allocation API and we don’t want to limit our selection in targetmodules this way.

Another important detail we discovered when trying to make a module as versionindependent from the running kernel as possible is config options that affect APIs.For example the copy_to_user API is an inline function calling _copy_to_userafter performing some debug bookkeeping on kernels with a specific config optionenabled1. Compiling in an environment where this option is enabled will result in a

1 Kernels that are older than 3.0 have the CONFIG_DEBUG_SPINLOCK_SLEEP option, newer ones haveCONFIG_DEBUG_ATOMIC_SLEEP.

83


Distribution Kernel Version Modules Available Modules Suitable

Fedora 10 2.6.27 1746 4Fedora 15 2.6.38 2280 14Fedora 16 3.1 2384 14Ubuntu 8.04 2.6.24 1939 6Ubuntu 12.10 3.8 3708 14Ubuntu 13.10 3.11 3957 15

Table 6.1: Host Modules by Kernel Version

symbol dependency that kernels compiled without it can not satisfy, thus limitingthe scope where the module can be successfully loaded. Also this causes problemswhen scanning for suitable hosts, as they import _copy_to_user when this option isenabled and copy_to_user when compiled without it. We have solved this problemby explicitly calling _copy_to_user in our module, and modifying the symbol tableto use the correct one depending on what the host uses. Since copy_to_user essen-tially calls _copy_to_user, this is doesn’t affect the codes correctness or stability.

Finally, the build environment needs to be slightly tweaked, because some configu-ration options trigger dependencies on symbols that might not be available on thetarget system. For example, if the CONFIG_FUNCTION_TRACER option is enabled, allfunctions will call the symbol __fentry__ at the beginning to enable ftrace func-tionality in the kernel (Rostedt, 2009). Any module compiled with this will dependon the __fentry__ symbol which is not available on kernels without ftrace.

We have evaluated our approach on multiple Linux distributions and kernel versionsto provide data on how big the difference in kernel version can actually be while stillbeing able to obtain a physical memory image. We compiled our parasite module onan Ubuntu system with kernel 3.8.0-34. We do not believe this technique will workon 2.4 kernels due to massive changes in module loading and relocation architecture,so we did not test these (Salzman et al., 2001).

We have tested our module on six different kernels and distributions as shown inTable 6.1. All tested systems had a number of suitable modules available, withnewer kernels providing 14 to 15 different suitable host modules, and older kernels4 to 6.

Our technique was successful in acquiring memory from all tested systems withoutcrashes or any other major problems.

84

6.5 Summary

6.5 Summary

In this chapter we have illustrated the creation of an ELF relinking library thatis capable of injecting a kernel module into another module, while taking care ofall string, symbol, and relocation table dependencies. With a technique we havenamed relocation hooking we have leveraged the information contained in a modules’relocation tables to steal its data structures and use them to interact with the kernelin a stable manner.

Furthermore, we have developed a physical memory acquisition kernel module thatis independent of the version of the running kernel. It has been stripped down to thebare essentials, and requires only two kernel APIs to function, because it uses ourkernel independent memory mapping technique developed in Chapter 5. With therelinking library we are able to load the binary module on any Linux kernel between2.6.38 to 3.10, regardless of configuration or compiler options.

Testing shows our approach has no negative impact on system stability and providesreliable access to physical memory. This simplifies memory forensic procedures sig-nificantly and allows for physical memory acquisition even on systems where kernelheaders are not available. It also minimizes the impact on the target system, asthere is no need to install a build environment and compile software on the systemthat is to be analysed.

85

Chapter 7

Acquisition and Analysis of Compromised Firmware

In 2010, computer security researcher Dragos Ruiu noticed some very strange be-haviour in his computers (Goodin, 2013). A number of his machines would suddenlydelete data or change their configuration without prompting. He started inves-tigating this issue and claimed to have discovered a malware species that infectsthe system firmware and actively propagates to other computers by modifying thefirmware of connected USB devices. If a USB flash drive from an infected machinewas plugged into a clean system it would suddenly exhibit the same symptoms. Heeven suspected the malware to communicate with other infected systems that werenot connected to any network by use of High Frequency (HF) sounds. The malwarein this case was subsequently named BadBIOS.

Ruius was never able to prove his claims and present evidence of the malwaresexistence. And while the capabilities of BadBIOS might sound like straight out ofa science fiction movie, researchers have shown that bridging of air gaps using HFis indeed viable (Hanspach and Goetz, 2013; O’Malley and Choo, 2014). Nohl et al.(2014) also demonstrated that it is possible to infect the firmware of USB deviceswith malicious software that can completely take over a system in fractions of asecond without requiring user interaction. So even if BadBIOS might have onlyexisted in Ruius imagination, it is possible to create such software.

BIOS and UEFI have also been successfully attacked by researchers in the past(Wojtczuk and Tereshkin, 2009; Loukas, 2012). Firmware attacks have even beenspotted in the wild. For example, the Mebromi malware has the ability to infectspecific versions of Award BIOS to ensure its persistence on infected hosts (Giuliani,2013). Recently leaked documents also show that state actors have been usingthis attack vector for a long time. The NSA internally advertises a software calledDEITYBOUNCE, capable of infecting the BIOS of Dell servers since 2007 (Schneier,2014).

What makes this threat so dangerous is that it is extremely hard to detect. Thereis no anti-virus software on the firmware level and SMM allows malware to leveragean execution environment that is completely hidden from the rest of the system.To detect and analyze malicious firmware it’s necessary to obtain the contents ofthe firmware ROMs. This can either be accomplished using a hardware ErasableProgrammable ROM (EPROM) programmer (The Coreboot Project, 2009) or bysoftware that interacts with the ROM chip (The Flashrom Team, 2013). For exam-ple, the Copernicus project (Butterworth et al., 2013) aims at extracting maliciousfirmware code and data directly over the SPI bus. Because this approach is vulner-

87

7 Acquisition and Analysis of Compromised Firmware

able to malicious software running in SMM, the latest implementation utilizes theIntel TXT extensions (Intel Corporation, 2014e) to isolate the acquisition modulefrom other parts of the system (Kovah et al., 2014).As we will show in this chapter, current memory forensic technology is also com-pletely oblivious to malicious firmware. In order to mitigate this research gap, wepresent a comprehensive study on current firmware rootkit techniques, the tracesthey leave on infected systems, and propose methods for identifying them in thecourse of memory forensic investigations. Utilizing the memory mapping and enu-meration methods we illustrated in Chapter 5, we show that it is possible to readfirmware code and data from the systems memory bus. With this knowledge, we de-velop tools and techniques to integrate firmware acquisition into the forensic memoryacquisition process. Our insights are implemented into standard open-source toolswhich are published as part of the Rekall project (Cohen, 2014b). We evaluateour work using a proof-of-concept ACPI rootkit implementation and manipulatedfirmware images.


The remainder of this chapter is outlined as follows: In Section 7.1, we present asurvey of current firmware rootkit techniques and their implications for memoryforensics. We then describe a method for enumerating and acquiring firmware codeand data from the memory bus in Section 7.2. In Section 7.3, we discuss the analysisof the acquired data, followed by an evaluation of how well these insights are alreadyincorporated in common forensic suites and applications in Section 7.4. Aspects andlimitations that need to be considered when applying the respective concepts in real-world investigations are discussed in Section 7.5. We conclude with a short summaryof our work in Section 7.6.

7.1 Rootkit Strategies for Compromising Firmware

In this section we present a survey of the current state of the art in x86 firmware-based malware techniques. We group exploits by technology used and point outthe traces that are recoverable using memory forensics. While firmware rootkits arehighly target-specific and require a lot of in-depth knowledge to develop, malwareauthors have demonstrated that building working prototypes is feasible, and variousapproaches have already been adopted by different species “in the wild” (Giuliani,2013).

7.1.1 BIOS- and EFI-Based Attacks

As Bulygin et al. (2014) report, a huge number of BIOS/EFI attacks were success-fully carried out in the past. Despite update signature verification, secure boot, and

88

7.1 Rootkit Strategies for Compromising Firmware

other security measures at the firmware level, many feasible attack vectors still exist.In the following, we give a brief overview of common system compromise strategies.

When an x86 computer is first switched on, the ROM containing the firmware isinitially writable through the SPI bus. This functionality is necessary to permitlegitimate installation of new firmware updates. On the other hand, before controlis handed to the operating system, SPI flash must be properly locked down toprevent software from overwriting the ROM. However, many vendors fail at thesetasks and leave the respective areas open for manipulation (Bulygin, 2013; Bulyginet al., 2013). As a consequence, malicious code may flash the firmware ROM directlyfrom kernel space and incorporate malevolent functionality.

In addition, most BIOS update implementations do not require a cryptographicsignature. They process any source file as long as it matches a given format. Thisflaw was exploited by the Mebromi rootkit to infect versions of Award BIOS (Bulyginet al., 2014). In contrast, modern firmware technologies based on EFI are morewary of such attack vectors and attempt to verify update requests more rigorously.However, the respective algorithms may contain unintentional errors and, thus, besusceptible themselves as Wojtczuk and Tereshkin (2009) argue.

Even with all software measures perfectly implemented, a malicious adversary atan arbitrary position in the supply chain can modify a system’s firmware with thehelp of a flash programmer. As recently outlined by Brossard (2012), the origi-nal firmware image can be replaced with a malicious one using open firmware likeCoreboot (Minnich, 2014), SeaBIOS (O’Connor, 2014), or iPXE (Brown, 2014).

All these attacks ultimately result in reprogramming of the firmware flash ROM.As laid out in Section 2.3, this ROM chip is mapped on the memory bus from 0xF0000 to 0xFFFFF. It is thus possible to include this region into a memory imagefor analysis.

7.1.2 PCI Option ROM-Based Attacks

Because some PCI devices require custom initialization, system firmware loads andexecutes any option ROM provided by devices during boot time. This code runs infirmware context while SPI flash is unlocked and can therefore patch the firmwareROM effortlessly. For instance, as Brossard (2012) points out, it is possible to load abootkit over the built-in Wifi or WiMax devices of the system by flashing a maliciousoption ROM onto a network card. Thereby, firewalls or intrusion detection systemscan be bypassed.

A vulnerable firmware version can also be directly exploited over the network: Triulzi(2010) outlines techniques for remotely reflashing the firmware of specific networkcards. Even worse, because PCI devices have unrestricted access to physical memory,additional malicious code may be downloaded in order to further propagate into thelocal network.

89


Last but not least, a system may also be compromised using a malicious device thatis attached over a hardware port and initiating a subsequent reboot. For example,Loukas (2012) shows how an Apple computer may be infected with malware byconnecting a small ethernet adapter to the Thunderbolt port. Because Thunder-bolt hardware has direct access to the PCI bus and, thus, to physical memory, themachine is prone to attack, in correspondence to our previous explanations. Ad-ditionally, Hudson (2014) demonstrate that it is possible to infect the EFI from amalicious Thunderbolt option ROM.

The previously described attacks result in the introduction of one or more new PCIoption ROMs into the system. Firmware maps this ROM somewhere into the phys-ical address space and stores a pointer to its location in PCI configuration space.Similarly to the firmware ROM, option ROMs can also be read over the mem-ory bus, and thus their code can also be included into a memory image. Further-more, firmware copies option ROMs into the option ROM memory area (0xC0000 -0xE0000) for execution (see Section 2.3.3). This area is actually RAM and shouldalso be included into a memory image.

7.1.3 ACPI-Based Attacks

ACPI programs run in kernel space and therefore have full permission to operate onthe physical address space. Even though sensitive data structures could theoreticallybe protected efficiently by filtering the respective instructions in the AML virtualmachine, such restrictions have not yet been implemented in any major operatingsystem to the best of our knowledge. Neither Linux up to kernel 3.15 nor Windowsup to version 8 have security measures in place to prevent ACPI programs fromsubverting the system core. Because the ACPI tables are provided by the firmware,they are implicitly trusted. In the presence of a skilled adversary, this assumptionmay be potentially devastating.

The vulnerability we have just outlined can be exploited in several ways: First, itis possible to patch the ACPI tables directly in the firmware image. In addition,because the tables are copied to memory and must be identified by the operatingsystem, a malicious bootkit has the chance of modifying them prior to this process.Alternatively, a manipulated version of the tables can be placed right in front ofthe firmware-provided copy. Since the location of tables is not strictly defined andmust be retrieved by the operating system with the help of a signature-based scan(see Section 2.3.4), only the manipulated version is found, while the original andlegitimate code is never executed. As a consequence, an ACPI rootkit may beembedded in either the firmware ROM on the mainboard, in any PCI option ROM,on a connected PCI device, or even as part of an EFI driver module. Detection andremoval of such a threat is cumbersome, and most of the described methods evensurvive a complete wipe of the hard disk.

90

7.2 Enumeration of Firmware in the Physical Address Space

A proof of concept implementation of an ACPI rootkit for the Linux kernel hasalready been published (see Heasman, 2006). The rootkit hooks all unused systemcalls by overwriting the sys_ni_syscall() function with the instructions callebx; ret;. Because the ebx register is controlled by code running in user space,effectively all programs with an arbitrary privilege level are able to execute codein kernel space. The concept can be used to, e.g., illegitimately gain additionalpermissions or load additional kernel rootkits even in case kernel module loadinghas been disabled. However, at the point of this writing, we are not aware thatthese insights are being actively abused by malicious programs “in the wild”.

No matter what the original attack vector was, the ACPI tables have to be placedinto RAM for the OS to find and execute them. If not already present, they shoulddefinitely be included into a memory image. If they are supplied by the firmwareor a malicious option ROM, all firmware ROMs should be included in the memoryimage.


As we have shown in Section 7.1, there are many regions in the physical address spacethat contain firmware code and data. Figure 7.1 illustrates the layout of the addressspace on a machine, we specifically set up for testing. Not highlighted are regionscontaining physical RAM and are marked “Memory”. These are already acquiredwith a standard physical memory dump (see Section 3.2). The blue regions containfirmware code or data that can be accessed through the memory bus. They have tobe incorporated into the memory image if firmware analysis should be performed.Regions marked in red represent memory-mapped I/O and must not be touched.Just reading from these regions can cause an interrupt on the device, thus leadingto data corruption and system crashes.

7.2.1 Enumeration of the Physical Address Space

As pointed out in Section 3.2, memory acquisition software commonly relies on theoperating system to identify and map physical memory. Precisely, imaging programsduplicate solely those parts of the address space that are explicitly marked as RAM.On Microsoft Windows, the MmGetPhysicalMemoryRanges API can be used to querythe memory manager for the physical memory layout. However, further but lesscommon methods do exist: On systems with a BIOS, for instance, the firmwarememory map may be queried in real-mode by setting the eax register to 0xE820and repeatedly invoking interrupt 0x15. This method is usually applied by the bootmanager, and the retrieved information is passed to the operating system for furtherprocessing. During runtime, it is not advisable to manually switch to real-mode froma driver as this can cause system instabilities. Fortunately, since Windows Vista,

91


Memory

EBDA

Video Window

PCI Option ROMs

Lower BIOS

Upper BIOS

Memory

ACPI Tables

PCI MMIO

PCI MMIO

PCI MMIO

0x00000000

0x0009FC00

0x000A0000

0x000C0000

0x000E0000

0x000F0000

0x00100000

0x7FFF0000

0xE0000000

0xE8000000

0xF0000000

0xF0020000

0xF03FFFFF

0xF080C000

0xFFFFFFFF

Physical Memory Ranges

Figure 7.1: Firmware Memory Ranges

the kernel’s HAL includes an undocumented BIOS emulation module that permitsdrivers to access BIOS services directly (Chappell, 2010).

Each memory enumeration method provides a unique view of the physical addressspace. None of them is entirely accurate though, because most devices (especiallyon the PCI bus) are not directly managed by the operating system but by a vendor-supplied driver. In Figure 7.2, we present a comparison of three major sources ofinformation on the physical address space. Regions that contain firmware code ordata are marked in blue, while regions that are reserved by devices and must not beread are marked in red. Unmarked regions are unknown, they might be backed bymemory or mapped by devices. If the location of device MMIO regions is unknown,software must not access unmarked regions to ensure system stability.

The most incomplete view of the physical address space is returned when queryingthe MmGetPhysicalMemoryRanges API in the windows memory manager, as seen onthe right of the figure. As we have argued in the previous section, memory imagingprograms only acquire those ranges that are identified as being “available” by theoperating system. For safety reasons, other areas are ignored, including regions ofmemory that are used by the firmware. For this reason, memory images obtained

92


EBDA

Video Window

PCI Option ROMs

Lower BIOS

Upper BIOS

PCI MMIO

PCI MMIO

PCI MMIO

APIC + BIOS ROM

0x00000000

0x0009FC00

0x000A0000

0x000C0000

0x000E0000

0x000F0000

0x00100000

0xE0000000

0xE8000000

0xF0000000

0xF0020000

0xF03FFFFF

0xF080C000

0xFFFC0000

0xFFFFFFFF

Static + PCI

Memory

Reserved

Reserved

Memory

ACPI Reclaim

reserved

BIOS E820

Available

Available

0x00000000

0x0009F000

0x00100000

0x7FFF0000

0xFFFFFFFF

Memory Manager

Figure 7.2: Views on the Physical Address Space

through this method are not suited for firmware examinations. With respect toa test system we analyzed, a created memory image only contains two ranges ofphysical memory. The remaining regions in the image are either zero-padded ornot part of the image at all (e.g., when using the crash dump approach (MicrosoftCorporation, 2011)).

As depicted in the center part of Figure 7.2, the BIOS provides a better view of thephysical address space. Additionally to the memory regions identified by the memorymanager, the BIOS also keeps track of memory used by ACPI. Furthermore, thereare 3,072 bytes of memory right at the end of the first memory region that theoperating system does not know about (hidden memory, as illustrated in Section4.3). (U)EFI offers a similar service to the BIOS memory map. However, becauseit is a boot service, it is not available anymore once the boot manager has handedcontrol to the operating system. The layout and classification of memory ranges isthe same though.

93


The most exhaustive map of the physical address space can be constructed by in-tersecting knowledge from the architecture specifications with an enumeration ofPCI configuration space. This view is illustrated on the left side of Figure 7.2: Asdiscussed in Section 2.3, the physical address space layout in the first megabyte iswell-defined. There are designated regions in the physical address space for PCI op-tion ROM execution, the BIOS/UEFI and EBDA. Note that the mentioned firmwareROMs in these regions are not actually mapped ROMs anymore. Due to performancereasons, firmware migrates into memory during initialization (see Sections 2.3, 2.3.3).It is therefore safe to read from these addresses and perform memory acquisition justlike with regions that are explicitly marked as RAM.

The memory layout above the first megabyte is not defined and depends on theamount of installed memory as well as on the number of installed devices. Becausethe latter map registers and memory into this part of the address space, simplyiterating through the entire area would be a dangerous process since the respectiveoperations could trigger interrupts and result in undefined behaviour and the lossof data. Therefore, in order to avoid instabilities, software needs to consult thefirmware or operating system upon what areas are safe to read.

Because the ACPI tables lie somewhere outside of the memory regions reportedby the operating system, it is prudent to acquire as much memory from the upperpart as safely possible. Furthermore, it is trivial for malware to hook the kernelmemory enumeration APIs and hide from the acquisition. Because the real dangerof accessing memory outside the available regions comes from touching PCI devicememory, it is best to simply exclude all MMIO regions and acquire all remainingsections. This can be accomplished by use of our PCI memory enumeration techniqueshown in Section 5.1.1.

To sum up, the non-red regions on the left of Figure 7.2 do not necessarily containRAM. Reading from parts of the physical address space that are not mapped simplyreturns zeroes1. The resulting image is significantly larger than an image that solelycomprises ranges being marked as “available” but includes the entire firmware codeand data.

7.2.2 Mapping of Memory and Firmware Regions

Some of the firmware regions in the physical address space we have identified are ac-tually RAM. The ACPI tables, EBDA and the PCI option ROM area in the first MBare stored in memory and can thus be accessed using conventional methods like kmap. Others, like PCI option ROMs are memory-mapped I/O which can cause problemswith standard kernel memory mapping functions due to caching constraints. Whileit is possible to use iomap_nocache on Linux, or MmMapIOSpace on Windows to ac-cess them, we prefer to bypass the operating system for accessing device memory. If

1 It is possible that some systems return another pattern or even data that is still on the bus froma previous read. However, we have not witnessed such behavior during our tests.

94

7.3 Firmware Analysis

an area of memory has already been mapped by a driver or even the kernel itself, carehas to be taken to conform to caching attributes to avoid memory corruption. TheWindows kernel will actually prevent any attempts to map memory that has alreadybeen mapped with different caching attributes, making use of standard operatingsystem memory mapping facilities unreliable (Vidstrom, 2006).

We can use the PTE remapping technique described in Section 5.1.2 to map firmwarememory. In fact, our implementation in Chapter 5 is already capable of acquiringfirmware this way. Because our method uses a separate mapping and is guaranteedto only read from this mapping, we can avoid running into problems with cachecoherence and alignment requirements. The operating system can not interfere withthis because we bypass the memory management APIs and create the mappingmanually. The resulting memory image now contains all memory, firmware codeand data, and can be analyzed using standard tools like Rekall (Cohen, 2014b) orVolatility (Walters, 2014).

7.3 Firmware Analysis

Firmware implementations are platform dependent, and executable formats and codecompression schemes vary from vendor to vendor. It is out of the scope of this thesisto present generic firmware code analysis and verification solutions. However, sincethe memory locations of firmware code are clearly defined, it is trivial to disassembleit with the Rekall dis plugin or extract it to the filesystem with the dump plugin foranalysis with specialized software like IDA Pro (Hex-Rays, 2005).

ACPI code on the other hand allows for more automation on the analysis side.We have created two plug-ins for the Volatility (Walters, 2014) and Rekall (Cohen,2014b) frameworks, one for dumping the ACPI tables from a memory image, andanother one for scanning the respective tables for potential rootkits.

To acquire the ACPI tables from memory we have mirrored the process used by theOS to find them (see Section 2.3.4). First, a signature-based scan for the RSDTis performed. When the RSDT is found, we follow the pointers inside to locatethe other ACPI tables. Our plugin then writes the tables out to the filesystem foranalysis.

For analysis-related tasks, we first decompile and, in a second step, examine thetables for signs of malicious behavior. The central technique for manipulating ker-nel memory from an ACPI program is the definition of so-called operating regions.They determine which part of the address space will be modified. Our method fordetection of malicious behavior is thus to identify all operation regions that referencekernel memory. An investigator can then use this information to focus on exactlythose sections of the ACPI program during investigation.

The plugin utilizes the official AML decompiler (Intel Corporation, 2014a) to trans-form the AML code into ACPI Source Language (ASL). The resulting ASL code

95


is subsequently scanned, and all operation regions referencing critical memory areflagged as suspicious, i.e., parts of physical memory that contain kernel code anddata.

7.4 Evaluation

We have evaluated the created tools for stability, correctness, and, in case of theACPI triage plug-in, for rate of detection and number of false positives and negatives.We set up several physical as well as virtual machines and created duplicates of theirphysical address space. The machines comprised the following configuration:

• A Lenovo x220 notebook with an Intel Sandy Bridge CPU and 8 GBs of DDR3RAM running Ubuntu 12.04 x64

• A Dell workstation with Intel Ivy Bridge CPU and 8 GBs of DDR3 RAM runningWindows 8.1 x64

• A virtual machine based on VirtualBox with 4 GBs of RAM running Debian 7x64 with Kernel 3.2.41

• A virtual machine based on VirtualBox with 2 GBs of RAM running Windows7 SP1 x64

7.4.1 Stability and Correctness of the Acquisition Method

All acquisition operations were successfully completed every time. We could iden-tify the firmware regions in every image with corresponding data. We were notable to verify the firmware though, because we did not have access to EEPROMreprogramming hardware and, thus, did not have access to the original contents ofthe firmware ROM. Additionally, because most firmware implementations are com-pressed to save space, proper verification would require reverse engineering of thefirmware compression algorithm and analysis of the decompressed ROM image. Toestablish correct firmware acquisition without access to the ROM nonetheless, weleveraged features of virtualization software. Specifically, qemu-kvm (Linux KernelOrganization, 2014) permits loading custom BIOS images over the -bios commandline option. With the help of the -option-rom parameter, it is possible to load acustom Option ROM as well.We started a qemu-kvm-based virtual machine with a version of SeaBIOS (O’Connor,2014) and an iPXE Option ROM (Brown, 2014). By acquiring memory from insidethe virtual machine, we obtained an image with known BIOS and PCI Option ROMcode. We were able to find fragments of the iPXE and SeaBIOS images in the cre-ated memory images at their expected locations. In addition, we could identify partsof the dumped firmware to come from the supplied ROM images. Other parts wereheavily modified though and are likely to have been space-optimized in memory.Further experiments are needed in the future to confirm these assumptions.

96

7.4 Evaluation

Acquisition Tool Firmware Acquired

Memoryze ✗

FTK Imager ✗

Moonsols DumpIt ✗

WinPmem ✗

WinPmem (pci) ✓

WindowsMemoryReader ✗

LiMe ✗

Pmem ✗

Pmem (pci) ✓

Table 7.1: Firmware Acquisition Capabilities of Memory Forensic Software

7.4.2 Comparison with Available Memory Acquisition Solutions

We have evaluated a large group of freely-available memory acquisition solutions tosee if they are capable of correctly obtaining firmware code and data. The resultsof our evaluation are depicted in Table 7.1. Thereby, an entry labeled with theextension pci means that the respective version of the program supports PCI addressspace enumeration (see Section 7.2.1). As can be seen, only those two versionswere able to acquire all firmware code and data. All other tools simply imagedthe “available” ranges supplied by the Windows Memory Manager or, on Linuxsystems, by the iomem_ressource tree (see Section 3.2.2), and do not contain anyfirmware-related code or data.

7.4.3 Detection of ACPI Rootkits

We created a simple ACPI rootkit that is capable of modifying the Linux kernel andsetting up a hidden backdoor, analogously to the proof of concept application byHeasman (2006) as described in Section 7.1.3. The rootkit was installed on five vir-tual machines running Fedora 19, Ubuntu 12.04, Debian 7, OpenSuse 12.3, and Win-dows XP as well as two physical Intel Sandy Bridge systems running Ubuntu 12.04.Each system was analyzed with the help of the scanner plug-in we developed forthe Volatility framework (see Section 7.3). Further tests were conducted with non-infected ACPI tables of original manufacturers as well as manually manipulatedtables that covered a wide range of malicious accesses to kernel memory. Objectiveof our experiments was to examine ACPI-related data structures and automaticallydistinguish potentially infected components from legitimate program parts. In to-tal, 299 operation regions were evaluated. The corresponding results are shown inTable 7.2.

97


Correctly Classified Falsely Classified∑

Malicious 13.0% 16.4% 29.4%Benign 61.9% 61.9%Unknown 8.7% 8.7%∑ 83.6% 16.4% 100%

Table 7.2: Classification of Operation Regions in the ACPI Test Data Set

As can be seen, the scanner flagged 29.4% of all operation regions as malicious.In reality however, only 13% of these regions represented true rootkit activity. Theremaining 16.4% were erroneously reported due to legitimate memory accesses in theAML virtual machine. In contrast, 61.9% of the operation regions were correctlyrecognized as benign and do not reference any kernel memory. Last but not least,8.7% of the regions could not be evaluated because their respective arguments weredynamic. If the parameters of a region depend on a variable or the result of afunction call, it is impossible to determine the target of the operation with staticcode analysis. Evaluating those would require the state of the runtime environmentat the given time they are executed. The missing regions can thus not be classifiedand have to be manually analyzed.

Our results can be summarized as follows: On the one hand, due to our plug-in,61.9% of all memory accesses do not need to be examined in detail and may safelybe ignored in the course of an investigation. As such, forensic practitioners benefitfrom considerable time savings and are able to focus on the relevant sections of anACPI program. On the other hand, with 16.4%, the number of false positives is stillrather high. As we have already indicated, these mis-classifications stem from thefact that we were unable to distinguish accesses to regions that belong to legitimateACPI memory from those that access actual kernel data structures. To decreasethe false positive rate, an in-depth analysis of the ACPI environment of the kernelwould be necessary. For this task, further research must be conducted in the future.

7.5 Discussion

Even though our approach is capable of reliably acquiring all firmware code and dataand may be easily integrated with existing memory forensic procedures, practitionershave to be aware of technological limitations. A brief discussion of these will besubject of the following sections.

98

7.6 Summary

7.5.1 Technological Limitations

Some firmware rootkits cannot be detected with software-based memory forensicmethods. Any rootkit that completely isolates itself from the CPU-accessible mem-ory falls into this category. SMM rootkits, for instance, patch the BIOS to injectcode into System Management Mode. This code is run when a SMI is triggered. TheSystem Management Mode comprises its own address space, i.e., SMRAM, and isstrictly separated from accesses by kernel or user space applications. This restrictionis enforced by the memory controller and can not be bypassed when the respectiveconfiguration registers have been set up correctly. By a similar reasoning, maliciousprograms running on the ME (Stewin and Bystrov, 2012) cannot be discovered.The only way of obtaining a copy of the respective memory regions would be toperform a RAM transplantation attack (Halderman et al., 2008). For this purpose,physical access to the machine and a system reboot would be required. On systemswith DDR3 RAM there is currently no way to do this due to data scrambling (seeSection 1.2).

7.5.2 Anti-Forensics

It is also possible for firmware to hide or even wipe malicious code and data fromRAM before the acquisition process commences. If the only malicious componentthat is still in memory at runtime resides in SMRAM, it is protected by the mem-ory controller and will not appear in the memory image. Any bootstrapping codein the firmware can be wiped from memory after performing its designated task.In this situation, the only way of acquiring the malicious code is by either using aflash programmer to physically read the ROM chip or running a tool like Coperni-cus (Butterworth et al., 2013) if Intel TXT is available.

7.6 Summary

In this chapter, we have discussed possibilities for rootkits and other sophisticatedmalicious applications to compromise x86 systems at the firmware level. Althoughyet rarely seen “in the wild”, these types of attacks are highly dangerous and maybe particularly devastating because the base of the machine is subverted at a veryearly point of time, and corresponding traces are easily overlooked during typicalsystem investigation routines. As we have seen, common memory forensic solutionsdistributed on the market to date fail to properly acquire the respective sources ofthe physical address space and are therefore ill-prepared in the course of an incident.We have adapted the techniques developed in Chapter 5 to enable investigators toacquire firmware code and data in the course of a memory forensic investigation.We have also created two plug-ins for the Volatility and Rekall forensic frameworksto facilitate inspection of the ACPI environment and discover traces of malevolentbehavior more quickly.

99

Chapter 8

Conclusion

Memory forensics has become a powerful tool in the arsenal of incident responders,forensic investigators, and malware analysts. It can provide an unfiltered view onthe internals of operating systems and programs, uncovering artifacts hidden bymalicious software, such as processes, threads, and network connections. As memoryis volatile and cannot be accessed by user-mode programs directly, its contents mustbe made available for analysis by acquiring it into a memory image. Because ofthe physical access requirement and practicality issues elaborated in Section 1.2,memory acquisition is mostly performed by software. This process is vulnerable toanti-forensics by malicious software, which we try to remedy in this thesis.

8.1 Summary

In Chapter 3 we have given an overview of the current state of the art of softwarememory acquisition, which we define as the process of creating a copy of physicalmemory called memory image. We have outlined the criteria that we use to classifythe quality of memory images, and pointed out the importance of correctness inregard to obtaining a “true” and complete copy of the systems physical memory.We have analyzed the two main challenges software must solve to acquire physicalmemory: memory enumeration and memory mapping. Memory enumeration refersto the task of locating RAM in the physical address space. Because the physicaladdress space is not continuous and contains MMIO regions interleaved with regionsbacked by RAM, software must determine the location of all RAM regions to avoidaccessing device memory, which can cause system instability. Memory mappingrefers to the creation of a mapping in the virtual memory of the acquisition process.Because of memory protection software cannot access physical memory directly, buthas to create an entry in the page tables to get access to a specific physical page. Inour analysis of 12 forensic memory acquisition programs for Windows, Linux, andOS X, we found that all of them rely on the operating system to enumerate andmap physical memory. We have given an overview of the operating system APIsused by the software, and implemented a memory acquisition framework for OS Xcalled OSXPmem.

The reliance of memory acquisition software on the operating system for its mostcritical tasks make it prone to subversion by anti-forensic software. In Chapter 4,we have given an overview of anti-forensic techniques against memory enumerationand memory mapping, as well as passive techniques that utilize unknown regions

101

8 Conclusion

of physical memory we call hidden memory. To demonstrate the severity of theproblem, we have implemented a selection of anti-forensic techniques for Windows,Linux and OS X. Using these proof-of-concept implementations, we have performedan evaluation of the 12 memory acquisition tools introduced in the previous chapter.We found that none of the analyzed programs were able to acquire a memory imagewith our anti-forensic techniques in place. The techniques we have demonstratedare generic and can be extended by an attacker to selectively hide information frommemory acquisition tools. The simplicity of these methods emphasizes the needfor software memory acquisition techniques that are resilient against anti-forensicattacks.

To counter the attacks presented in the last chapter, we have developed a soft-ware memory acquisition technique that does not rely on the operating system formemory enumeration and mapping, which we introduce in Chapter 5. Instead ofenumerating available physical memory regions, we query the hardware directly toidentify all MMIO regions that are mapped into the physical address space. Thisenables our software to safely access the entire physical address space, while avoidingto read from device memory, which can destabilize the system. We map memoryby allocating a page of memory we call the rogue page. By walking the page tableswe locate the page table entry used by the MMU to map the physical frame for therogue page. We then directly modify the frame number in this entry to point to thetarget page. After flushing the rogue page from the TLB this causes the MMU todirect all further memory accesses for the rogue page to the target page in physicalmemory. Our evaluation showed that we can reliably acquire all physical memorywith this technique, even on systems that have been subverted by our anti-forensictools. Finally, we have discussed possible anti-forensic techniques that could stillwork against our approach. We have identified debug register rootkits and shadowpaging as the only conceivable attacks on the same privilege level that could work,and presented ideas on how to further improve our technique to be resilient againstthese two methods.

One of the key benefits of our memory enumeration and mapping method introducedin the previous chapter is that it is operating system independent. This makesit ideal for solving a problem in Linux memory acquisition: The requirement ofhaving to compile an acquisition kernel module on a system with the exact sameconfiguration as the target, or even worse, on the target itself. This is mandatoryto maintain system stability, because the layout of data structures changes withdifferent versions and configurations of the kernel. In Chapter 6, we have illustratedthe creation of a minimal memory acquisition module for Linux, that is independentof the kernel version and configuration used. We have adapted methods normallyused in rootkits to inject this module into a compatible host module on the target,and then instrument the host module’s data structures, to redirect control flow andcommunicate with kernel APIs. This approach has allowed us to create a memoryacquisition program that can be distributed as a statically linked binary. It is able to

102

8.2 Future Work

relink a dynamically selected host module on the target system on the fly, convertingit into a memory acquisition module that is fully compatible with the running kernel.

A second novel property of our approach is that, due to MMIO enumeration it cansafely acquire more than just physical memory. In Chapter 7, we have given anoverview of memory used by the system firmware and shown that current publiclyavailable memory acquisition software is incapable of acquiring firmware code anddata. With the use of the memory enumeration and mapping techniques developedin Chapter 5, we were able to acquire all firmware memory regions that are notprotected by SMM, including the BIOS/UEFI ROM, PCI option ROMs, and theACPI tables. To aid investigators in their analysis of malicious firmware, we havedeveloped plugins for memory analysis frameworks that help identify ACPI codeaccessing operating system memory regions.

8.2 Future Work

While the techniques we have developed in this thesis have furthered the anti-forensicresilience of software memory acquisition, there is still potential for improvement. Tobecome immune to debug register based PCI device simulation, our memory enumer-ation procedure can be improved to use the PCIe-based ECAM mechanism to accessPCI configuration space. It is also possible to obtain memory geometry directly fromthe memory controller, making PCI enumeration unnecessary. The Intel iMC, forexample, makes its memory configuration registers available through MMIO (IntelCorporation, 2013). Because these registers are directly used for memory routing,they are locked once the physical address space is configured. This makes them areliable source of information regarding the address space layout, because its im-possible for malicious software to tamper with them. It also reduces the danger ofaccessing regions of memory mapped to devices not on the PCI bus, making theapproach more stable.

Furthermore, instead of using a rogue page to map memory, we suggest utilizing aprivate page table hierarchy. While this approach is much more complicated, it isresilient against shadow paging and solves the problem of requiring a physical-to-virtual translation function for operating systems with kernel Address Space LayoutRandomization (ASLR). This can also improve the acquisition speed of our ap-proach, because the private page tables can map the entire physical address space,which allows us to use larger buffers and reduces the amount of necessary TLBflushes.

In addition to the correctness of images, their atomicity and integrity are also prob-lematic when acquired by software. Recent research has shown that low levels ofatomicity in an image make it difficult to integrate the page file into the memoryanalysis process (Richard and Case, 2014). By hooking the page fault handler andmarking all memory as non-writable, software could implement a lazy-dumping ap-

103

8 Conclusion

proach similar to the one used in virtualization-based memory acquisition software(Martignoni et al., 2010).

Finally, with the injection of a memory acquisition module into arbitrary Linuxkernels we have solved the kernel version problem for the acquisition side, but not forthe analysis side. For sophisticated analysis of the acquired memory dump we needto gather information on symbols and data structures. The Rekall (Cohen, 2014b)and Volatility (Walters, 2014) projects for example refer to this as a profile. Thisprofile is usually built by compiling a kernel module with debugging informationfor the exact kernel version on the target, which is then parsed to extract datastructure layout and symbol information (Hale, 2013). When the kernel version andconfiguration is not known or available, this is not possible. However, informationon the kernels data structure layout is contained in the relocation tables of modulesand the kernel binary itself. Future work can utilize this information to build apartial profile for the target system. By extracting and parsing this information wecan get an understanding of the layout of parts of certain data structures. With thisknowledge we can build a partial profile without having access to kernel headers andconfiguration files.

104

Bibliography

AccessData (2012). FTK Imager. http://www.accessdata.com/, 2012.

Accetta, Mike; Baron, Robert; Bolosky, William; Golub, David; Rashid, Richard;Tevanian, Avadis; Young, Michael (1986). Mach: A New Kernel Foundation forUNIX Development. In Proceedings of the USENIX Summer Conference (pp.93–112)., 1986.

ACPI Promoters Corporation (2013). Advanced Configuration and Power InterfaceSpecification – Revision 5.0 Errata A. http://acpi.info/DOWNLOADS/ACPI_5_Errata%20A.pdf, 2013.

Advanced Micro Devices (2011). AMD64 Architecture Programmer’s Man-ual. http://developer.amd.com/resources/documentation-articles/developer-guides-manuals/, 2011.

Afek, Yehuda; Attiya, Hagit; Dolev, Danny; Gafni, Eli; Merritt, Michael; Shavit, Nir(1993). Atomic Snapshots of Shared Memory. Journal of the ACM, Volume 40(4),pp. 873–890, 1993.

Allievi, Andrea (2014). Understanding and Defeating Windows 8.1 Kernel PatchProtection. http://www.nosuchcon.org/talks/2014/D2_01_Andrea_Allievi_Win8.1_Patch_protections.pdf, 2014.

Anderson, David (2008). Red Hat Crash Utility. http://people.redhat.com/anderson/crash_whitepaper, 2008.

Apple Inc. (2009). IOKit Device Driver Design Guidelines. https://developer.apple.com/library/mac/documentation/DeviceDrivers/Conceptual/WritingDeviceDriver/, 2009.

Apple Inc. (2013a). IOMemoryDescriptor Class Reference. https://developer.apple.com/library/mac/documentation/Kernel/Reference/IOMemoryDescriptor_reference/, 2013.

Apple Inc. (2013b). Kernel Programming Guide. https://developer.apple.com/library/mac/documentation/Darwin/Conceptual/KernelProgramming, 2013.

ATC-NY (2012a). MacMemoryReader. http://cybermarshal.com/index.php/cyber-marshal-utilities/mac-memory-reader, 2012.

ATC-NY (2012b). WindowsMemoryReader. http://cybermarshal.com/index.php/cyber-marshal-utilities/windows-memory-reader, 2012.

105

http://www.accessdata.com/

http://acpi.info/DOWNLOADS/ACPI_5_Errata%20A.pdf

http://acpi.info/DOWNLOADS/ACPI_5_Errata%20A.pdf

http://developer.amd.com/resources/documentation-articles/developer-guides-manuals/

http://developer.amd.com/resources/documentation-articles/developer-guides-manuals/

http://www.nosuchcon.org/talks/2014/D2_01_Andrea_Allievi_Win8.1_Patch_protections.pdf

http://www.nosuchcon.org/talks/2014/D2_01_Andrea_Allievi_Win8.1_Patch_protections.pdf

http://people.redhat.com/anderson/crash_whitepaper

http://people.redhat.com/anderson/crash_whitepaper

https://developer.apple.com/library/mac/documentation/DeviceDrivers/Conceptual/WritingDeviceDriver/



https://developer.apple.com/library/mac/documentation/Kernel/Reference/IOMemoryDescriptor_reference/



https://developer.apple.com/library/mac/documentation/Darwin/Conceptual/KernelProgramming

https://developer.apple.com/library/mac/documentation/Darwin/Conceptual/KernelProgramming

http://cybermarshal.com/index.php/cyber-marshal-utilities/mac-memory-reader

http://cybermarshal.com/index.php/cyber-marshal-utilities/mac-memory-reader

http://cybermarshal.com/index.php/cyber-marshal-utilities/windows-memory-reader

http://cybermarshal.com/index.php/cyber-marshal-utilities/windows-memory-reader

Bibliography

BBN Technologies (2006). FRED: Forensic RAM Extraction Device. http://www.ir.bbn.com/˜vkawadia/, 2006.

Becher, Michael; Dornseif, Maximillian; Klein, Christian N. (2005). FireWire –All Your Memory Are Belong To Us. In Proceedings of the Annual CanSecWestApplied Security Conference, 2005.

Bilby, Darren (2006). Low Down and Dirty: Anti-Forensic Rootkits. In Proceedingsof Black Hat Japan, 2006.

Boileau, Adam (2006). Hit by a Bus: Physical Access Attacks with Firewire. InProceedings of Ruxcon, 2006.

Brossard, Jonathan (2012). Hardware Backdooring is Practical. https://media.blackhat.com/bh-us-12/Briefings/Brossard/BH_US_12_Brossard_Backdoor_Hacking_Slides.pdf, 2012.

Brown, Michael (2014). iPXE. http://ipxe.org/, 2014.

Bulygin, Yuriy (2013). Evil Maid Just Got Angrier – Why Full-Disk EncryptionWith TPM is Insecure on Many Systems. https://cansecwest.com/slides/2013/Evil%20Maid%20Just%20Got%20Angrier.pdf, 2013.

Bulygin, Yuriy; Bazhaniuk, Oleksandr; Furtak, Andrew; Loucaides, John (2014).Summary of Attacks Against BIOS and Secure Boot. http://www.c7zero.info/stuff/DEFCON22-BIOSAttacks.pdf, 2014.

Bulygin, Yuriy; Furtak, Andrew; Bazhaniuk, Oleksandr (2013). A Tale of OneSoftware Bypass of Windows 8 Secure Boot. https://media.blackhat.com/us-13/us-13-Bulygin-A-Tale-of-One-Software-Bypass-of-Windows-8-Secure-Boot-Slides.pdf, 2013.

Butler, Jamie (2004). DKOM (Direct Kernel Object Manipulation). Black HatWindows Security, 2004.

Butterworth, John; Kallenberg, Corey; Kovah, Xeno; Herzog, Amy (2013). BiosChronomancy: Fixing the Core Root of Trust for Measurement. In Proceedingsof the 2013 ACM SIGSAC Conference on Computer & Communications Security(pp. 25–36).: ACM, 2013.

Carbone, Martim; Cui, Weidong; Lu, Long; Lee, Wenke; Peinado, Marcus; Jiang,Xuxian (2009). Mapping kernel objects to enable systematic integrity checking.In Proceedings of the 16th ACM conference on Computer and communicationssecurity (pp. 555–565).: ACM, 2009.

Carrier, B.D.; Grand, J. (2004). A hardware-based memory acquisition procedurefor digital investigations. Digital Investigation, Volume 1(1), pp. 50–60, 2004.

106

http://www.ir.bbn.com/~vkawadia/

http://www.ir.bbn.com/~vkawadia/

https://media.blackhat.com/bh-us-12/Briefings/Brossard/BH_US_12_Brossard_Backdoor_Hacking_Slides.pdf



http://ipxe.org/

https://cansecwest.com/slides/2013/Evil%20Maid%20Just%20Got%20Angrier.pdf

https://cansecwest.com/slides/2013/Evil%20Maid%20Just%20Got%20Angrier.pdf

http://www.c7zero.info/stuff/DEFCON22-BIOSAttacks.pdf

http://www.c7zero.info/stuff/DEFCON22-BIOSAttacks.pdf

https://media.blackhat.com/us-13/us-13-Bulygin-A-Tale-of-One-Software-Bypass-of-Windows-8-Secure-Boot-Slides.pdf



Bibliography

Case, Andrew; Marziale, Lodovico; Richard, Golden G (2010). Dynamic recreationof kernel data structures for live forensics. Digital Investigation, 7, pp. S32–S40,2010.

Chappell, Geoff (2010). The x86 BIOS Emulator. http://www.geoffchappell.com/studies/windows/km/hal/api/x86bios/call.htm, 2010.

Chappell, Geoff (2011). Viewing the Firmware Memory Map. http://www.geoffchappell.com/studies/windows/km/hal/api/x86bios/fwmemmap.htm?tx=7, 2011.

Chow, Jim; Pfaff, Ben; Garfinkel, Tal; Rosenblum, Mendel (2005). Shredding YourGarbage: Reducing Data Lifetime through Secure Deallocation. In Proceedings ofthe 14th Conference on USENIX Security Symposium, 2005.

Cohen, Michael (2011). PMEM - physical memory driver. http://code.google.com/p/volatility/source/browse/branches/scudette/tools/linux, 2011.

Cohen, Michael (2012). The PMEM Memory acquisition suite. http://code.google.com/p/volatility/source/browse/branches/scudette/tools/windows/winpmem, 2012.

Cohen, Michael (2014a). How to stop memory acquisition by changing onebyte. http://rekall-forensic.blogspot.de/2014/03/how-to-stop-memory-acquisition-by.html, 2014.

Cohen, Michael (2014b). Rekall Memory Forensic Framework. http://www.rekall-forensic.com, 2014.

Cohen, M.; Bilby, D.; Caronni, G. (2011). Distributed Forensics and Incident Re-sponse in the Enterprise. Digital Investigation, 8, pp. S101–S110, 2011.

Corbet, Jonathan; Rubini, Alessandro; Kroah-Hartman, Greg (2005). Linux DeviceDrivers. O’Reilly, third edition Edition, 2005.

Drepper, Ulrich (2007). What every programmer should know about memory. RedHat, Inc, 11, 2007.

Duarte, Gustavo (2009). How the Kernel Manages Your Memory. http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory/, 2009.

Garrison, Todd (2011). Mac OS Lion Forensic Memory Acquisition UsingIEEE 1394. http://www.frameloss.org/wp-content/uploads/2011/09/Lion-Memory-Acquisition.pdf, 2011.

Giuliani, Marco (2013). Mebromi: The First BIOS Rootkit in the Wild.http://www.webroot.com/blog/2011/09/13/mebromi-the-first-bios-rootkit-in-the-wild/, 2013.

107

http://www.geoffchappell.com/studies/windows/km/hal/api/x86bios/call.htm

http://www.geoffchappell.com/studies/windows/km/hal/api/x86bios/call.htm

http://www.geoffchappell.com/studies/windows/km/hal/api/x86bios/fwmemmap.htm?tx=7



http://code.google.com/p/volatility/source/browse/branches/scudette/tools/linux

http://code.google.com/p/volatility/source/browse/branches/scudette/tools/linux

http://code.google.com/p/volatility/source/browse/branches/scudette/tools/windows/winpmem



http://rekall-forensic.blogspot.de/2014/03/how-to-stop-memory-acquisition-by.html

http://rekall-forensic.blogspot.de/2014/03/how-to-stop-memory-acquisition-by.html

http://www.rekall-forensic.com

http://www.rekall-forensic.com

http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory/

http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory/

http://www.frameloss.org/wp-content/uploads/2011/09/Lion-Memory-Acquisition.pdf

http://www.frameloss.org/wp-content/uploads/2011/09/Lion-Memory-Acquisition.pdf

http://www.webroot.com/blog/2011/09/13/mebromi-the-first-bios-rootkit-in-the-wild/

http://www.webroot.com/blog/2011/09/13/mebromi-the-first-bios-rootkit-in-the-wild/

Bibliography

Goodin, Dan (2013). Meet “badBIOS”, the mysterious Mac and PC malware thatjumps airgaps. http://arstechnica.com/security/2013/10/meet-badbios-the-mysterious-mac-and-pc-malware-that-jumps-airgaps/, 2013.

Gorman, Mel (2004). Understanding the Linux virtual memory manager. PrenticeHall, 2004.

Gruhn, Michael; Muller, Tilo (2013). On the Practicability of Cold Boot Attacks.In Proceedings of the 8th International Conference on Availability, Reliability andSecurity (ARES) (pp. 390–397).: IEEE, 2013.

Halderman, J. Alex; Schoen, Seth D.; Heninger, Nadia; Clarkson, William; Paul,William; Calandrino, Joseph A.; Feldman, Ariel J.; Appelbaum, Jacob; Felten,Edward W. (2008). Lest We Remember: Cold-Boot Attacks on Encryption Keys.In Proceedings of the 17th USENIX Security Symposium, 2008.

Hale, Michael (2013). Linux Support in Volatility. http://code.google.com/p/volatility/wiki/LinuxMemoryForensics, 2013.

Halfdead (2008). Mystifying the debugger for ultimate stealthness. Phrack, 0x0c,pp. 0x08, 2008.

Halvorsen, Ole Henry; Clarke, Douglas (2011). OS X and iOS Kernel programming.Apress, 2011.

Hanspach, Michael; Goetz, Michael (2013). On Covert Acoustical Mesh Networksin Air. Journal of Communications, Volume 8(11), 2013.

Haruyama, T.; Suzuki, H. (2012). One-byte Modification for Breaking MemoryForensic Analysis. http://media.blackhat.com/bh-eu-12/Haruyama/bh-eu-12-Haruyama-Memory_Forensic-Slides.pdf, 2012.

Heasman, John (2006). Implementing and Detecting an ACPI BIOSRootkit. http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-Heasman.pdf, 2006.

Hermann, Uwe (2014). Physical memory attacks via Firewire/DMA - Part 1:Overview and Mitigation. http://www.hermann-uwe.de/blog/physical-memory-attacks-via-firewire-dma-part-1-overview-and-mitigation,2014.

Hewlett-Packard; Intel; Microsoft; Phoenix-Technologies; Toshiba (2011). ACPISpecification 5.0. http://www.acpi.info/DOWNLOADS/ACPIspec50.pdf, 2011.

Hex-Rays (2005). IDA: The Interactive Disassembler. https://www.hex-rays.com,2005.

Hoglund, Greg; Butler, James (2005). Rootkits: Subverting the Windows Kernel.Addison Wesley, 2005.

108

http://arstechnica.com/security/2013/10/meet-badbios-the-mysterious-mac-and-pc-malware-that-jumps-airgaps/

http://arstechnica.com/security/2013/10/meet-badbios-the-mysterious-mac-and-pc-malware-that-jumps-airgaps/

http://code.google.com/p/volatility/wiki/LinuxMemoryForensics

http://code.google.com/p/volatility/wiki/LinuxMemoryForensics

http://media.blackhat.com/bh-eu-12/Haruyama/bh-eu-12-Haruyama-Memory_Forensic-Slides.pdf

http://media.blackhat.com/bh-eu-12/Haruyama/bh-eu-12-Haruyama-Memory_Forensic-Slides.pdf

http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-Heasman.pdf

http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-Heasman.pdf

http://www.hermann-uwe.de/blog/physical-memory-attacks-via-firewire-dma-part-1-overview-and-mitigation

http://www.hermann-uwe.de/blog/physical-memory-attacks-via-firewire-dma-part-1-overview-and-mitigation

http://www.acpi.info/DOWNLOADS/ACPIspec50.pdf

https://www.hex-rays.com

Bibliography

Hudson, Trammell (2014). Thunderstrike: EFI bootkits for Apple MacBooks, 2014.

Inoue, Hajime; Adelstein, Frank; Joyce, Robert A (2011). Visualization in testing avolatile memory forensic tool. Digital Investigation, 8, pp. S42–S51, 2011.

Intel Corporation (1997). MultiProcessor Specification. http://download.intel.com/design/archives/processors/pro/docs/24201606.pdf, 1997.

Intel Corporation (2000). Intel 815 Chipset Family. http://download.intel.com/design/chipsets/datashts/29068801.pdf, 2000.

Intel Corporation (2009). Intel 5 Series Platform Controller Hub (PCH)Datasheet. http://www.intel.de/content/dam/www/public/us/en/documents/datasheets/8-series-chipset-pch-datasheet.pdf, 2009.

Intel Corporation (2013). Desktop 4th Generation Intel Core Processor FamilyDatasheet Volume 2. http://www.intel.com/assets/pdf/datasheet/317607.pdf, 2013.

Intel Corporation (2014a). ACPI Component Architecture. https://acpica.org/,2014.

Intel Corporation (2014b). Intel 64 and IA-32 Architectures Software Developer’sManual, Volume 3 System Programming Guide, 2014.

Intel Corporation (2014c). Intel 8 Series Platform Controller Hub (PCH)Datasheet. http://www.intel.de/content/dam/www/public/us/en/documents/datasheets/8-series-chipset-pch-datasheet.pdf, 2014.

Intel Corporation (2014d). Intel Virtualization Technology for Directed I/O: Spec-ification. http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html, 2014.

Intel Corporation (2014e). Trusted Compute Pools with Intel Trusted ExecutionTechnology. http://www.intel.com/content/www/us/en/architecture-and-technology/trusted-execution-technology/malware-reduction-general-technology.html, 2014.

Kaspersky Labs (2015). The Great Bank Robbery: the CarbanakAPT. https://securelist.com/blog/research/68732/the-great-bank-robbery-the-carbanak-apt/, 2015.

King, Samuel T; Chen, Peter M (2006). SubVirt: Implementing malware with virtualmachines. In Security and Privacy, 2006 IEEE Symposium on (pp. 14–pp).: IEEE,2006.

Kleen, Andi (2004). Virtual memory map with 4 level page tables. https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt, 2004.

109

http://download.intel.com/design/archives/processors/pro/docs/24201606.pdf

http://download.intel.com/design/archives/processors/pro/docs/24201606.pdf

http://download.intel.com/design/chipsets/datashts/29068801.pdf

http://download.intel.com/design/chipsets/datashts/29068801.pdf

http://www.intel.de/content/dam/www/public/us/en/documents/datasheets/8-series-chipset-pch-datasheet.pdf


http://www.intel.com/assets/pdf/datasheet/317607.pdf

http://www.intel.com/assets/pdf/datasheet/317607.pdf

https://acpica.org/



http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html

http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html

http://www.intel.com/content/www/us/en/architecture-and-technology/trusted-execution-technology/malware-reduction-general-technology.html



https://securelist.com/blog/research/68732/the-great-bank-robbery-the-carbanak-apt/

https://securelist.com/blog/research/68732/the-great-bank-robbery-the-carbanak-apt/

https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt

https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt

Bibliography

Kollar, Ivor (2010). Forensic RAM dump image analyser. Department of SoftwareEngineering, Charles University, Prague, 2010.

Kornblum, Jesse (2006). Exploiting the rootkit paradox with windows memoryanalysis. International Journal of Digital Evidence, Volume 5(1), pp. 1–5, 2006.

Kovah, Xeno; Butterworth, John; Kallenberg, Corey; Cornwell, Sam (2014).Copernicus 2: SENTER the Dragon. http://www.mitre.org/publications/technical-papers/copernicus-2-senter-the-dragon, 2014.

Levin, Jonathan (2012). Mac OS X and IOS Internals: To the Apple’s Core. JohnWiley & Sons, 2012.

Levine, John R. (1999). Linkers and Loaders. Morgan Kaufmann, 1999.

Ligh, Michael Hale; Case, Andrew; Levy, Jamie; Walters, AAron (2014). The Art ofMemory Forensics: Detecting Malware and Threats in Windows, Linux, and MacMemory. John Wiley & Sons, 2014.

Lineberry, Anthony (2009). Malicious Code Injection via /dev/mem. Black HatEurope, 2009.

Linux Kernel Organization (2014). Kernel Virtual Machine. git://git.kernel.org/pub/scm/virt/kvm/kvm.git, 2014.

Loukas, K (2012). De Mysteriis Dom Jobsivs–Mac EFI Rootkits. http://ho.ax/De_Mysteriis_Dom_Jobsivs_Black_Hat_Paper.pdf, 2012.

Maartmann-Moe, Carsten (2013). Inception. http://www.breaknenter.org/projects/inception/, 2013.

Mandiant (2011). Memoryzetm. http://www.mandiant.com/resources/download/memoryze, 2011.

Mandiant (2012). Memoryzetm for the Mac. https://www.mandiant.com/resources/download/mac-memoryze, 2012.

ManTech CSI, Inc. (2009). mdd. http://sourceforge.net/projects/mdd/, 2009.

Martignoni, Lorenzo; Fattori, Aristide; Paleari, Roberto; Cavallaro, Lorenzo (2010).Live and Trustworthy Forensic Analysis of Commodity Production Systems. InProceedings of the 13th International Conference on Recent Advances in IntrusionDetection (RAID), 2010.

Matz, Michael; Hubicka, Jan; Jaeger, Andreas; Mitchell, Mark (2012). SystemV Application Binary Interface. http://refspecs.linuxfoundation.org/elf/x86-64-abi-0.99.pdf, 2012.

110

http://www.mitre.org/publications/technical-papers/copernicus-2-senter-the-dragon

http://www.mitre.org/publications/technical-papers/copernicus-2-senter-the-dragon

git://git.kernel.org/pub/scm/virt/kvm/kvm.git

git://git.kernel.org/pub/scm/virt/kvm/kvm.git

http://ho.ax/De_Mysteriis_Dom_Jobsivs_Black_Hat_Paper.pdf

http://ho.ax/De_Mysteriis_Dom_Jobsivs_Black_Hat_Paper.pdf

http://www.breaknenter.org/projects/inception/

http://www.breaknenter.org/projects/inception/

http://www.mandiant.com/resources/download/memoryze

http://www.mandiant.com/resources/download/memoryze

https://www.mandiant.com/resources/download/mac-memoryze

https://www.mandiant.com/resources/download/mac-memoryze

http://sourceforge.net/projects/mdd/

http://refspecs.linuxfoundation.org/elf/x86-64-abi-0.99.pdf

http://refspecs.linuxfoundation.org/elf/x86-64-abi-0.99.pdf

Bibliography

McAfee Inc. (2014). Net Losses: Estimating the Global Cost of Cy-bercrime. http://www.mcafee.com/de/resources/reports/rp-economic-impact-cybercrime2.pdf, 2014.

Microsoft Corporation (2006). Kernel Patch Protection: FAQ. http://msdn.microsoft.com/en-us/library/windows/hardware/gg487353.aspx, 2006.

Microsoft Corporation (2011). Windows Feature Lets You Generate a MemoryDump File by Using the Keyboard. http://support.microsoft.com/?scid=kb%3Ben-us%3B244139&x=5&y=9, 2011.

Microsoft Corporation (2013). Device\PhysicalMemory Object. http://technet.microsoft.com/en-us/library/cc787565%28v=ws.10%29.aspx, 2013.

Milkovic, Luka (2012). Defeating Windows memory forensics. http://events.ccc.de/congress/2012/Fahrplan/events/5301.en.html, 2012.

Miller, David S.; Henderson, Richard; Jelinek, Jakub (2015). Dynamic DMA map-ping Guide. https://www.kernel.org/doc/Documentation/DMA-API-HOWTO.txt, 2015.

Minnich, Ron (2014). Coreboot. http://www.coreboot.org/, 2014.

MoonSols (2012). Windows Memory Toolkit. http://moonsols.com/product,2012.

Mozak, C.P. (2011). Suppressing power supply noise using data scrambling in doubledata rate memory systems. http://www.google.com/patents/US7945050, 2011.US Patent 7,945,050.

Nohl, Karsten; Krißler, Sascha; Lell, Jakob (2014). BadUSB - On accessories thatturn evil. Blackhat. https://srlabs.de/blog/wp-content/uploads/2014/07/SRLabs-BadUSB-BlackHat-v1.pdf, 2014.

O’Connor, Kevin (2014). SeaBIOS. http://www.seabios.org/SeaBIOS, 2014.

O’Malley, Samuel Joseph; Choo, Kim-Kwang Raymond (2014). Bridging the AirGap: Inaudible Data Exfiltration by Insiders. In 20th Americas Conference onInformation Systems (AMCIS 2014) (pp. 7–10)., 2014.

Ooi, Tsukasa (2009). Stealthy Rootkit: How bad guy fools live memory foren-sics? http://www.slideshare.net/a4lg/stealthy-rootkit-how-bad-guy-fools-live-memory-forensics-pacsec-2009, 2009.

PCI-SIG (1998). PCI-to-PCI Bridge Architecture Specification, 1998.

PCI-SIG (2002). PCI Local Bus Specification 3.0, 2002.

PCI-SIG (2010a). PCI Express Base Specification Revision 3.0, 2010.

111

http://www.mcafee.com/de/resources/reports/rp-economic-impact-cybercrime2.pdf

http://www.mcafee.com/de/resources/reports/rp-economic-impact-cybercrime2.pdf

http://msdn.microsoft.com/en-us/library/windows/hardware/gg487353.aspx

http://msdn.microsoft.com/en-us/library/windows/hardware/gg487353.aspx

http://support.microsoft.com/?scid=kb%3Ben-us%3B244139&x=5&y=9

http://support.microsoft.com/?scid=kb%3Ben-us%3B244139&x=5&y=9

http://technet.microsoft.com/en-us/library/cc787565%28v=ws.10%29.aspx

http://technet.microsoft.com/en-us/library/cc787565%28v=ws.10%29.aspx

http://events.ccc.de/congress/2012/Fahrplan/events/5301.en.html

http://events.ccc.de/congress/2012/Fahrplan/events/5301.en.html

https://www.kernel.org/doc/Documentation/DMA-API-HOWTO.txt

https://www.kernel.org/doc/Documentation/DMA-API-HOWTO.txt

http://www.coreboot.org/

http://moonsols.com/product

http://www.google.com/patents/US7945050

https://srlabs.de/blog/wp-content/uploads/2014/07/SRLabs-BadUSB-BlackHat-v1.pdf

https://srlabs.de/blog/wp-content/uploads/2014/07/SRLabs-BadUSB-BlackHat-v1.pdf

http://www.seabios.org/SeaBIOS

http://www.slideshare.net/a4lg/stealthy-rootkit-how-bad-guy-fools-live-memory-forensics-pacsec-2009

http://www.slideshare.net/a4lg/stealthy-rootkit-how-bad-guy-fools-live-memory-forensics-pacsec-2009

Bibliography

PCI-SIG (2010b). PCI Firmware 3.1 Specification. https://www.pcisig.com/specifications/conventional/pci_firmware/, 2010.

PCI-SIG (2015). PCI Vendor ID Search. https://www.pcisig.com/membership/vid_search/, 2015.

Petroni, Nick L.; Fraser, Timothy; Molina, Jesus; Arbaugh, William A. (2004).Copilot – A Coprocessor-Based Kernel Runtime Integrity Monitor. In Proceedingsof the 13th USENIX Security Symposium, 2004.

Raytheon Pikewerks (2013). Linux Incident Response with Second Look. http://secondlookforensics.com/linux-incident-response/, 2013.

Reina, Alessandro; Fattori, Aristide; Pagani, Fabio; Cavallaro, Lorenzo; Bruschi,Danilo (2012). When Hardware Meets Software: A Bulletproof Solution to Foren-sic Memory Acquisition. In Proceedings of the 28th Annual Computer SecurityApplications Conference, 2012.

Richard, Golden G; Case, Andrew (2014). In lieu of swap: Analyzing compressedRAM in Mac OS X and Linux. Digital Investigation, 11, pp. S3–S12, 2014.

Rostedt, Steven (2009). Debugging the kernel using Ftrace. http://lwn.net/Articles/365835/, 2009.

Ruff, Nicolas; Suiche, Matthieu (2007). Enter Sandman. In Proceedings of the 5thAnnual PacSec Applied Security Conference, 2007.

Rusakov, Vyacheslav (2011). TDL4 Rootkit. http://www.securelist.com/en/analysis/204792157/TDSS_TDL_4, 2011.

Rusakov, Vyacheslav (2012). XPAJ: Reversing a Windows x64 Bootkit.http://www.securelist.com/en/analysis/204792235/XPAJ_Reversing_a_Windows_x64_Bootkit#5, 2012.

Russinovich, Mark E.; Solomon, David A.; Ionescu, Alex (2009). Microsoft WindowsInternals. Microsoft Press, 5th Edition, 2009.

Rutkowska, J. (2006). Introducing Blue Pill. http://theinvisiblethings.blogspot.de/2006/06/introducing-blue-pill.html, 2006.

Rutkowska, J. (2007). Beyond the CPU: Defeating hardware based RAM acquisition.In Proceedings of BlackHat DC, 2007.

Salihun, Darmawan (2006). BIOS Disassembly Ninjutsu Uncovered. A-List Publish-ing, 2006.

Salihun, Darmawan (2014). System Address Map Initialization in x86/x64Architecture Part 2: PCI Express-Based Systems. http://resources.infosecinstitute.com/system-address-map-initialization-x86x64-architecture-part-2-pci-express-based-systems/, 2014.

112

https://www.pcisig.com/specifications/conventional/pci_firmware/

https://www.pcisig.com/specifications/conventional/pci_firmware/

https://www.pcisig.com/membership/vid_search/

https://www.pcisig.com/membership/vid_search/

http://secondlookforensics.com/linux-incident-response/

http://secondlookforensics.com/linux-incident-response/

http://lwn.net/Articles/365835/


http://www.securelist.com/en/analysis/204792157/TDSS_TDL_4

http://www.securelist.com/en/analysis/204792157/TDSS_TDL_4

http://www.securelist.com/en/analysis/204792235/XPAJ_Reversing_a_Windows_x64_Bootkit#5

http://www.securelist.com/en/analysis/204792235/XPAJ_Reversing_a_Windows_x64_Bootkit#5

http://theinvisiblethings.blogspot.de/2006/06/introducing-blue-pill.html

http://theinvisiblethings.blogspot.de/2006/06/introducing-blue-pill.html

http://resources.infosecinstitute.com/system-address-map-initialization-x86x64-architecture-part-2-pci-express-based-systems/



Bibliography

Salzman, Peter Jay; Burian, Michael; Pomerantz, Ori (2001). The linux kernelmodule programming guide. TLDP: http://tldp. org/LDP/lkmpg/2.4/html, 2001.

Schatz, B. (2007a). BodySnatcher: Towards reliable volatile memory acquisition bysoftware. Digital Investigation, 4, pp. 126–134, 2007.

Schatz, Bradley (2007b). Recent Developments in Volatile Memory Foren-sics. http://www.schatzforensic.com.au/presentations/BSchatz-CERT-CSD2007.pdf, 2007.

Schneier, Bruce (2014). DEITYBOUNCE: NSA Exploit of the Day. https://www.schneier.com/blog/archives/2014/01/nsa_exploit_of.html, 2014.

Shoumikhin, Anthony (2010). Redirecting functions in shared ELF libraries.http://www.codeproject.com/KB/library/elf-redirect.aspx, 2010.

Singh, Amit (2006). Mac OS X internals: a systems approach. Addison-WesleyProfessional, 2006.

Skochinsky, Igor (2014). Intel ME: Two Years Later. https://ruxconbreakpoint.com/assets/2014/slides/bpx-Breakpoint%202014%20Skochinsky.pdf, 2014.

Sparks, S.; Butler, J. (2005). Shadow Walker: Raising the bar for rootkit detection.In Proceedings of Black Hat Japan (pp. 504–533)., 2005.

Stealth (2004). The Adore-Ng Rootkit. http://packetstormsecurity.com/files/32843/adore-ng-0.41.tgz.html, 2004.

Stewin, Patrick; Bystrov, Iurii (2012). Understanding DMA Malware. In Proceedingsof the 9th International Conference on Detection of Intrusions and Malware, andVulnerability Assessment (DIMVA), 2012.

Stuttgen, Johannes (2012). OSXPmem. http://code.google.com/p/pmem/wiki/OSXPmem, 2012.

Stuttgen, Johannes (2014). Elfrelink: An ELF code injection library. https://github.com/google/rekall/tree/master/tools/linux/lmap/elfrelink,2014.

Stuttgen, Johannes; Cohen, Michael (2013). Anti-Forensic Resilient Memory Acqui-sition. Digital Investigation, 10, pp. S105–S115, 2013.

Stuttgen, Johannes; Cohen, Michael (2014). Robust Linux memory acquisition withminimal target impact. Digital Investigation, 11, pp. S112–S119, 2014.

Stuttgen, Johannes; Vomel, Stefan; Denzel, Michael (2015). Acquisition and Ana-lysis of Compromised Firmware Using Memory Forensics. In Proceedings of the2nd Annual DFRWS Europe Conference (DFRWS-EU 2015 Dublin), 2015.

113

http://www.schatzforensic.com.au/presentations/BSchatz-CERT-CSD2007.pdf

http://www.schatzforensic.com.au/presentations/BSchatz-CERT-CSD2007.pdf

https://www.schneier.com/blog/archives/2014/01/nsa_exploit_of.html

https://www.schneier.com/blog/archives/2014/01/nsa_exploit_of.html

http://www.codeproject.com/KB/library/elf-redirect.aspx

https://ruxconbreakpoint.com/assets/2014/slides/bpx-Breakpoint%202014%20Skochinsky.pdf

https://ruxconbreakpoint.com/assets/2014/slides/bpx-Breakpoint%202014%20Skochinsky.pdf

http://packetstormsecurity.com/files/32843/adore-ng-0.41.tgz.html

http://packetstormsecurity.com/files/32843/adore-ng-0.41.tgz.html

http://code.google.com/p/pmem/wiki/OSXPmem

http://code.google.com/p/pmem/wiki/OSXPmem

https://github.com/google/rekall/tree/master/tools/linux/lmap/elfrelink

https://github.com/google/rekall/tree/master/tools/linux/lmap/elfrelink

Bibliography

Styx (2012). Infecting loadable kernel modules, kernel versions 2.6.x/3.0.x. Phrack,Volume 0x0e(0x44), pp. 0x0b, 2012.

Suiche, Matthieu (2009a). Reply to HBGary. http://www.msuiche.net/2009/11/16/reply-to-hbgary-and-personal-notes/, 2009.

Suiche, Matthieu (2009b). Win32dd. http://www.msuiche.net/tools/win32dd-v1.2.1.20090106.zip, 2009.

Suiche, Mathieu (2011). MoonSols DumpIt goes mainstream. http://www.moonsols.com/2011/07/18/moonsols-dumpit-goes-mainstream/, 2011.

Sutherland, Iain; Evans, Jon; Tryfonas, Theodore; Blyth, Andrew (2008). AcquiringVolatile Operating System Data Tools and Techniques. ACM SIGOPS OperatingSystems Review, Volume 42(3), pp. 65–73, 2008.

Sylve, Joe (2012). LiME – Linux Memory Extractor. In Proceedings of the 7thShmooCon Conference, 2012.

Sylve, Joe; Case, Andrew; Marziale, Lodovico; Richard, Golden G (2012). Acquisi-tion and analysis of volatile memory from android devices. Digital Investigation,Volume 8(3), pp. 175–184, 2012.

Tanenbaum, Andrew S; Bos, Herbert (2014). Modern operating systems. PrenticeHall Press, 2014.

The Bochs Project (2013). Bochs – The Cross Platform IA-32 Emulator. http://bochs.sourceforge.net/, 2013.

The Coreboot Project (2009). Developer Manual/Tools. http://www.coreboot.org/Developer_Manual/Tools, 2009.

The Flashrom Team (2013). Flashrom. http://www.flashrom.org/, 2013.

The Linux Kernel Archives (2013). The Linux Kernel Source Code. https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.12.tar.xz, 2013.

The Linux man-pages (2012). kexec load - load a new kernel for later execution.http://man7.org/linux/man-pages/man2/kexec_load.2.html, 2012.

The New York Times (2015). Bank Hackers Steal Millions via Mal-ware. http://www.nytimes.com/2015/02/15/world/bank-hackers-steal-millions-via-malware.html, 2015.

TIS Committee (1995). Tool Interface Standard Executable and Linking For-mat (ELF) Specification v1.2. http://refspecs.linuxbase.org/elf/elf.pdf,1995.

114

http://www.msuiche.net/2009/11/16/reply-to-hbgary-and-personal-notes/

http://www.msuiche.net/2009/11/16/reply-to-hbgary-and-personal-notes/

http://www.msuiche.net/tools/win32dd-v1.2.1.20090106.zip

http://www.msuiche.net/tools/win32dd-v1.2.1.20090106.zip

http://www.moonsols.com/2011/07/18/moonsols-dumpit-goes-mainstream/

http://www.moonsols.com/2011/07/18/moonsols-dumpit-goes-mainstream/

http://bochs.sourceforge.net/

http://bochs.sourceforge.net/

http://www.coreboot.org/Developer_Manual/Tools

http://www.coreboot.org/Developer_Manual/Tools

http://www.flashrom.org/

https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.12.tar.xz

https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.12.tar.xz

http://man7.org/linux/man-pages/man2/kexec_load.2.html

http://www.nytimes.com/2015/02/15/world/bank-hackers-steal-millions-via-malware.html

http://www.nytimes.com/2015/02/15/world/bank-hackers-steal-millions-via-malware.html

http://refspecs.linuxbase.org/elf/elf.pdf

Bibliography

Triulzi, Arrigo (2010). The Jedi Packet Trick Takes over the Deathstar.http://www.alchemistowl.org/arrigo/Papers/Arrigo-Triulzi-CANSEC10-Project-Maux-III.pdf, 2010.

Truff (2003). Infecting loadable kernel modules. Phrack, Volume 0x0b(0x3d),pp. 0x0a, 2003.

Turley, Jim (2014). The Basics of Intel Architecture. http://www.intel.com/content/www/us/en/intelligent-systems/embedded-systems-training/ia-introduction-basics-paper.html, 2014.

van de Ven, Arjan (2008). Introduce /dev/mem restrictions with a config option.http://lwn.net/Articles/267427/, 2008.

Vidas, Timothy (2010). Volatile Memory Acquisition via Warm Boot Memory Sur-vivability. In Proceedings of the 43rd Hawaii International Conference on SystemSciences, 2010.

Vidstrom, Arne (2006). Forensic memory dumping intricacies - PhysicalMemory,DD, and caching issues. http://ntsecurity.nu/onmymind/2006/2006-06-01.html, 2006.

Vomel, Stefan; Freiling, Felix C (2011). A survey of main memory acquisitionand analysis techniques for the windows operating system. Digital Investigation,Volume 8(1), pp. 3–22, 2011.

Vomel, Stefan; Freiling, Felix C (2012). Correctness, atomicity, and integrity: Defin-ing criteria for forensically-sound memory acquisition. Digital Investigation, Vol-ume 9(2), 2012.

Vomel, Stefan; Stuttgen, Johannes (2013). An Evaluation Platform for ForensicMemory Acquisition Software. In Proceedings of the 13th Annual DFRWS Con-ference, 2013.

Walters, Aaron (2014). Volatility Framework. https://github.com/volatilityfoundation/volatility, 2014.

Walters, Aaron; Petroni, Nick L. (2007). Volatools: Integrating Volatile MemoryForensics into the Digital Investigation Process. In Proceedings of Black Hat DC,2007.

Wang, J.; Zhang, F.; Sun, K.; Stavrou, A. (2011). Firmware-assisted MemoryAcquisition and Analysis tools for Digital Forensics. In Systematic Approaches toDigital Forensic Engineering (SADFE), 2011 IEEE Sixth International Workshopon (pp. 1–5).: IEEE, 2011.

Weidong, Cui; Zhilei, Xu; Marcus, Peinado; Ellick, Chan (2012). Tracking rootkitfootprints with a practical memory analysis system. In Proceedings of the 21st

115

http://www.alchemistowl.org/arrigo/Papers/Arrigo-Triulzi-CANSEC10-Project-Maux-III.pdf

http://www.alchemistowl.org/arrigo/Papers/Arrigo-Triulzi-CANSEC10-Project-Maux-III.pdf

http://www.intel.com/content/www/us/en/intelligent-systems/embedded-systems-training/ia-introduction-basics-paper.html




http://ntsecurity.nu/onmymind/2006/2006-06-01.html

http://ntsecurity.nu/onmymind/2006/2006-06-01.html

https://github.com/volatilityfoundation/volatility

https://github.com/volatilityfoundation/volatility

Bibliography

USENIX conference on Security symposium (pp. 42–42).: USENIX Association,2012.

Williams, Jake; Torres, Alissa (2014). ADD - Complicating MemoryForensics Through Memory Disarray. http://www.mediafire.com/view/h7bmcscbtyaeb6r/ADD_Shmoocon.pdf, 2014.

WindowsSCOPE (2014). CaptureGUARD. http://www.windowsscope.com/, 2014.

Witherden, Freddie (2010). libforensic1394. https://freddie.witherden.org/tools/libforensic1394/, 2010.

Wojtczuk, Rafal; Tereshkin, Alexander (2009). Attacking Intel BIOS.https://www.blackhat.com/presentations/bh-usa-09/WOJTCZUK/BHUSA09-Wojtczuk-AtkIntelBios-SLIDES.pdf, 2009.

You, Dong-Hoon (2012). Android platform based linux kernel rootkit. Phrack,Volume 0x0e(0x44), pp. 0x06, 2012.

Yu, Miao; Lin, Qian; Li, Bingyu; Qi, Zhengwei; Guan, Haibing (2012). Vis: Virtu-alization Enhanced Live Forensics Acquisition for Native System. Digital Investi-gation, Volume 9(1), pp. 22–33, 2012.

Zimmer, Vincent; Rothman, Michael; Marisetty, Suresh (2010). Beyond BIOS:developing with the unified extensible firmware interface. Intel Press, 2010.

116

http://www.mediafire.com/view/h7bmcscbtyaeb6r/ADD_Shmoocon.pdf

http://www.mediafire.com/view/h7bmcscbtyaeb6r/ADD_Shmoocon.pdf

http://www.windowsscope.com/

https://freddie.witherden.org/tools/libforensic1394/

https://freddie.witherden.org/tools/libforensic1394/

https://www.blackhat.com/presentations/bh-usa-09/WOJTCZUK/BHUSA09-Wojtczuk-AtkIntelBios-SLIDES.pdf

https://www.blackhat.com/presentations/bh-usa-09/WOJTCZUK/BHUSA09-Wojtczuk-AtkIntelBios-SLIDES.pdf

Documents

On the Viability of Memory Forensics in Compromised ... · Malicious software can employ anti-forensic techniques to intercept the acquisition and filter memory contents while they