11
DIGITAL FORENSIC RESEARCH CONFERENCE Surveying The User Space Through User Allocations By Andrew White, Bradley Schatz and Ernest Foo From the proceedings of The Digital Forensic Research Conference DFRWS 2012 USA Washington, DC (Aug 6 th - 8 th ) DFRWS is dedicated to the sharing of knowledge and ideas about digital forensics research. Ever since it organized the first open workshop devoted to digital forensics in 2001, DFRWS continues to bring academics and practitioners together in an informal environment. As a non-profit, volunteer organization, DFRWS sponsors technical working groups, annual conferences and challenges to help drive the direction of research and development. http:/dfrws.org

Surveying the user space through user allocations...space ranges, which we term user allocations. The memory ranges which these user allocations occupy are described within a process

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Surveying the user space through user allocations...space ranges, which we term user allocations. The memory ranges which these user allocations occupy are described within a process

DIGITAL FORENSIC RESEARCH CONFERENCE

Surveying The User Space Through User Allocations

By

Andrew White, Bradley Schatz and Ernest Foo

From the proceedings of

The Digital Forensic Research Conference

DFRWS 2012 USA

Washington, DC (Aug 6th - 8th)

DFRWS is dedicated to the sharing of knowledge and ideas about digital forensics

research. Ever since it organized the first open workshop devoted to digital forensics

in 2001, DFRWS continues to bring academics and practitioners together in an

informal environment.

As a non-profit, volunteer organization, DFRWS sponsors technical working groups,

annual conferences and challenges to help drive the direction of research and

development.

http:/dfrws.org

Page 2: Surveying the user space through user allocations...space ranges, which we term user allocations. The memory ranges which these user allocations occupy are described within a process

Surveying the user space through user allocations

Andrew White*, Bradley Schatz, Ernest FooQueensland University of Technology, Brisbane, Australia

Keywords:Memory forensicsUser spaceWindows XPWindows 7Malware analysis

a b s t r a c t

Previous research into memory forensics has focused on understanding the structure andcontents of the kernel space portions of physical memory, and mostly ignored the contentsof the user space. This paper describes the results of a survey of user space virtual addressallocations in the Windows XP and Windows 7 operating systems, comprehensivelyidentifying the kernel and user space metadata required to identify such allocations. Newtechniques for determining the role and content of those allocations are identified,significantly increasing the proportion of allocations for which the role and function isunderstood. The validity of this approach is evaluated and a detailed analysis of the datastructures involved provided. An implementation of this approach is presented which iscapable of identifying all user space allocations, and for those allocations identifying fora high percentage, the role of those allocations, even for complex applications.

ª 2012 A. White, B. Schatz & E. Foo. Published by Elsevier Ltd. All rights reserved.

1. Introduction

Memory forensics provides a valuable way of analysingthe contents of physical memory, in order to obtain tran-sient information that would not necessarily be present ondisk. Previous research in this area has focused on under-standing and interpreting the layout and contents of thekernel portion of memory, in order to facilitate the devel-opment of memory analysis tools that recreate the capa-bilities of previously used live response tools. While thiswork has been invaluable, allowing the examination ofprocesses, drivers, network sockets and other useful arte-facts, it has not been taking into account the full picture.

Previous research has focused almost exclusively on thedata the operating system itself is using, not the data of theuser applications running on that operating system. Asa result, the data and data structures used by these appli-cations have not been examined, resulting in a lack ofmethodologies for understanding and interpreting thisapplication data. Without such methods, it is not possible

to extract useful information from the memory of anapplication, such as user credentials and chat logs, withoutfirst undertaking significant reverse engineering of theapplication itself.

In order to explain the data being used by an application,this paper presents an approach for describing the purposeof all memory allocations in the user space. This approachuses the Virtual Address Descriptors (VADs) proposed byprevious research (Dolan-Gavitt, 2007) to identify all ofthese allocations, and then determines the purpose of theseallocations using numerous sources of kernel and userspace metadata. A sample implementation of this approachis then provided as a plugin for the Volatility framework(Volatile Systems, 2011), with an evaluation of both itsvalidity and effectiveness. All of this research is based onextensive analysis of both the Windows XP SP3 32-bit andWindows 7 SP1 32-bit operating systems and their innerworkings. Unless otherwise stated, allfindings apply to bothof the operating systems being studied.

The contributions of this paper are the detailed analysesof metadata sources that can be used to describe thepurpose of user space allocations, and a plugin thatimplements this approach. The structure of this paper is asfollows. Section 2 introduces the concept of user alloca-tions, and Section 3 describes the ways in which metadata

* Corresponding author.E-mail addresses: [email protected] (A. White), [email protected].

au (B. Schatz), [email protected] (E. Foo).

Contents lists available at SciVerse ScienceDirect

Digital Investigation

journal homepage: www.elsevier .com/locate/di in

1742-2876/$ – see front matter ª 2012 A. White, B. Schatz & E. Foo. Published by Elsevier Ltd. All rights reserved.doi:10.1016/j.diin.2012.05.005

Digital Investigation 9 (2012) S3–S12

Page 3: Surveying the user space through user allocations...space ranges, which we term user allocations. The memory ranges which these user allocations occupy are described within a process

that describes these allocations can be located. Section 4outlines the implementation of this approach as a Vola-tility plugin. Section 5 demonstrates the validity of thisapproach and its effectiveness, and these results are furtherdiscussed in Section 6. The related work is then presentedin Section 7, before the paper is concluded in Section 8.

2. User allocations

Each processwithinWindows has its own virtual addressspace, and this virtual address space is divided into twoparts,the user space, and the kernel space (Russinovich andSolomon, 2005). As the names suggest, the user space iswhereuserapplicationcodeanddata is stored, and thekernelspace is where code and data used by the kernel is stored.

Windows manages the use of user space memory byallocating memory in chunks of contiguous virtual addressspace ranges, which we term user allocations. The memoryranges which these user allocations occupy are describedwithin a process by Virtual Address Descriptors (VADs).These VADs are stored in a self balancing binary tree, andeach time a new user allocation is made, a new VAD isadded to the tree to describe the memory ranges used bythat allocation (Russinovich and Solomon, 2005). Theseuser allocations are the highest level of abstraction possiblewithin the user space of a process, and serve as a suitableframework within which to further explore the user space.An experiment to verify the suitability of VADs for locatingthese user allocations is described in Section 5.1.

While previous work has shown how these user allo-cations can be identified, techniques to determine thepurpose of these allocations have been limited. Knowingthe roles of these user allocations makes it possible todetermine which allocations are relevant in a particularline of enquiry. For example, if the value of a user variable isthe search, knowledge of where the application data isstoredwould significantly reduce the number of allocationsrequiring analysis. Without such knowledge, all the allo-cations used by the application would require analysis,including allocations which are not even relevant to theexecution of the program.

Fig.1 provides a comparison of which contents and rolesof user allocations have been determined by previousresearch, and those newly determined by our research. The

next section is dedicated to describing how each of thesenew roles can be determined.

3. User allocation metadata

To describe the contents and roles of these user alloca-tions within the user space, sources of metadata whichdescribe these allocationsmust be consulted. These sourcesof metadata can be divided based on whether they exist inthe kernel space, or in the user space. Generally speaking,the sources of metadata in the kernel focus on the role ofthe user allocation, while the sources of metadata in theuser space focus on the content of the user allocation.Unless otherwise stated, all of the following informationhas been determined through our own analyses ofWindows XP SP3 and Windows 7 SP1 systems.

3.1. Kernel space metadata sources

The first source of metadata for a user allocation is ofcourse the VAD that describes that allocation. In addition toproviding the location of the user allocation, the VAD alsoprovides two other pieces of information. The first is thepermissions with which that memory was allocated, whichare enforced in software by the Windows MemoryManager. These permissions can take on a variety of values,and are defined in WinNT.h.

The second piece of useful information is the type of theuser allocation, which can either be private or mapped. Aprivate allocation only exists within the one user addressspace, whereas a mapped allocation exists in more thanone address space. For example, these multiple addressspaces could be the user and kernel portions of the onevirtual address space, or multiple virtual address spaces.The type of an allocation is determined by the presence ofa pointer in the ControlArea field, if a pointer exists, theallocation is mapped, otherwise it is private.

Aside from a VAD, a user allocation can potentially havetwo different types of objects that provide metadata aboutthe allocation, a _FILE_OBJECT and a _SECTION_OBJECT. Asthe names suggest, a file object describes an allocationbacked by a file on disk, while a section object describesa section, which is the term Windows uses to describe anallocation that is shared between multiple processes. Bothof these objects are used to identify when physical memorypages are shared between multiple address spaces, whichmeans they only ever apply to mapped allocations. In orderto locate these objects, an understanding of the datastructures involved in shared memory, the control area andthe segment, are first required.

A control area, described by a _CONTROL_AREA datastructure, is responsible for keeping track of how manyaddress spaces an allocation has been mapped to. Asegment, described by a _SEGMENT data structure, keepstrack of the page table entries used by an allocation that isshared, such that for each address space it exists in, italways points to the same physical pages.

These two data structures are highly related, to theextent that a control area has a pointer to its correspondingsegment in the Segment field, and a segment has a pointerits corresponding control area in the ControlArea field.Fig. 1. Contents and roles capable of being determined.

A. White et al. / Digital Investigation 9 (2012) S3–S12S4

Page 4: Surveying the user space through user allocations...space ranges, which we term user allocations. The memory ranges which these user allocations occupy are described within a process

Control areas and segments always point to each other,essentially forming a pair of data structures.

As previously mentioned, a mapped allocation is onethat has a valid pointer in the ControlArea field. This field, asthe name suggests, points to a control area, making it easyto find the control area and corresponding segment thatdescribe an allocation.

If a mapped allocation represents a file object, thecontrol area will have a pointer to a valid file object in theFilePointer field, as noted by Dolan-Gavitt (2007). This fileobject will reveal several pieces of information about thefile, including its address on disk, which allows moretraditional techniques from disk forensics to be used.

If the FilePointer field of the control area is null, then themapped allocation represents a section. Unlike with fileobjects, the control area does not have a field pointing tothe section object, and neither does the segment. Instead,to locate relevant section objects, the object manager needsto be parsed.

The object manager is responsible for the creation anddeletion of kernel objects, and allowing the retrieval ofspecific objects by their handle (Russinovich and Solomon,2005). Handles are designed to be passed on to user code,as a method of allowing access to specific system resources,such as the file object for an open file. Each process hasa link to its process handle table, which contains thehandles that are in use by that process.

By parsing through the handle table of each process,section objects can be found. Section objects take the formof a _SECTION_OBJECT, which contains a variety of fields.However, in our experiments, the only field which is alwaysused is the Segment field, which although the Windowssymbol files indicate should point to a _SEGMENT_OBJECT,in fact points to a _SEGMENT.

This makes it possible to determine which sharedmemory sections are in use by a process, by relating thesegments found by parsing the VAD to those found byinterpreting the sections in the objectmanager. An exampleof this relationship is shown in Fig. 2. Onekeypoint to note isthat in parsing the object manager, the handle table of allprocessesmust be parsed, not just the process that owns thevirtual address space in question. This is required as whenone process maps shared memory into another, only one ofthese processes requires access to the handle.

Unlike the file object, the section object by itselfprovides limited information about the allocation, howeverit does facilitate the retrieval of two useful pieces of

information. The first is the name of the section object,which is recorded in the object manager and can bedetermined while finding the section object. While thename of the section may reveal some information about itscontents, in our observations it often contained a blankstring or a string filled with non-printable characters. Thesetypes of section names are quite common, which couldindicate that these non-printable characters are being usedto store some other sort of information.

The second piece of useful information is the processthat created the section. When a segment is pointed to bya section object, the u1 field represents the CreatingProcessfield. The value of this field is a pointer to an _EPROCESSstructure, and as the name suggests, this allows the processthat created the section to be determined. Since suchsections are generally used for inter-process communica-tion, it is often the case that the creating process isa different process to the process being examined. Whenthe creating process is the same process, this could indicatethat this section is being shared with other processes, orthat the process is sharing information with part of thekernel.

One point to note is that it is possible for a VAD to havean associated file object and section object at the sametime. Such an occurrence is commonly the case with thenatural language support (.nls) files which are mapped asread-only into every process. The significance of this is thatthe same view of the file is being shared between multipleprocesses, not a unique view of the file as is typically ach-ieved with the copy-on-write style permissions used forDLL files.

While a VAD entry describes which part of the virtualaddress space a user allocation occupies, it is the metadataof this entry obtained from file and section objects thatallows the role of the allocation to be determined. Eventhough the description of these allocations is at a high level,with no explicit knowledge of the underlying data struc-tures, the information can still be used in a variety of ways.

An example would be inferring the behaviour ofa process based on the file and section objects presentwithin its user space. If a process loads a DLL namedWINHTTP.dll from the system root into the address space,one may hypothesise that the process is using the HTTPprotocol. Likewise, if a section named ShimSharedMemory isbeing loaded, this would indicate that the Windows ShimEngine is being used, a part of Windows responsible forproviding compatibility to older programs.

Object Handle TableObject Handle Table

ControlArea: 81a7bee0VAD @ 8186dee8

VAD @ 81a722a0

Segment: e1755008Control Area @ 81a7bee0

Segment: e1755008Section @ e18fb708

Object Handle Table

ControlArea: 81a7bee0Segment @ e1755008

...

Fig. 2. Finding shared memory.

A. White et al. / Digital Investigation 9 (2012) S3–S12 S5

Page 5: Surveying the user space through user allocations...space ranges, which we term user allocations. The memory ranges which these user allocations occupy are described within a process

3.2. User space metadata sources

Although the role of numerous user allocations can bedescribed using kernel sources of metadata, it is notpossible to describe all user allocations in this way.However, additional sources of metadata exist in the userspace, which can be located from known structures in thekernel space. For the most part, these user space metadatasources describe the content of the user allocations, inparticular those that are created when a process is loaded.These default user allocations are often required for theapplication to function, or are used by some Windowscomponent to provide helpful functionality.

The two main sources of metadata with which thecontent of all other user allocations can be described arethe Process Environment Block, _PEB, and the ThreadEnvironment Block, _TEB. While previous work hasdescribed how these can be used to list the modules loadedby a process, find the process parameters (Betz, 2005), andlocate the stacks of a process (Arasteh and Debbabi, 2007),the PEB contains numerous more fields that, prior to thiswork, have not been identified in the literature.

The useful fields of the PEB are summarised in Fig. 3.This information has been taken from theWindows XP SP3symbol files, and updated with some type specific infor-mation for some of the fields.

The ImageBaseAddress, Ldr and ProcessParameters fieldswere all fields first used by theMemparser tool (Betz, 2005).These fields can be used to find the virtual address of theprocess executable, list the loaded modules and retrievethe process parameters respectively. Although the list ofloaded modules could potentially be identified using fileobjects from the VAD, this provides an alternative methodwith which to verify that information.

There are two key pieces of information which have notbeen noted by previous work about the process parame-ters. The first is that the data structure it uses, _RTL_U-SER_PROCESS_PARAMETERS, always resides within its own

user allocation. The second is that the Environment field ofthis data structure can be used to find another user allo-cation, which contains a series of strings relating to envi-ronment variables. While this allows two additionalallocations to be explained on Windows XP, the same doesnot apply to Window 7. Although Windows 7 continues touse the same data structures to store this information,these data structures are stored in the default heap ratherthan in their own allocations.

TheProcessHeap,NumberOfHeapsandProcessHeapsallowthe location of the heaps used by the process, each of whichreside in their own allocation. These heaps are found fromthe ProcessHeaps field, which is a pointer to a list of pointersto heaps, the size of which is given by NumberOfHeaps. Thefirst heap of this list is also referenced by the ProcessHeapfield,which is considered to be thedefault heap. There is alsoone heap that does not appear in this list, which is the read-only shared heap, pointed to by the values of ReadOnly-SharedMemoryBase and ReadOnlySharedMemoryHeap.

Each of these heaps begin with a _HEAP object, whichcontains two fields that can be used to describe further userallocations. The first is the field which gives the heapsegments. OnWindows XP, this is the Segments field, whichis an array of pointers to _HEAP_SEGMENT data structureson Windows XP, whereas on Windows 7, this is the Seg-mentListEntry field, which is a double linked list of the samedata structure. A heap segment describes the heap withinits current allocation, and as such, every heap always has atleast one segment. If a heap extends to a separate userallocation however, that allocation will have its own heapsegment. The second is the VirtualAllocdBlocks field, whichis a double linked list of _HEAP_VIRTUAL_ALLOC_ENTRY datastructures. A heap virtual alloc entry is used when a heapreceives a request for memory exceeding a threshold size,at which point rather than the data being stored in theexisting heap allocation it is stored within its own separateuser allocation. As can be seen, the location and parsing ofthese heaps potentially allows the content of even moreuser allocations to be determined.

Returning to the PEB, the next significant fields are theAnsiCodePageData, OemCodePageData and UnicodeCase-TableData fields. These fields all point to different locationswithin the same user allocation, which is the code page.This code page is present in every process, and containsa series of tables containing information relating to ANSI,OEM and Unicode character encoding information. Whilethe specific data structure used to store this information isnot known, it is likely to be a simple array.

If a process has a GUI element, it will have a GDI sharedhandle table, which is pointed to by the GdiSharedHand-leTable field of the PEB. This user allocation is a simplearray of GDITableEntry data structures (Nasarre, 2003), andas the name suggests, is shared between all processes witha GUI element. Since these entries give both a virtualmemory address and an owning process for each GDIobject, it is possible to parse these entries to determinewhich allocations of the process contain local GDIinformation.

When a process uses theWindows Shim Engine, the PEBwill potentially contain a valid pointer in the pShimDatafield. This field will point to a private allocation containingFig. 3. Relevant fields of the PEB (XP).

A. White et al. / Digital Investigation 9 (2012) S3–S12S6

Page 6: Surveying the user space through user allocations...space ranges, which we term user allocations. The memory ranges which these user allocations occupy are described within a process

a small amount of data relating to the Shim Engine in anunknown data format.

Towards the end of the PEB are the Activa-tionContextData and SystemDefaultActivationContextDatafields. These fields store an activation context, which isa mechanism used by Windows to redirect an applicationto load specific versions of resources, such as DLL files. Anapplication can potentially have two user allocations asa result of this functionality, one for the system defaultactivation context, and one for the activation contextspecified by the process. Although the data structure usedfor this purpose is not known, it begins with a distinctmagic value of Actx.

While Windows 7 adds numerous new fields to the PEB,only two commonly provide useful metadata. These are thepContextData and WerRegistrationData fields. Although thename of the first field would suggest it is related to acti-vation contexts, this does not seem to be the case, as it doesnot start with the magic value of Actx. As such, the purposeof this allocation is not known. The second field relates todata used for Windows Error Reporting functionality, andwhile the data structure used for this purpose is unknown,it is commonly paged out or for the most part unpopulated.

As demonstrated by the vast amount of metadata thePEB provides, it is invaluable source for determining thecontent of user allocations. Although a portion of theseallocations relate to default data structures located in everyprocess, this knowledge allows the exclusion of theseallocations when their information is not required. Many ofthe allocations identified however are directly related toapplication data, such as the heap, allowing these alloca-tions to be searched for key pieces of user information, suchas login credentials.

4. Implementation

The interpretation techniques discussed in the previoussections have been implemented as a plugin for the Vola-tility Framework (Volatile Systems, 2011). Volatility waschosen for the implementation as it not only provides thebase functionality required for memory analysis, such asmemory address translation, but also provides numerousplugins covering functionality described in previousresearch. This allowed the development to focus only onimplementing new functionality, rather than spendingtime reimplementing previous research.

Our plugin, Userspace.py, uses all of the aforementionedmetadata sources to describe the contents of allocationswithin user space memory. This is achieved by followingthree main steps. Firstly, the specified process is locatedand its VAD Tree is parsed, using previous Volatility func-tionality. Secondly, the PEB and TEBs of the process arelocated, and the metadata within them is used to describethe relevant allocations. Finally, the control areas of eachVAD are parsed to locate file objects, and then the handletable parsed to link the remaining undescribed user allo-cations to relevant section objects.

The development of this tool was conducted in parallelwith the research, in order to provide an easy way of testingand validating hypotheses formed about the contents ofvarious data structures. Small, custom programs with

known functionality were created for the purposes ofanalysis, before moving onto more complex, existingapplications. Windows XP SP3 and Windows 7 SP1 werethe versions of Windows onwhich this research was based,and as such numerous memory dumps of this operatingsystem were made to facilitate development. Thesememory dumps were all created using multiple virtualmachines, with a variety of different memory sizes.

An example of the output from this implementation isshown in Fig. 4. This demonstrates the plugin runningagainst malloc.exe, a simple C application that allocatesa few KBs of memory, fills it with data and then sleeps. Amore complex example using the Windows 7 version ofnotepad.exe is shown in Fig. 5.

The plugin is available at http://github.com/a-white/.

5. Tool evaluation

The evaluation of the aforementioned implementationwas undertaken in three ways. Firstly, the assumption thata VAD tree completely describes a user space was tested, asthe plugin relies on the VADs when identifying the userallocations. Secondly, the correctness of the plugin wasvalidated through the extensive comparison of the resultsobtained against manual analysis of the involved datastructures using existing tools. Lastly, once the results wereknown to be accurate, the plugin was then comparedagainst existing approaches in order to investigate thebenefit obtained from the use of the plugin.

All evaluations took place using Windows XP SP3 andWindows 7 SP1 virtual machines and the memory dumpsmade using these virtual machines. A variety of differentvirtual machine software was employed for this purpose.

5.1. VAD tree completeness

The outlined approach and subsequent plugin rely onthe assumption that the VADs can be used to identify alluser allocations within a virtual address space. If this werenot the case, the VAD tree would not provide a suitableframework within which to analyse user space memory, asit would not allow the complete identification of userallocations. Without first identifying these user allocations,their purpose cannot be determined, affecting thecompleteness of the results from the plugin.

In order to validate this assumption, a simple experi-ment to test the inverse was devised. A simple Volatilityplugin, VadCompleteness.py was created, which checksevery process to see if a virtual address not described bya VAD maps to a physical memory page. This plugin wasthen used to analyse a series of Windows XP SP3 andWindows 7 SP1 memory images.

The surprising result of this analysis was that the VADsnever completely described the user space of a process. Allprocesses running on aWindows XP SP3 orWindows 7 SP1system were found to have a single page mapped at theaddress 0x7FFE0000 which was not described by a VAD. Inaddition to being mapped at the same virtual address inevery process, this virtual address always corresponded tothe same physical page, meaning it was identical in eachprocess.

A. White et al. / Digital Investigation 9 (2012) S3–S12 S7

Page 7: Surveying the user space through user allocations...space ranges, which we term user allocations. The memory ranges which these user allocations occupy are described within a process

This page is in fact the _KUSER_SHARED_DATA datastructure, which is shared between all virtual memoryaddress spaces. Numerous systemwide variables are storedwithin this structure, such as the system time, the time-zone, and the location of the system root.

Aside from this one page however, the VADs correctlydescribed which virtual memory ranges mapped to phys-ical memory pages for every process. While this demon-strates that the VAD tree provides a suitable framework forthe analysis of user space memory, special care must betaken to first ensure that no additional addresses outsidethe memory ranges described by the VADs map to physicalmemory pages. For this reason,Userspace.py checks to see ifany virtual addresses that are not described by a VAD mapto a physical address, and outputs any such addresses.

5.2. Plugin validation

Once the accuracy of the VAD based approach had beenverified, the validation process then turned to the pluginitself. Validating the results of the plugin required the indepth analysis of the data structures involved, and someexisting tools were employed to achieve this.

The main tool employed was WinDbg (Microsoft, 2010),a Microsoft debugging program, which was used to analysethe data structures contained within Windows memory.This tool was used as it provides a way to access thenumerous operating system data structures described bythe Windows symbol files, and use these data structures to

dump the specified address inmemory. All validation of theresults obtained from the plugin was performed by usingWinDbg in this way to manually analyse specific datastructures, and ensure that the resulting output from theplugin accurately represented these data structures. Onelimitation of WinDbg is that it only operates on crashdumps, a Windows specific memory image format, whichprevented the use of other memory capture methods.

A selection of tools from the Sysinternals Suite(Russinovich, 2012) were also used in validating theresults of the plugin. Since each of the tools were designedfor live response, not memory analysis, this required thatthey be run on the virtual machine while simultaneouslyexperimenting with test applications. Of these tools,VMMap, WinOBj and NotMyFault were the most heavilyused. The VMMap tool provided a high-level view of thecontents of each user allocation, which served as a usefulstarting point for analysis. The WinObj tool provideda method of exploring the object manager, allowing thevalidation of the information retrieved from the objectmanager through Volatility. Finally, the NotMyFault tool,which can cause a variety of system errors, was essentialfor creating the memory dumps in crash dump format forWinDbg.

5.3. Plugin comparison

In order to compare the Userspace.py to the existingapproaches, this required a tool that implemented all of

Fig. 4. Userspace.py running against malloc.exe on Windows XP. Asterisks indicate allocations described by Existing.py.

A. White et al. / Digital Investigation 9 (2012) S3–S12S8

Page 8: Surveying the user space through user allocations...space ranges, which we term user allocations. The memory ranges which these user allocations occupy are described within a process

these approaches. Unfortunately, no such tool existed,although the majority of the functionality required forthese approaches had been implemented in Volatility. Thismeant a plugin to easily replicate this behaviour could becreated. For the purposes of this section, we created Exist-ing.py. This plugin identifies all user allocations using theVAD tree, checks each of these VADs for control areas withfile objects, locates the process parameters from the PEB,and locates the stacks from the TEBs.

Since the plugins describe the contents of user spacememory for a particular process, a set of processes wererequired for analysis. Four processes of increasingcomplexity were chosen for this purpose. The first, mallo-c.exe, was a simple C program created during developmentof the plugin. The behaviour of this program is very simple,it allocates a few KBs of memory, fills it with constantvalues, then sleeps for 20 s. The second, calc.exe, is thedefault calculator program for Windows. For each memory

Fig. 5. Userspace.py running against notepad.exe on Windows 7. OS.

A. White et al. / Digital Investigation 9 (2012) S3–S12 S9

Page 9: Surveying the user space through user allocations...space ranges, which we term user allocations. The memory ranges which these user allocations occupy are described within a process

image, the calculator was opened, and a simple calculationperformed. The third, notepad.exe, is the simple text editingprogram that comes preinstalled with Windows. In eachcaptured memory image, notepad was used to open a newdocument, enter a paragraph of text, and then save the file.The final and most complex program chosen was iexplor-e.exe, the preinstalled web browser on Windows, whichwaswas opened and allowed to completely load the defaulthome page for each memory image. For the programsincluded by default on Windows, the default version afterinstallation was used, no updates were applied.

The results of comparing Existing.py and Userspace.pyover a series of memory images on both Windows XP andWindows 7 can be seen in Fig. 6. For each analysedprogram, the average number of user allocations for thisprogram across the memory images is shown, followed byon average how many of those entries were able to bedescribed by the Existing.py and Userspace.py plugins.

6. Discussion

As can be seen from Fig. 6, our plugin provides a signif-icant improvement on the existing approaches in terms ofdetermining the contents and roles of user allocations.Since our plugin implements a superset of the existingapproaches, there is no scenario in which the existingapproaches could outperform our plugin.

Comparing the results across the two operating systemsprovides some interesting results. Since the defaultversions of each Windows program were used in theexperiments, this meant that the custom malloc.exeprogramwas the only program to remain identical for bothoperating systems. For this simple program, the number ofmemory allocations decreased, partially due to the lack ofdedicated allocations for the environment and processparameters data structures and partially due to reductionin loaded .nls files.

For the default Windows applications however, thenumber of allocations doubled for calc.exe and note-pad.exe, and increased by 80% for iexplorer.exe. As is quiteobvious from running both versions of Internet Explorer onthe two operating systems, iexplorer.exe has changeddramatically between the two versions of Windows, andcan almost be treated as a different program. Despite thedifferences in code between these versions however, theresults for both versions of Internet Explorer remainedalmost the same.

Unlike Internet Explorer, Notepad seemed identicalbetween Windows versions save for the updated visualtheme on Windows 7, making the increase in allocationsseem as purely the result of additional operating systemcomplexity introduced by Windows 7. Even with thisadditional complexity however, the results favoured theWindows 7 version of notepad with a 10% increase inidentified allocations for userspace.py over the WindowsXP version. Given the similar increase in the results forexisting.py however, this is likely due to an increase in theDLL files required for the Windows 7 version.

While for Internet Explorer and Notepad the identifiedallocations both improved across versions of Windows, theWindows 7 version of Calculator demonstrated a signifi-cant drop. While like Notepad, the visual theme ofCalculator was updated for Windows 7, unlike Notepad italso introduced new functionality. Since none of this newfunctionality was used when taking the memory imageshowever, it is unlikely that this is the cause of the increasein unexplained allocations. Instead, this would suggestthat some alternate method is being used to allocate ormanage memory, for which no metadata has yet beenidentified.

Being able to accurately describe the contents and rolesof user allocations allows the significant reduction of thesearch space when looking for specific information. Insteadof analysing the whole user space, only a small subset nowrequires analysis. Depending on the information beingsearched for, this subset could range from a few allocationsright down to a specific data structure.

In addition, it facilitates the analysis of data not previ-ously possible. For example, some newly identified datastructure could be used to identify some artefact previouslyoverlooked, or the shared memory sections could revealinformation about process or driver communication thatcould prove useful in investigations.

6.1. Limitations

While the tool is capable of providing information aboutthe contents of a high percentage of user allocations, it cannot explain every possible allocation. With the increase inthe complexity of the application, it can be seen that thecompleteness of the descriptions offered by the plugindecreases. Since it was not clear how the increasedcomplexity of the programwas affecting the results, a smallexperiment focussing on notepad.exe was performed.

Fig. 6. Allocations identified by existing.py and userspace.py.

A. White et al. / Digital Investigation 9 (2012) S3–S12S10

Page 10: Surveying the user space through user allocations...space ranges, which we term user allocations. The memory ranges which these user allocations occupy are described within a process

For each memory image, as before, notepad was openedand a paragraph of text was entered. This time however,a memory capture was taken prior to saving the file, thenanother memory capture taken after the file had beensaved. The results of this experiment can be seen in Fig. 7.As can be seen from the results, saving the file causedsignificantly more user allocations to be made, particularlyonWindows 7. After saving the file, in terms of known userallocations, more files were mapped, more threads createdand one additional section was allocated when comparedto before saving. Numerousmore unknown user allocationswere also created. In terms of functionality, saving requiredsignificantly more GUI activity, such as selecting where ondisk to save the file, and device driver interaction, in orderto save the file to the disk.

When comparing these results to malloc.exe, which asa terminal application has no GUI component, it can beseen that increased GUI activity is one reason for thedecrease in completeness of the results. Manual analysis ofunknown allocations also confirms this, as GUI relatedterms such as Combobox appear in many of them.

Aside from concerns about the location of GUI relatedinformation, there were also other potential causes ofunexplained user allocations.

As with any sort of memory analysis, paging is an issue.Since Volatility does not support accessing an accompyingpagefile, thismeans that anydata thathasbeenpaged todiskis inaccessible. While the majority of the metadata sourceswill always remain inmemory, there are some sources, suchas TEBs and heaps, that can be paged to disk, preventing theextraction of their relatedmetadata. Combining the page filewith the memory image would allow this problem to beovercome, but would likely introduce new inconsistencyissues between the state of the two data sources.

One potential issue could the fact that data structuresare being capturedwhile they are beingmodified. Althoughvirtual machines were being used, which are capable oftaking a snapshot of memory at an instantaneous point intime, it is likely that at this point, some part of memory wasin the process of being freed. Since there are potentiallymany data structures that require modification to removea single user allocation, the lack of information couldpotentially be the result of a lack of consistency betweenthese data structures. For example, although rare, instancesof control areas were found where the segment pointer didnot point to a segment but pointed to some other unknowndata structure. The same kind of inconsistency couldpotentially occur with the virtual to physical addresstranslation process. While these sorts of inconsistenciescan be detected through sanity checks of the object refer-ences, there is not enough redundancy in the metadata toallow their correction.

A more likely cause of these unexplained user alloca-tions however is the use of some undocumented WindowsAPI that has not accurately been accounted for. This couldmean that metadata about these allocations exist, but theysimply reside in a data structure that has yet to be under-stood, or the link to the data structure that provides themetadata has not been found.

One example whereby the metadata exists but is refer-enced by an unknown data structure is with mappedallocations. Although the vast majority of these mappedallocations can be explained through the use of a file objector section object as described in Section 3.1, there exist VADstructures, which point to a valid pair of control area andsegment objects, for which no file or section objects exist.This could potentially be caused by two factors, eithera type of object that has not been accounted for is refer-encing the segment, or an unknown source of sectionobjects has not been examined. More research into theinternals of the Windows memory manager would berequired to overcome this issue.

7. Related work

The majority of previous research in memory forensicshas focused on describing the kernel, and few contributionshave been made that further the ability to describe thecontents of user space.

Memparser (Betz, 2005) was one of the first memoryanalysis tools, and was capable of retrieving some userspace information, such as the modules loaded and processparameters by using the PEB. Although the tool was capableof dumping this information, it provided no further userspace analysis except for dumping the accessible pages.

Dolan-Gavitt (2007) was the first to allow the explora-tion of the user space by retrieving the VADs froma memory image. Aside from describing the VAD datastructures and how to determine if a VAD representeda file, the analysis of these VADs was taken no further.

Arasteh and Debbabi (2007) used the user and kernelstacks to recreate the execution history of a process.Although they provide a method of locating the user stackof a thread, the location of other user space data structureswas not in the scope of their research. However, they dopresent an alternative method of finding the executable ofa process, locating the executable file through the use ofa section object.

Other research, while not expanding on the ability todescribe the user space, has focused on analysing the userspace without explicit knowledge of the data structuresinvolved.

Hejazi et al. (2009) analysed API call traces on the stackto locate useful data structures, such as those dealing with

Fig. 7. Analysis of the effect of saving on the results.

A. White et al. / Digital Investigation 9 (2012) S3–S12 S11

Page 11: Surveying the user space through user allocations...space ranges, which we term user allocations. The memory ranges which these user allocations occupy are described within a process

encryption. This however was performed without anyknowledge of how data was stored in the user space,limiting their ability to retrieve user data without signifi-cant reverse engineering of the API call.

Specific applications have also been analysed, such asSkype (Simon and Slay, 2010) and Pidgin (Simon and Slay,2011) for data of interest, such as chat logs. One keyprocess in the analysis of Pidginwas the comparison of howindividual pages within an address space changed overtime. The use of VAD allocations could have significantlyimproved this process, allowing the removal of irrelevantpages such as those related to known DLL files. Given thetime required to analyse individual applications however,this approach does not scale to the huge range of potentialapplications that a target computer could be running.

To foster thedevelopmentofmemoryanalysis, numeroustoolkits have been developed over the years (Okolica andPeterson, 2010; Petroni et al., 2006; Walters and Petroni,2007). The Volatility Framework (Volatile Systems, 2011)however, has become the de facto standard for memoryanalysis, with its extensive plugin support allowing thesimple creation of new analysis tools. Numerous of thepreviously mentioned techniques have already been reim-plemented as plugins for Volatility, and it is for this reasonthat we chose to implement our work as a Volatility plugin.

8. Conclusion

This paper has presented an approach to determine thecontents and roles of user allocations. The user allocationsdescribed by the VAD tree have been shown to be a reliableframework upon which to describe the user space, anda detailed analysis of how various sources of metadata canbe used to further describe these user allocations has beenshown. A sample implementation of this approach wasthen provided, the results of whichwere validated and thendemonstrated to be able to consistently describe thecontents and roles of a high percentage of user allocationseven when dealing with complex applications.

8.1. Future work

While the provided plugin is capable of describing themajority of the user allocations of a process, it is not capableof describing all of them. More research is required tolocate the reasons for these unexplained user allocations,and update the plugin accordingly. Analysis of thesusceptibility of these techniques to malicious modificationis also required.

In addition, the presented work needs to be extended tosupport Windows Vista and the soon to be releasedWindows 8, as well as the 64-bit versions of the currentlysupported operating systems. This will involve the location

of any new artefacts, as well as the verification that theexisting artefacts have remained the same. The plugin canthen be updated to account for these additional versions ofWindows and validated against them.

References

Arasteh AR, Debbabi M. Forensic memory analysis: from stack and code toexecution history. Digital Investigation 2007;4(S1):114–25.

Betz C. MemParser, http://www.dfrws.org/2005/challenge/memparser.shtml; 2005.

Dolan-Gavitt B. The VAD tree: a process-eye view of physical memory.Digital Investigation 2007;4(S1):62–4.

Hejazi SM, Talhi C, Debbabi M. Extraction of forensically sensitive infor-mation from windows physical memory. Digital Investigation 2009;6(S1):121–31.

Microsoft. Debugging tools for Windows 32-bit version, http://msdn.microsoft.com/en-us/windows/hardware/gg463016; 2010.

Nasarre C. Detect and Plug GDI Leaks in your code with two powerfultools for Windows XP, http://msdn.microsoft.com/en-au/magazine/cc188782.aspx; 2003.

Okolica J, Peterson GL. Windows operating systems agnostic memoryanalysis. Digital Investigation 2010;7:48–56.

Petroni NL, Walters A, Fraser T, Arbaugh WA. FATKit: a framework for theextraction and analysis of digital forensic data from volatile systemmemory. Digital Investigation 2006;3(4):197–210.

Russinovich M. Windows sysinterals, http://technet.microsoft.com/en-us/sysinternals; 2012.

Russinovich ME, Solomon DA. Microsoft Windows internals. 4th ed.Redmond, Washington: Microsoft Press; 2005.

Simon M, Slay J. Recovery of Skype application activity data from physicalmemory. In: Proceedings of the 5th International Conference onAvailability, Reliability, and Security; 2010. p. 283–8.

Simon MP, Slay J. Recovery of pidgin chat communication artefacts fromphysical memory: a pilot test to determine feasibility. In: Proceedingsof the Sixth International Conference on Availability, Reliability andSecurity. IEEE; 2011. p. 183–8.

Volatile Systems. The volatility framework: volatile memory artifactextraction utility framework, https://www.volatilesystems.com/default/volatility; 2011.

Walters A, Petroni N. Volatools: integrating volatile memory forensicsinto the digital investigation process. Black Hat DC; 2007.

Andrew White is currently a full-time PhD Student at the InformationSecurity Institutewithin the Queensland University of Technology, locatedin Brisbane, Australia. His research primarily focuses on memory analysistechniques, and how they can be utilised to find evidence of malware.

Dr. Bradley Schatz is the director of the independent digital forensicsconsultancy Schatz Forensic, and an adjunct Associate Professor at theQueensland University of Technology (QUT). Dr. Schatz’ forensic practiceprovides forensic services primarily to the legal sector, where his advice issought in relation to matters ranging from intellectual property theft tocomputer intrusions. Bradley’s research currently focuses on digitalevidence in control systems, enterprise environments, and volatilememory.

Dr. Ernest Foo is an active researcher in the area of information andnetwork security. Dr. Foo has worked extensively in the field of electroniccommerce protocols investigating secure protocols for electronictendering and electronic contracting in the Australian constructionindustry. Dr. Foo has broad interests having published in the area of formalanalysis of privacy and identity management protocols as well asproposing secure reputation systems for wireless sensor networks.Recently Dr. Foo has been looking into research in the area of secureSCADA systems and memory forensics.

A. White et al. / Digital Investigation 9 (2012) S3–S12S12