8

Click here to load reader

[IEEE 2013 Eighth International Conference on Availability, Reliability and Security (ARES) - Regensburg, Germany (2013.09.2-2013.09.6)] 2013 International Conference on Availability,

Embed Size (px)

Citation preview

Page 1: [IEEE 2013 Eighth International Conference on Availability, Reliability and Security (ARES) - Regensburg, Germany (2013.09.2-2013.09.6)] 2013 International Conference on Availability,

Model-Based Generation of Synthetic Disk Imagesfor Digital Forensic Tool Testing

York Yannikos, Christian Winter

Fraunhofer Institute for Secure Information Technology SIT

Darmstadt, Germany

{firstname.lastname}@sit.fraunhofer.de

Abstract—Testing digital forensic tools is important to deter-mine relevant tool properties like effectiveness and efficiency.Since many different forensic tool categories exist, differenttesting techniques and especially suitable test data are required.Considering test data for disk analysis and data recovery tools,synthetic disk images provide significant advantages comparedto disk images created from real-world storage devices.

In this work we propose a framework for generating syntheticdisk images for testing digital forensic analysis tools. The frame-work provides functionality for building models of real-worldscenarios in which data on a storage device like a hard disk iscreated, changed, or deleted. Using such a model our frameworkallows simulating actions specified in the model in order togenerate synthetic disk images with realistic characteristics.These disk images can then be used for testing the performanceof forensic disk analysis and data recovery tools.

Keywords-Synthetic disk image generation; forensic tool test-ing; disk analysis tools; Markov chains; model-based simulation

I. INTRODUCTION

Today one challenge in digital forensics is how to deal with

large amounts of data. Forensic analysis tools must be capable

of processing hard disks with several terabytes efficiently

while keeping false positive and false negative rates low.

Since many open source as well as commercial tools exist for

different forensic analysis procedures it is crucial to measure

tool performance, including effectiveness and efficiency, by

using unified testing strategies.

Equally important to the testing strategies is test data.

Finding suitable test data for the very different categories

of forensic analysis tools can be difficult: To allow compre-

hensive tool testing an extensive knowledge about the data

is required, i. e. what kind of traces, attacks, evidence are

contained in the data. Unfortunately, only little information

is available about most real-world data used for tool testing.

In this work we propose a framework which allows gener-

ating synthetic disk images to test digital forensic tools for file

system analysis and data recovery. Each synthetic disk image

generation process is based on a model which has to be built

in advance. This can also be done with our framework.

In the following Sect. II we provide a short overview about

the current state of forensic tool testing. Sect. III summarizes

the advantages and disadvantages of using synthetic data and

real-world data for testing. In Sect. IV a short definition

This work was supported by CASED (www.cased.de).

of Markov processes is given. We describe our model-based

approach in Sect. V followed by the description of our frame-

work architecture in Sect. VI. In Sect. VII we demonstrate

the model building process using a sample scenario. An

evaluation of our framework is given in Sect. VIII followed

by a conclusion in Sect. IX.

II. TOOL TESTING IN DIGITAL FORENSICS

Tool testing is especially important in digital forensics:

Investigators rely on results produced by forensic analysis

tools and findings of evidence data is often used in court.

The National Institute of Standards and Technology created the

Computer Forensics Tool Testing (CFTT) program where com-

mercial as well as open source forensic tools are continuously

being tested [1]. A collection of recent test results for disk

imaging software, tools for forensic media preparation, hard-

and software write blockers, and mobile forensics software is

given in [2].

A methodology for validating and verifying digital forensic

tools with focus on search functionality has been proposed

in [3]. The authors use their methodology in [4] to present

a validation/verification framework for tools used to create

forensically sound copies of evidence data.

III. TEST DATA

Digital forensic tool testing needs suitable test data to

provide reliable results. Test data can either be data collected

from real-world samples like a seized storage device or it can

be synthetic data especially created for testing purposes. An

important question is whether real-world data or synthetic data

should be used for digital forensic tool testing.

A. Real-world Data

The main advantage of real-world data is that the data

characteristics are relevant for real-world cases. Hence real-

world data seems to be an ideal basis for testing digital forensic

tools. However, real-world hard disk data can be difficult to use

due to legal restrictions or availability. Generally, real-world

data could be easily collected through simply buying used

hard disks. Their previous owners often think to have their

data deleted just by formatting the hard disk, by removing

the partition layout, or sometimes even by using the trash

bin of the installed operating system. However, using suitable

forensic tools the chances of recovering most or all of the data

2013 International Conference on Availability, Reliability and Security

978-0-7695-5008-4/13 $26.00 © 2013 IEEE

DOI 10.1109/ARES.2013.65

498

Page 2: [IEEE 2013 Eighth International Conference on Availability, Reliability and Security (ARES) - Regensburg, Germany (2013.09.2-2013.09.6)] 2013 International Conference on Availability,

is very high. Nevertheless, in some countries exist very strict

privacy laws which state that any private data of a third party

must not be used for any purpose if the owner did not provide a

usage permission or if he simply showed the intention to have

his data deleted (even if the deletion itself was not successful).

Other sources of real-world data are hard disk images

voluntarily provided by third parties. However, the number

and size of freely available real-world hard disk images is

rather small. Real-world hard disks or disk images from

forensic investigation cases are also rarely available for other

parties than law enforcement agencies. With more than 30

TB probably one of the largest collections of real-world data

available for research is described in [5].

Even if real-world hard disk data is available it may be

of only limited use for testing purposes since many different

categories of digital forensic tools exist. For instance a real-

world disk image which does not include many deleted files

provides no suitable test set for testing file recovery tools.

Additionally, no ground truth is known about such a disk

image: There is no better knowledge about what data exists

on the disk image than that provided by the most accurate

recovery tools.

B. Synthetic Data

The advantage of using synthetic disk images for digital

forensic tool testing is having full knowledge about what data

exists in the image. Even the origin and type of small file frag-

ments are known in contrast to real-world disk images where

such knowledge can hardly be achieved. Another advantage is

the availability: Synthetic disk images can be created on de-

mand, with user-defined characteristics, in arbitrary numbers,

and with arbitrary size. This enables a digital forensic tool

tester to generate the data with exactly those characteristics

required for specific test cases without having to search for

suitable real-world data.

Note that synthetic data is only as realistic as the underlying

model used for the data generation. The more accurate the

model represents real-world characteristics the closer is a

simulation based on that model to the corresponding real-

world process, and therefore the more realistic is the data

generated during the simulation. Building a suitable model

with nearly real-world characteristics is crucial for synthetic

data which aims to compete with real-world data.

In [6] the authors presented a tool set for generating

synthetic disk images using virtual machines. In a more recent

work the authors evaluated their tool set by using it for student

education in forensic computing [7]. The authors mention that

the process to generate a synthetic disk image with their tool

set has to be programmed in advance and therefore requires

specific programming skills from the user. In comparison, our

framework uses a high-level model based on Markov processes

for simulating user actions and provides native support for

file system creation/operations without using virtualization.

This allows creating very different synthetic disk images

for identical scenarios in an efficient way without requiring

additional programming.

A small set of synthetic disk images usable for testing

purposes is available at [8]. No information is given about

how these disk images have been created so we assume that

standard Unix tools have been used.

IV. MARKOV PROCESSES

Our framework for synthetic disk image generation is based

on 3LSPG which we proposed in [9]. With our framework we

first build a model where we use Markov processes to simulate

real-world actions for a specific scenario: We define a set of

actions to be simulated together with a probability for each

action to be performed. For possible transitions between two

actions, i. e. when an action may be performed immediately

after another, we calculate the transition probabilities.

Each Markov process we use within a model is defined by

an aperiodic irreducible discrete-time Markov chain consisting

of a non-void finite set of states S (i. e. a set of actions), a

state probability vector π (i. e. the probability for each action

to be performed), and a state transition probability matrix P(i. e. the probabilities to transit from one to another action).

Additionally, π = π · P holds due to the Markov chain

properties. π can be easily calculated if P is given – however,

calculating P for a given π is not trivial.

When building a model for simulating real-world actions

we think that it is more reasonable to define states (actions)

with a given state probability rather than with a given state

transition probability. Therefore, we assume that π is given

and a suitable P has to be found. For this problem we

presented an efficient solution in [10].

V. MODEL-BASED SYNTHETIC DISK IMAGE GENERATION

To generate synthetic disk images with user-specified char-

acteristics we propose a framework which can be used for

modeling and simulating specific real-world actions on a hard

disk responsible for data creation or modification. Examples

for such actions are downloading, copying, removing, or over-

writing a file. The real-world actions are typically performed

by one or several users of a computer system where the hard

disk is installed.

In order to generate a synthetic disk image with our frame-

work, we first build a model which describes a specific real-

world scenario including a set of actions. After building the

model we use our framework to run a simulation based on

the model, i. e. simulating all actions specified in the model,

whereby a synthetic disk image is generated which can be

used for testing purposes. This process is depicted in Fig. 1.

In the following we propose basic components for modeling

real-world actions on hard disks which are relevant from a

digital forensic point of view.

A. Components for Model Building

Our framework allows creating specific models representing

different real-world scenarios. It provides three component

categories for model building which are described in the

following.

499

Page 3: [IEEE 2013 Eighth International Conference on Availability, Reliability and Security (ARES) - Regensburg, Germany (2013.09.2-2013.09.6)] 2013 International Conference on Availability,

������������ ��

������������

������� � �� ����������� �� �����������

���������������� ������ �� ����������������� ��

Fig. 1. Process for generating a synthetic disk image based on a real-worldscenario

• Actions: An action is a state i within a process where

a specific activity is performed. In Markov processes

each state occurs with a specific probability πi and is

successor and predecessor of at least one other state.

Therefore, in a model built with our framework no state

probability is equal to 0. Additionally, for each possible

transition between two states a state transition probability

0 < pij ≤ 1 is specified. In linear processes state

probabilities and state transition probabilities are not

relevant and therefore not defined.

• Processes: A process denotes a sequence of actions to

be performed in a simulation. The actions are performed

either in a linear process, i. e. only a single time and

one after another, or in a Markov process, i. e. a spec-

ified number of times and based on the state transition

probabilities. Processes can either be globally active or

inactive which means that they are either run during the

simulation or ignored at run time.

– Linear Process: A linear process denotes a sequence

of actions including one start action and one end

action. Each action is performed exactly one time

and has exactly one predecessor (except the start

action) and one successor (except the end action).

Linear processes are useful for performing prepa-

ration steps for a subsequent Markov process or

finishing tasks after a simulation. For example a

linear process can be used for initially creating an

empty disk image file of a specific size and then

formatting the image file with a file system before

starting to simulate file system activity using Markov

processes.

– Markov Process: Markov processes are discrete-time

processes used for simulating actions which result

in synthetic data. Each Markov process within a

model is defined as a set of states (actions) with

state transitions and corresponding state probabilities

as well as state transition probabilities. The state

probabilities are defined while building the model

and used to calculate the state transition probabilities

with linear programming as shown in [10].

• Global Objects: A global object denotes a specific object

which is accessible or reachable at any time and by any

action within a model. For example when simulating

actions on a synthetic disk image, the disk image file

itself would be defined as global object.

B. Model Building Process

In order to simulate a real-world scenario with our frame-

work a suitable model must be built using the previously

described components. In the following the steps for model

building are described.

1) Subject definition: In the first step the simulation sub-

jects performing different actions are defined. In the

corresponding real-world scenario the subjects are e. g.

users of a computer system with a hard disk where the

disk image file is created from.

2) Global object definition: In the second step all global

objects required for performing actions are defined. At

least the resulting synthetic disk image file has to be

defined as global object.

3) Process definition: In this step the number and type of

processes within the model are defined.

4) Process sequence definition: In this step the sequence

of all defined processes is defined. Linear processes

are typically used for pre- or post-processing Markov

processes where the simulation is done. Within a model

each process can have at most one predecessor and one

successor. A linear process may also succeed another

linear process: This enables creating several linear pro-

cesses with different outcome which could be set active

or inactive at run time.

5) Action definition: The next step in building a model for

synthetic disk image generation is defining the actions

which are to be simulated. Such actions are performed

on the synthetic disk image and are typically file system

operations or direct writes to the image file. Each action

must be part of a previously defined process.

6) Action transition definition: After subject and action

definition the possibilities to transit from one action to

another are defined. Actions can only transit from one

to another within the same process.

7) Probability derivation: In this step the probabilities for

actions to be performed are defined (state probabilities),

e. g. by deriving them from statistics about file system

operations, from studies about user behavior, or from

assumptions about fictive scenarios.

8) Probability calculation: Based on the previous step the

probabilities to transit from one action to another are

calculated (state transition probabilities). This is done

500

Page 4: [IEEE 2013 Eighth International Conference on Availability, Reliability and Security (ARES) - Regensburg, Germany (2013.09.2-2013.09.6)] 2013 International Conference on Availability,

automatically within the framework using linear pro-

gramming. Also, the feasibility of the Markov processes

is checked.

VI. FRAMEWORK ARCHITECTURE

Our framework incorporates functionality for model build-

ing and simulation. Global Objects and Actions are im-

plemented as modules using a unified interface which we

specified to allow an easy extension of the framework. The

framework has been implemented in Java SE 1.7 to ensure

cross-platform compatibility.

In the following we describe the specific framework mod-

ules we implemented:

A. Global Object Modules

The following Global Object modules have been imple-

mented:

• Synthetic Disk Image File: A synthetic disk image file

denotes a file which is initially created as a byte sequence

consisting of a user-specified number of null bytes. An

image file can (but does not have to) be formatted with

a file system in order to write a file/directory structure

on the image file during a simulation. Additionally, raw

binary data without any file system information can be

written directly to the image file. After applying a model

and running all linear/Markov processes, the resulting

synthetic disk image file can be used for testing.

• File Pool: A file pool denotes a source of files to be used

during simulation. Actions performed on a synthetic disk

image during a simulation typically include writing files

to the image. With our framework such files can either

be downloaded from the Internet or taken from a local

file pool, e. g. an intranet file storage server or specific

directories on a local hard disk.

B. Action Modules

The following Action modules have been implemented:

• Wait: The wait action is a placeholder action where

nothing relevant happens.

• Create File System: This action formats a disk image with

a specific file system. Currently we implemented native

support for the FAT16, FAT32, and ext2 file systems.

• Create File: This action writes a file to the disk image

using the underlying file system. The file can either be

taken from a file pool, or can contain predefined static

data or random data.

• Delete File: This action removes a file from the disk

image using the underlying file system. Therefore a file

which has been removed by this action still exists on

the disk image (if not overwritten yet) but actually the

corresponding file references in the file system meta data

have been removed or marked as deleted.

• Write Raw Data: This action is the same as the CreateFile action but without using the file system, i. e. writing

data using this action is equal to using the dd tool to

write to a block device. We can use this e. g. to simulate

disk corruptions.

• Download File: This action downloads a random file

from the Internet and stores it on the disk image like

the Create File action. Examples for a random file are

freely available pictures from an online picture database

or binary files from open source software repositories.

• Export Disk Image: This action exports the generated

synthetic disk image.

• Export Disk Image Map: This action exports an image

map which provides information about the location of

each file or file fragment on the generated synthetic disk

image. The image map allows a very detailed comparison

of analysis results from different forensic tools with the

actual data existing on the disk image. This provides a

significant advantage compared to using real-world disk

images.

• Import Disk Image: This action imports an already gener-

ated synthetic disk image with its image map (if available)

for further modification.

C. General User Interface

The framework provides a user interface for model building

and simulation. The user interface also includes a logging

component which provides detailed information for each per-

formed action and process together with a time stamp. The

data provided by the logging component is also used for

creating the disk image map if required. In Fig. 4 a screenshot

of the user interface is shown.

VII. GENERATING A DISK IMAGE FOR A SAMPLE

SCENARIO

In the following we describe a sample real-world scenario

and provide a suitable model in order to generate a synthetic

disk image with our framework.

A. Scenario

Our example scenario focuses on testing different file

carvers. Many commercial as well as open source file carvers

exist – however, there is no comprehensive comparison of

their performance available. File carving tools basically search

raw binary data for header/footer byte signatures of previously

deleted files in order to recover them. While simple file carvers

assume that any data between a found header and footer is not

fragmented and belongs to the same file, advanced file carvers

are also able to handle a specific amount of fragmentation.

In our sample scenario we assume that disk images with the

following properties provide suitable test data for file carvers:

• The disk image contains a significant number of deleted

files without any reference from a potentially existing file

system.

• The number of the different file types is large for the

deleted files.

• A significant amount of the deleted files is fragmented.

• For fragmented files the number of fragments lies within

a specific interval, e. g. 2 to 10 fragments.

501

Page 5: [IEEE 2013 Eighth International Conference on Availability, Reliability and Security (ARES) - Regensburg, Germany (2013.09.2-2013.09.6)] 2013 International Conference on Availability,

������������������ !

"�#��� �������� ������ � !

$����� ����������� !

%�&�� ������������!

'�#��� ���������� !

(����������������� !

)�*�� ������ ������ !

Fig. 2. Example set of actions to create a synthetic disk image withfragmented files

Fig. 2 shows a possible first set of actions performed on a

synthetic disk image in order to create deleted and fragmented

files:

1) Create an initial empty disk image file

2) Format the disk image file with a file system (creating

a reserved area for file system meta information at the

beginning of the image file)

3) Download a random file from the Internet and store it

on the disk (creating a reference in the file system)

4) Create a second file on the disk, e. g. by copying it from

a specific source (again creating a reference in the file

system)

5) Delete the downloaded file from the disk (removing the

file system reference but not overwriting the file)

6) Download a third file and store it on the disk (creating

a reference in the file system)

7) Write a small number of random data fragments to the

disk without respecting file system structure (overwriting

any existing data and therefore fragmenting files)

After applying such a set of actions the resulting disk image

can be used to test the performance of different file carvers.

However, all but the first two steps have to be repeated several

times to create synthetic disk images with a reasonable size

and a significant number of files and file fragments on it.

B. Model

For the described sample scenario we use our framework

to build a suitable model. Based on that model we can then

generate a synthetic disk image file which can be used as

test data for different file carvers. Additional meta information

should also be generated along with the disk image file,

e. g. to reuse the disk image file in another simulation with

our framework. In the following we apply the previously

introduced steps of the model building process.

First we define a typical user of a computer system with an

installed hard disk as subject responsible for the actions to be

simulated. For our model the relevant actions performed by

the user are those file system operations where data is written

or deleted from the hard disk.

Then we define two global objects: a synthetic disk image

file where all simulated actions are performed on and a file

pool which can be used as source for files to write on the disk

image.

Based on the scenario properties we specify a sequence of

three processes with different actions and action transitions

which can be used to generate a disk image suitable for testing

file carvers:

1) Linear process• Action la11: Create File System

2) Markov process• Action ma11: Create File• Action ma12: Download File• Action ma13: Delete File• Action ma14: Write Raw Data

3) Linear process• Action la21: Export Disk Image• Action la22: Export Disk Image Map

Fig. 3 gives an overview about the model including the se-

quence of the three processes and possible transitions between

the actions of the Markov process.

��%�

��%% ��%"

��%'

��%%

��"% ��""

+�����,�����

����-,�����

+�����,�����

Fig. 3. Sample model to generate a synthetic disk image for the describedscenario

Note that the creation of a file system and using file system

operations like Create File or Delete File are not really

required to generate a disk image which fits our scenario

– just using the Write Raw Data action with a suitable

data source would be enough. However, using a file system

502

Page 6: [IEEE 2013 Eighth International Conference on Availability, Reliability and Security (ARES) - Regensburg, Germany (2013.09.2-2013.09.6)] 2013 International Conference on Availability,

and file system operations makes storing files on the disk

image significantly less difficult since we are not required to

handle an overhead like file position offsets, file sizes, etc. by

ourselves. Additionally, using a file system provides a realistic

file-system specific fragmentation.

The last steps of building the model are deriving suitable

probabilities for the actions in the Markov process as well as

calculating the action transition probabilities. For the actions

ma11, . . . ,ma14 of the Markov process we define the proba-

bilities π1, . . . , π4 as elements of the state probability vector

π as follows:

π = (π1, . . . , π4) = (0.2 , 0.4 , 0.333 , 0.067)

Note that these probabilities are exemplary and not based on

any statistics of hard disk operations caused by user behavior

or likewise. We plan to derive the probabilities from real-world

observations in our future research.

Based on the state probability vector π we use our frame-

work to calculate feasible transition probabilities and define

a state transition probability matrix P which satisfies the

previously mentioned requirement π = π · P :

P =

⎡⎢⎢⎣0.067 0.799 0.067 0.0670.134 0.067 0.732 0.0670.386 0.48 0.067 0.0670.067 0.799 0.067 0.067

⎤⎥⎥⎦

Note that this specific matrix P is also exemplary and no

unique solution. By calculating P we complete the model

building process. Fig. 4 shows a screenshot of the model

builder interface of our framework using the described ex-

ample.

C. Simulation

After the model has been built with our framework the

simulation process has to be configured, i. e. the properties

for each action, subject, and global object has to be defined.

This includes e. g. defining how many actions a subject should

perform during the simulation or specifying a source for files

to be written on the disk image file during a Create File action.

Finally, when the configuration has been finished the syn-

thetic disk image generation process, i. e. the simulation, can

be started and then runs without requiring any user interaction.

When the simulation finishes a synthetic disk image file has

been generated with the properties specified by the scenario-

based model.

VIII. EVALUATION

To evaluate our framework we used the previously described

model for the sample scenario with a slight modification: We

exchanged the Download File action with a second Create Fileaction to avoid potential delays or timeouts while downloading

files from Internet sources.

For the simulation process we used the following configu-

ration:

• Synthetic disk image file size: 1500 MB

1 2 3 4 5 6 7 8 9 10

0

20

40

3136

44

34 34 3237

32

3835

Simulation run

Tim

ein

seco

nds

(a) Required time in seconds for each simulation run

1 2 3 4 5 6 7 8 9 10

0

20

40

60

80

100

46

61

87

50 51 48

66

43

68

46

Simulation run

Use

ddis

ksp

ace

inper

cent

(b) Used space of the generated synthetic disk image after each simulation run

1 2 3 4 5 6 7 8 9 10

0

100

200

167 187

232

180

165

161

214

152

203

157

140

112

70

115

128

133

95

137

107 138

Simulation run

Num

ber

of

file

s

(c) Number of allocated (light gray) and deleted files (dark gray) on thegenerated synthetic disk image after each simulation run

Fig. 5. Evaluation results for 10 simulation runs where a synthetic diskimage was generated (average values shown as gray line)

• File system to be used: FAT32

• File pool location: 2 directories with a total of 2219 files

For the Write Raw Data action we used the following

configuration:

• Data source: 1 random file from file pool

• Number of fragments to write: between 2 and 10, chosen

randomly

• Size of each fragment: between 32 bytes and 16 KiB,

chosen randomly

• Location on the disk image to write the data to: chosen

randomly, excluding areas containing file system meta

data

For the Create File actions the same file pool was specified

as source.

503

Page 7: [IEEE 2013 Eighth International Conference on Availability, Reliability and Security (ARES) - Regensburg, Germany (2013.09.2-2013.09.6)] 2013 International Conference on Availability,

Fig. 4. Screenshot of the model builder interface of the framework

All file system operations were perfomed using specific li-

braries which we developed for our framework. By the writing

of this paper, our framework supports read/write operations for

ext2, FAT16, and FAT32.

With this configurations we did 10 simulation runs and

measured the required time, the used disk space of each

resulting synthetic disk image, and the number of allocated

and deleted files on each disk image. All simulation runs

where performed on a PC with 2.8 GHz dual-core CPU, 4

GiB system memory, running Ubuntu Linux. The individual

results for each simulation run are shown in Figs. 5(a), 5(b),

and 5(c).

We were able to completely finish each simulation run in

about 35 seconds on average. After each run an individual

synthetic disk image was created, containing an average of 182

allocated and 118 deleted files, where the former were using

57% of the image disk space. This shows that a synthetic disk

image containing a reasonable amount of data can be generated

in short time with our framework.

IX. CONCLUSION

In this work we provided a short overview about the current

state of forensic tool testing. We described the advantages of

using synthetic data for testing instead of real-world data and

showed that only little work has been done regarding synthetic

test data generation. To support digital forensic tool testing

we proposed a framework which allows generating synthetic

disk images suitable for testing. In order to use the framework

a real-world scenario has to be chosen initially in which

users perform actions which result in suitable test data, e. g. a

computer system where users create, delete, and modify files

on a hard disk. Our framework provides functionality to build

a model for such a scenario and is able to simulate the actions

contained in the model. By that a synthetic disk image can be

created which has the characteristics specified in the model.

Additionally, an image map can be created which contains

detailed information about the content of each single fragment

of the disk image. The synthetic disk image can then be used

to test the performance of disk analysis and data recovery tools

while the image map provides a reference for comparing tool

analysis results.

Additional work has to be done regarding modeling relevant

real-world scenarios. Especially deriving realistic probabilities

for the different actions within a model is important. A survey

about actions performed by users of different file systems

and operating systems could be interesting and may help to

significantly improve the model building process.

REFERENCES

[1] National Institue of Standards and Technology (NIST), “ComputerForensic Tool Testing (CFTT) Program,” 2013. [Online]. Available:http://www.cftt.nist.gov/

[2] ——, “Computer Forensics Tool Testing Handbook,” 2012. [Online].Available: http://www.cftt.nist.gov/CFTT-Booklet-Revised-02012012.pdf

[3] Y. Guo, J. Slay, and J. Beckett, “Validation and verification of computerforensic software tools–searching function,” Digital Investigation, vol. 6,no. Supplement 1, pp. S12–S22, 2009, the Proceedings of the NinthAnnual DFRWS Conference.

[4] Y. Guo and J. Slay, “A Function Oriented Methodology to Validateand Verify Forensic Copy Function of Digital Forensic Tools,” in Inter-national Conference on Availability, Reliability, and Security (ARES).IEEE, 2010, pp. 665–670.

504

Page 8: [IEEE 2013 Eighth International Conference on Availability, Reliability and Security (ARES) - Regensburg, Germany (2013.09.2-2013.09.6)] 2013 International Conference on Availability,

[5] S. Garfinkel, “Lessons learned writing digital forensics tools and man-aging a 30TB digital evidence corpus,” Digital Investigation, vol. 9, pp.S80–S89, 2012.

[6] C. Moch and F. Freiling, “The forensic image generator generator(forensig2),” in IT Security Incident Management and IT Forensics,2009. IMF’09. Fifth International Conference on. IEEE, 2009, pp.78–93.

[7] C. Moch and F. C. Freiling, “Evaluating the Forensic Image GeneratorGenerator,” in Digital Forensics and Cyber Crime. Springer, 2012, pp.238–252.

[8] B. Carrier, “Digital Forensics Tool Testing Images,” 2010. [Online].Available: http://dftt.sourceforge.net/

[9] Y. Yannikos, F. Franke, C. Winter, and M. Schneider, “3LSPG: Forensictool evaluation by three layer stochastic process-based generation ofdata,” in Computational Forensics: Fourth International Workshop,IWCF 2010, ser. Lecture Notes in Computer Science. Springer Berlin/ Heidelberg, Nov 2010.

[10] Y. Yannikos, C. Winter, and M. Schneider, “Synthetic Data Creation forForensic Tool Testing: Improving Performance of the 3LSPG Frame-work,” in Seventh International Conference on Availability, Reliabilityand Security (ARES). IEEE, 2012, pp. 613–619.

505