39
The Evolution of File Carving Presenters: Muhammad Mohsin Butt(g201103010) COE589 Paper Presentation

The Evolution of File Carving

  • Upload
    csilla

  • View
    129

  • Download
    0

Embed Size (px)

DESCRIPTION

Presenters: Muhammad Mohsin Butt(g201103010). COE589 Paper Presentation. The Evolution of File Carving. Contents. Introduction Background Traditional Recovery File Carving Smart Carver Conclusion. Introduction. This Survey presents various File Carving techniques. - PowerPoint PPT Presentation

Citation preview

Page 1: The Evolution of File Carving

The Evolution of File Carving

Presenters:Muhammad Mohsin

Butt(g201103010)

COE589Paper Presentation

Page 2: The Evolution of File Carving

Contents• Introduction• Background• Traditional Recovery• File Carving• Smart Carver• Conclusion

Page 3: The Evolution of File Carving

Introduction

• This Survey presents various File Carving techniques.

• File carving is a forensic technique to recover data based on file structure and content.• No file system meta-data is required

• Main Focus of this paper is on File carving techniques for Fragmented Data.

Page 4: The Evolution of File Carving

Background

• File System• Part of OS that manages the creation,

deletion, allocation various other functions on files.

• FAT 32 and NTFS File Systems are most famous for Windows OS.

• Basic unit of data storage on disks is cluster.

• Clusters are usually multiples of 512 Bytes.

Page 5: The Evolution of File Carving

Background• Recovery In FAT -32(File Allocation Table)• Files can be allocated in different ways.

• Contiguous Allocation.• Linked Allocation.• Indexed Allocation.

Page 6: The Evolution of File Carving

Background• Contiguous Allocation. Linked

Allocation

Page 7: The Evolution of File Carving

Background• Indexed Allocation

Page 8: The Evolution of File Carving

Background• Indexed Allocation

Page 9: The Evolution of File Carving

Traditional Recovery Techniques• These recovery techniques use the met-data of

file system to recover data.• Data Storage in FAT32

Page 10: The Evolution of File Carving

Traditional Recovery Techniques• Deletion and Recovery in FAT32.

Page 11: The Evolution of File Carving

File Carving• What if we don’t have file system meta-

data information ??• File carving recovers data without using

file system information.• Knowledge of Structure of files to be

recovered is used.• File Carving can be divided into two

categories• File Carving for non Fragmented data.• File Carving for Fragmented data.

Page 12: The Evolution of File Carving

File Carving (First Generation)• Performed good for non fragmented data.• In forensics user data (Images, documents

etc) is important to recover.• The search pool is reduced by removing

operating system files which are detected using their MD5 Hash and keywords.

• Byte Sequences at prescribed offsets are used to identify files.

Page 13: The Evolution of File Carving

File Carving (First Generation)• Header and footer information of files to

be recovered is used.• JPEG image header cluster begin with

sequence FFD8.• JPEG image footer cluster contains the

sequence FFD9.

• Some files don’t have footer information.• BMP image has file size, number of clusters

and other info present in header.• Number of unallocated clusters as

indicated by the header of BMP image are merged for recovery.

Page 14: The Evolution of File Carving

File Carving (First Generation)• Foremost tool implemented both header to

footer carving and also carving based on header and size of file information.

• Scalpel built on foremost engine improved the performance and memory usage of this file carving techniques.

• Both these suffer degradation in performance when data is fragmented.

Page 15: The Evolution of File Carving

Fragmentation• As files are edited, modified and deleted,

most hard drives get fragmented.• Also depends on allocation methodology of

file system.• Fragmentation in forensically important

files like email, WORD document etc. is high. Why??• Because of constant editing, deletion and

addition PST files are most fragmented. • Wear Leveling Algorithms in Next Gen

Hard Drives (SSD) also cause fragmentation.

Page 16: The Evolution of File Carving

FragmentationFragmented File Recovery

Page 17: The Evolution of File Carving

Graph Theoretic Carvers.• Provide Recovery of fragmented files.• Recovery is formulated as a Hamiltonian

Path Problem.• Solved using alpha-beta heuristics.

Page 18: The Evolution of File Carving

Hamiltonian Path Problem.• Given a set of clusters.• Find a permutation of these clusters that

recovers the correct file.• Identify pairs that are adjacent in original

document.• Assign weights between clusters which

represent the likelihood one cluster following the other in original file.

• The best permutation is the on that maximizes the candidate weights of adjacent clusters.

Page 19: The Evolution of File Carving

Hamiltonian Path Problem.• Formulated as a graph.

• Vertices represent clusters.• Edges represent weights between clusters.

• Problem Reduces to finding a maximum weight Hamiltonian path in this graph.

Page 20: The Evolution of File Carving

Assigning Weights• Weight assignment is the key in this type

of carving.• Prediction By Partial Matching (PPM)

technique is used for assigning weights.

• PPM is good for Texts.

Page 21: The Evolution of File Carving

Assigning Weights• Weight Assignment in Images

Page 22: The Evolution of File Carving

K-Vertex Disjoint Path Problem.• Hamiltonian Path method assumed that all

the clusters belong to same file.• In actual systems multiple files are

fragmented together.• Headers of various files are identified from

the pool of clusters. • Graph is again formed using weights.• Now K-disjoint paths are found in this

graph using various algorithms where k represents number of headers found in previous step.

• Developed primarily for recovering images.

Page 23: The Evolution of File Carving

K-Vertex Disjoint Path Problem.• Various algorithms to find k disjoint paths.• Unique Path (UP) Algorithms provides best

performance.• Each Cluster is assigned to only one file.• Incorrect assignment may result in two files

incorrectly recovered.• Parallel Unique Path Algorithm.• Shortest Path First Algorithm.

Page 24: The Evolution of File Carving

Parallel Unique Path (PUP).• Variation of dijkstra’s single source

shortest path algorithm.1. Given k headers and a pool of clusters.2. Find the best cluster match for each of

the headers.3. From the matches found in previous step

take the best one and assign it to the header.

4. Remove the chosen cluster from the available clusters pool.

5. Find again the best match for found cluster and repeat the step3 until all files recovered.

Page 25: The Evolution of File Carving

Parallel Unique Path (PUP).

Page 26: The Evolution of File Carving

Shortest Path First• This algorithm presents the idea that best

recoveries have lowest average path costs.• The average path cost is simply the sum of the

weights between the clusters of a recovered file divided by the number of clusters.

• Takes one image at a time.• Reconstruct the image.• After reconstruction the clusters used are not

removed from the cluster pool.• This process is repeated for all the images.• Out of all the recovered images the one with

lowest path cost is assumed as the best recovery.• Clusters associated with the best recovery are

than removed.

Page 27: The Evolution of File Carving

Shortest Path First• This algorithm presents the idea that best

recoveries have lowest average path costs.• The average path cost is simply the sum of the

weights between the clusters of a recovered file divided by the number of clusters.

• Takes one image at a time.• Reconstruct the image.• After reconstruction the clusters used are not

removed from the cluster pool.• This process is repeated for all the images.• Out of all the recovered images the one with

lowest path cost is assumed as the best recovery.• Clusters associated with the best recovery are

than removed.

Page 28: The Evolution of File Carving

Results• Shortest Path First provides an accuracy of

88%• PUP provides an accuracy of 83% but is

faster.• Both require edge weights to be pre

computed.• For large hard drives requirement of

forming weights by checking the likelihood between clusters is a major drawback.

Page 29: The Evolution of File Carving

BiFragment Gap Carving• Most of the real world data is bi-

fragmented. • This technique works for files with known

header and footer.• Files should be decodable or be validated

via their structure.• Works by searching for combinations

between identified header and footer.

Page 30: The Evolution of File Carving

BiFragment Gap Carving

Page 31: The Evolution of File Carving

Smart Carver• Can work on fragmented and non

fragmented data.• Wide variety of file types supported.• Preprocessing

• Data clusters are decrypted or decompressed.

• Collating• Classification of cluster to various file

types.• Reassembly

Page 32: The Evolution of File Carving

Smart Carver (PreProcessing)• Compressed and encrypted drive are

decrypted/decompressed in this stage.• Removing known clusters from the disk

based on file system met-data.• Helps increase the speed and reduce the

amount of data for next phases.• Allocated files and Operating system

specific data can be pruned since it doesn’t have any use in forensics.

Page 33: The Evolution of File Carving

Smart Carver (Collating)• Classifies the disk clusters as belonging to

certain file types.• Reduces the cluster pool in recovery of file

of each type.• Keyword/Pattern Matching

• Looking for sequences to determine the type of cluster.

• E.g. <html> tags in a cluster collates to html file.

• ASCII characters frequency• High frequency of these indicate that data

is non Video or Image.

Page 34: The Evolution of File Carving

Smart Carver (Collating)• File Fingerprints

• Uses Byte Frequency Distribution (BFD) to determine the type of file.

• BFD is generated by creating a histogram for the file.

• A centroid model for each file type is created using the mean and standard deviation of each byte value.

• Still they face problem differentiating JPEG and ZIP

• Still a hot research topic.

Page 35: The Evolution of File Carving

Smart Carver (ReAssembly)• Reassembly can done by

• Finding the starting fragment of a file that contains the header.

• Merging clusters belonging to same fragment.

• Finding the fragmentation point i.e. the last cluster in current segment.

• Starting point of next fragment.• Ending point of last fragment. Last cluster

contating the footer.

Page 36: The Evolution of File Carving

Smart Carver (ReAssembly)• Merging of similar Clusters can be done in

two ways.• KeyWord/Dictionary

• This occurs when a word is formed between the two cluster boundaries.

• E.g. One cluster ends at “he”, second starting at “llo World”. Both can be merged.

• File Structure• File structure can help in merging. Length

field in headers indicate the length of data. E.g. in PNG file if length value is k than after k clusters CRC of data associated is present. If the data in between has same CRC than we can merger all clusters in between. Otherwise fragmentation is present.

Page 37: The Evolution of File Carving

Smart Carver (ReAssembly)• Sequential Hypothesis Parallel Unique Path

Algorithm( SHT-PUP) for reassembly.• Modification of PUP algorithm.• In PUP when best match is found for the

available k headers and out of them the best one is selected.

• The clusters immediately following the newly found clusters are tested using sequential hypothesis testing until a fragmentation point is reached.

Page 38: The Evolution of File Carving

Smart Carver (ReAssembly)• Sequential Hypothesis Testing.

• This is done by using the weight vector. i.e. the weights of all clusters in the pool.

• Two Hypothesis are tested.• One that says the clusters belong in sequence to

fragment• Other says that they don’t.

• The ratio• is used to test the hypothesis.

Page 39: The Evolution of File Carving

Conclusion• Various File Carving methods for

fragmented files are presented in the survey.

• Problem of finding best weight is still an open research issue.