22
Collecting, Searching and Sorting evidence Introduction Recovering data is the first step in analyzing an investigation’s data Recent studies: big volume of data Law enforcement: Crime investigation Police: Cyber Security and Technology Crime Bureau Commercial sector: Data breaches: theft of corporate data pwC HK: Forensic Technology solutions 2 Introduction Each suspect in a criminal case: 5 hard disks, 140 CDs or DVDs 4 memory cards and USB sticks Business cases Data: 31 hard disks, 14 terabytes for one case FBI’s regional computer forensics lab 2013: 5973 TBs of data from 7273 exams Audit report: 1566 outstanding cases (2015): 57% waited bet 91 days and over 2 years 3 Introduction Recent studies: anti-forensics tools delete files, overwrite clusters multiple times, create large volume of data of certain types Discussion: Collecting evidence: file system, file deletion Techniques for recovering files Existing tools and challenges 4

Introduction Collecting, Searching and Sorting evidencenflaw/EIE4114Sem12019-20/part2s.pdf · Recovering data is the first step in analyzing an investigation’s data ... E.g., hard

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction Collecting, Searching and Sorting evidencenflaw/EIE4114Sem12019-20/part2s.pdf · Recovering data is the first step in analyzing an investigation’s data ... E.g., hard

Collecting, Searching and Sorting evidence

Introduction Recovering data is the first step in

analyzing an investigation’s data Recent studies: big volume of data Law enforcement:

Crime investigation Police: Cyber Security and Technology Crime

Bureau Commercial sector:

Data breaches: theft of corporate data pwC HK: Forensic Technology solutions

2

Introduction Each suspect in a criminal case:

5 hard disks, 140 CDs or DVDs 4 memory cards and USB sticks

Business cases Data: 31 hard disks, 14 terabytes for one case

FBI’s regional computer forensics lab 2013: 5973 TBs of data from 7273 exams

Audit report: 1566 outstanding cases (2015): 57% waited bet 91 days and over 2 years 3

Introduction

Recent studies: anti-forensics tools delete files, overwrite clusters multiple times,

create large volume of data of certain types Discussion:

Collecting evidence: file system, file deletion Techniques for recovering files Existing tools and challenges

4

Page 2: Introduction Collecting, Searching and Sorting evidencenflaw/EIE4114Sem12019-20/part2s.pdf · Recovering data is the first step in analyzing an investigation’s data ... E.g., hard

storage media Media where data is stored for long-term

preservation C.f.: primary memory (RAM, cache memories short term storage)

E.g., hard drives, USB flash drives, memory cards

physical size of storage media E.g., C: partition: 200 GB, but physical size is

250GB another partition?5

Example A hard drive contains partitions, a

partition contains a file system: File system: structure used to control how

data is stored Common file systems:

Ext4: common on Linux NFS: common for network storage FAT32: common on surveillance video and thumb

drives NTFS: windows system 6

File Storage

Files are stored in file system Files: sequence of binary data (bits and

bytes) Data is stored in clusters or blocks Blocks corresponding to a file may be

Stored contiguously on disk Split and stored all over the disk

7

Example: NTFS

NTFS file system begins with a metadata called the partition boot sector Partition boot sector contains the master file

table (MFT): dictionary of all files and folders on the NTFS partition For each file or folder, the MFT record contains

info about the name and the actual file data. MFT record describes what clusters on the hard

drive that house the file8

Page 3: Introduction Collecting, Searching and Sorting evidencenflaw/EIE4114Sem12019-20/part2s.pdf · Recovering data is the first step in analyzing an investigation’s data ... E.g., hard

Example: NTFS

Create a file: Get a MFT record Small file stored in MFT Large file allocate clusters and store the

file in clusters Delete file delete MFT record only

9

Example: FAT Storage Files: f1.doc, f2.txt, f3.jpg

10

Filename Starting block

f1.doc 102

f2.txt 106

f3.jpg 110

Root table entriesFAT

Block Next block101 Free102 103103 104104 105105 108106 107107 EOF108 109109 EOF110 111

Deleted a file

entry in the file system is updated to indicate its deleted status

clusters that were previously allocated for storing become unallocated and can be reused to store a

new file But: data are left on the disk until a new file

overwrites them

11

Example: File deletion Delete f1.doc

12

Filename Starting block

?1.doc 102

f2.txt 106

f3.jpg 110

Root table entries

FATBlock Next block101 Free102 Free103 Free104 Free105 Free106 107107 EOF108 Free109 Free110 111

Page 4: Introduction Collecting, Searching and Sorting evidencenflaw/EIE4114Sem12019-20/part2s.pdf · Recovering data is the first step in analyzing an investigation’s data ... E.g., hard

Example: File deletion Delete f1.doc

13

FATBlock Next block101 Free102 Free103 Free104 Free105 Free106 107107 EOF108 Free109 Free110 111

Contents of f1.doc have not been deleted

Major types of file structures

14

Contiguous: stored in blocks in a logical order of sequence

Fragmented: One or more chunks are not stored in a

sequential order (happens when files are added, deleted or modified)

Linear (logical order), non-linear Partial files:

Incomplete files: some portion of the files are unavailable (overwritten by other data)

Example:

15

A:B:C:D:

Major types of file structures

16

Embedded files: Contents of one file are added or stored inside

another file: JPEG inside a word document File systems become large

Large hard disks: inexpensive, common Huge number of files and fragments

Individual files usually lightly fragmented Causes of fragmentation

Low disk space Append more data to an existing file

Page 5: Introduction Collecting, Searching and Sorting evidencenflaw/EIE4114Sem12019-20/part2s.pdf · Recovering data is the first step in analyzing an investigation’s data ... E.g., hard

Major types of file structures

17

Studies: 6% of all files recovered were fragmented

Always perform disk allocation which minimizes file fragmentation to reduce seek time and improve file system performance

File types of forensic interest (AVI, JPG,…) higher fragmentations than file types of little interest (BMP, TXT,…) JPEG: 16%, AVI: 17% PST: 58% (email, outlook) Word doc: 17%

Mobile devices Android applications:

Facebook, Twitter, whatsapp, WeChat, Chrome, Gmail, …

Seldom fragmented: exe files (.apk, .dex,)

Serious fragmented: database files (.db, .db-journal, .wal)

Evidence collection Search evidence in the complete file

system, including recovering those deleted files

File carving: Recovery of file fragments from a digital

storage device without the assistance from the file system

Scanning the raw bytes of the disk and reassembling them

19

Evidence collection File carving:

Possible even if the file system metadata has been completely destroyed

Possible even if the files are deleted Delete: means removing the knowledge of where

the file is, but not removing the file content Possible to recover files with file name

renamed to “hidden” what the file actually is Possible to recover data that is embedded

into another file (JPEG inside a doc) 20

Page 6: Introduction Collecting, Searching and Sorting evidencenflaw/EIE4114Sem12019-20/part2s.pdf · Recovering data is the first step in analyzing an investigation’s data ... E.g., hard

Techniques for File carving Tools have been developed to automate

the process of carving for various file types foremost, scalpel and DataLifter, PhotoRec Specialized forensic tools: EnCase, FTK, X-

ways Can be used to extract files from physical

memory dumps from mobile devices and from raw network traffic

21

Tools for file carving

Need to understand how the tools carve files Not a substitute for knowledge Understand limitations of tools

22

Techniques for File Carving Header-footer

Recover files based on known header Used in EnCase, Foremost, Scalpel

File Structure Header-footer + internal layout of a file Use in Foremost, PhotoRec

Content-based (Semantics)

23

Header-Footer Carving Most basic carving technique Steps

Scan for the header of a file type Once found, scan for the file type’s footer File = bytes between header and footer copy

byte-by-byte

24

Page 7: Introduction Collecting, Searching and Sorting evidencenflaw/EIE4114Sem12019-20/part2s.pdf · Recovering data is the first step in analyzing an investigation’s data ... E.g., hard

Examples of File Signatures

25

Header Footer File typeFFD8FF FFD9 Jpg, jpeg424D BMPFFFB MP3 without ID3

tag494433 MP3 with ID3 tag

D0CF11E0 Doc52494646 Wav25504446 Pdf474946383761 003B GIF

26

File signature: www.garykessler.net/library/file_sigs.html 27

Header-Footer Carving

Problem: Header/footer markers: short

May produce many results (false positives) Cannot handle fragmented/partial files Cannot carve files without fixed headers

(text/html)

28

Page 8: Introduction Collecting, Searching and Sorting evidencenflaw/EIE4114Sem12019-20/part2s.pdf · Recovering data is the first step in analyzing an investigation’s data ... E.g., hard

Fragmented example

29

Variations Estimate the file size through various means

Header-maximum file size carving Fixed the number of bytes in file carving after locating

the header Header-embedded file size carving

Find out the file size through the information available in the “header”

30

Header-Maximum File Size Carving

Carve a fixed no of bytes from the beginning of a possible file

Steps Scan for the header of a file type Extract a fixed no of bytes

Size determined by trial and error

Can be useful for files with footers JPEG: store thumbnails within the image

Thumbnail: another JPEG Are not affected if additional data is appended to

the end of JPEG31

Header-Maximum File Size Carving

Same problem as header-footer carving

Always return results much larger than the original file Manual process to discard additional data

If the guess for the maximum size is too small carved incorrectly

32

Page 9: Introduction Collecting, Searching and Sorting evidencenflaw/EIE4114Sem12019-20/part2s.pdf · Recovering data is the first step in analyzing an investigation’s data ... E.g., hard

Header-Embedded File Size Carving

Many files: embed info about the file size in the first few bytes Find out the size of the file by analyzing the

embedded info Steps

Scan for the header of a file type Determine the file size by reading the bytes extract

33

Header-Embedded File Size Carving

34

File Structure Based Carving

Carve by using knowledge about the internal file structure Metadata Header, footer, identifier strings, size info, etc

Can be used to detect cases of fragmentation if the file structure data is detailed and extensive

35

Example: File structure JPEG file

Header Start of image: FF D8

EXIF info Start of image data

A series of sections End of image data Footer

End of image data (FF D9)

36

Page 10: Introduction Collecting, Searching and Sorting evidencenflaw/EIE4114Sem12019-20/part2s.pdf · Recovering data is the first step in analyzing an investigation’s data ... E.g., hard

File structure PNG file

Header byte Size of the next

section IHDR: identifier of

the next section 12 bytes:

unstructured data

37

Challenges in File carving Original file may be fragmented

carving process that assumes all portions of the file was stored contiguously on the disk will fail

salvaging fragments of multiple files and incorrectly combining them into a single container

Content-based carving Main idea: read individual block and analyze its

contents to find out if it belongs to a particular file

38

Content-based Carving Main idea:

Fragmentation can occur only at block boundaries Block: size of the smallest data unit that can be

written to a storage media (sector or cluster size) One block one single file

Information entropy Entropy: measure of randomness Large changes in entropy

Indicate that the sector belongs to a different file

39

Entropy Example 1: tossing a coin: Possible outcomes: head/tail Prob(head) = Prob(tail) = Entropy = 2

1

logN

n nn

p p

Page 11: Introduction Collecting, Searching and Sorting evidencenflaw/EIE4114Sem12019-20/part2s.pdf · Recovering data is the first step in analyzing an investigation’s data ... E.g., hard

Entropy Example 2: In a bin, there are four different

colored balls: red, yellow, blue and green. There are 9 red color balls, 1 yellow color ball, 1 blue color ball and 1 green color ball

Entropy = 21

logN

n nn

p p

Entropy Example 3: In a bin, there are four different

colored balls: red, yellow, blue and green. There are 3 red color balls, 3 yellow color balls, 3 blue color balls and 3 green color balls

Entropy = 21

logN

n nn

p p

Sliding entropy Sliding window

Measure average value of the bytes Entropy formula:

N: total number of different values Pn: probability of the n-th value

43

21

logN

n nn

p p

Sliding entropy Sliding window

Measure average value of the bytes Bytes: 8 bits: values = 0 to 255 Entropy: 0 – 8

4 – 6: Text and HTML blocks 7 – 8: zip and JPEG blocks

44

Page 12: Introduction Collecting, Searching and Sorting evidencenflaw/EIE4114Sem12019-20/part2s.pdf · Recovering data is the first step in analyzing an investigation’s data ... E.g., hard

Studies txt and jpg

45

Studies Mp3 file, zip version, encrypted version

46

Example

47

Sliding entropy Calculate the entropy of the block of the

data If the block contains compressed data

entropy of these blocks would be similar If a sudden in entropy that block doesn’t

belong to PNG image data

48

Page 13: Introduction Collecting, Searching and Sorting evidencenflaw/EIE4114Sem12019-20/part2s.pdf · Recovering data is the first step in analyzing an investigation’s data ... E.g., hard

Example: sliding entropy Block: 11619 Block: 11820

49

Example: sliding entropy Remove the section where the

entropy drops:

50

Data inbetween zip files

51

Current Research approach 1

52

Stage 1: Header/footer

Stage 2:

Complete JPEG file for segment 2

Page 14: Introduction Collecting, Searching and Sorting evidencenflaw/EIE4114Sem12019-20/part2s.pdf · Recovering data is the first step in analyzing an investigation’s data ... E.g., hard

53

Stage 3: Decoding to RGB

Stage 4: Fragmentation

point high CED value

boundary nearbyCED ED ED

Boundary: RGB values of pixels on both sides of the boundaryNearby: RGB values of pixels on one side of the boundary

Current Research approach 1

54

Stage 6: Aim: construct

from header to footer

Join segments together

Current Research approach 2

55

Graph approach Assume all file

clusters are randomized

Step 1: identify headers/footers

Current Research approach 2

56

Step 2 For each header, find

the best match (using similarity) Similarity calculation

would depend on the content of the cluster Image file: check block

similarity Text file: check word

likelihood

Page 15: Introduction Collecting, Searching and Sorting evidencenflaw/EIE4114Sem12019-20/part2s.pdf · Recovering data is the first step in analyzing an investigation’s data ... E.g., hard

Current Research approach 2

57

Probability/likelihood

Comparison of different methods

Lots of different tools/methods for file carving

Performance comparison: Carving quality Memory and space used

Terminology: Positive: a file that is correctly carved from

the dataset58

Quality Terminology:

False positive: a carving result which is not a positive

False negative: a file that is present in the dataset, but was not carved

59

Yes No

Yes Positive False positive

No False negative

In datasetRecovered

Quality Recall: proportion of the files is recovered

Precision: proportion of the recovered files is correct

F measure: control user’s preference on recall and precision

Recall tptp fn

Precision tptp fp

1Fmeasure 1 11

P R

Page 16: Introduction Collecting, Searching and Sorting evidencenflaw/EIE4114Sem12019-20/part2s.pdf · Recovering data is the first step in analyzing an investigation’s data ... E.g., hard

Example

Consider there is a total of 10,000 files. Out of these 10,000 files, there are 100 files that are fragmented. Suppose that a tool reports that there are 200 fragmented files. However, only 60 are correct. Determine Recall Precision Accuracy

Performance Analysis Public datasets:

FAT carving test dataset (15 files) dftt.sourceforge.net/test11

DFRWS 2006 challenge image (32 files) dfrws.org/2006/challenge

Basic data carving test: http://dftt.sourceforge.net/test11/index.html

Simple datasets good results Complex datasets poor results

Fragmentation of files: major impact 62

Tools comparison Look at

Percentage of files recovered The correctness and reliability of tool output Processing speed of the tool

Requirement: Process roughly 100GB data per day 1.16 MB per second

Handle less than 0.58 MB Impractical

63

Tools comparison: example 1 datasets

Basic data carving test: http://dftt.sourceforge.net/test11/index.html Contains

Valid doc, jpeg, wav, pdf, zip, gif, doc, xls files Invalid jpg file (header has been modified) Deleted ppt, wmv files

Contains only contiguous files E.g., PhotoRec: can find all, except invalid jpg files

(bec header info is not correct)64

Page 17: Introduction Collecting, Searching and Sorting evidencenflaw/EIE4114Sem12019-20/part2s.pdf · Recovering data is the first step in analyzing an investigation’s data ... E.g., hard

Tools comparison: example 1

65

Tools comparison: example 2 Another test set

http://old.dfrws.org/2006/challenge/layout.shtml

Contains Jpeg, zip, html, txt, word files

Fragmented files

66

Tools comparison: example 2 Example:

One JPEG non-fragmented One JPEG non-fragmented, larger than a typical

default max file size One JPEG non-fragmented, but sector before it has

0xffd8 in the first two bytes One JPEG fragmented with text in between One JPEG fragmented with a Word document in

between One JPEG fragmented with random data in between

67

Tools comparison : example 2 Example

One JPEG fragmented with a JPEG in between Two JPEGs that are intertwined One JPEG non-fragmented that is REALLY big One JPEG fragmented with singe sector in between

that starts with 0xffd9 E.g., PhotoRec:

Performance drops because the dataset is more complicated Contiguous + fragmented files

68

Page 18: Introduction Collecting, Searching and Sorting evidencenflaw/EIE4114Sem12019-20/part2s.pdf · Recovering data is the first step in analyzing an investigation’s data ... E.g., hard

Tools comparison : example 2

69

Tools comparison : example 2

70

General Findings MPEG, ZIP:

Difficult to carve bec of common header values

Scalpel: header-based carving PhotoRec: structure-based carving Contiguous files: good performance Fragmented: not easy

71

New approaches for carving

Page 19: Introduction Collecting, Searching and Sorting evidencenflaw/EIE4114Sem12019-20/part2s.pdf · Recovering data is the first step in analyzing an investigation’s data ... E.g., hard

New approaches for carvingUse of file carving to solve Data hiding

conceal a file: change its name to mislead digital investigators Renaming an illegal photograph from xxx.jpg to

xxx.exe Need to check the file header (file signature) The file xxx.exe that has a JPEG header (FF

D8 FF) will be correctly recognized as a graphics file

74

Steganography?

It hides info inside image files Two types: insertion and substitution Insertion

Hidden data is not displayed when viewing the original file Need to analyze the data structure carefully

75 76

Hidden message

Page 20: Introduction Collecting, Searching and Sorting evidencenflaw/EIE4114Sem12019-20/part2s.pdf · Recovering data is the first step in analyzing an investigation’s data ... E.g., hard

Steganography?

Substitution Replaces bits with other bits of data Usually change the last two LSBs (least

significant bit)

77

Original pixel Altered pixel

1010 1010 1010 1001

1001 1101 1001 1110

1111 0000 1111 0011

0011 1111 0011 1100

Steganography Detect variations of the graphic image

When applied correctly you cannot detect hidden data in most cases

Check to see whether the file size, image quality, or file extensions have changed

Clues to look for: Duplicate files with different hash values Steganography programs installed on suspect’s

drive 78

Data and File carving

DeepSound: http://jpinsoft.net/DeepSound is a steganography tool and audio

converter that hides secret data into audio files. The application also enables you to extract secret files directly from audio files or audio CD tracks.

General Guideline

Page 21: Introduction Collecting, Searching and Sorting evidencenflaw/EIE4114Sem12019-20/part2s.pdf · Recovering data is the first step in analyzing an investigation’s data ... E.g., hard

Device Status Difference in data collection:

Power off: Find data stored on static memory (e.g., hard drive) Principle: eliminate any chance of modifying the actual

evidence Write blockers are used between the digital evidence and the

computer Create a disk image: bit by bit copy of the original data (Hash

comparison)

Power on: Able to collect volatile data

Device Status Difference in data collection:

Power on: Can examine if any of the active heard drives are

encrypted or not Full disk encryption: all data on hard drive is encrypted

when the device is off Good chance to collect unencrypted data

Conducting Live Investigation

Document what is visually present on screen Active programs (active windows, data/time

settings, log files?) Collect volatile data

Memory is a good source for finding passwords and data from encrypted communication in plain text Use memory capturer (e.g., FTK Imager)

Conducting Live Investigation

Find out computer install date, OS version, list of users, registered owner, …

Find out time zone info and clock settings Find network drive maps, or remote storage

media Data in Hard drive: e.g., pagefile, hiberfile etc

Pagefile: used by computer when it needs to swap parts of the working memory and dump them somewhere else (browser artifacts)

Hiberfile: saves current machine state when computer is in hibernation

Page 22: Introduction Collecting, Searching and Sorting evidencenflaw/EIE4114Sem12019-20/part2s.pdf · Recovering data is the first step in analyzing an investigation’s data ... E.g., hard

Summary Collecting evidence:

file storage clusters fragmentation

file deletion Data remains in clusters

Techniques for recovering files Header-footer, file structure, content-based approach

Existing tools and challenges

85

Summary Bec of the large volume of data

Investigator: analyze data and understand inter-relationships

Gold standard: analyze all files to ensure nothing is overlooked

Now: “intelligence-based”: subset of files are analyzed dependent upon the intelligence provided to the investigator Not find every piece of evidence, rather sufficient

evidence to determine innocence of guilt

86

Summary File Carving vs Keyword searching:

Looks for data that fits into known file structures and interprets that data in light of these structures

Search for content that matches one or more keywords or keyword patterns

Find structures matching known structures vs Find data matching known data

87