An Automated Timeline Reconstruction Approach for Digital Forensic InvestigationsWritten by Christopher Hargreaves and Jonathan Patterson
Presented by Jason McKenzie
November 8th, 2013
Introduction
Reconstruction: a process in which an event or series of events is carefully examined in order to find out or show exactly what happened (Merriam Webster)
Provenance: the origin or source of something
Low-level PC event: File modification, registry key update
High-level PC event: Connection of a USB device, like a USB stick
Goal: Construct a software prototype using Python to automatically reconstruct a timeline of events using low-level events to infer high-level events and their provenance
Background
Reconstruction is an essential aspect of digital forensics
Key challenge in digital forensics is the large volume of information that needs to be analyzed
Population owns an increasing number of digital devices
There are tools present that automate the extraction process of a digital investigation, and are useful for examining events that have occurred
There is a demand for explaining the sequence of digital events, and a tool to automatically reconstruct the events and produce a timeline is needed
Related Work
Related work is comprised of solutions that incorporate some form of timeline generation (non automatic)
Timelines based on file system times
Uses metadata from file systems to create a timeline
Modified, Accessed, and Created (MAC) times
The Sleuth Kit generates timeline from file activity
Encase creates graphical “Timeline” view
Times that the contents of files are examined are not captured in metadata and presents a limitation
Related Work (continued)
Timelines including time from inside files
Cyber Forensic Time Lab (CFTL)
Extracts system times from FAT and NTFS hard drives and some file types
Has incomplete source information of extracted events
Log2timeline
Has several enhancements and options that when combined could produce a timeline
Carbone and Bean addressed the need for a rich, event filled timeline in their paper “Generating computer forensic super-timelines under Linux” in 2011
Key to creating an event filled timeline is to capture more event times
Related Work (continued)
Visualizations
Encase
Visual Timeline
Zeitline
Imports file system times from other programs through the user of Import Filters
Complex events: events directly imported from system
Atomic events: comprised of atomic and other complex events.
Allows for filtering, searching, and combination of atomic into complex events
Aftertime
Performs enhanced timeline generation
Visualizes results as a histogram
Related Work (continued)
Summary
Importance of recovering times from inside files and using file system metadata
Two key challenges:
Too many events to effectively analyze
Difficult to visualize what is going on in the timeline due to the number of events
Highlighting patterns of activity to indicate areas of interest and maintaining records of source of extracted data is important
Methodology
As expressed previously, large volume of events creates a problem for analysis and an inability to visualize the timeline
To counteract this, an approach to automate the process of combining “low-level” events, into “high-level” events is being researched
By automating the conversion of low-level to high-level events a summary of activity would be produced that would help direct the investigation
To facilitate this, a software prototype was constructed
Methodology (continued)
Should frameworks be expanded to accommodate a timeline reconstruction system?
Would take extensive work to build upon an existing framework, like log2timeline
Best to implement a new framework without having to adjust data structures or adjust for legacy languages
Python 3 is chosen for this project due to readability of code
Design
Overall design
Python Digital Forensic Timeline (PyDFT)
Supports low-level event extraction and high-level event reconstruction
Also supports case management, conversion of different formats for date and time, and basic GUI’s
Design (continued)
Generation of low-level events
Overview
Low-level events are file system times and times extracted from within files
Analysis is performed on a mounted file system NOT a disk based image
Recommended approach is to mount disk image in read-only mode using Linux or Mac OS X
Extraction of file system times
Master File Table ($MFT)
Accessed directly on Linux or Mac OS X using NTFS driver from Tuxera
Created, modified, accessed, and entry modified times from Standard Information Attribute are used to build four events for reach file
Design (continued)
Generation of low-level events (continued)
Times from inside files
Extraction Manager calls GetTimesFromInsideFiles() for any files mounted in the file system and checked for time extractors
If found, extracts information from file pointer, file name, file path
Any time information extracted is added to low-level timeline
Time extractors used are browsing history found in Chrome, Firefox, Internet Explorer; Skype, Windows Live Mail, etc.
Design (continued)
Generation of low-level events (continued)
Parsers and bridges
Parsers: process raw data structures and recover data in a useable form
Bridges: takes information from parsers and maps it to a low-level event object
Design approach makes it easier to accommodate new parsers, and code in the parsers easier to reuse
Design (continued)
Generation of low-level events (continued)
Traceability
If extractor returns a low-level event, it also points to the raw data that produced the event.
Different types of provenance based upon event
Low-level event format
Different events have different provenance and have different fields
Id, date_time_min, date_time_max, evidence, provenance, etc.
Design (continued)
Generation of low-level events (continued)
Backing store for the low-level timeline
A back-end storage is required due to the use of Python classes
SQLite chosen as the backing store and allows for multiple advanced queries
Summary
Extraction manager extracts low-level events that are converted to a standard format and added to timeline
Timeline stored in SQLite
Fields like date/time, provenance, and information about the raw data
Design (continued)
Reconstruction of high-level events
Overview
Use of predetermined rules using plug-in scripts to automatically convert low-level events to high-level events
Basic event matching using test events
SQLite requires knowledge of SQL
By creating a test event with all the conditions of the low-level event it’s possible to add events to the high-level timeline without extensive knowledge SQL queries
Comparison match (not exact match) with test events and low-level events
Matching field values can produce SQL searches for those fields and then create high-level events
Design (continued)
Reconstruction of high-level events (continued)
Matching multiple artefacts
“Test events” serve as triggers and any matches are used to construct a hypothesis of a high-level event
Low-level timeline created in memory for a specific period determined by the analyzer
Analyzer searches for all low-level events occurring in this period
If matches are found are considered supporting artefacts
If matches are not found are considered contradictory artefacts
One ore more high-level events created based upon these artefacts
Design (continued)
Reconstruction of high-level events (continued)
High level event format
Similar to low-level event format
Includes files, trigger_evidence_artefact, supporting_evidence_artefact, contradictory_evidence_artefact
High-level timeline output
Not stored in SQLite
Exports to XML and individual high-level event HTML reporting
Design (continued)
Reconstruction of high-level events (continued)
Summary
Searching timeline through the use of “test events” that have similarities to desired low-level events
One or more match leads to one or more high-level event
Since low-level event information is preserved, it can still point to the raw data that generated the low-level event
Produces two timelines
Low-level event timeline (not very readable)
High-level event timeline (human readable)
Results
Examples of high-level events constructed
Google searches
11:28:30 Google search for ‘how to hack wifi’
USB device connection
“Setup API entry for USB found (VIBL07AB PID:FCF6 Serial:07A80207B128BE08)”
Results (continued)
Visualization
Since there are usually not a large amount of high-level events it’s possible to use a third-party program like Timeflow to display them graphically
In the high-level timeline below there are 2894 low-level events that have occurred (obviously not displayed)
Results (continued)
Performance
Calculations based on Intel Core 2 Duo 2.28-28 GHz and 4-8GB of ram
1 Million events, ~2min per analyzer, 22 analyzers = 44 minutes to process 1 million events
Equivalent to other indexing or searching forensics tools (“start search and walk away”)
No plans to optimize performance
Evaluation
Results section reinforces that the use of “test events” matching low-level events, which is considered “temporal proximity pattern matching”, is effective at creating high-level events automatically
Need to develop more analyzers and time extractors to further reinforce feasibility of “temporal proximity pattern matching”
Need to implement low-level extractors that are currently not available for some aspects of the disk like Recycle Bin
Need to determine if keeping high-level provenance of information is required since the associated low-level provenance is preserved
Evaluation (continued)
Although performance is within limits compared to other forensics tools a bottleneck exists due to each analyzer searching through the timeline linearly for patterns
More analyzers means a greater bottleneck
Needs optimization for multi-core processors
Optimization of SQLite secondary indexing could improve performance
Need to implement a way of verifying target PC’s clock is correct
Need more robust testing of the prototype
Future work
Creation of more low-level event extractors
Creation of more analyzers
Formalizing low-level event information
Inputting data from other tools
Testing of framework against real world data
Adding complexity to analysis scripts, such as Bayesian networks
Development of more robust visual data tools for timelining
Conclusions
Illustrates possibility of pattern matching to automatically reconstruct high-level human-understandable events which then creates a readable visualization of the timeline
Preserves provenance of low-level events
Not to be used to replace a full forensic analysis by an experienced, trained analyst