View
832
Download
0
Category
Tags:
Preview:
DESCRIPTION
this presentation is taking about lessons learned in digital forensics tools development
Citation preview
Lessons Learned in digital forensics
Writing digital forensics(DF) tools is difficult because of the diversity of data types that needs to be processed, the need for high performance , the skill set of most users, and the requirement that the software run without crashing. Developing this software is dramatically easier when one possesses a few hundred disks of other people’s data for testing purposes. This paper presents some of the lessons learned by the author over the past 14 years developing DF tools and maintaining several research corpora that currently total roughly 30TB.
Abstract
As the field of digital forensics (DF) continues to grow, many people find themselves engaged in the once obscure practice of writing DF software.Few of today’s forensic tool developers have formal training in software development or design-many do not even see themselves as programmers . They say that they are writing “scripts” not programs.
Introduction
Meaning of digital forensics software
Software that is used to analyze disk images, memory dumps, network packet captures, program executable , office documents , web pages and container files.
1-criminal investigations2-internal investigations.
3-audits.
All of which have different standards for chain-of-custody , admissibility , and scientific validity.
The use of DF tools
Hackers hide data in several ways
In images using image watermark and steganography techniques but can be caught by artifacts , copy forge techniques and analyzing data patterns.In unallocated partitions in hard disk and remapped bad sectors or using alternate data stream (ADS) like C:\notepade file.txt:hide (here any data hidden won't be shown in the file size)
In order to delete files securely for good you need to use Gutmann algorithm for writing 35 times random patterns
Distinct Sector Hashes for Target file detection
Hashing files to check for file changesHashing sectors to discover changes in file segmentHashing algorithm depends on probability so it won't hash the whole drive because of the processing time requiredLooking for distinct hashes and repeated file patterns using Government data,Openmalware to detect malwares for known hashes.Algorithm using urn statistic problem for finding sectors that need to be inspected like finding red beans out of red and black beans of the urn.
Finding distinct and repeated hashes in hard disk sectors
Using different data structures and testing the speed for the file system
Network forensics challenges : Cloud computing challenges needed new tools
New frontiers in network intrusion starting from the firewall Emerging Network forensic areas:
Social networks Data mining
Digital imaging and data visualization
Network forensics
Applying network forensics in critical infrastructures
BotnetsWireless networks still lacking good forensic tools
Sink holes:accept,analyze and forensically store attack traffic
Installs forensic tools at layers 0-2
SCADA (Supervisory control and data acquisition) Challenges
Smart phone security challenges
Smart phone threat model showing malware spreading from the application layer to the communication and finally to the resource layer.while malware hijacks the phone resource and send multimedia messages to premium accounts
The challenge of data diversity1-processing incomplete or corrupt data.2-Why data will not validate?3-Windows inconsistencies.4-Eliminate data that are consistent.
Data Scale challenges1-The amount of data.2-Applying big data solutions to DF.
Lessons in digital forensics
Sub-linear algorithms for reading sectors
One solution to the performance bottleneck is to adopt sub-linear algorithms that operate by sampling data. Sampling is a powerful technique and can frequently find select data on a large piece of media with a high degree of precision.
But sampling cannot prove the absence of data: the only way to establish that there are no written sectors on a hard drive is to read every sector.
Temporal diversity: the never-ending upgrade cycle
Many computer users have learned that upgrades are a disruptive process that need to be carefully managed. As a result, many organizations run out-of-date operating systems and only move to newer ones when they buy new hardware.1-Upgrading forensics tools2-Software Versions to be upgraded3-Encase forensics tool4-Intelligent forensics tools
Human capital demands and limitations
1-It was found that users of DF software come overwhelmingly from law enforcement, with little or no background in computer science. They are generally deadline-driven and over-worked. 2-Examiners that have substantial knowledge in one area (e.g., NTFS semantics) will routinely encounter problems requiring knowledge of other areas (e.g., Macintosh malware) for which they have no training.3-developers also with skills like opcodes, multi-threading,Organization of processes and operating system data structures, networking and supercomputing.
Hard to recover data in realityHard to recover data from Hard diskRecovering data from hard drives typically involves decoding data that is fragmented or partially overwrittenFunding problemsThe differences between Windows Explorer and EnCase Forensic are not obvious to the uninitiated. DF is a difficult process that looks easy. This is not a good formula for continued funding.
The CSI Effect
Lessons learned managing a research corpus
This project started in 1998 and has expanded to include data from hard drives, cell phones, digital cameras and other devices. Today the corpus includes nearly a million redistributable filesdownloaded from US Government web servers, disk images from thousands of hard drives purchased around the world, and several terabytes of “realistic” scenarios manufactured by students.
Corpus management --technical issues
1-Imaging ATA drives Lesson: read the documentation for the computer that you are using. Lesson: make the most of the tools that you have and follow the technical innovations they force upon you. (Because you are dealing with hard disks with different technologies whether it stream-data processing or bulk data processing for compression , reading file fragments and data segments)
2-Automation as the key to corpus management
Needed a process for capturing the hard disk make,model, serial number. Lesson: automation is key; any process that involves manual record keeping is going to introduce inaccuracies that will be hard to detect and correct. Lesson: useful data will outlive the system in which it is stored, so make provisions to move the data when you design the system.
3-Evidence file formats(customer container file)
Trying to use his own container files did not work well and he had to use standard containers from programs like Encase and FTK.
Lesson: avoid developing new file formats has never been possible. Lesson: kill your darlings.
4-Crashes from bad drives
Causes of crash are many as it could be kernel memory overwritten or faulty drive or transfer to incorrect memory locations.
Lesson: many technical options remain unexplored.
5- Drive failures produce better data
Algorithm1: Developed an algorithm that reads from first sector of the disk to the last sector and then upon encountering errors, would jump to last sector of the hard drive and then repeatedly skip toward the front of the drive , read a few sectors and repeat which works for single error but does not work for multiple errors.Algorithm2: developed a disk imaging program called aimage which implemented a variety of recovery algorithms, such as attempting to repeatedly reread the problematic section; randomly seeking and reading; jumping ahead a few hundred kilobytes at each error, and reading from the last sector toward the first.
Lessons learned
Lesson: Drives with some bad sectors invariably have more sensitive information on them than drives that were in working condition when they were decommissioned.
Lesson: do research, and only to maintain software that implements a particular function when no other software is available.
6- Numbering and naming
Algorithm1: developed an algorithm that was generating files names randomly but was a waste of time. Lesson: Names must be short enough to be usable but long enough to be distinct.
When I started acquiring data outside the US I discovered that the country of origin was the most important characteristic of a disk image. I adopted a naming scheme in which the first two characters are the ISO country code, followed by a two-digit batch number, a dash, and a four digit item number. (For example, CN07-0045 is the 45th disk of the 7th batch acquired from China.) Assigning a batch number allows different individuals in the same country to assign their own numbers.
Lesson: although it is advantageous to have names that contain no semanticcontent, it is significantly easier to work with names that havesome semantic meaning.
7- Path names
• Lesson: place access-control information as near to the root of a path name as possible.
8- Anti-virus and indexing
Lesson: Configure anti-virus scanners and other indexing tools to ignore directories that might contain raw forensic data.
9- Distribution and updates
Lesson: solutions developed by other disciplines for distributing large files rarely work well when applied to DF without substantial reworking.
Corpus management–policy issues
1- Privacy issues Lesson: just because something is legal, you may wish to think twice before you do it.2- Illegal content financial, passwords, and copyrightLesson: never sell access to DF data, even if you have personal ownership.Lesson: understand Copyright Law before copying other people’s data.Lesson: make sure your intent is scientific research, not fraud, so that any collection of access devices you create does not constitute criminal activity. (credit card fraud) 3- Illegal content pornography Lesson: do not give minors access to real DF data; do not intentionally extract pornography from research corpora.4- Institutional Review BoardsLesson: While IRBs exist to protect human subjects, manyhave expanded their role to protect institutions and experimenters.Unfortunately this expanded role occasionally decreases the protection afforded human subjects. And even withthe IRB watching over you, it’s important to watch your back.
Lessons learned developing DF tools
1- Platform and language2- Parallelism and high performance computing3- All-in-one tools vs. single-use tools4- Evidence container file formats
1- Platform and language
1- The easiest way to write multi-platform tools is to write command-line programs in C, C++, C#, Java or Python, as programs written in these languages can easily transfer between the three platforms (Windows, Linux, Mac OS).2-Although C has historically been the DF developer’s language of choice, we have shifted to C++ so that we can use the STL collection and container classes.3-Java has a reputation for being slow especially for high computational applications.4-While it is easy to write programs in Python, experience to date has shown that these programs are slow and memory-intensive.
2-Parallelism and high performance computing
Multithreading and high performance computing not all the times work well because of the communications bottlenecks and a lot of times host computer processor is better than GPUs due to the I/O bottlenecks especially when processing data at many gigabytes per second.
3- All-in-one tools vs. single-use tools
My experience argues that itis better to have a single tool than many: If there are many tools, most investigators will want to have them all. Splitting functionality into multiple tools complicates tool management without providing any real benefit to practitioners. Much of what a DF tools does ---data ingest, decoding and enumerating data structures, preparing a report –is required no matter what kind of output is desired. There is a finite cost to packaging, distributing, and promoting a tool. When a tool has many functions this cost is amortized across a wider base.
4- Evidence container file formats
1-processing inputs in any format. Tools should be allowed to process inputs in any format and transparently handle disk images in raw, split-raw , Encase or AFF formats.
2-With network packets the situation is better, with pcap being the universal format.
Famous digital forensics tools
EncaseFTK
Nuix
Intilla
PTKForensics
MicrosoftCofee
Conclusion
1-Digital Forensics is an exciting area in which to work, but it is exceedingly difficult because of the diversity of data that needs to be analyzed, the size of the data sets, and the mismatch between the technical skills of users and the difficulty of the work.
2-These problems are likely to get worse over time, and our only way to survive the coming crisis is to concentrate on the development of new techniques that leverage our advantage ----the ability to collect and maintain large data sets of other people’s information.
3-in building and maintaining this corpus he encountered many problems that are increasingly relevant to others in the field. This paper describes some of the lessons that I have learned in the course my research in this area.
Recommended