44
ForNet: A Distributed Forensic ForNet: A Distributed Forensic Network Network Kulesh Shanmugasundaram http://isis.poly.edu/projects/fornet/

ForNet: A Distributed Forensic Network Kulesh Shanmugasundaram

Embed Size (px)

Citation preview

ForNet: A Distributed Forensic ForNet: A Distributed Forensic NetworkNetwork

Kulesh Shanmugasundaramhttp://isis.poly.edu/projects/fornet/

Security Fails….

What Mechanisms Do We Have? Logging mechanisms and audit trails

Alerts/Logs from security components: Logs only perceived security threats

Host Logs: First thing to get disabled upon intrusion An insider would rather use a host without any

logs Mobility, wireless networks create new problems

Packet Logs: Usually at the edges hence blinded easily Can’t keep data for long… E.g: Infinistream, NFR, NetWitness, SilentRunner

What do we need? A reliable mechanism for analysis and

attribution

An effective response model Current response model is mostly manual Response time is in days or weeks Digital evidence disappears quickly

Tools that complement security components Need some safety net to fall back Forensics is that net

Our Solution… Securely collect, store, disseminate,

and process synopsis of network traffic.

In other words a device analogous to a surveillance camera for the network. Even better – a co-operating network of such surveillance cameras.

Goal of Project ForNet: development of tools, techniques, and infrastructure to aid rapid investigation and identification of cyber crimes

ForNet: A Unified, End-to-End Approach to Network Forensics

SynApp: equipped routers or hosts. Primary function is to create synopses of network traffic. May have limited query processing and storage component as well.

Forensic Server: Responsible for archiving synopses, query processing & routing, enforcing monitoring, security policies, for the domain.

ForNet Domain: A domain covered by single monitoring and privacy policies.

Design Goals and Trade-Offs

1. “Complete” & “Correct” Evidence2. Longevity 3. Succinctness4. Accessibility of Evidence5. Security & Privacy of Evidence4. Ubiquity5. Incremental Deployment6. Modular 7. Scaleable Design

Research Challenges

Networking , Distributed

SystemsTheory

Systems

Legal Issues

Security

• High-speed packet capture• Storage systems• Databases• GUI

• Legally admissible evidence• Chain of custody

• Privacy

• Secure architecture • Attacks on ForNet

• Use of ForNet

• Design of Synopses• Quantitative Analysis

• Identification of events• Architecture of ForNet• Query Routing• Fault Tolerance

Components of ForNet

Architectural components can be grouped under following functions:

1. Data Collection2. Data Retention3. Data Dissemination

SynAppMonitoring Policy

Query LanguagePanorama

Forensic ServerPrivacy Policy

Architecture of a SynApp

Network Stream

Network Filter

Buffer Manager

Synopsis Controller

HBF

Neoflow

Histograms

Syn

op

sis

En

gi n

e

Synopses Sent ToForensic Server

ConfigurationManager

MonitoringPolicy

SynopsesLibrary

Synopses in ForNet

Data Type

Link Link Content Mappings Aggregates

Potential

Evidence

Protocol Layer IPs,ports, protocol TOS, TTL

Packet packet sizes inter-arrival times

Type of payload

audio, video…

IM Keywords Content

MAC to IP DNS to IP VirtualHost to IP AS-Name to IP

#Packets Delays etc.

Synopses

Connection record Neoflow Header-dump

Neoflow-NG*

HBF Packet-dump

MAC Records DNS Records HTTP Records*

BGP Records*

Histogram SCBF*

* Currently being implemented

Architecture of Forensic Server

ArchiveManager

QueryParser

PrivacyManager

QueryPlanner

XMLGenerator

HB

F

Neofl

ow

DN

S R

ecord

s

His

tog

ram

BG

P T

ab

les

ConfigurationManager

Query FromAn Analyst

Response ToThe Query

Scan

Dete

cti

on

Fin

d Z

om

bie

s

Ste

pp

ing

ston

es

SecurityManager

Forensic Server Forensic Server has the following functions:

Data archiving Query processing Policy management and enforcement

Data Archiving Periodically receives data from SynApps & archives

them on disk.

Forensic Server (cont…) Query Processor:

XML-based query language No relational databases (uses Berkeley DB)

Does coalescence of synopses Given a query can locate appropriate synopses Generate and execute a query plan to answer the query

Distributed query processing Currently being implemented to talk to nearby forensic

servers to answer/corroborate queries/evidence

Policy Enforcer: Two types of policies in ForNet:

Monitoring Policy– informs user of what is being monitored

Privacy Policy – informs user of how/with whom data is shared etc.

A user interface for:

Building queries Data mining Visualization

Architecture of Panorama

CommunicationManager

XML Parser

Plug-inManager

UserInterfaceManager

Chart Generator

Network Graphs

3D Link Maps

Summary Views

Data Cleansing

Miners

Queries,Various Commands to

Forensic Server

Capturing Evidence on The Internet Internet is a gigantic state machine:

To support forensics we need to know its precise state at any given time!

Therefore, we need to keep track of state transitions

State transitions can be characterized by:1. Links/connections between nodes2. Link content3. Protocol mappings and aliases4. Aggregates generated by state transitions

Archiving raw traffic for this information is an overkill!

What is a Synopsis?

Data structures and algorith

ms for

representing a set

of elements

succinctly with predefined loss

in

information and has the abilit

y to

recall the original set of elements

with a preset accuracy.

Properties of a Good Synopsis Contains enough data to answer certain classes of queries

Who sent payload “xyz”? What did host bug.poly.edu send?

Contains enough data to quantify confidence of its answers

I’m 99.37% sure bug.poly.edu sent “XYZ”

Have small memory footprint and easy to update Need 20GB/day to keep 1TB/day of raw network data Need to compute two hashes per packet

Resource requirements are tunable Can only afford 3GB/day, adjust the accuracy to accommodate

this.

Advantages of Using Synopses

Succinct representation of raw data makes it possible to transfer network data to disks

Sharing/transferring raw data over network is impossible but synopsis can be moved to remote sites

Query processing would be expensive with raw data What’s the frequency of traffic to port 80 in the past week?

(raw data vs. a histogram)

Easily adaptable to various resource requirements Can adopt the size, processing requirements of a Bloom Filter

based on various hardware resources and network load

Can retain potential evidence for months!

Allows for cascading different techniques!

Enterprise(100Mbps)

Subnet(50Mbps)

Edge(1Gbps)

Core(40Gbps)

CR-BFTopology maps

SCBFSimple CRTopology maps

NeoflowHBFDNS mappings

Payload SamplingMAC-layer maps

Cascading of Synopses

Packet Digests & Bloom Filters Snoeren et. al. used it successfully in SPIE for

single packet traceback (“Hash-Based IP Traceback”)

Space Efficient: 16-bits per packet (m/n=16) and 8 hashes (k=8)

false positive (FP) = 5.74 x 10-4 No false negatives!

However, suppose we don’t have packets. We only have some excerpts of payload Don’t know where the excerpt was aligned in the packet

Extend Bloom Filters to support excerpt/substring matching

Block-based Bloom Filter

p0 p1 p2 … … p(n/s)P =

create s-byte blocks of payload

P = (p0|0) (p1|s) (p2|2s) … … (p(n/s)|n/s)

append blocks’Offset (in payload)

Insert each block into a Bloom Filter

P =

s

BBF = (p0|0) (p1|s) (p2|2s) (p2|3s) … (p(n/s)|n/s)

Q =

s

q0 q1 q2 … … q(l/s)Q =

create s-byte blocksof query string

Try all possible offsets

(q0|0)

X

(q0|s)

q0=p1

(q1|2s)

q1=p2

(q2|3s)

q2=p3

“q0q1q2” was seen in a payload at offset ‘s’

H1(q0|0)=1H2(q0|0)=1H3(q0|0)=0

P1 = A B R A C A

P2 = C D A B R A

For query strings: “AD”, “CB”, “DR”, “AA” etc. BBF falsely identifies them as seen in the payload!

BBF =(A|0) (B|s) (R|2s) (A|3s) (C|4s) (A|5s)

(C|0) (D|s) (A|2s) (B|3s) (R|4s) (A|5s)

(A|0) (B|s) (R|2s) (A|3s) (C|4s) (A|5s)

(C|0) (D|s) (A|2s) (B|3s) (R|4s) (A|5s)

“Offset Collisions”

Because BBF cannot distinguish between P1 and P2

Hierarchical Bloom Filter

• An HBF is basically a set of BBF for geometrically increasing sizes of blocks.

(p0|0) (p1|s) (p2|2s) … … (p(n/s)|n/s)level-0

P =

(p0p1|0) (p2p3|2s) (p(n/s-1)p(n/s)|(n/s-1))level-1

(p0p1p2p3|0)level-2

Hie

rarc

hic

al

Blo

om

Fil

ter

Hierarchical Bloom Filter

(q0|0) (q1|s) (q2|2s) (q3|3s) … (q(n/s)|l/s)level-0

Q =

(q0q1|0) (q2q3|2s) (q(l/s-1)q(l/s)|(l/s-1))level-1

(q0q1q2q3|0)level-2

• Querying is similar to BBF. • Matches at each level can be confirmed a level above.

Adapting an HBF for ForNet So far an HBF can attest for the

presence of a bit-string in payloads

We need to tie this bit-string to a source and/or destination hosts

Our Approach: Similar to tying an offset to a block/bit-

string In addition to inserting (block||offset) also

insert (block||offset||hostid) Hostid could be (srcIP||dstIP)

Coalescence of Synopses HBF requires:

Source IP, destination IP, excerpt But where do we get Source IP, Destination IP

Connection Record/Neoflow Given two time intervals can give us list of source, destination

IPs Coalesce these synopses

Who sent “DEADBEEF”between time T1 and T2?

QueryProcessor

Connectionrecord/Neoflow

HBF

Give all connections between time T1 and T2?

Did (source, destination) IP send“DEADBEEF” between time T1 and T2?

(source, destination) IPs of connectionsbetween time T1 and T2?

YES/NO

ForNet in Intranet Usage Investigation based on payload characteristics

Determine victims of worm, trojans and other malware. Detection of potential victims of phising and spyware Source of IP theft

Investigation based on connection characteristics Detection of zombies Detection of malware based on connection pattern

Investigation based on aggregate characteristics Insider abuse Network resource abuse

Etc Etc …

ForNet Deployed on Internet Investigation based on payload characteristics

Traceback based on partial content of single packets

Source of malware, worms, etc.

Investigation based on connection characteristics Stepping stone detection

Investigation based on aggregate characteristics Attack attribution

Current Status Implemented a PC based SynApp device for

placement within an intranet. Connection records, HBF, DNS, NewFlow, Mappings

Implemented Forensics Server with simple querying capabilities.

Current Forensic Server has 1.3TB of storage with over 3 months worth of data from the edge-router and two subnets

Normal bandwidth consumption of network is about a 1 – 2 TB/day

Synopses reduces this traffic to about 20GB/day A 4TB Forensic Server will take over operations in July

Implemented Panorama (GUI Client)

Tracking MyDoom Recorded all email traffic for a week

Using HBF and raw traffic Was not aware of MyDoom during this collection

When signatures became available we used them to query the system

To find hosts that are infected in our network How the hosts were infected

Some statistics: 679 hosts originated at least one copy of the virus

52 of which were in our network These hosts sent out copies of the virus to 2011

hosts outside our network boundary

Analyzing MyDoom Infections…

MyDoom’s Weekly Progress…

Neoflow-NG Keeps track of everything Neoflow has

plus: Individual packet sizes

Be able to recall the packet sizes in the same sequence

Inter-arrival times of packets Be able to recall the arrival times of individual

packets

Content type of flow Be able to recall types of content carried by flows Audio, Video, Plain-Text, Encrypted, Compressed etc.

Investigating Network Resource Abuse

Total Network Bandwidth Composition

Detection of Network Proxies

Zombie Detection Are they any zombies in Poly network?

And if so, how do we identify them? Criterion: Hosts that portscan and use IRC

at likely zombies. Tabulate the amount of IRC activity for

each host. Tabulate the amount of portscan activity

for each host. For each host: ZOMBIE_LIKELIHOOD =

PORTSCANS * IRC_FLOWS Sort all hosts by their

ZOMBIE_LIKELIHOOD. List the top 200 hosts.

Work currently in progress… Identification of useful network events

Network is the virtual crime scene that holds evidence in the form of network events

Developing efficient synopses Handling connection oriented & connectionless

traffic Techniques for payload attribution (esp. IRC/IM) Automatic cascading of various synopsis techniques

Analysis and mining techniques Zombie detection Stepping stone detection

Work currently in progress… Integration of information from synopses

across networks Development of a protocol for secure

communication of various ForNet components

Query processing and storage of synopses A query language transparent of various underlying

synopsis techniques A query manage system to interpret the language

for the underlying database Various storage and garbage collection strategies

for collected-SynApps Storage and query processing infrastructure for

Forensics Servers

Research Challenges Integration of information from synopses across

networks Real power of ForNet is realized when information from

SynApps is fused to answer queries Development of a protocol for secure communication of

various ForNet components

Query processing and storage of synopses A query language transparent of various underlying

synopsis techniques A query management system to interpret the language

for the underlying database

Analysis A frame-work to compare effectiveness/trade-offs of

synopses

Attacks on ForNet Resource Exhaustion:

Flood the network with random bits of data

Malicious Transformation: Create packets of length (blocksize – 1)

Stuffing: Stuff every other block with application dependent escape

characters For smaller blocks we can try to guess for larger blocks it is not

possible!

Exploiting Collisions: Hash collisions

Very unlikely for strong hash functions We use a random seed for every HBF so it makes it more difficult

Packet collisions A possibility in Block-based Bloom Filters but not in HBFs

Streaming transformations: Encryption, compression

For more information visit:http://isis.poly.edu/projects/fornet/