Measuring Adversaries Vern Paxson International Computer Science Institute / Lawrence Berkeley National Laboratory [email protected] June 15, 2004

Measuring Adversaries

Vern Paxson

International Computer Science Institute /

Lawrence Berkeley National Laboratory

[email protected]

June 15, 2004

Data courtesy of Rick Adams

= 80% growth/year

= 60% growth/year

= 596% growth/year

The Point of the Talk

• Measuring adversaries is fun:

– Increasingly of pressing interest

– Involves misbehavior and sneakiness

– Includes true Internet-scale phenomena

– Under-characterized

– The rules change

The Point of the Talk, con’t

• Measuring adversaries is challenging:

– Spans very wide range of layers, semantics, scope

– New notions of “active” and “passive” measurement

– Extra-thorny dataset problems

– Very rapid evolution: arms race

Adversaries & Evasion

• Consider passive measurement: scanning traffic for a particular string (“USER root”)

• Easiest: scan for the text in each packet– No good: text might be split across multiple

packets

• Okay, remember text from previous packet– No good: out-of-order delivery

• Okay, fully reassemble byte stream– Costs state ….– …. and still evadable

Evading Detection ViaAmbiguous TCP Retransmission

The Problem of Evasion

• Fundamental problem passively measuring traffic on a link: Network traffic is inherently ambiguous

• Generally not a significant issue for traffic characterization …

• … But is in the presence of an adversary: Attackers can craft traffic to confuse/fool monitor

The Problem of “Crud”

• There are many such ambiguities attackers can leverage

• A type of measurement vantage-point problem

• Unfortunately, these occur in benign traffic, too:– Legitimate tiny fragments, overlapping fragments

– Receivers that acknowledge data they did not receive

– Senders that retransmit different data than originally

• In a diverse traffic stream, you will see these:

– What is the intent?

Countering Evasion-by-Ambiguity

• Involve end-host: have it tell you what it saw• Probe end-host in advance to resolve

vantage-point ambiguities (“active mapping”)– E.g., how many hops to it?

– E.g., how does it resolve ambiguous retransmisions?

• Change the rules - Perturb– Introduce a network element that “normalizes” the

traffic passing through it to eliminate ambiguities• E.g., regenerate low TTLs (dicey!)

• E.g., reassemble streams & remove inconsistent retransmissions

Adversaries & Identity

• Usual notions of identifying services by port numbers and users by IP addresses become untrustworthy

• E.g., backdoors installed by attackers on non-standard ports to facilitate return / control

• E.g., P2P traffic tunneled over HTTP

• General measurement problem: inferring structure

Adversaries & Identity:Measuring Packet Origins

• Muscular approach (Burch/Cheswick)– Recursively pound upstream routers to see which

ones perturb flooding stream

• Breadcrumb approach:– ICMP ISAWTHIS

• Relies on high volume

– Packet marking• Lower volume + intensive post-processing• Yaar’s PI scheme yields general tomography utility

Yields general technique: power of introducing small amount of state inside the network

Adversaries & Identity:Measuring User Origins

• Internet attacks invariably do not come from the attacker's own personal machine, but from a stepping-stone: a previously-compromised intermediary.

• Furthermore, via a chain of stepping stones.• Manually tracing attacker back across the

chain is virtually impossible.• So: want to detect that a connection going

into a site is closely related to one going out of the site.

• Active techniques? Passive techniques?

Measuring User Origins, con’t

• Approach #1 (SH94; passive): Look for similar text– For each connection, generate a 24-byte

thumbprint summarizing per-minute character frequencies

• Approach #2 (USAF94) - particularly vigorous active measurement:– Break-in to upstream attack site– Rummage through its logs– Recurse


• Approach #3 (ZP00; passive): Leverage unique on/off pattern of user login sessions:– Look for connections that end idle periods at the

same time.– Two idle periods correlated if ending time differ by

≤ sec.– If enough periods coincide stepping stone pair.– For A B C stepping stone, just 2 correlations

suffices– (For A B … C D, 4 suffices.)


• Works very well, even for encrypted traffic• But: easy to evade, if attacker cognizant of

algorithm– C’est la arms race

• And: also turns out there are frequent legit stepping stones

• Untried active approach: imprint traffic with low-frequency timing signature unique to each site (“breadcrumb”). Deconvolve recorded traffic to extract.

Global-scale Adversaries: Worms

• Worm = Self-replicating/self-propagating code• Spreads across a network by exploiting flaws

in open services, or fooling humans (viruses)• Not new --- Morris Worm, Nov. 1988

– 6-10% of all Internet hosts infected

• Many more small ones since …… but came into its own July, 2001

Code Red

• Initial version released July 13, 2001.• Exploited known bug in Microsoft IIS Web

servers.• 1st through 20th of each month: spread.

20th through end of each month: attack.• Spread: via random scanning of 32-bit

IP address space.• But: failure to seed random number generator

linear growth reverse engineering enables forensics

Code Red, con’t

• Revision released July 19, 2001.

• Payload: flooding attack on www.whitehouse.gov.

• Bug lead to it dying for date ≥ 20th of the month.

• But: this time random number generator correctly seeded. Bingo!

Worm dies on July 20th, GMT

Measuring Internet-Scale Activity: Network Telescopes

• Idea: monitor a cross-section of Internet address space to measure network traffic involving wide range of addresses – “Backscatter” from DOS floods– Attackers probing blindly– Random scanning from worms

• LBNL’s cross-section: 1/32,768 of Internet– Small enough for appreciable telescope lag

• UCSD, UWisc’s cross-section: 1/256.

Spread of Code Red

• Network telescopes give lower bound on # infected hosts: 360K.

• Course of infection fits classic logistic.

• That night ( 20th), worm dies … … except for hosts with inaccurate clocks!

• It just takes one of these to restart the worm on August 1st …

Could parasitically analyze sample of 100K’s of clocks!

The Worms Keep Coming

• Code Red 2:– August 4th, 2001– Localized scanning: prefers nearby addresses– Payload: root backdoor– Programmed to die Oct 1, 2001.

• Nimda:– September 18, 2001– Multi-mode spreading, including via Code Red 2

backdoors!

Code Red 2 kills off Code Red 1

Code Red 2 settles into weekly pattern

Nimda enters the ecosystem

Code Red 2 dies off as programmed

CR 1 returns thanksto bad clocks

Code Red 2 dies off as programmed

Nimda hums along, slowly cleaned up

With its predator gone, Code Red 1 comes back!, still exhibiting monthly pattern

80% of Code Red 2 cleaned up due to onset of Blaster

Code Red 2 re-released with Oct. 2003 die-off

Code Red 1 and Nimda endemic

Code Red 2 re-re-released Jan 2004

Code Red 2 dies off again

Detecting Internet-Scale Activity

• Telescopes can measure activity, but what does it mean??

• Need to respond to traffic to ferret out intent

• Honeyfarm: a set of “honeypots” fed by a network telescope

• Active measurement w/ an uncooperating (but stupid) remote endpoint

Internet-Scale Adversary Measurement via Honeyfarms

• Spectrum of response ranging from simple/cheap auto-SYN acking to faking higher levels to truly executing higher levels

• Problem #1: Bait– Easy for random-scanning worms, “auto-rooters”– But for “topological” or “contagion” worms, need to

seed honeyfarm into application network Huge challenge

• Problem #2: Background radiation– Contemporary Internet traffic rife with endemic

malice. How to ignore it??

Measuring InternetBackground Radiation -- 2004

• For good-sized telescope, must filter:– E.g., UWisc /8 telescope sees 30Kpps of traffic

heading to non-existing addresses

• Would like to filter by intent, but initially don’t know enough

• Schemes - per source:– Take first N connections– Take first N connections to K different ports– Take first N different payloads– Take all traffic source sends to first N destinations

Responding to Background Radiation

Hourly Background Radiation Seen at a 2,560-address Telescope

Measuring Internet-scale Adversaries: Summary

• New tools & forms of measurement:– Telescopes, honeypots, filtering

• New needs to automate measurement:– Worm defense must be faster-than-human

• The lay of the land has changed:– Endemic worms, malicious scanning– Majority of Internet connection (attempts)

are hostile (80+% at LBNL)

• Increasing requirement for application-level analysis

The Huge Dataset Headache

• Adversary measurement particularly requires packet contents– Much analysis is application-layer

• Huge privacy/legal/policy/commercial hurdles• Major challenge: anonymization/agents

technologies– E.g. [PP03] “semantic trace transformation”– Use intrusion detection system’s application

analyzers to anonymize trace at semantic level (e.g., filenames vs. users vs. commands)

– Note: general measurement increasingly benefits from such application analyzers, too

Attacks on Passive Monitoring

• State-flooding:

– E.g. if tracking connections, each new SYN requires state; each undelivered TCP segment requires state

• Analysis flooding:– E.g. stick, snot, trichinosis

• But surely just peering at the adversary we’re ourselves safe from direct attack?

Attacks on Passive Monitoring

• Exploits for bugs in passive analyzers!• Suppose protocol analyzer has an error

parsing unusual type of packet– E.g., tcpdump and malformed options

• Adversary crafts such a packet, overruns buffer, causes analyzer to execute arbitrary code

• E.g. Witty, BlackIce & packets sprayed to random UDP ports– 12,000 infectees in < 60 minutes!

Summary

• The lay of the land has changed– Ecosystem of endemic hostility– “Traffic characterization” of adversaries as

ripe as characterizing regular Internet traffic was 10 years ago

– People care

• Very challenging:– Arms race– Heavy on application analysis– Major dataset difficulties

Summary, con’t

• Revisit “passive” measurement:– evasion– telescopes/Internet scope– no longer isolated observer, but vulnerable

• Revisit “active” measurement– perturbing traffic to unmask hiding &

evasion– engaging attacker to discover intent

• IMHO, this is "where the action is” …• … And the fun!

Documents

Measuring Adversaries Vern Paxson International Computer Science Institute / Lawrence Berkeley National Laboratory [email protected] June 15, 2004