Vern Paxson, Stefan Savage George Varghese, Geoff Voelker, Nick Weaver

Collaborative Center for Collaborative Center for Internet Epidemiology Internet Epidemiology and Defenses (CCIED)and Defenses (CCIED)

Technical Advisory Board Technical Advisory Board MeetingMeeting

Vern Paxson, Stefan SavageVern Paxson, Stefan SavageGeorge Varghese, Geoff Voelker, Nick WeaverGeorge Varghese, Geoff Voelker, Nick Weaver

Mark Allman, Juan Caballero, Martin Casado, Jay Chen, Simon Mark Allman, Juan Caballero, Martin Casado, Jay Chen, Simon Crosby, Crosby,

Weidong Cui, Cristian Estan, Ranjit Jhala, Jaeyeon Jung, Chris Weidong Cui, Cristian Estan, Ranjit Jhala, Jaeyeon Jung, Chris Kanich, Kanich,

Jayanth Kumar Kannan, Erin Kenneally, Kirill Levchenko, Justin Jayanth Kumar Kannan, Erin Kenneally, Kirill Levchenko, Justin Ma, Ma,

Marvin McNett, David Moore, Michelle Panik, Colleen Shannon, Marvin McNett, David Moore, Michelle Panik, Colleen Shannon, Sumeet Singh, Alex Snoeren, Amin Vahdat, Erik Vandekieft, Sumeet Singh, Alex Snoeren, Amin Vahdat, Erik Vandekieft, Michael Vrable, Ming Woo-Kawaguchi, Vinod YegneswaranMichael Vrable, Ming Woo-Kawaguchi, Vinod Yegneswaran

Welcome!

First some context… This isn’t a “sales pitch” We created a TAB for our benefit We want to improve the effectiveness of the project and we think

you can help

…and some ground rules We’re going to give some informal presentations Ask questions and give informal feedback anytime The meeting today is private, but nothing is confidential We have some specific high-level focus questions that we’d like

you to think about and give feedback

Focus questions for the TAB

1. Are we considering the right threats? Are there other technical approaches we should

be considering? Are we missing any important partnership

opportunities? Are we missing any key capabilities on our

team? What education/training is necessary/missing

for practitioners in the field? How can we best help here?

Agenda

9:30-10:30 Intro 10:45-12:00 Data Collection (Honeyfarms) 12:00-1:30 Lunch 1:30-1:45 Potpourri 1:45-2:30 Detection/Defense 2:30-3:00 Future 3:30-4:30 TAB Breakout 4:30-5:30 TAB Feedback Dinner

For the rest of our time…

Motivation and scope

What we promised NSF Research & education

Prior activity and background Monitoring Analyses Defense

Motivation: threat transformation

Traditional threats Attacker manually targets high-

value system/resource Defender increases cost to

compromise high-value systems Biggest threat: insider attacker

Modern threats Attacker uses automation to

target all systems at once (can filter later)

Defender must defend all systems at once

Biggest threats: software vulnerabilities & naïve users

No longer just for fun, but for profit SPAM forwarding (MyDoom.A backdoor, SoBig), Credit

Card theft (Korgo), DDoS extortion, etc… Symbiotic relationship: worms, bots, SPAM, DDoS, etc Fluid third-party exchange market

(millions of hosts for sale) Going rate for SPAM proxying 3 -10 cents/host/week

Seems small, but 25k botnet gets you $40k-130k/yr Raw bots, 1$+/host, Special orders

Generalized search capabilities are next “Virtuous” economic cycle Bottom line: compromised hosts are a platform

Driving economic forces

Overall CCIED Scope

Developing understanding and technology to address the threats of large-scale host compromise

CCIED’s research responsibilities

Internet Epidemiology: Understanding What kinds of new attacks are going on? What are their limits?

Automated Network Defenses: Reacting Stop new attacks without humans in the loop

Legal and Economic issues: Worrying What are liability issues? How to create forensic and commercial value?

CCIED’s education responsibilities

We are committed to provide yearly workshop to help train researchers and the workforce (interpreted broadly) in these issues Input appreciated for this, format and who best short

term audience might be

Curriculum development Worm/virus segments for undergrad and grad classes

Year one milestones

Development and deployment of large-scale network worm detection system (telescope/simple honeyfarm)

Testing of prototype in-line defenses (scan suppression, signature extraction)

Legal issues related to both technologies

Initial Worm/Virus curriculum for security courses

CIED Web Portal running

Ancient history – independent groups

In late 90’s Paxson deploys Bro IDS system at LBL and starts looking at network-based intrusions

In 2000, UCSD develops “network telescope”-based backscatter DoS inference technique

See: Paxson, Bro: a System for Detecting Intruders in Real Time, USENIX Security, 1998 &Moore et al, Inferring Internet Denial of Service Activity, USENIX Security, 2001

Code Red

Code Red epidemic takes off in 2001, first large-scale network worm in over a decade

Selects IP address at random and probes for vulnerability

Monitored via telescopes ~360,000 hosts in a day Slow admin response Didn’t do much

Growth matches logisticfunction

See: Moore et al, CodeRed: a Case study on the Spread of an Internet Worm, IMW 2002 andStaniford et al, How to 0wn the Internet in your Spare Time, USENIX Security 2002

Code Red is only proof of concept

Better targeting possible Biased: local biases faster and more likely to hit Topological: exploit application-level networks (e.g. e-mail, p2p

apps, google vs searchers, etc) Hitlist: predetermine vulnerable hosts (at least some)

Metaserver worms – exploit directory servers for this purpose Permutation scanning: don’t duplicate effort Contagion worms: hide in existing communication patterns

More destructive payload possible Toast disk, toast bios, patch microcode Simple cost models suggest multi-billion costs achievable

Call for Cyber-CDC

See: Staniford et al, How to 0wn the Internet in your Spare Time, USENIX Security 2002and Weaver et al, A Worst-case Worm. WEIS 2004

How well must defense work?

Containment strategy “Sharable” signatures

offer huge advantages Reaction Time

For CodeRed densities 3hrs for 10 probes/sec 2mins for 1000 probes/sec

Deployment Need to interdict most paths Worms form worlds-best overlay net

Address Filtering

Reaction time (minutes) Reaction time (hours)

% I

nfec

ted

(95th

perc

.)

Content Filtering:Address Filtering

Reaction time (minutes)

Address Filtering

Reaction time (minutes) Reaction time (hours)

% I

nfec

ted

(95th

perc

.)

Content Filtering:

Reaction time (hours)

% I

nfec

ted

(95th

perc

.)

Content Filtering:

See: Moore et al, Internet Quarantine: Requirements for Containing Self-Propagating Code, Infocom 2003

Content Filtering:

probes/second

reac

tion

time

Content Filtering:

probes/second

reac

tion

time

% I

nfec

ted

at 2

4 ho

urs

(95th

perc

.)

Top

100

CodeRed-like Worm

25%

50%

75%

100%

Top

10To

p 20

Top

30To

p 40 All

% I

nfec

ted

at 2

4 ho

urs

(95th

perc

.)

Top

100

CodeRed-like Worm

25%

50%

75%

100%

Top

10To

p 20

Top

30To

p 40 All

Aside

Around this time both groups are providing input to Anup Ghosh (DARPA) for new program: Dynamic Quarantine

We join forces and put in joint proposal Highest-rated proposal for DQ Project then classified (then reclassified again!)

Group stays in touch…

A pretty fast outbreak:Slammer (2003) First ~1min behaves like classic

random scanning worm Doubling time of ~8.5 seconds CodeRed doubled every 40mins

>1min worm starts to saturateaccess bandwidth Some hosts issue >20,000 scans

per second Self-interfering

(no congestion control)

Peaks at ~3min >55million IP scans/sec

90% of Internet scanned in <10mins Infected ~100k hosts

(conservative) See: Moore et al, The Spread of the Sapphire/Slammer Worm, IEEE Security & Privacy, 1(4), 2003

Was Slammer really fast?

Yes, it was orders of magnitude faster than CR No, it was poorly written and unsophisticated Who cares? It is literally an academic point

The current debate is whether one can get < 500ms Bottom line: way faster than people!

See: Staniford et al, The Top Speed of Flash Worms, ACM WORM, 2004

Aside: How to think about worms

Reasonably well described as infectious epidemics Simplest model: Homogeneous random contacts

Classic SI model N: population size S(t): susceptible hosts at time t I(t): infected hosts at time t ß: contact rate i(t): I(t)/N, s(t): S(t)/N

N

IS

dt

dSN

IS

dt

dI

)1( ii

dt

di

)(

)(

1)(

Tt

Tt

e

eti

courtesy Paxson, Staniford, Weaver

What’s important?

There are lots of improvements to the model… Chen et al, Modeling the Spread of Active Worms, Infocom 2003 (discrete time) Wang et al, Modeling Timing Parameters for Virus Propagation on the Internet ,

ACM WORM ’04 (delay) Ganesh et al, The Effect of Network Topology on the Spread of Epidemics,

Infocom 2005 (topology) … but the bottom line is the same. We care about two

things:

How likely is it that a given infection attempt is successful? Target selection (random, biased, hitlist, topological,…) Vulnerability distribution (e.g. density – S(0)/N)

How frequently are infections attempted? ß: Contact rate

What can be done?

Reduce the number of susceptible hosts Prevention, reduce S(t) while I(t) is still small

(ideally reduce S(0))

Reduce the contact rate Containment, reduce ß while I(t) is still small

This is where most of our work has focused

Scan Detection

Basic idea: detection scanning behavior indicative of worms and shoot down hosts

Threshold Random Walk algorithm Scanners will not usually succeed Track ratio of failed connection attempts to connection

attempts per IP address; should be small Can be approximated for line-rate implementation in

hardware (being built by Nick)

See: Jung et al, Fast Portscan Detection Using Sequential Hypothesis Testing, Oakland 2004, Weaver et al, Very Fast Containment of Scanning Worms, USENIX Security 2004

Content sifting

Key idea: quickly infer content signature for new worm Assume there exists some (relatively) unique invariant bitstring

W across all instances of a particular worm Two consequences

Content Prevalence: W will be more common in traffic than other bitstrings of the same length

Address Dispersion: the set of packets containing W will address a disproportionate number of distinct sources and destinations

Content sifting: find W’s with high content prevalence and high address dispersion and drop that traffic

By using approximate data structures can be implemented at line-rate

See: Singh et al, Automated Worm Fingerprinting, OSDI 2004.

CCIED formed in 2004

Joint UCSD/ICSI collaboration $6.2M from NSF over 5 years

Synergistic support from Microsoft, HP, Intel, VMware, CNS

Between 20-25 people involved Our first year of operation completes in

November

Questions

?

Documents

Vern Paxson, Stefan Savage George Varghese, Geoff Voelker, Nick Weaver