Collaborative Center for Internet Epidemiology and Defenses (CCIED) Stefan Savage Department of Computer Science & Engineering University of California,

Collaborative Center for Internet Epidemiology and Defenses (CCIED)

Stefan Savage

Department of Computer Science & EngineeringUniversity of California, San Diego

Context: threat transformation

• Traditional threats– Attacker manually targets high-

value system/resource – Defender increases cost to

compromise high-value systems– Biggest threat: insider attacker

• Modern threats– Attacker uses automation to

attack many resources at once (filter later)

– Defender must defend all systems at once

– Biggest threat: software bugs and naïve users

Technical enablers

• Wide-open communications architecture– IP model: anyone can send anything to anyone– Federated management, minimal authentication

• Vulnerable computing platforms– One software bug -> millions of compromised hosts– Naïve users -> don’t even need software bugs

• Lack of meaningful deterrence– Little forensic attribution/audit capability– Inefficient investigatory mechanisms/

prosecutorial incentives

Bigger problem: Economic Drivers

• In last six years, emergence of profit-making malware– Anti-spam efforts force spammers to launder e-mail through

compromised machines (starts with MyDoom.A, SoBig)– “Virtuous” economic cycle transforms nature of threat

• Commoditization of compromised hosts– Fluid third-party exchange market (millions of hosts)

• Raw bots (range from pennies to dollars)• Value added tier: SPAM proxying (more expensive)

• Innovation in both host substrate and its uses– Sophisticated infection and command/control networks: platform– SPAM, piracy, phishing, identity theft, DDoS are all applications

DDoS for sale• Emergence of economic engine for Internet crime

– SPAM, phishing, spyware, etc

• Fluid third party markets for illicit digital goods/services– Bots ~$0.5/host, special orders, value added tiers– Cards, malware, exploits, DDoS, cashout, etc.

6

• 3.6 cents per bot week

• 6 cents per bot week

• 2.5 cents per bot week

September 2004 postings to SpecialHam.com, Spamforum.biz

>20-30k always online SOCKs4, url is de-duped and updated> every 10 minutes. 900/weekly, Samples will be sent on> request. Monthly payments arranged at discount prices.

>$350.00/weekly - $1,000/monthly (USD) >Type of service: Exclusive (One slot only)>Always Online: 5,000 - 6,000>Updated every: 10 minutes

>$220.00/weekly - $800.00/monthly (USD)>Type of service: Shared (4 slots)>Always Online: 9,000 - 10,000>Updated every: 5 minutes

Botnet Spammer Rental Rates

Bot PayloadsBot Payloads

Structural asymmetries• Defenders reactive, attackers proactive

– Defenses public, attacker develops/tests in private– Arms race where best case for defender is to “catch up”

• New defenses expensive, new attacks cheap– Defenses sunk costs/business model,

attacker agile and not tied to particular technology

• Defenses hard to measure, attacks easy to measure– Few security metrics (no “evidence-based” security), attackers

directly monetization which drives attack quality

• Minimal deterrent effect

8

CCIED• Collaborative Center for Internet Epidemiology and

Defenses (“Seaside”)– Joint UCSD/ICSI project, 1 of 4 National CyberTrust Centers – Focused on threats posed by large-scale host compromise

• Worms, viruses, botnets, DDoS, spam, spyware etc

• Three key areas of work– Internet epidemiology: measuring/understanding attacks– Automated defenses: blocking/stopping attacks:– Economic drivers: why attacks are happening

• See: http://www.ccied.org

10

DetectingDetecting Outbreaks Outbreaks

• Both defense and deterrence are predicated on getting good intelligence– Need to detect, characterize and analyze new malware threats

– Need to be do it quickly across a very large number of events

• Classes of monitors– Network-based

– Endpoint-based

• Monitoring environments– In-situ: real activity as it happens

• Network/host IDS

– Ex-situ: “canary in the coal mine”• HoneyNets/Honeypots

Network Telescopes

• Idea: Unsolicited packets evidence of global phenomena– Backscatter: response packets sent by victims provide insight into

global prevalence of DoS attacks (and who is getting attacked)– Scans: request packets can indicate an infection attempt from a

worm (and who is current infected, growth rate, etc.)• Very scalable: CCIED Telescope monitors 17M+ IP addrs

– (> 1% of all routable addresses of the Internet)Moore et al, Inferring Internet Denial-of-Service Activity, USENIX Security, 2001.

Backscatter analysis

• Monitor block of n IP addresses

• Expected # of backscatter packets given an attack of m packets:

• Extrapolated attack rate R’ is a function of measured backscatter rate R:

322

nmE(X)

nRR

322'

Attacks over time

Example: Periodic attack (1hr per 24hrs)

Measuring worm growth

CodeRed infects 360,000 hosts in 14 hours in 2001

Moore et al, Code Red: a case study on the spread and victims of an Internet worm, ACM IMW, 2002

Code red was slow

• Slammer worm released January 2003– First ~1 min behaves like classic scanning worm

(doubles in 8.5secs)– >1 min worm saturates access bandwidth

• Some hosts issue > 20,000 scans/sec• Self-interfering

– Peaks at ~3 min• >55 million IP scans/sec

– 90% of Internet scanned in <10 mins

Moore et al, The Spread of the Sapphire/Slammer Worm, IEEE Security & Privacy, 1(4), 2003

Scalability/Fidelity Scalability/Fidelity Tradeoff in detectionTradeoff in detection

Live Honeypot

Telescopes + Responders(iSink, honeyd, Internet Motion Sensor)

VM-based Honeynet(e.g., Collapsar)

NetworkTelescopes(passive)

MostScalable

HighestFidelity

Potemkin Honeyfarm

• Provide the illusion of millions of honeypots– But use a much smaller

set of physical resources– 1 Million IP addresses on

10s of physical hosts

• Gateway multiplexes traffic onto multiple virtual machines (VMs)

• VMM multiplexes multiple VMs on physical servers

Vrable et al., Scalability, Fidelity, and Containment in the Potemkin Virtual Honeyfarm, SOSP 2005.

Was largest high-fidelity honeyfarm on planet

Potemkin OperationPotemkin Operation

• Packet received by gateway• Dispatched to honeyfarm

server• VM instantiated

– Adopts destination IP address

– Creation must be fast enough to maintain illusion (creation via copy)

• Many VMs will be created– Must be resource efficient

(copy-on-write representation)

– Can support 100s of simultaneous VMs per server

Outbreak Defense• Modern worms can infect

>1M hosts/sec• Need to detect and block

new outbreaks << 1 sec [Moore et al, Infocom03]

SRC: 11.12.13.14.3920 DST: 132.239.13.24.5000 PROT: TCP

00F0 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 ................0100 90 90 90 90 90 90 90 90 90 90 90 90 4D 3F E3 77 ............M?.w0110 90 90 90 90 FF 63 64 90 90 90 90 90 90 90 90 90 .....cd.........0120 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 ................0130 90 90 90 90 90 90 90 90 EB 10 5A 4A 33 C9 66 B9 ..........ZJ3.f.0140 66 01 80 34 0A 99 E2 FA EB 05 E8 EB FF FF FF 70 f..4...........p. . .

PACKET HEADER

PACKET PAYLOAD (CONTENT)

SRC: 11.12.13.14.3920 DST: 132.239.13.24.5000 PROT: TCP

00F0 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 ................0100 90 90 90 90 90 90 90 90 90 90 90 90 4D 3F E3 77 ............M?.w0110 90 90 90 90 FF 63 64 90 90 90 90 90 90 90 90 90 .....cd.........0120 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 ................0130 90 90 90 90 90 90 90 90 EB 10 5A 4A 33 C9 66 B9 ..........ZJ3.f.0140 66 01 80 34 0A 99 E2 FA EB 05 E8 EB FF FF FF 70 f..4...........p. . .

PACKET HEADER

PACKET PAYLOAD (CONTENT)

Earlybird: Line-rate network inference of worm signatures [Singh et al, OSDI04]

Key issue: how to learn popular strings with high in and out degree, without maintaining per-string state

Precise signature identification < 1ms

Singh et al., Automated Worm Fingerprinting, NSDI 2004.

Today

• We are increasingly focused on better mapping the economics of on-line crime– Botnet infiltration– Spam conversion– Buying and selling of stolen credit cards,

bank accounts, botnets, etc

• The hope is to find economics bottlenecks and focus defenses there

Spam

• The oldest e-crime profit generator• > 100B spam e-mails sent/day (Ironport)• Wide range of campaigns

– Scams: pharma, software, rolex, jobs, porn,..– Phishing: banks (e.g. BoA), e-commerce, etc– Web exploits, XSS & social engineering

• Key question: what is ROI?– Costs can be estimated, but we don’t know sales

conversion rate

Courtesy Stuart Brownmodernlifisrubbish.co.uk

How Pharma Spam works?

Key opportunity

• Spam is increasingly sent by botnets

• Botnets are increasingly self-organizing

• Can infiltrate botnet C&C network– Observe who is getting spammed– Observe what spam is being sent– Observe which addresses get delivered to– Change templates in transit

Kanich, Kreibich, Levchenko, Enright, Paxson, Voelker and Savage, Spamalytics: an Empirical Analysis of Spam Marketing Conversion, ACM CCS 2008

http://canadianpharma.com

http://ucsdpharma.com

Spam pipeline

26

83.6 M

347.5M

21.1M (25%)

82.7M (24%)

3,827 (0.005%)

10,522 (0.003%)

316 (0.00037%)

28 (0.000008%)

---

Pharma: 12 M spam emails for one “purchase”Pharma: 12 M spam emails for one “purchase”

Sent MTA Visits ConversionsInbox

40.1 M 10.1M (25%) 2,721 (0.005%) 225 (0.00056%)

E-card: 1 in 10 visitors execute the binaryE-card: 1 in 10 visitors execute the binary

Questions?

Yahoo! 27

Collaborative Center for Internet Epidemiology and Defenses

http://ccied.org

What’s next: Value-chain characterization

• Value-chain characterization– Empirical map establishing links between criminal

groups and enablers• Affiliate programs, botnets, fast flux networks, registrars,

payment processors, SEO/traffic partners, fulfillment/manufacturing

• Data mining across huge data feeds we’ve built or established relationships for

– Social network among criminal groups• Semantic Web mining

New: Fulfillment measurements

• About to start purchasing wide range of spam-advertized products

– Watches– Pharma– Traffic

• Cluster purchases based on

– Merchant and processor– Packaging (postmark, forensic analysis of

paper)– Artifacts of manufacturing process (e.g., FT-

NIR on drugs)29

• Observations

– Modest number of bots send most spam

– Virtually all bots use templates with simple rules to describe polymorphism

– Templates+dictionaries ≈ regex describing spam to be generated

– If we can extract or infer these from the botnets, we have a perfect filter for all the spam generated by the botnet

– Very specific filters, extremely low FP risk

New: Bot-based spam filter generation

http://www.marshal.com/trace/spam_statistics.asp

random letters and numbers

phrases from a dictionary

Early results (last week)0 FP with 50 examples0 FN on Storm with 500 examples

Still tuning for other botnets

Documents

Collaborative Center for Internet Epidemiology and Defenses (CCIED) Stefan Savage Department of Computer Science & Engineering University of California,