1 Abusing the Network: Spam in All its forms Joshua Goodman, Microsoft Research with slides from...
41
1 Abusing the Network: Spam in All its forms Joshua Goodman, Microsoft Research with slides from Geoff Hulten and all the hard work done by other people, including Robert Rounthwaite, David Heckerman, John Platt, Carl Kadie, Eric Horvitz, Scott Yih, Geoff Hulten, Nathan Howell, Micah Rupersburg, George Webb, Ryan Hamlin, Kevin Doerr, Elissa Murphy, Derek Hazeur, Bryan Starbuck, lots of people at Hotmail and Outlook and Exchange and MSN…
1 Abusing the Network: Spam in All its forms Joshua Goodman, Microsoft Research with slides from Geoff Hulten and all the hard work done by other people,
1 Abusing the Network: Spam in All its forms Joshua Goodman,
Microsoft Research with slides from Geoff Hulten and all the hard
work done by other people, including Robert Rounthwaite, David
Heckerman, John Platt, Carl Kadie, Eric Horvitz, Scott Yih, Geoff
Hulten, Nathan Howell, Micah Rupersburg, George Webb, Ryan Hamlin,
Kevin Doerr, Elissa Murphy, Derek Hazeur, Bryan Starbuck, lots of
people at Hotmail and Outlook and Exchange and MSN
Slide 3
Overview Introduction to spam Theres a ton of it, and people
hate it Kinds of Spam(lots) Techniques Spammers Use Can be used for
other kinds of spam Solutions to spam Machine Learning, Fuzzy
Hashing, Turing Tests, Blackhole lists, etc. Can apply to other
kinds of spam Conclusion
Slide 4
InfoWorld Poll July 25, 2003
Slide 5
Pew Internet Study Numbers 25% of email users say spam has
reduced their overall email use 76% of email users are bothered by
offensive or obscene content of spam. Economics favor spam 7% of
email users report that they have ordered a product advertised in
spam. Cost of sending spam is only about.01 cents/message! If 1 in
100,000 people buy, and you earn $11, you make a profit
Slide 6
Kinds of Spam Advertising Wants to Be Free Email spam (Ill talk
about this in detail) Usenet spam (usenet now nearly useless in
many groups) Chat rooms Instant Messenger Popups (Web pages,
Adware) Search engine spam Link spam Word spam Blog spam VOIP?
Conclusion: If you can advertise for free, someone will
Slide 7
Chat Room Spam MSN closed its free chat rooms Spambots come in
and pretend to chat But really just advertising porn sites Some
spambots trivial Dont talk at all, but take up space Link to porn
spam in their profile Some spambots very sophisticated You can have
a short conversation with them before they try to convince you to
go to their website Randomized conversations so hard for users to
spot
Slide 8
joshuagood9: hi there super_christina12: hey there how u doin?
joshuagood9: are you a bot? super_christina12: im not a bot are u?
lol joshuagood9: are you a bot? super_christina12: i hate bots lol
joshuagood9: asl? super_christina12: im 20 f usa and u?
joshuagood9: asl? super_christina12: im 21 f usa and u?
joshuagood9: 74/M, WA super_christina12: nice age joshuagood9:
thank you super_christina12: yw sweety..could u do me a
favor..check out my homepage and my profile see if my cam works?
brb Chat Bot Conversation
Slide 9
super_christina12s home page
Slide 10
Instant Messenger Spam SPIM Send messages to people via IM
Microsoft solved this by requiring to get permission before IMing
Spammers put spam in their name so permission request message now
has spam!
Slide 11
Popup Spam Web page popups You go to a web page, and get a
popup May be a pop under that appears under all other windows, so
you dont even know where it came from Adware (e.g. Gator) Software
installed on your computer either without your permission, or where
permission is hidden deep in license agreement. Creates popups all
the time Messenger Spam (not IM) Method meant to deliver notices
like Printer is out of paper Spammers exploit it to create notices
like Buy a diploma
Slide 12
Search Engine Spam Link spam Search engines use number of links
to determine rankings Spammers create millions of pages that link
to their site Fake pages may be realistic and may be returned as
search results, too. Word spam Spammers put misleading words on
their page, e.g. celebrity names or technical terms Page is
actually porn Cloaking Put useful content on page when crawler
comes (use IP address or agent ID) Put spam on page when people
click on the link
Slide 13
VOIP spam? Make free or low-cost internet phone calls Internet
to internet is free Could be fully automated Speech recognition
someday Almost no cost Internet to POTS (fixed line) 2 cents/minute
Could be cheap foreign labor International calls too expensive over
traditional phone lines for most advertising, become more
affordable with VOIP
Slide 14
Techniques Spammers Use Examples of Tricks Sending spam Open
proxies Zombies (Lots of other nasty stuff, wont have time to talk
about today)
Slide 15
Weather Report Guy Content in Image Good Word Chaff Weather,
Sunny, High 82, Low 81, Favorite
Slide 16
Secret Decoder Ring Dude Another spam that looks easy Is
it?
Slide 17
Secret Decoder Ring Dude Character Encoding HTML word breaking
Pharmacy Produc t s
Slide 18
Diploma Guy Word Obscuring Dplmoia Pragorm Caerte a mroe
prosoeprus
Slide 19
Diploma Guy Word Obscuring Dipmloa Paogrrm Cterae a more
presporous
Slide 20
Diploma Guy Word Obscuring Dimlpoa Pgorram Cearte a more
poosperrus
Slide 21
Diploma Guy Word Obscuring Dpmloia Pragorm Caetre a more
prorpeosus
Slide 22
Diploma Guy Word Obscuring Dplmoia Pragorm Carete a mroe
prorpseous
Slide 23
More of Diploma Guy Diploma Guy is good at what he does
Slide 24
Trends in Spam Exploits (Hulten et al.) Exploit 2003 Spam 2004
Spam Delta (Absolute %) Description Word Obscuring 4%20%16%
Misspelling words, putting words into images, etc. URL Spamming
0%10% Adding URLs to non-spam sites (e.g. msn.com). Domain Spoofing
41%50%9% Using an invalid or fake domain in the from line. Token
Breaking 7%15%8% Breaking words with punctuation, space, etc. MIME
Attacks5%11%6% Putting non-spam content in one body part and spam
content in another. Text Chaff52%56%4% Random strings of
characters, random series or words, or unrelated sentences. URL
Obscuring22%17%-5% Encoding a URL in hexadecimal, hiding the true
URL with an @ sign, etc. Character Encoding5%0%-5%Pharmacy renders
into Pharmacy. Based on 1,200 spam messages sent to Hotmail
Slide 25
Sending Spam: Open Proxies These are web-page proxy servers
Used for getting web-pages past firewalls Should have nothing to do
with email Spammers exploit holes Exploit a hole that you can use
some proxies to send email Exploit another hole that anyone can
access the proxy-server Both holes must be present to use an open
proxy Spammers really love these Almost impossible to trace spammer
Spammer uses someone elses bandwidth Less incentive for owner to
close the proxy than to close open mail relays Everyone who abuses
things likes open proxies (example: click fraud on Google and
Overture advertisements)
Slide 26
Sending Spam: Zombies As much as 2/3 of spam may originate from
zombies now! Consumer computers taken over by viruses or trojans
Spammer tells them what to send Very difficult to trace Very cheap
for spammer
Slide 27
Blogs, Referrers For link spam, some spammers manually or
automatically add links to blogs Increases page rank of site Blog
site tends to have high page rank Some sites will put list of
referrer pages or proxy traffic somewhere on site Spammers generate
fake traffic to site to get entries, which are then indexed by
search engines Even secure features like these are vectors for
abuse
Filtering Technique Machine Learning Learn spam versus good
Problem: need source of training data Get users to volunteer GOOD
and SPAM 100,000 volunteers at Hotmail Should generalize well But
spammers are adapting to machine learning too Images, different
words, misspellings, etc. We use machine learning details
later
Slide 30
Filtering Technique Matching/Fuzzy Hashing Use Honeypots
addresses that should never get mail All mail sent to them is spam
Look for similar messages that arrive in real mailboxes Exact match
easily defeated Use fuzzy hashes How effective? The Madlibs attack
defeats exact match filters or fuzzy hashing Spammers already doing
this Good work from IBM and AOL (Josh Alspector) on solutions Make
Earn thousands of dollars lots of money working at home in the
comfort of your own house !!!.
Slide 31
Blackhole Lists Lists of IP addresses that send spam Open
relays, Open proxies, DSL/Cable lines, etc Easy to make mistakes
Open relays, DSL, Cable send good and spam Who makes the lists?
Some list-makers very aggressive Some list-makers too slow MSN
blocks e-mail from rival ISPs By Stefanie Olsen Staff Writer, CNET
News.com February 28, 2003, 2:34 PM PTStefanie Olsen Microsoft's
MSN said its e-mail services had blocked some incoming messages
from rival Internet service providers earlier this week, after
their networks were mistakenly banned as sources of junk mail. The
Redmond, Wash., company, which has nearly 120 million e-mail
customers through its Hotmail and MSN Internet services, confirmed
Friday it had wrongly placed a group of Internet protocol addresses
from AOL Time Warner's RoadRunner broadband service and EarthLink
on its "blocklist" of known spammers whose mail should be barred
from customer in- boxes. Once notified of the error by the two
ISPs, MSN moved the IP addresses "over to a safe list immediately,"
according to a Microsoft spokeswoman.
Slide 32
Postage Basic problem with email is that it is free Force
everyone to pay (especially spammers) and spam goes away Send
payment pre-emptively, with each outbound message, or wait for
challenge Multiple kinds of payment: Turing Test, Computation,
Money
Slide 33
Turing Tests (Naor 96) You send me mail; I dont know you I send
you a challenge: type these letters Your response is sent to my
computer Your message is moved to my inbox, where I read it
Slide 34
Computational Challenge (Dwork and Naor 92) Sender must perform
time consuming computation Example: find a hash collision Easy for
recipient to verify, hard for sender to find collision Requires say
10 seconds (or 5 minutes?) of sender CPU time (in background) Can
be done preemptively, or in response to challenge
Slide 35
$$$ Money Pay actual money (1 cent?) to send a message My
favorite variation: take money only when user hits Report Spam
button Otherwise, refund to sender Free for non-spammers to send
mail, but expensive for spammers Requires multiple monetary
transactions for every message sent expensive Who pays for
infrastructure?
Slide 36
The SmartProof Approach Overview Combines best aspects of
several previous techniques: Machine learning Challenge response
Postage (multiple techniques)
Slide 37
SmartProof: Selective Challenging Most challenge-response
approaches challenge every message We use machine learning to
challenge only some messages Definite spam deleted (saves
processing costs) Definite good passed through to inbox (avoids
annoying challenges, and avoids many challenges that will not be
answered.) Only possible spam, possible good is challenged
Slide 38
SmartProof: Sender Chooses Type of Proof Can auto-respond with
computation Least annoying to sender he may never see the challenge
Usable by people with disabilities Can respond by solving a Turing
Test Works for people with old computers or incompatible computers
or who do not want to download code Future?: Can respond with
micro-payment Works for small businesses. Hardest for spammers to
work around.
Slide 39
Why Email and Spam need Their Own Field/Conference Email is one
of top two applications Search is the other (TREC, SIGIR) Email is
why my grandfather and my wifes grandmother bought computers If you
are interested in the frontiers of the internet, you should be very
interested in the number one application on the internet
Slide 40
Example: Anti-Spoofing Cryptographic approaches S/MIME, PGP
Small adoption because of problems distributing keys need solutions
that work for email Systems/networking approaches DNS/IP
address-based approaches Combination approaches Put key in DNS
entry (e.g. Yahoos DomainKeys) Need a conference where the crypto
people and systems people and email people and spam people all come
together to compare and learn
Slide 41
Conference on Email and Anti-Spam www.ceas.cc How its
different: First academic-style research conference on email or
spam Plenty of informal conferences, industrial conferences
Thursday and Friday at Stanford If you cant come this year, make
sure you come next year, and submit papers!
Slide 42
Conclusion Machine learning filters combined with postage seem
like the best approach to stopping spam Any thing on the network
that can be abused will be abused