John P., Fang Yu, Yinglian Xie, Martin Abadi, Arvind Krishnamurthy University of California, Santa...
45
Searching the Searchers with SearchAudit John P., Fang Yu, Yinglian Xie, Martin Abadi, Arvind Krishnamurthy University of California, Santa Cruz USENIX SECURITY SYMPOSIUM, August, 2010 A Presentation at Advanced Defense Lab
John P., Fang Yu, Yinglian Xie, Martin Abadi, Arvind Krishnamurthy University of California, Santa Cruz USENIX SECURITY SYMPOSIUM, August, 2010 John P.,
John P., Fang Yu, Yinglian Xie, Martin Abadi, Arvind
Krishnamurthy University of California, Santa Cruz USENIX SECURITY
SYMPOSIUM, August, 2010 John P., Fang Yu, Yinglian Xie, Martin
Abadi, Arvind Krishnamurthy University of California, Santa Cruz
USENIX SECURITY SYMPOSIUM, August, 2010 A Presentation at Advanced
Defense Lab
Slide 2
Outline Introduction Related Work Architecture Implementation
Stage 1 Implementation Stage 2 Attack 1: Indentifying Vulnerable
Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger
Phishing Conclusion Advanced Defense Lab2
Slide 3
Introduction A framework that identifies malicious queries from
massive search engine logs to uncover their relationship with
potential attacks. Use a small set of malicious queries as seed,
and generates regular expressions for detecting new malicious
queries. Advanced Defense Lab3
Slide 4
Introduction Two stage: Identification Investigation
SearchAudit identifies malicious queries. Analyzing those queries
and the attacks of which they are part. Advanced Defense Lab4
Slide 5
Introduction Enhanced detection capability 400 becomes 4
million. Low false-positive rates. 2% Ability to detect new attacks
Forum spaming Facilitation of attack analysis Analyze a series of
phishing attacks that lasted for more than one year. Advanced
Defense Lab5
Slide 6
Outline Introduction Related Work Architecture Implementation
Stage 1 Implementation Stage 2 Attack 1: Indentifying Vulnerable
Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger
Phishing Conclusion Advanced Defense Lab6
Slide 7
Related Work Advanced Defense Lab7 Theres a significant amount
of automated Web traffic on the Internet. Another research showed
that more than 3% of the entire search traffic may be generated by
stealthy search bots. Whats the motivation of those search bots?
Search engine competitors Studying search quality Click fraud for
monetary gain Spreading infection (MyDoom, Santy) Identifying
victims
Slide 8
Related Work Advanced Defense Lab8 Using regular expression
patterns Hon-eycomb Polygraph Hamsa AutoRE (A way to generate RE
from another research)
Slide 9
Outline Introduction Related Work Architecture Implementation
Stage 1 Implementation Stage 2 Attack 1: Indentifying Vulnerable
Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger
Phishing Conclusion Advanced Defense Lab9
Slide 10
Architecture Let attackers be our guides Follow their
activities and predict their future attacks. Advanced Defense
Lab10
Slide 11
Architecture Platform Dryad/DryadLINQ Query Expansion Taking a
small set of seed queries and expand them Extract IPs and search
again Regular Expression Generation Signature Generation
(AutoRE)AutoRE Eliminating Redundancies Eliminating Proxies
Advanced Defense Lab11
Architecture Eliminating Proxies Advanced Defense Lab13 Most
users in a geographical region have similar query patterns. Mostly
legitimate users queries will have a large overlap with the popular
queries from the same /16 IP prefix. We label an IP as a proxy if K
most popular queries from that IP and the K most popular queries
from that prefix overlap in m queries. K = 100, m = 5
Slide 14
Outline Introduction Related Work Architecture Implementation
Stage 1 Implementation Stage 2 Attack 1: Indentifying Vulnerable
Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger
Phishing Conclusion Advanced Defense Lab14
Slide 15
Data Description and Sys Setup Use 3 months of search logs from
the Bing search engine.Bing search February 2009 (when it was known
as Live Search) December 2009 January 2010 Each month of sampled
data contains around 2 billion pageviews. The seed 500 malicious
queries are obtained from a hacker Web site milw0rm.commilw0rm.com
Takes about 7 hours to process the 1.2 TB of sampled data. Advanced
Defense Lab15
Slide 16
Selection of RE Use Cookies to identify the malicious queries.
Benign proxy are eliminated. Use a threshold to pick regular
expressions based on their scores. Advanced Defense Lab16
Slide 17
Detection Results: Effect of Query Expansion and Regular
Expression Matching Feed the 500 malicious queries into
SearchAudit, we find that 122 of the 500 queries appear in the
dataset. February 2009 dataset 174 IPs issued these queries Use the
result to feed our system again 800 unique queries from 264 IPs
Advanced Defense Lab17
Slide 18
Detection Results Advanced Defense Lab18
Slide 19
Effect of Incomplete Seeds Split the 122 seed queries into two
sets 100 queries that were first posted on milw0rm.com before 2009
22 queries were posted in 2009 Advanced Defense Lab19
Slide 20
Looping Back Seed Queries Use derived RE as new seeds to feed
back as an input to SearchAudit. Advanced Defense Lab20
Verification of Malicious Queries As we lack ground truth
information about whether a query is malicious or not. Check
whether the query is reported on any hacker Web sites Check query
behavior whether the query matches individual bot or botnet
features For each query q returned by SearchAudit Issue a query q
AND (dork OR vulnerability) to search engine, and save the results.
Advanced Defense Lab22
Slide 23
Verification of Queries Generated by Individual Bots Two
features help us to distinguish bot queries from human queries
Cookie: Most bot queries do not enable cookies, resulting in an
empty cookie field. Normal users who do not clear their cookies,
all the queries carry the old cookies. Link clicked Many bots do
not click any link on the result page. Instead, they scrape the
results off the page. Advanced Defense Lab23
Slide 24
Verification of Queries Generated by Individual Bots Advanced
Defense Lab24
Slide 25
Verification of Queries Generated by Botnets If most of the IPs
that issued malicious queries exhibit similar behavior, then its
likely that all these IPs were running the same script. User agent
Contains information about the browser and the version used
Metadata Records certain metadata that comes with the request Pages
per query Records the number of search result pages retrieved per
query Inter-query interval Denotes the time between queries issued
by the same IP Advanced Defense Lab25
Slide 26
Verification of Queries Generated by Botnets Advanced Defense
Lab26
Slide 27
Verification of Queries Generated by Botnets Advanced Defense
Lab27
Slide 28
Outline Introduction Related Work Architecture Implementation
Stage 1 Implementation Stage 2 Attack 1: Indentifying Vulnerable
Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger
Phishing Conclusion Advanced Defense Lab28
Slide 29
Analysis of Detection Results Large countries such as USA,
Russia, and China are responsible for almost half the IPs issuing
malicious queries. Vulnerable Web Sites Try to exploit these web
sites by SQL injection index.php?content=[?=#+;&:]{1,10} Try to
find particular software with known vulnerabilities Power by Forum
spamming /includes/joomla.php site:.[a-zA-Z]{2,3} Windows Live
Messenger phishing Advanced Defense Lab29
Slide 30
Analysis of Detection Results Advanced Defense Lab30
Slide 31
Outline Introduction Related Work Architecture Implementation
Stage 1 Implementation Stage 2 Attack 1: Indentifying Vulnerable
Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger
Phishing Conclusion Advanced Defense Lab31
Slide 32
Identifying Vulnerable Web Sites Applications of Vulnerability
Searches Sample 5000 queries returned by SearchAudit. For every
query q we issue a query q dork vulnerability. Obtain 80,490 URLs
from 39,475 unique Web sites. Compare this list of random Web sites
against a list of known phishing or malware sites. PhishTank
Microsoft Test and show that many of these sites indeed have SQL
injection vulnerabilities. Advanced Defense Lab32
Slide 33
Identifying Vulnerable Web Sites Advanced Defense Lab33
Slide 34
SQL Injection Vulnerabilities For the malicious queries, we
look at the search results and crawl all of the links twice. First
time, we crawl the link as is Second time, we add a single quote ()
If the two pages are identical, then it suggests that theres no
obvious SQL injection vulnerability If the second page have any
kind of SQL error, then there might exists an SQL injection
vulnerability In 14,500 URLs, we find 1,760 URLs (12%) may have SQL
injection vulnerability. Advanced Defense Lab34
Slide 35
Outline Introduction Related Work Architecture Implementation
Stage 1 Implementation Stage 2 Attack 1: Indentifying Vulnerable
Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger
Phishing Conclusion Advanced Defense Lab35
Slide 36
Forum-Spamming Attacks We manually identified 46 REs that are
associated with forum spamming. Advanced Defense Lab36
Slide 37
Advanced Defense Lab37
Slide 38
Forum-Spamming Attacks Advanced Defense Lab38
Slide 39
Apps of Forum Searching Queries Using Project Hony Pot to
identify Web spammingProject Hony Pot Advanced Defense Lab39
Slide 40
Outline Introduction Related Work Architecture Implementation
Stage 1 Implementation Stage 2 Attack 1: Indentifying Vulnerable
Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger
Phishing Conclusion Advanced Defense Lab40
Slide 41
Windows Live MSN Phishing What is a MSN Phishing ?
http://[a-zA-Z0-9._]*. / http:// ?user=[a-zA-Z0-9._]* Advanced
Defense Lab41
Slide 42
Windows Live MSN Phishing Advanced Defense Lab42
Slide 43
Characteristics of Compromised Accounts Advanced Defense
Lab43
Slide 44
Outline Introduction Related Work Architecture Implementation
Stage 1 Implementation Stage 2 Attack 1: Indentifying Vulnerable
Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger
Phishing Conclusion Advanced Defense Lab44