John P., Fang Yu, Yinglian Xie, Martin Abadi, Arvind Krishnamurthy University of California, Santa Cruz USENIX SECURITY SYMPOSIUM, August, 2010 John P.,

John P., Fang Yu, Yinglian Xie, Martin Abadi, Arvind Krishnamurthy University of California, Santa Cruz USENIX SECURITY SYMPOSIUM, August, 2010 John P., Fang Yu, Yinglian Xie, Martin Abadi, Arvind Krishnamurthy University of California, Santa Cruz USENIX SECURITY SYMPOSIUM, August, 2010 A Presentation at Advanced Defense Lab

Outline Introduction Related Work Architecture Implementation Stage 1 Implementation Stage 2 Attack 1: Indentifying Vulnerable Web Sites Attack 2: Forum Spamming Attack 3: Windows Live Messenger Phishing Conclusion Advanced Defense Lab2

Introduction A framework that identifies malicious queries from massive search engine logs to uncover their relationship with potential attacks. Use a small set of malicious queries as seed, and generates regular expressions for detecting new malicious queries. Advanced Defense Lab3

Introduction Two stage: Identification Investigation SearchAudit identifies malicious queries. Analyzing those queries and the attacks of which they are part. Advanced Defense Lab4

Introduction Enhanced detection capability 400 becomes 4 million. Low false-positive rates. 2% Ability to detect new attacks Forum spaming Facilitation of attack analysis Analyze a series of phishing attacks that lasted for more than one year. Advanced Defense Lab5

Related Work Advanced Defense Lab7 Theres a significant amount of automated Web traffic on the Internet. Another research showed that more than 3% of the entire search traffic may be generated by stealthy search bots. Whats the motivation of those search bots? Search engine competitors Studying search quality Click fraud for monetary gain Spreading infection (MyDoom, Santy) Identifying victims

Related Work Advanced Defense Lab8 Using regular expression patterns Hon-eycomb Polygraph Hamsa AutoRE (A way to generate RE from another research)

Architecture Let attackers be our guides Follow their activities and predict their future attacks. Advanced Defense Lab10

Architecture Platform Dryad/DryadLINQ Query Expansion Taking a small set of seed queries and expand them Extract IPs and search again Regular Expression Generation Signature Generation (AutoRE)AutoRE Eliminating Redundancies Eliminating Proxies Advanced Defense Lab11

Arch. Eliminating Redundancies Advanced Defense Lab12 Algorithm REGEX_CONSOLIDATE

Architecture Eliminating Proxies Advanced Defense Lab13 Most users in a geographical region have similar query patterns. Mostly legitimate users queries will have a large overlap with the popular queries from the same /16 IP prefix. We label an IP as a proxy if K most popular queries from that IP and the K most popular queries from that prefix overlap in m queries. K = 100, m = 5

Data Description and Sys Setup Use 3 months of search logs from the Bing search engine.Bing search February 2009 (when it was known as Live Search) December 2009 January 2010 Each month of sampled data contains around 2 billion pageviews. The seed 500 malicious queries are obtained from a hacker Web site milw0rm.commilw0rm.com Takes about 7 hours to process the 1.2 TB of sampled data. Advanced Defense Lab15

Selection of RE Use Cookies to identify the malicious queries. Benign proxy are eliminated. Use a threshold to pick regular expressions based on their scores. Advanced Defense Lab16

Detection Results: Effect of Query Expansion and Regular Expression Matching Feed the 500 malicious queries into SearchAudit, we find that 122 of the 500 queries appear in the dataset. February 2009 dataset 174 IPs issued these queries Use the result to feed our system again 800 unique queries from 264 IPs Advanced Defense Lab17

Detection Results Advanced Defense Lab18

Effect of Incomplete Seeds Split the 122 seed queries into two sets 100 queries that were first posted on milw0rm.com before 2009 22 queries were posted in 2009 Advanced Defense Lab19

Looping Back Seed Queries Use derived RE as new seeds to feed back as an input to SearchAudit. Advanced Defense Lab20

Overall Matching Statistics Advanced Defense Lab21

Verification of Malicious Queries As we lack ground truth information about whether a query is malicious or not. Check whether the query is reported on any hacker Web sites Check query behavior whether the query matches individual bot or botnet features For each query q returned by SearchAudit Issue a query q AND (dork OR vulnerability) to search engine, and save the results. Advanced Defense Lab22

Verification of Queries Generated by Individual Bots Two features help us to distinguish bot queries from human queries Cookie: Most bot queries do not enable cookies, resulting in an empty cookie field. Normal users who do not clear their cookies, all the queries carry the old cookies. Link clicked Many bots do not click any link on the result page. Instead, they scrape the results off the page. Advanced Defense Lab23

Verification of Queries Generated by Individual Bots Advanced Defense Lab24

Verification of Queries Generated by Botnets If most of the IPs that issued malicious queries exhibit similar behavior, then its likely that all these IPs were running the same script. User agent Contains information about the browser and the version used Metadata Records certain metadata that comes with the request Pages per query Records the number of search result pages retrieved per query Inter-query interval Denotes the time between queries issued by the same IP Advanced Defense Lab25

Verification of Queries Generated by Botnets Advanced Defense Lab26

Verification of Queries Generated by Botnets Advanced Defense Lab27

Analysis of Detection Results Large countries such as USA, Russia, and China are responsible for almost half the IPs issuing malicious queries. Vulnerable Web Sites Try to exploit these web sites by SQL injection index.php?content=[?=#+;&:]{1,10} Try to find particular software with known vulnerabilities Power by Forum spamming /includes/joomla.php site:.[a-zA-Z]{2,3} Windows Live Messenger phishing Advanced Defense Lab29

Analysis of Detection Results Advanced Defense Lab30

Identifying Vulnerable Web Sites Applications of Vulnerability Searches Sample 5000 queries returned by SearchAudit. For every query q we issue a query q dork vulnerability. Obtain 80,490 URLs from 39,475 unique Web sites. Compare this list of random Web sites against a list of known phishing or malware sites. PhishTank Microsoft Test and show that many of these sites indeed have SQL injection vulnerabilities. Advanced Defense Lab32

Identifying Vulnerable Web Sites Advanced Defense Lab33

SQL Injection Vulnerabilities For the malicious queries, we look at the search results and crawl all of the links twice. First time, we crawl the link as is Second time, we add a single quote () If the two pages are identical, then it suggests that theres no obvious SQL injection vulnerability If the second page have any kind of SQL error, then there might exists an SQL injection vulnerability In 14,500 URLs, we find 1,760 URLs (12%) may have SQL injection vulnerability. Advanced Defense Lab34

Forum-Spamming Attacks We manually identified 46 REs that are associated with forum spamming. Advanced Defense Lab36

Advanced Defense Lab37

Forum-Spamming Attacks Advanced Defense Lab38

Apps of Forum Searching Queries Using Project Hony Pot to identify Web spammingProject Hony Pot Advanced Defense Lab39

Windows Live MSN Phishing What is a MSN Phishing ? http://[a-zA-Z0-9._]*. / http:// ?user=[a-zA-Z0-9._]* Advanced Defense Lab41

Windows Live MSN Phishing Advanced Defense Lab42

Characteristics of Compromised Accounts Advanced Defense Lab43

Conclusion Advanced Defense Lab45

Documents

John P., Fang Yu, Yinglian Xie, Martin Abadi, Arvind Krishnamurthy University of California, Santa Cruz USENIX SECURITY SYMPOSIUM, August, 2010 John P.,