9
Research Article Method for Detecting Core Malware Sites Related to Biomedical Information Systems Dohoon Kim, Donghee Choi, and Jonghyun Jin Agency for Defense Development, Daejeon 305-600, Republic of Korea Correspondence should be addressed to Dohoon Kim; [email protected] Received 5 December 2014; Accepted 17 February 2015 Academic Editor: Joongheon Kim Copyright © 2015 Dohoon Kim et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Most advanced persistent threat attacks target web users through malicious code within landing (exploit) or distribution sites. ere is an urgent need to block the affected websites. Attacks on biomedical information systems are no exception to this issue. In this paper, we present a method for locating malicious websites that attempt to attack biomedical information systems. Our approach uses malicious code crawling to rearrange websites in the order of their risk index by analyzing the centrality between malware sites and proactively eliminates the root of these sites by finding the core-hub node, thereby reducing unnecessary security policies. In particular, we dynamically estimate the risk index of the affected websites by analyzing various centrality measures and converting them into a single quantified vector. On average, the proactive elimination of core malicious websites results in an average improvement in zero-day attack detection of more than 20%. 1. Introduction Various types of cyber-attacks have recently been attempted on biomedical information systems [1, 2]. is is mainly because the personal records included in biomedical systems represent valuable financial information. Unfortunately, current network security solutions are more vulnerable to advanced intelligent cyber-attacks [3] than to traditional cyber-attacks (e.g., distributed denial of service and spam). Because advanced persistent threat (APT) attacks [4, 5] are concentrated on the weak point of the target and the context, it is very hard to establish which APT attack detection method and defense system are most appropriate for biomedical information systems. APT attacks are generally administered through mali- cious code exploit/landing/distribution sites, and infected User (or Administrator) PCs [6] easily give up contacts to biomedical information systems. erefore, it is necessary to preisolate the contact points by which malicious code is disseminated, that is, the exploit/landing/distribution sites, to defend against these targeted attacks and protect biomedical information systems. To defend against APT attacks on biomedical information systems, it is vital to analyze the way in which the network between medical websites and related websites is formed. is is because APT attacks make use of various sociotechnolog- ical methods [7] and create as many links as possible with medical service users (patients), medical staff, and related people via various contacts. Above all, administrators should detect malicious code targeted at biomedical information systems in an early stage and block the core-hub node in order to cope with APT attacks. erefore, this paper proposes a methodology that blocks and eliminates malicious code at an early stage by detecting the core-hub node at the root of the network between the bio- medical information system-targeted malicious code exploit/ landing/distribution site and the related websites. is paper also employs network analysis to estimate and manage the risk index of the detected malware sites by determining the potential risk factor of each exploit/landing/distribution point. In particular, we present a method for reprocessing malicious code so that it can be used as a reference in terms of malicious code detection and management. Furthermore, this paper supports the efficient classifi- cation/application and management of massive blacklists in terms of biomedical information system-targeted malware sites. In this paper, we measure the risk index of websites Hindawi Publishing Corporation Computational and Mathematical Methods in Medicine Volume 2015, Article ID 756842, 8 pages http://dx.doi.org/10.1155/2015/756842

Research Article Method for Detecting Core Malware Sites ...downloads.hindawi.com/journals/cmmm/2015/756842.pdf · # of nodes 1, Time complexity: ( ). B Eigenvector Centrality Index

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article Method for Detecting Core Malware Sites ...downloads.hindawi.com/journals/cmmm/2015/756842.pdf · # of nodes 1, Time complexity: ( ). B Eigenvector Centrality Index

Research ArticleMethod for Detecting Core Malware Sites Related toBiomedical Information Systems

Dohoon Kim Donghee Choi and Jonghyun Jin

Agency for Defense Development Daejeon 305-600 Republic of Korea

Correspondence should be addressed to Dohoon Kim karmy01addrekr

Received 5 December 2014 Accepted 17 February 2015

Academic Editor Joongheon Kim

Copyright copy 2015 Dohoon Kim et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Most advanced persistent threat attacks target web users through malicious code within landing (exploit) or distribution sitesThere is an urgent need to block the affected websites Attacks on biomedical information systems are no exception to this issueIn this paper we present a method for locating malicious websites that attempt to attack biomedical information systems Ourapproach uses malicious code crawling to rearrange websites in the order of their risk index by analyzing the centrality betweenmalware sites and proactively eliminates the root of these sites by finding the core-hub node thereby reducing unnecessary securitypolicies In particular we dynamically estimate the risk index of the affected websites by analyzing various centrality measures andconverting them into a single quantified vector On average the proactive elimination of core malicious websites results in anaverage improvement in zero-day attack detection of more than 20

1 Introduction

Various types of cyber-attacks have recently been attemptedon biomedical information systems [1 2] This is mainlybecause the personal records included in biomedical systemsrepresent valuable financial information

Unfortunately current network security solutions aremore vulnerable to advanced intelligent cyber-attacks [3]than to traditional cyber-attacks (eg distributed denial ofservice and spam) Because advanced persistent threat (APT)attacks [4 5] are concentrated on the weak point of the targetand the context it is very hard to establish which APT attackdetection method and defense system are most appropriatefor biomedical information systems

APT attacks are generally administered through mali-cious code exploitlandingdistribution sites and infectedUser (or Administrator) PCs [6] easily give up contacts tobiomedical information systems Therefore it is necessaryto preisolate the contact points by which malicious code isdisseminated that is the exploitlandingdistribution sites todefend against these targeted attacks and protect biomedicalinformation systems

To defend against APT attacks on biomedical informationsystems it is vital to analyze the way in which the network

betweenmedical websites and relatedwebsites is formedThisis because APT attacks make use of various sociotechnolog-ical methods [7] and create as many links as possible withmedical service users (patients) medical staff and relatedpeople via various contacts Above all administrators shoulddetect malicious code targeted at biomedical informationsystems in an early stage and block the core-hubnode in orderto cope with APT attacks

Therefore this paper proposes amethodology that blocksand eliminates malicious code at an early stage by detectingthe core-hub node at the root of the network between the bio-medical information system-targetedmalicious code exploitlandingdistribution site and the related websites This paperalso employs network analysis to estimate and manage therisk index of the detected malware sites by determiningthe potential risk factor of each exploitlandingdistributionpoint In particular we present a method for reprocessingmalicious code so that it can be used as a reference in termsof malicious code detection and management

Furthermore this paper supports the efficient classifi-cationapplication and management of massive blacklists interms of biomedical information system-targeted malwaresites In this paper we measure the risk index of websites

Hindawi Publishing CorporationComputational and Mathematical Methods in MedicineVolume 2015 Article ID 756842 8 pageshttpdxdoiorg1011552015756842

2 Computational and Mathematical Methods in Medicine

Internet user

Internet user

Internet user

Landing siteDistribution site

Hopping site

Hopping site

Hopping site

Hopping site

Hopping site

Hopping site

Exploit site

Exploit site

Exploit site

Hopping section Exploit section

Distribution site

Distribution site

RedirectionWeb connection Malicious code dissemination

Figure 1 Definition of landing (or exploit)distribution sites including malicious code

with links to biomedical information systems and produce amalicious URL risk index (MRI) from this reference index

2 Background

Todetect the core-hubnode it is first necessary to understandthe entire framework of malicious code distribution andinfection through malicious websites It is also important tounderstand the typical methods of detecting such websitesand to appreciate certain risk estimation methods for thedetection of malicious sites

21 Malware Site Framework To estimate the risk index of amalware site we need to understand the dissemination routeFigure 1 illustrates the definition and operation principles ofthe malware site detection framework which is the basis forrisk index estimation

As shown in Figure 1 the victim (ie internet user) firstvisits the landing site connected with the distribution site andis then redirected to a hopping site or exploit site and finallydownloads themalicious codeThe internet user is eventuallyinfected by the malicious code and may be damaged byvarious secondary cyber-attacks (eg personal informationleaks system destruction and other host-derived attacks)

22WebCrawling-BasedMalicious Site Detection Most stud-ies on malware sites have mainly focused on detectionThese studies primarily apply a web crawling method thatrapidly collects the URL information of websites througha web crawler-based search engine [8 9] However theweb crawling technology used for malicious code collectionselects and collects the execution files or compressed files thatcontain the malicious code unlike the web crawling appliedby search engines

The web crawler considers URLs with file extensionsof exe or HTTP headers with ldquoapplicationoctet-streamrdquocontent types to be execution files and downloads them Thecrawler then inspects the headers of the downloaded files toconfirm whether they are execution files As execution filescompressed files and MS installation files are inspected anddownloaded in the same way

A number of web crawling-based automatic maliciouscode collection techniques have been proposed most ofwhich search websites via web crawling confirm whether thewebsites include malicious code and then downloadanalyzethe relevant content

3 Analysis of the Risk Index of BiomedicalInformation System-Related Malware Sites

We first propose a method for estimating the risk index ofbiomedical information system-targeted malware sites andestimate the ultimate risk index by analyzing the potentialthreat through a correlation analysis between the distributionsite and the other connected sites

The following sections describe our approach for predict-ing the risk index of the exploitlanding sites that redistributethe malicious code The risk of individual exploitlandingsites is calculated through this prediction

31 Vector-Based Risk Index Estimation Method We employa risk vector calculation to estimate the risk index [10] As aplanar vector is indicated by arranging two real numbers athree-dimensional vector is indicated by arranging three realnumbers in the rectangular coordinate system

Spatial rectangular coordinates are indicated by arrangingthree real numbers that are orthogonal to each other throughthe origin 119874

Computational and Mathematical Methods in Medicine 3

MRI

V1

V2

y (betweenness centrality)

z (degree centrality)

x (eigenvector centrality)

O

MRI998400

Figure 2 Entire analysis diagram for malicious code landing (orexploit)distribution site risk estimation

We fix the three coordinate axes 119909 119910 and 119911 set thepositive direction of the 119909 119910 and 119911 axes and then define thelength scale

As shown in Figure 2 three vectors (connectivity eigen-vector and betweenness) are used to estimate the risk indexof malicious code landing (or exploit)distribution sites andthe length is indicated by the vector sum [10] The purpose isto indicate different vector values as lengths and then quantifythe risk index through this

We thus determine which sites have the highest-riskindex and find the significance-based concentration degreeof the corresponding sites by analyzing the central structureof the exploitlandingdistribution sites within maliciouscode that is connected to medical information systemsTo interpret various meanings more objectively this paperrepresents a risk factor and estimates the ultimate risk indexby analyzing the connectivity [11ndash13] degree eigenvector andbetweenness of the distribution site and exploitlanding siteand vectorizing the calculated value We now define eachelement of the risk index for the detected malicious codeexploitlandingdistribution sites

(i) Degree Centrality Analysis of NodesThis is defined asthe number of links incident upon a nodeThe degreecan be interpreted in terms of the immediate risk ofa node catching whatever is flowing through the net-work (such as malware sites) In the case of a directednetwork (where ties have direction) we usually definetwo separate measures of degree centrality namelythe in-degree and out-degree centrality

(ii) Eigenvector Centrality Analysis of Nodes This mea-sures the influence of a node within a networkRelative scores are assigned to all nodes in thenetwork based on the concept that connections tohigh-scoring nodes contribute more to the score ofthe node in question than equal connections to low-scoring nodes

(iii) Betweenness Centrality Analysis of Nodes This is thenumber of shortest paths from all vertices to allothers that pass through that node A node withhigh betweenness centrality has a large influence onthe transfer of items through the network underthe assumption that the transfer of items follows theshortest path

32 Method to Estimate Malicious URL Risk Index (MRI) Toestimate the risk index of theURLof amalicious code exploitlandingdistribution site we follow the process in Figure 3

(1) Step 1 Node Characteristic Classification Landing (orexploit)distribution site information is classified bythe logs produced through the self-developed mali-cious code detection crawler and the detectionhistory is sorted by time from the unit logs of themali-cious code exploitlandingdistribution siteThe basicrisk is also estimated with the following log informa-tion

A Node Characteristic Whether the infected siteis an exploitlanding site or a distribution siteis confirmed If there is no link to the detectedmalicious code (ie the information on thefirst infected site) the site is defined as adistribution site If the URL of another site isexploiteddistributed the site is defined as anexploitlanding site

B Malicious Code ExploitLandingDistribution SiteInformationThis is theURLof the detectedmali-cious code exploitlandingdistribution siteTheexploitlanding site can be the distributionsite If the distribution site is eliminated by aself-developed or other detection system theexploitlanding site is rendered as the distribu-tion site and operated continuously as a mali-cious code distribution site

C IP Address Country Code amp Site SurvivabilityBasic information is collected through the IPaddress and the related server location and thecurrent operating status is investigated In par-ticular the survivability of the exploitlandingdistribution site is very important in estimatingthe risk index Although the site has beentreated or isolated and is no longer operated thepossibility of reinfection exists if the weak pointis exposed continuously Therefore this shouldbe reflected in the risk index estimation

(2) Step 2 Centrality Analysis of NodeThe following threeindices are applied to the centrality analysis of eachnode

(i) Degree Centrality Analysis(ii) Eigenvector Centrality Analysis(iii) Betweenness Centrality Analysis

4 Computational and Mathematical Methods in Medicine

Step 1Node

characteristicclassification

Eigenvectorcentrality analysis

of node

Betweenesscentrality analysis

of node

Degree centrality analysis of node

Step 2

1st order risk analysis

Step 3

2nd order risk analysis

Step 4

Distribution site risk analysis

Exploit site risk analysis

Weight value

calculation

Crawling DB

MRI estimationStep 5

Figure 3 Entire analysis diagram for risk estimation of malicious code exploitlandingdistribution site

A Degree Centrality Index (DCI)

(i) A node that has more directly connectedneighboring nodes has higher degree cen-trality The scale of direct effects is mea-sured

(ii) Degree centrality is calculated from thecomposition ratio of each node

DCI =sum (weight of incedent link)

of nodes minus 1

Time complexity 119874 (119899) (1)

B Eigenvector Centrality Index (ECI)

(i) Assume that the number of the linksincluded in node 119873119895 is 119897119895 If one of theselinks is connected to node119873119894 the probabil-ity that 119873119895 passes 119873119894 is 1119897119895 Therefore theultimate ECI is as follows

ECI = 119868 (119873119894) =sum 119868 (119873119895)

119897119895

(2)

C Betweenness Centrality Index (BCI)

(i) To measure the BCI measure the degreeto which a node is located on the shortestroute between nodes

(ii) The betweenness centrality of a node ishigher if the node connects more differentnode groups The BCI indicates the degreeto which a node functions as a bridge in theentire network

(iii) It is possible to find the intermediate URLthat links information between fields

(iv) Suppose that 119892119895119896 is the shortest possibleroute between nodes 119895 and 119896 in the networkand 119892119895119896(119899119894) is the shortest possible routebetween nodes 119895 and 119896 that includes node119894 The probability of the shortest route thatincludes node 119894 is 119892119895119896(119899119894)119892119895119896

BCI = 119862119861 (119899119894) =sum119895lt119896 119892119895119896 (119899119894)

119892119895119896

(3)

If the main target node is constructed asa child node of depth 1 the degree willbe increased However the BCI will bedecreased by (3)

(3) Step 3 1st Order Risk Analysis The 1st order risk isestimated by calculating the Euclidean distance of thenode analysis result from Step 2 The 1st order risk isthus estimated by the vector distance formula for thevalues calculated in Step 2

1199031 =radicDCI2 + ECI2 + BCI2 (4)

Computational and Mathematical Methods in Medicine 5

(4) Step 4 2nd Order Risk Analysis

A Distribution Site Risk Analysis The risk indexis estimated by considering the weights (over-lapped infection history and survival ratio)based on the 1st order risk analysis The dis-tribution site risk is calculated by the vector ofthe values calculated in Step 3 the overlappedinfection history (119868) of each distribution sitenode and the actual survival ratio (119878)lowast Survival Ratio (119878) is as follows whether treat-ment has been given after infection (based onone yearrsquos information)

Treatment Probability (1198781)

=Survival Cases

Survival Cases + Treatesd Cases

Failure Probability (1198782)

=Treated Cases

Survival Cases + Treatesd Cases

1199032 = 1199031 times 119868 times 1198781 (If the node has been treated)

1199032 = 1199031 times 119868 times 1198782 (If the node has not been treated)

(5)

B ExploitLanding Site Risk Analysis The riskindex is estimated by considering the weights(overlapped infection history and exposure fre-quency) from the 1st order risk analysis Theexploit site risk is calculated by the vectorof values calculated in Step 3 the overlappedinfection history (119868) of each distribution sitenode and the actual exposure frequency in asearch website (119864)

1199033 = 1199031 times (2 times 119868 times 119864

119868 + 119864) or 1199033 = 1199031 times 119868 (6)

(5) Step 5 Malicious URL Risk Index (MRI) The MRI isestimated from the 1st order risk analysis result andthe risk indexThe following formula can be deducedfrom the 1st order risk analysis result calculated inStep 3 and the risk index of each distributionexploitsite calculated in Step 4 considering the characteris-tics of the corresponding node

119903final = radic11990321 + 11990322 + 11990323

(7)

4 Experimental Results

We conducted experiments to examine the performance ofour zero-day detection method based on MCC

For these experiments we processed the detection logacquired by crawling biomedical information system-relatedmalware sites with the developedMCC in the log form statedin Step 1

The estimated risk values are intuitive in our proposedmodel That is our final interpretation is based on the

crawling result Additionally the crawling method uses ablacklist or known patterns Thus our proposed modelexhibits a low false positive rate

The MCC detection method proceeds as follows Theattacker (hacker) inserts malicious code into a specific web-page by operating a malicious code distribution server on theinternet or by hacking a vulnerable web serverThe clients (orusers) of the web server involuntarily use the exploitlandingdistribution site containing themalicious code and downloadthe malicious code Eventually the attacker collects the clientaccounts and various other information from the infectedserver and proceeds to act maliciously

The proposed system searchescrawls 25 million siteson a continuous basis detectsblocks the inserted maliciouscode and establishesoperates a malicious code blacklist

41 Analysis Results As a post hoc study based on the resultsof the MCC operation for a specific period our results sup-port decision making for proactive responses and follow-upmeasures enabling biomedical information system securityexperts or administrators to maximize their operationalefficiency

Figure 4 shows the MRI estimated through the 1st and2nd order risk analysis after the detection of malicious URLs

Table 1 lists the detected malicious code exploitlandingdistribution URLs (including both exploitlanding sites anddistribution sites) The risk index is a relative value If limitedto the range 0-1 the minimum risk would be fixed at 0 but itis hard to set a clear standard for the maximum risk

In this paper we use a relative risk index that fixes theminimum risk to 0 and indicates the high-risk core malwaresites through prioritization

42 Sensitivity Analysis Thedetection rate of actual zero-dayattacks can be measured using a sensitivity analysis based onthe results given in Table 1 Among the malware sites relatedto zero-day attacks occurring to biomedical informationsystems we analyze distribution sites and exploitlandingsites Table 2 shows the detection rate measurements basedon actual data produced in a specific time window

The results in Table 2 focus on the top five high-risk sitesThe multipath malware site group denotes the number ofexploitlanding sites actually connected with a distributionsite The percentage represents the average detection rate ina specific time window and this detection performance isbetter than in the pre-analysis stage

The average early detection rate of distribution sites andexploitlanding sites is also higher in this section than in thepreanalysis stage That is the proactive elimination of coremaliciouswebsites results in an average improvement in zero-day attack detection of more than 20

43 Visualization of Analysis Results The risk index of eachURL calculated in this paper can be analyzed by verifyingwhether the risk index agrees with the weak point of thecorresponding server

This section analyzes the actual weak point based onthe calculated risk index and verifies whether this indexagrees with the actual prioritization using an error analysis

6 Computational and Mathematical Methods in Medicine

1st highest-risk landing (or exploit) site1st highest-risk distribution site

Figure 4 Visualization of malware site risk

Table 1 MRI estimation result of exploitlandingdistribution sites

Node type URL MRI ReliabilityDistribution site http222lowast lowast lowastlowast lowast lowastlowast lowast lowastchhtml 03965 91Distribution site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcomNewindexhtml 03505 92Distribution site httpa1lowast lowast lowast lowast lowast lowast lowast lowast lowastcom1indexhtml 03058 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 03047 95Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 03026 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 03017 94Distribution site httpa2lowast lowast lowast lowast lowast lowast lowast lowast lowastcom2indexhtml 03009 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastrekr 03003 92Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 02993 92Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02991 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02983 91Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02982 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorg 02970 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 02969 96Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcom 02968 95Exploit site httplowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02967 95Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02966 96Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 02966 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 02962 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcom 02961 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02960 94(ldquolowast lowast lowast lowast lowast lowast lowast lowast lowastrdquo the URL information of malware site)

Computational and Mathematical Methods in Medicine 7

Table 2 Average detection rate of zero-day attacks for a given day

Priority of risk Malware site groupwith multipath

Distribution site withsingle path

Landing (or exploit) sitewith single path

1 233 (15) 215 2822 226 (9) 316 3243 147 (8) 228 1814 184 (10) 197 3235 212 (12) 242 176Average early detection rate 2004 2396 2572

technique Figure 4 visualizes the 1st highest-risk distributionsite according to the MRI

The detection and elimination of high-risk maliciouscode exploitlandingdistribution sites related to biomedicalinformation systems can be achieved by visualizing the 1sthighest-risk exploitlandingdistribution site as shown inFigure 4Thus our proposedmodel focuses on estimating therisk presented by target malware sites in the specific field ofbiomedical information

We verify the performance of the proposed model basedon static analysis However for military government orsimilar organizations we must dynamically filter out coremalware sites based on high-performance hardware plat-forms For this reason our method is a good example of asuitable defensive measure for APT attacks

5 Related Work

Methods for detecting and analyzingwebsites includingmali-cious code can generally be divided into static and dynamicanalysis

51 Static Analysis Static analysis mainly uses machinelearning and pattern matching to detect and classify mali-cious URLs

Ma et al [14 15] presented a classification model thatdetects spam and phishingURLsThismodel uses a statisticalmethod to classify URLs by considering the lexical and host-based properties of malicious URLs Although this methoddetects both spam and phishing URLs it cannot distinguishbetween the two

Another approach is to analyze the JavaScript code inweb pages to find the typical features of malicious codeThis is done either statically [16] or dynamically by loadingthe affected pages in an emulated browser [17] Systemssuch as Prophiler [18] consider both JavaScript and otherfeatures found in HTML and the URLs of malicious pagesWhittaker et al [19] proposed a phishing website classifier toautomatically update Googlersquos phishing blacklist They usedseveral features obtained from domain information and pagecontents

JSAND [20] used amachine learning approach to classifymalicious JavaScript

52 Dynamic Analysis Dynamic analysis analyzes theserverndashclient connection to detect and classify maliciousURLs

In other words dynamic analysis relies on visiting web-siteswith an instrumented browser (often referred to as a hon-eyclient) and monitoring the activities of the machine to findthe typical signatures of successful exploitations (eg the cre-ation of a new process) [21] PhoneyC [22] uses a signature-based low-interaction honeypot to detect malicious websites

Systems such as [23 24] execute web content dynamicallyand capture drive-by downloads based on either signatures oranomaly detection while Blade [25] leverages user behaviormodels for drive-by download detection All of these systemsexhibit good detection results However it is usually costlyto follow the full redirection path and monitor each scriptexecution in real time Moreover their accuracy is highlydependent on the malicious response of the webpage tovulnerable components

Provos et al [26] analyzed the maliciousness of a largecollection of web pages using a machine learning algorithmas a prefilter for VM-based analysis They adopted content-based features including the presence of obfuscated JavaScriptand exploit site-pointing iframes

The main differences between the models proposed inthis paper and previous approaches are as follows

(i) The model proposed in this paper applies a staticmethod to analyze the connectivity between nodesand detects the core-hub node dynamically based onthe risk index

(ii) The proposed model detects and blocks the core-hubnode using link data from the high-risk maliciouswebsites as observed for a specific period of time

(iii) The proposed model prevents the dissemination ofmalicious websites in the early stages by blocking thelink between the core malicious code distribution siteand the exploitlanding site

6 Conclusion

In this paper the 1st order risk of malware infection wasanalyzed using log information estimated by an MCC thatconsiders the DCI BCI and ECI of the main nodes based onthe priority of risk This provides a quantitative value of thepotential risk inherent in the corresponding site (node)

In addition the risk index of exploit sites and distribu-tion sites was calculated by considering their weights Theoverlapped infection history and survival ratio were used toestimate the risk of distribution sites whereas the overlapped

8 Computational and Mathematical Methods in Medicine

infection history and exposure frequency were consideredwhen estimating the risk of exploit sites Finally the MRI wasestimated using the 1st order risk analysis and the risk indexof the distribution sites and exploit sites

In future work we will develop a feature model thatpredicts the seriousness of website security problems by data-mining the logs produced frommalicious code detection andvulnerability scanning tools

As this feature model will be used to predict the risk ofa specific website it should contribute to establish an activemalicious code distribution blocking system that realizesproactive responses beyond the limit of reactive responsesthat rely only on traditional malicious code detection tools

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] J A Hansen and N M Hansen ldquoA taxonomy of vulnerabilitiesin implantable medical devicesrdquo in Proceedings of the 2ndAnnualWorkshop on Security and Privacy inMedical andHome-Care Systems (SPIMACS rsquo10) pp 13ndash20 October 2010

[2] C-S Park ldquoSecurity mechanism based on hospital authen-tication server for secure application of implantable medicaldevicesrdquo BioMed Research International vol 2014 Article ID543051 12 pages 2014

[3] E Hutchins M Cloppert and R Amin ldquoIntelligence-drivencomputer network defense informed by analysis of adversarycampaigns and intrusion kill chainsrdquo in Proceedings of the 6thInternational Conference on Information Warfare and Security(ICIW rsquo11) pp 113ndash125 Academic Conferences March 2011

[4] N Moran ldquoUnderstanding Advanced Persistent ThreatsmdashACase Studyrdquo 2010 httpswwwusenixorgsystemfilesloginarticles105484-Moranpdf

[5] S-J Kim D-E Cho and S-S Yeo ldquoSecure model againstAPT in m-connected SCADA networkrdquo International Journalof Distributed Sensor Networks vol 2014 Article ID 594652 8pages 2014

[6] N Provos P Mavrommatis M Abu Rajab and F MonroseldquoAll your iframes points to usrdquo in Proceedings of the USENIXSecurity 2008

[7] S Lee and J Kim ldquoWARNINGBIRD detecting suspiciousURLsin twitter streamrdquo in Proceedings of the Symposium on Networkand Distributed System Security (NDSS rsquo12) 2012

[8] X Sun Y Wang J Ren Y Zhu and S Liu ldquoCollectinginternet malware based on client-side honeypotrdquo in Proceedingsof the 9th International Conference for YoungComputer Scientists(ICYCS rsquo08) pp 1493ndash1498 Hunan China November 2008

[9] Y-C Cho and J-Y Pan ldquoMultiple-feature extracting modulesbased leak mining system designrdquoThe Scientific World Journalvol 2013 Article ID 704865 11 pages 2013

[10] D H Kim Y-G Kim H P In and H C Jeong ldquoA method forrisk measurement of botnetrsquos malicious activitiesrdquo InformationJournal vol 17 no 1 pp 165ndash180 2014

[11] C Ni C Sugimoto and J Jiang ldquoDegree closeness andbetweenness application of group centrality measurements toexplore macro-disciplinary evolution diachronicallyrdquo in Pro-ceedings of the ISSI pp 1ndash13 Durban South Africa 2011

[12] F Barzinpour B Hoda Ali-Ahmadi S Alizadeh and S G JalaliNaini ldquoClustering networksrsquo heterogeneous data in defining acomprehensive closeness centrality indexrdquoMathematical Prob-lems in Engineering vol 2014 Article ID 202350 10 pages 2014

[13] S K Raghavan Unnithan B Kannan and M JathavedanldquoBetweenness centrality in Some classes of graphsrdquo Interna-tional Journal of Combinatorics vol 2014 Article ID 241723 12pages 2014

[14] J Ma L K Saul S Savage and G M Voelker ldquoBeyondblacklists learning to detectmaliciousweb sites from suspiciousURLsrdquo in Proceedings of the 15th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo09) pp 1245ndash1253 July 2009

[15] J Ma L K Saul S Savage and G M Voelker ldquoIdentifyingsuspicious URLs an application of large-scale online learningrdquoin Proceedings of the 26th Annual International Conference onMachine Learning (ICML rsquo09) pp 681ndash688 2009

[16] C Curtsinger B Livshits B Zorn and C Seifert ldquoZozzlelow-overhead mostly static javascript malware detectionrdquo inProceedings of the USENIX Security Symposium 2011

[17] M Cova C Kruegel and G Vigna ldquoDetection and analysis ofdrive-by-download attacks and malicious JavaScript coderdquo inProceedings of the 19th International World Wide Web Confer-ence (WWW rsquo10) pp 281ndash290 April 2010

[18] D Canali M Cova G Vigna and C Kruegel ldquoProphiler a fastfilter for the large-scale detection of malicious web pagesrdquo inProceedings of the 20th International Conference on World WideWeb (WWW rsquo11) pp 197ndash206 2011

[19] C Whittaker B Ryner and M Nazif ldquoLarge-scale automaticclassification of phishing pagesrdquo in Proceedings of the Sympo-sium on Network and Distributed System Security (NDSS rsquo10)2010

[20] P Agten S van Acker Y Brondsema P H Phung L Desmetand F Piessens ldquoJSand complete client-side sandboxing ofthird-party JavaScript without browser modificationsrdquo in Pro-ceedings of the 28th Annual Computer Security ApplicationsConference (ACSAC rsquo12) pp 1ndash10 ACM December 2012

[21] C Seifert I Welch and P Komisarczuk ldquoHoneyc the low-interaction client honeypotrdquo in Proceedings of the New ZealandComputer Science Research Student Conference (NZCSRCS rsquo07)Hamilton New Zealand 2007

[22] N Jose ldquoPhoneyC a virtual client honeypotrdquo in Proceedingsof the 2nd USENIX Conference on Large-Scale Exploits andEmergentThreats Botnets SpywareWorms andMore USENIXAssociation Berkeley Calif USA April 2009

[23] Y-MWang D Beck X Jiang et al ldquoAutomatedweb patrol withstrider honeymonkeysrdquo in Proceedings of the 2006 Network andDistributed System Security Symposium February 2006

[24] The Honeynet Project Capture-HPC client honeypot 2008httpprojectshoneynetorgcapture-hpc

[25] L Lu V Yegneswaran P Porras and W Lee ldquoBlade an attack-agnostic approach for preventing drive-by malware infectionsrdquoin Proceedings of the 17th ACM Conference on Computer andCommunications Security (CCS rsquo10) pp 440ndash450 ACM Octo-ber 2010

[26] N Provos P Mavrommatis M A Rajab and F Monrose ldquoAllyour iFRAMEs point to usrdquo in Proceedings of the 17th USENIXSecurity Symposium pp 1ndash15 2008

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 2: Research Article Method for Detecting Core Malware Sites ...downloads.hindawi.com/journals/cmmm/2015/756842.pdf · # of nodes 1, Time complexity: ( ). B Eigenvector Centrality Index

2 Computational and Mathematical Methods in Medicine

Internet user

Internet user

Internet user

Landing siteDistribution site

Hopping site

Hopping site

Hopping site

Hopping site

Hopping site

Hopping site

Exploit site

Exploit site

Exploit site

Hopping section Exploit section

Distribution site

Distribution site

RedirectionWeb connection Malicious code dissemination

Figure 1 Definition of landing (or exploit)distribution sites including malicious code

with links to biomedical information systems and produce amalicious URL risk index (MRI) from this reference index

2 Background

Todetect the core-hubnode it is first necessary to understandthe entire framework of malicious code distribution andinfection through malicious websites It is also important tounderstand the typical methods of detecting such websitesand to appreciate certain risk estimation methods for thedetection of malicious sites

21 Malware Site Framework To estimate the risk index of amalware site we need to understand the dissemination routeFigure 1 illustrates the definition and operation principles ofthe malware site detection framework which is the basis forrisk index estimation

As shown in Figure 1 the victim (ie internet user) firstvisits the landing site connected with the distribution site andis then redirected to a hopping site or exploit site and finallydownloads themalicious codeThe internet user is eventuallyinfected by the malicious code and may be damaged byvarious secondary cyber-attacks (eg personal informationleaks system destruction and other host-derived attacks)

22WebCrawling-BasedMalicious Site Detection Most stud-ies on malware sites have mainly focused on detectionThese studies primarily apply a web crawling method thatrapidly collects the URL information of websites througha web crawler-based search engine [8 9] However theweb crawling technology used for malicious code collectionselects and collects the execution files or compressed files thatcontain the malicious code unlike the web crawling appliedby search engines

The web crawler considers URLs with file extensionsof exe or HTTP headers with ldquoapplicationoctet-streamrdquocontent types to be execution files and downloads them Thecrawler then inspects the headers of the downloaded files toconfirm whether they are execution files As execution filescompressed files and MS installation files are inspected anddownloaded in the same way

A number of web crawling-based automatic maliciouscode collection techniques have been proposed most ofwhich search websites via web crawling confirm whether thewebsites include malicious code and then downloadanalyzethe relevant content

3 Analysis of the Risk Index of BiomedicalInformation System-Related Malware Sites

We first propose a method for estimating the risk index ofbiomedical information system-targeted malware sites andestimate the ultimate risk index by analyzing the potentialthreat through a correlation analysis between the distributionsite and the other connected sites

The following sections describe our approach for predict-ing the risk index of the exploitlanding sites that redistributethe malicious code The risk of individual exploitlandingsites is calculated through this prediction

31 Vector-Based Risk Index Estimation Method We employa risk vector calculation to estimate the risk index [10] As aplanar vector is indicated by arranging two real numbers athree-dimensional vector is indicated by arranging three realnumbers in the rectangular coordinate system

Spatial rectangular coordinates are indicated by arrangingthree real numbers that are orthogonal to each other throughthe origin 119874

Computational and Mathematical Methods in Medicine 3

MRI

V1

V2

y (betweenness centrality)

z (degree centrality)

x (eigenvector centrality)

O

MRI998400

Figure 2 Entire analysis diagram for malicious code landing (orexploit)distribution site risk estimation

We fix the three coordinate axes 119909 119910 and 119911 set thepositive direction of the 119909 119910 and 119911 axes and then define thelength scale

As shown in Figure 2 three vectors (connectivity eigen-vector and betweenness) are used to estimate the risk indexof malicious code landing (or exploit)distribution sites andthe length is indicated by the vector sum [10] The purpose isto indicate different vector values as lengths and then quantifythe risk index through this

We thus determine which sites have the highest-riskindex and find the significance-based concentration degreeof the corresponding sites by analyzing the central structureof the exploitlandingdistribution sites within maliciouscode that is connected to medical information systemsTo interpret various meanings more objectively this paperrepresents a risk factor and estimates the ultimate risk indexby analyzing the connectivity [11ndash13] degree eigenvector andbetweenness of the distribution site and exploitlanding siteand vectorizing the calculated value We now define eachelement of the risk index for the detected malicious codeexploitlandingdistribution sites

(i) Degree Centrality Analysis of NodesThis is defined asthe number of links incident upon a nodeThe degreecan be interpreted in terms of the immediate risk ofa node catching whatever is flowing through the net-work (such as malware sites) In the case of a directednetwork (where ties have direction) we usually definetwo separate measures of degree centrality namelythe in-degree and out-degree centrality

(ii) Eigenvector Centrality Analysis of Nodes This mea-sures the influence of a node within a networkRelative scores are assigned to all nodes in thenetwork based on the concept that connections tohigh-scoring nodes contribute more to the score ofthe node in question than equal connections to low-scoring nodes

(iii) Betweenness Centrality Analysis of Nodes This is thenumber of shortest paths from all vertices to allothers that pass through that node A node withhigh betweenness centrality has a large influence onthe transfer of items through the network underthe assumption that the transfer of items follows theshortest path

32 Method to Estimate Malicious URL Risk Index (MRI) Toestimate the risk index of theURLof amalicious code exploitlandingdistribution site we follow the process in Figure 3

(1) Step 1 Node Characteristic Classification Landing (orexploit)distribution site information is classified bythe logs produced through the self-developed mali-cious code detection crawler and the detectionhistory is sorted by time from the unit logs of themali-cious code exploitlandingdistribution siteThe basicrisk is also estimated with the following log informa-tion

A Node Characteristic Whether the infected siteis an exploitlanding site or a distribution siteis confirmed If there is no link to the detectedmalicious code (ie the information on thefirst infected site) the site is defined as adistribution site If the URL of another site isexploiteddistributed the site is defined as anexploitlanding site

B Malicious Code ExploitLandingDistribution SiteInformationThis is theURLof the detectedmali-cious code exploitlandingdistribution siteTheexploitlanding site can be the distributionsite If the distribution site is eliminated by aself-developed or other detection system theexploitlanding site is rendered as the distribu-tion site and operated continuously as a mali-cious code distribution site

C IP Address Country Code amp Site SurvivabilityBasic information is collected through the IPaddress and the related server location and thecurrent operating status is investigated In par-ticular the survivability of the exploitlandingdistribution site is very important in estimatingthe risk index Although the site has beentreated or isolated and is no longer operated thepossibility of reinfection exists if the weak pointis exposed continuously Therefore this shouldbe reflected in the risk index estimation

(2) Step 2 Centrality Analysis of NodeThe following threeindices are applied to the centrality analysis of eachnode

(i) Degree Centrality Analysis(ii) Eigenvector Centrality Analysis(iii) Betweenness Centrality Analysis

4 Computational and Mathematical Methods in Medicine

Step 1Node

characteristicclassification

Eigenvectorcentrality analysis

of node

Betweenesscentrality analysis

of node

Degree centrality analysis of node

Step 2

1st order risk analysis

Step 3

2nd order risk analysis

Step 4

Distribution site risk analysis

Exploit site risk analysis

Weight value

calculation

Crawling DB

MRI estimationStep 5

Figure 3 Entire analysis diagram for risk estimation of malicious code exploitlandingdistribution site

A Degree Centrality Index (DCI)

(i) A node that has more directly connectedneighboring nodes has higher degree cen-trality The scale of direct effects is mea-sured

(ii) Degree centrality is calculated from thecomposition ratio of each node

DCI =sum (weight of incedent link)

of nodes minus 1

Time complexity 119874 (119899) (1)

B Eigenvector Centrality Index (ECI)

(i) Assume that the number of the linksincluded in node 119873119895 is 119897119895 If one of theselinks is connected to node119873119894 the probabil-ity that 119873119895 passes 119873119894 is 1119897119895 Therefore theultimate ECI is as follows

ECI = 119868 (119873119894) =sum 119868 (119873119895)

119897119895

(2)

C Betweenness Centrality Index (BCI)

(i) To measure the BCI measure the degreeto which a node is located on the shortestroute between nodes

(ii) The betweenness centrality of a node ishigher if the node connects more differentnode groups The BCI indicates the degreeto which a node functions as a bridge in theentire network

(iii) It is possible to find the intermediate URLthat links information between fields

(iv) Suppose that 119892119895119896 is the shortest possibleroute between nodes 119895 and 119896 in the networkand 119892119895119896(119899119894) is the shortest possible routebetween nodes 119895 and 119896 that includes node119894 The probability of the shortest route thatincludes node 119894 is 119892119895119896(119899119894)119892119895119896

BCI = 119862119861 (119899119894) =sum119895lt119896 119892119895119896 (119899119894)

119892119895119896

(3)

If the main target node is constructed asa child node of depth 1 the degree willbe increased However the BCI will bedecreased by (3)

(3) Step 3 1st Order Risk Analysis The 1st order risk isestimated by calculating the Euclidean distance of thenode analysis result from Step 2 The 1st order risk isthus estimated by the vector distance formula for thevalues calculated in Step 2

1199031 =radicDCI2 + ECI2 + BCI2 (4)

Computational and Mathematical Methods in Medicine 5

(4) Step 4 2nd Order Risk Analysis

A Distribution Site Risk Analysis The risk indexis estimated by considering the weights (over-lapped infection history and survival ratio)based on the 1st order risk analysis The dis-tribution site risk is calculated by the vector ofthe values calculated in Step 3 the overlappedinfection history (119868) of each distribution sitenode and the actual survival ratio (119878)lowast Survival Ratio (119878) is as follows whether treat-ment has been given after infection (based onone yearrsquos information)

Treatment Probability (1198781)

=Survival Cases

Survival Cases + Treatesd Cases

Failure Probability (1198782)

=Treated Cases

Survival Cases + Treatesd Cases

1199032 = 1199031 times 119868 times 1198781 (If the node has been treated)

1199032 = 1199031 times 119868 times 1198782 (If the node has not been treated)

(5)

B ExploitLanding Site Risk Analysis The riskindex is estimated by considering the weights(overlapped infection history and exposure fre-quency) from the 1st order risk analysis Theexploit site risk is calculated by the vectorof values calculated in Step 3 the overlappedinfection history (119868) of each distribution sitenode and the actual exposure frequency in asearch website (119864)

1199033 = 1199031 times (2 times 119868 times 119864

119868 + 119864) or 1199033 = 1199031 times 119868 (6)

(5) Step 5 Malicious URL Risk Index (MRI) The MRI isestimated from the 1st order risk analysis result andthe risk indexThe following formula can be deducedfrom the 1st order risk analysis result calculated inStep 3 and the risk index of each distributionexploitsite calculated in Step 4 considering the characteris-tics of the corresponding node

119903final = radic11990321 + 11990322 + 11990323

(7)

4 Experimental Results

We conducted experiments to examine the performance ofour zero-day detection method based on MCC

For these experiments we processed the detection logacquired by crawling biomedical information system-relatedmalware sites with the developedMCC in the log form statedin Step 1

The estimated risk values are intuitive in our proposedmodel That is our final interpretation is based on the

crawling result Additionally the crawling method uses ablacklist or known patterns Thus our proposed modelexhibits a low false positive rate

The MCC detection method proceeds as follows Theattacker (hacker) inserts malicious code into a specific web-page by operating a malicious code distribution server on theinternet or by hacking a vulnerable web serverThe clients (orusers) of the web server involuntarily use the exploitlandingdistribution site containing themalicious code and downloadthe malicious code Eventually the attacker collects the clientaccounts and various other information from the infectedserver and proceeds to act maliciously

The proposed system searchescrawls 25 million siteson a continuous basis detectsblocks the inserted maliciouscode and establishesoperates a malicious code blacklist

41 Analysis Results As a post hoc study based on the resultsof the MCC operation for a specific period our results sup-port decision making for proactive responses and follow-upmeasures enabling biomedical information system securityexperts or administrators to maximize their operationalefficiency

Figure 4 shows the MRI estimated through the 1st and2nd order risk analysis after the detection of malicious URLs

Table 1 lists the detected malicious code exploitlandingdistribution URLs (including both exploitlanding sites anddistribution sites) The risk index is a relative value If limitedto the range 0-1 the minimum risk would be fixed at 0 but itis hard to set a clear standard for the maximum risk

In this paper we use a relative risk index that fixes theminimum risk to 0 and indicates the high-risk core malwaresites through prioritization

42 Sensitivity Analysis Thedetection rate of actual zero-dayattacks can be measured using a sensitivity analysis based onthe results given in Table 1 Among the malware sites relatedto zero-day attacks occurring to biomedical informationsystems we analyze distribution sites and exploitlandingsites Table 2 shows the detection rate measurements basedon actual data produced in a specific time window

The results in Table 2 focus on the top five high-risk sitesThe multipath malware site group denotes the number ofexploitlanding sites actually connected with a distributionsite The percentage represents the average detection rate ina specific time window and this detection performance isbetter than in the pre-analysis stage

The average early detection rate of distribution sites andexploitlanding sites is also higher in this section than in thepreanalysis stage That is the proactive elimination of coremaliciouswebsites results in an average improvement in zero-day attack detection of more than 20

43 Visualization of Analysis Results The risk index of eachURL calculated in this paper can be analyzed by verifyingwhether the risk index agrees with the weak point of thecorresponding server

This section analyzes the actual weak point based onthe calculated risk index and verifies whether this indexagrees with the actual prioritization using an error analysis

6 Computational and Mathematical Methods in Medicine

1st highest-risk landing (or exploit) site1st highest-risk distribution site

Figure 4 Visualization of malware site risk

Table 1 MRI estimation result of exploitlandingdistribution sites

Node type URL MRI ReliabilityDistribution site http222lowast lowast lowastlowast lowast lowastlowast lowast lowastchhtml 03965 91Distribution site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcomNewindexhtml 03505 92Distribution site httpa1lowast lowast lowast lowast lowast lowast lowast lowast lowastcom1indexhtml 03058 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 03047 95Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 03026 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 03017 94Distribution site httpa2lowast lowast lowast lowast lowast lowast lowast lowast lowastcom2indexhtml 03009 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastrekr 03003 92Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 02993 92Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02991 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02983 91Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02982 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorg 02970 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 02969 96Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcom 02968 95Exploit site httplowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02967 95Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02966 96Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 02966 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 02962 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcom 02961 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02960 94(ldquolowast lowast lowast lowast lowast lowast lowast lowast lowastrdquo the URL information of malware site)

Computational and Mathematical Methods in Medicine 7

Table 2 Average detection rate of zero-day attacks for a given day

Priority of risk Malware site groupwith multipath

Distribution site withsingle path

Landing (or exploit) sitewith single path

1 233 (15) 215 2822 226 (9) 316 3243 147 (8) 228 1814 184 (10) 197 3235 212 (12) 242 176Average early detection rate 2004 2396 2572

technique Figure 4 visualizes the 1st highest-risk distributionsite according to the MRI

The detection and elimination of high-risk maliciouscode exploitlandingdistribution sites related to biomedicalinformation systems can be achieved by visualizing the 1sthighest-risk exploitlandingdistribution site as shown inFigure 4Thus our proposedmodel focuses on estimating therisk presented by target malware sites in the specific field ofbiomedical information

We verify the performance of the proposed model basedon static analysis However for military government orsimilar organizations we must dynamically filter out coremalware sites based on high-performance hardware plat-forms For this reason our method is a good example of asuitable defensive measure for APT attacks

5 Related Work

Methods for detecting and analyzingwebsites includingmali-cious code can generally be divided into static and dynamicanalysis

51 Static Analysis Static analysis mainly uses machinelearning and pattern matching to detect and classify mali-cious URLs

Ma et al [14 15] presented a classification model thatdetects spam and phishingURLsThismodel uses a statisticalmethod to classify URLs by considering the lexical and host-based properties of malicious URLs Although this methoddetects both spam and phishing URLs it cannot distinguishbetween the two

Another approach is to analyze the JavaScript code inweb pages to find the typical features of malicious codeThis is done either statically [16] or dynamically by loadingthe affected pages in an emulated browser [17] Systemssuch as Prophiler [18] consider both JavaScript and otherfeatures found in HTML and the URLs of malicious pagesWhittaker et al [19] proposed a phishing website classifier toautomatically update Googlersquos phishing blacklist They usedseveral features obtained from domain information and pagecontents

JSAND [20] used amachine learning approach to classifymalicious JavaScript

52 Dynamic Analysis Dynamic analysis analyzes theserverndashclient connection to detect and classify maliciousURLs

In other words dynamic analysis relies on visiting web-siteswith an instrumented browser (often referred to as a hon-eyclient) and monitoring the activities of the machine to findthe typical signatures of successful exploitations (eg the cre-ation of a new process) [21] PhoneyC [22] uses a signature-based low-interaction honeypot to detect malicious websites

Systems such as [23 24] execute web content dynamicallyand capture drive-by downloads based on either signatures oranomaly detection while Blade [25] leverages user behaviormodels for drive-by download detection All of these systemsexhibit good detection results However it is usually costlyto follow the full redirection path and monitor each scriptexecution in real time Moreover their accuracy is highlydependent on the malicious response of the webpage tovulnerable components

Provos et al [26] analyzed the maliciousness of a largecollection of web pages using a machine learning algorithmas a prefilter for VM-based analysis They adopted content-based features including the presence of obfuscated JavaScriptand exploit site-pointing iframes

The main differences between the models proposed inthis paper and previous approaches are as follows

(i) The model proposed in this paper applies a staticmethod to analyze the connectivity between nodesand detects the core-hub node dynamically based onthe risk index

(ii) The proposed model detects and blocks the core-hubnode using link data from the high-risk maliciouswebsites as observed for a specific period of time

(iii) The proposed model prevents the dissemination ofmalicious websites in the early stages by blocking thelink between the core malicious code distribution siteand the exploitlanding site

6 Conclusion

In this paper the 1st order risk of malware infection wasanalyzed using log information estimated by an MCC thatconsiders the DCI BCI and ECI of the main nodes based onthe priority of risk This provides a quantitative value of thepotential risk inherent in the corresponding site (node)

In addition the risk index of exploit sites and distribu-tion sites was calculated by considering their weights Theoverlapped infection history and survival ratio were used toestimate the risk of distribution sites whereas the overlapped

8 Computational and Mathematical Methods in Medicine

infection history and exposure frequency were consideredwhen estimating the risk of exploit sites Finally the MRI wasestimated using the 1st order risk analysis and the risk indexof the distribution sites and exploit sites

In future work we will develop a feature model thatpredicts the seriousness of website security problems by data-mining the logs produced frommalicious code detection andvulnerability scanning tools

As this feature model will be used to predict the risk ofa specific website it should contribute to establish an activemalicious code distribution blocking system that realizesproactive responses beyond the limit of reactive responsesthat rely only on traditional malicious code detection tools

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] J A Hansen and N M Hansen ldquoA taxonomy of vulnerabilitiesin implantable medical devicesrdquo in Proceedings of the 2ndAnnualWorkshop on Security and Privacy inMedical andHome-Care Systems (SPIMACS rsquo10) pp 13ndash20 October 2010

[2] C-S Park ldquoSecurity mechanism based on hospital authen-tication server for secure application of implantable medicaldevicesrdquo BioMed Research International vol 2014 Article ID543051 12 pages 2014

[3] E Hutchins M Cloppert and R Amin ldquoIntelligence-drivencomputer network defense informed by analysis of adversarycampaigns and intrusion kill chainsrdquo in Proceedings of the 6thInternational Conference on Information Warfare and Security(ICIW rsquo11) pp 113ndash125 Academic Conferences March 2011

[4] N Moran ldquoUnderstanding Advanced Persistent ThreatsmdashACase Studyrdquo 2010 httpswwwusenixorgsystemfilesloginarticles105484-Moranpdf

[5] S-J Kim D-E Cho and S-S Yeo ldquoSecure model againstAPT in m-connected SCADA networkrdquo International Journalof Distributed Sensor Networks vol 2014 Article ID 594652 8pages 2014

[6] N Provos P Mavrommatis M Abu Rajab and F MonroseldquoAll your iframes points to usrdquo in Proceedings of the USENIXSecurity 2008

[7] S Lee and J Kim ldquoWARNINGBIRD detecting suspiciousURLsin twitter streamrdquo in Proceedings of the Symposium on Networkand Distributed System Security (NDSS rsquo12) 2012

[8] X Sun Y Wang J Ren Y Zhu and S Liu ldquoCollectinginternet malware based on client-side honeypotrdquo in Proceedingsof the 9th International Conference for YoungComputer Scientists(ICYCS rsquo08) pp 1493ndash1498 Hunan China November 2008

[9] Y-C Cho and J-Y Pan ldquoMultiple-feature extracting modulesbased leak mining system designrdquoThe Scientific World Journalvol 2013 Article ID 704865 11 pages 2013

[10] D H Kim Y-G Kim H P In and H C Jeong ldquoA method forrisk measurement of botnetrsquos malicious activitiesrdquo InformationJournal vol 17 no 1 pp 165ndash180 2014

[11] C Ni C Sugimoto and J Jiang ldquoDegree closeness andbetweenness application of group centrality measurements toexplore macro-disciplinary evolution diachronicallyrdquo in Pro-ceedings of the ISSI pp 1ndash13 Durban South Africa 2011

[12] F Barzinpour B Hoda Ali-Ahmadi S Alizadeh and S G JalaliNaini ldquoClustering networksrsquo heterogeneous data in defining acomprehensive closeness centrality indexrdquoMathematical Prob-lems in Engineering vol 2014 Article ID 202350 10 pages 2014

[13] S K Raghavan Unnithan B Kannan and M JathavedanldquoBetweenness centrality in Some classes of graphsrdquo Interna-tional Journal of Combinatorics vol 2014 Article ID 241723 12pages 2014

[14] J Ma L K Saul S Savage and G M Voelker ldquoBeyondblacklists learning to detectmaliciousweb sites from suspiciousURLsrdquo in Proceedings of the 15th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo09) pp 1245ndash1253 July 2009

[15] J Ma L K Saul S Savage and G M Voelker ldquoIdentifyingsuspicious URLs an application of large-scale online learningrdquoin Proceedings of the 26th Annual International Conference onMachine Learning (ICML rsquo09) pp 681ndash688 2009

[16] C Curtsinger B Livshits B Zorn and C Seifert ldquoZozzlelow-overhead mostly static javascript malware detectionrdquo inProceedings of the USENIX Security Symposium 2011

[17] M Cova C Kruegel and G Vigna ldquoDetection and analysis ofdrive-by-download attacks and malicious JavaScript coderdquo inProceedings of the 19th International World Wide Web Confer-ence (WWW rsquo10) pp 281ndash290 April 2010

[18] D Canali M Cova G Vigna and C Kruegel ldquoProphiler a fastfilter for the large-scale detection of malicious web pagesrdquo inProceedings of the 20th International Conference on World WideWeb (WWW rsquo11) pp 197ndash206 2011

[19] C Whittaker B Ryner and M Nazif ldquoLarge-scale automaticclassification of phishing pagesrdquo in Proceedings of the Sympo-sium on Network and Distributed System Security (NDSS rsquo10)2010

[20] P Agten S van Acker Y Brondsema P H Phung L Desmetand F Piessens ldquoJSand complete client-side sandboxing ofthird-party JavaScript without browser modificationsrdquo in Pro-ceedings of the 28th Annual Computer Security ApplicationsConference (ACSAC rsquo12) pp 1ndash10 ACM December 2012

[21] C Seifert I Welch and P Komisarczuk ldquoHoneyc the low-interaction client honeypotrdquo in Proceedings of the New ZealandComputer Science Research Student Conference (NZCSRCS rsquo07)Hamilton New Zealand 2007

[22] N Jose ldquoPhoneyC a virtual client honeypotrdquo in Proceedingsof the 2nd USENIX Conference on Large-Scale Exploits andEmergentThreats Botnets SpywareWorms andMore USENIXAssociation Berkeley Calif USA April 2009

[23] Y-MWang D Beck X Jiang et al ldquoAutomatedweb patrol withstrider honeymonkeysrdquo in Proceedings of the 2006 Network andDistributed System Security Symposium February 2006

[24] The Honeynet Project Capture-HPC client honeypot 2008httpprojectshoneynetorgcapture-hpc

[25] L Lu V Yegneswaran P Porras and W Lee ldquoBlade an attack-agnostic approach for preventing drive-by malware infectionsrdquoin Proceedings of the 17th ACM Conference on Computer andCommunications Security (CCS rsquo10) pp 440ndash450 ACM Octo-ber 2010

[26] N Provos P Mavrommatis M A Rajab and F Monrose ldquoAllyour iFRAMEs point to usrdquo in Proceedings of the 17th USENIXSecurity Symposium pp 1ndash15 2008

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 3: Research Article Method for Detecting Core Malware Sites ...downloads.hindawi.com/journals/cmmm/2015/756842.pdf · # of nodes 1, Time complexity: ( ). B Eigenvector Centrality Index

Computational and Mathematical Methods in Medicine 3

MRI

V1

V2

y (betweenness centrality)

z (degree centrality)

x (eigenvector centrality)

O

MRI998400

Figure 2 Entire analysis diagram for malicious code landing (orexploit)distribution site risk estimation

We fix the three coordinate axes 119909 119910 and 119911 set thepositive direction of the 119909 119910 and 119911 axes and then define thelength scale

As shown in Figure 2 three vectors (connectivity eigen-vector and betweenness) are used to estimate the risk indexof malicious code landing (or exploit)distribution sites andthe length is indicated by the vector sum [10] The purpose isto indicate different vector values as lengths and then quantifythe risk index through this

We thus determine which sites have the highest-riskindex and find the significance-based concentration degreeof the corresponding sites by analyzing the central structureof the exploitlandingdistribution sites within maliciouscode that is connected to medical information systemsTo interpret various meanings more objectively this paperrepresents a risk factor and estimates the ultimate risk indexby analyzing the connectivity [11ndash13] degree eigenvector andbetweenness of the distribution site and exploitlanding siteand vectorizing the calculated value We now define eachelement of the risk index for the detected malicious codeexploitlandingdistribution sites

(i) Degree Centrality Analysis of NodesThis is defined asthe number of links incident upon a nodeThe degreecan be interpreted in terms of the immediate risk ofa node catching whatever is flowing through the net-work (such as malware sites) In the case of a directednetwork (where ties have direction) we usually definetwo separate measures of degree centrality namelythe in-degree and out-degree centrality

(ii) Eigenvector Centrality Analysis of Nodes This mea-sures the influence of a node within a networkRelative scores are assigned to all nodes in thenetwork based on the concept that connections tohigh-scoring nodes contribute more to the score ofthe node in question than equal connections to low-scoring nodes

(iii) Betweenness Centrality Analysis of Nodes This is thenumber of shortest paths from all vertices to allothers that pass through that node A node withhigh betweenness centrality has a large influence onthe transfer of items through the network underthe assumption that the transfer of items follows theshortest path

32 Method to Estimate Malicious URL Risk Index (MRI) Toestimate the risk index of theURLof amalicious code exploitlandingdistribution site we follow the process in Figure 3

(1) Step 1 Node Characteristic Classification Landing (orexploit)distribution site information is classified bythe logs produced through the self-developed mali-cious code detection crawler and the detectionhistory is sorted by time from the unit logs of themali-cious code exploitlandingdistribution siteThe basicrisk is also estimated with the following log informa-tion

A Node Characteristic Whether the infected siteis an exploitlanding site or a distribution siteis confirmed If there is no link to the detectedmalicious code (ie the information on thefirst infected site) the site is defined as adistribution site If the URL of another site isexploiteddistributed the site is defined as anexploitlanding site

B Malicious Code ExploitLandingDistribution SiteInformationThis is theURLof the detectedmali-cious code exploitlandingdistribution siteTheexploitlanding site can be the distributionsite If the distribution site is eliminated by aself-developed or other detection system theexploitlanding site is rendered as the distribu-tion site and operated continuously as a mali-cious code distribution site

C IP Address Country Code amp Site SurvivabilityBasic information is collected through the IPaddress and the related server location and thecurrent operating status is investigated In par-ticular the survivability of the exploitlandingdistribution site is very important in estimatingthe risk index Although the site has beentreated or isolated and is no longer operated thepossibility of reinfection exists if the weak pointis exposed continuously Therefore this shouldbe reflected in the risk index estimation

(2) Step 2 Centrality Analysis of NodeThe following threeindices are applied to the centrality analysis of eachnode

(i) Degree Centrality Analysis(ii) Eigenvector Centrality Analysis(iii) Betweenness Centrality Analysis

4 Computational and Mathematical Methods in Medicine

Step 1Node

characteristicclassification

Eigenvectorcentrality analysis

of node

Betweenesscentrality analysis

of node

Degree centrality analysis of node

Step 2

1st order risk analysis

Step 3

2nd order risk analysis

Step 4

Distribution site risk analysis

Exploit site risk analysis

Weight value

calculation

Crawling DB

MRI estimationStep 5

Figure 3 Entire analysis diagram for risk estimation of malicious code exploitlandingdistribution site

A Degree Centrality Index (DCI)

(i) A node that has more directly connectedneighboring nodes has higher degree cen-trality The scale of direct effects is mea-sured

(ii) Degree centrality is calculated from thecomposition ratio of each node

DCI =sum (weight of incedent link)

of nodes minus 1

Time complexity 119874 (119899) (1)

B Eigenvector Centrality Index (ECI)

(i) Assume that the number of the linksincluded in node 119873119895 is 119897119895 If one of theselinks is connected to node119873119894 the probabil-ity that 119873119895 passes 119873119894 is 1119897119895 Therefore theultimate ECI is as follows

ECI = 119868 (119873119894) =sum 119868 (119873119895)

119897119895

(2)

C Betweenness Centrality Index (BCI)

(i) To measure the BCI measure the degreeto which a node is located on the shortestroute between nodes

(ii) The betweenness centrality of a node ishigher if the node connects more differentnode groups The BCI indicates the degreeto which a node functions as a bridge in theentire network

(iii) It is possible to find the intermediate URLthat links information between fields

(iv) Suppose that 119892119895119896 is the shortest possibleroute between nodes 119895 and 119896 in the networkand 119892119895119896(119899119894) is the shortest possible routebetween nodes 119895 and 119896 that includes node119894 The probability of the shortest route thatincludes node 119894 is 119892119895119896(119899119894)119892119895119896

BCI = 119862119861 (119899119894) =sum119895lt119896 119892119895119896 (119899119894)

119892119895119896

(3)

If the main target node is constructed asa child node of depth 1 the degree willbe increased However the BCI will bedecreased by (3)

(3) Step 3 1st Order Risk Analysis The 1st order risk isestimated by calculating the Euclidean distance of thenode analysis result from Step 2 The 1st order risk isthus estimated by the vector distance formula for thevalues calculated in Step 2

1199031 =radicDCI2 + ECI2 + BCI2 (4)

Computational and Mathematical Methods in Medicine 5

(4) Step 4 2nd Order Risk Analysis

A Distribution Site Risk Analysis The risk indexis estimated by considering the weights (over-lapped infection history and survival ratio)based on the 1st order risk analysis The dis-tribution site risk is calculated by the vector ofthe values calculated in Step 3 the overlappedinfection history (119868) of each distribution sitenode and the actual survival ratio (119878)lowast Survival Ratio (119878) is as follows whether treat-ment has been given after infection (based onone yearrsquos information)

Treatment Probability (1198781)

=Survival Cases

Survival Cases + Treatesd Cases

Failure Probability (1198782)

=Treated Cases

Survival Cases + Treatesd Cases

1199032 = 1199031 times 119868 times 1198781 (If the node has been treated)

1199032 = 1199031 times 119868 times 1198782 (If the node has not been treated)

(5)

B ExploitLanding Site Risk Analysis The riskindex is estimated by considering the weights(overlapped infection history and exposure fre-quency) from the 1st order risk analysis Theexploit site risk is calculated by the vectorof values calculated in Step 3 the overlappedinfection history (119868) of each distribution sitenode and the actual exposure frequency in asearch website (119864)

1199033 = 1199031 times (2 times 119868 times 119864

119868 + 119864) or 1199033 = 1199031 times 119868 (6)

(5) Step 5 Malicious URL Risk Index (MRI) The MRI isestimated from the 1st order risk analysis result andthe risk indexThe following formula can be deducedfrom the 1st order risk analysis result calculated inStep 3 and the risk index of each distributionexploitsite calculated in Step 4 considering the characteris-tics of the corresponding node

119903final = radic11990321 + 11990322 + 11990323

(7)

4 Experimental Results

We conducted experiments to examine the performance ofour zero-day detection method based on MCC

For these experiments we processed the detection logacquired by crawling biomedical information system-relatedmalware sites with the developedMCC in the log form statedin Step 1

The estimated risk values are intuitive in our proposedmodel That is our final interpretation is based on the

crawling result Additionally the crawling method uses ablacklist or known patterns Thus our proposed modelexhibits a low false positive rate

The MCC detection method proceeds as follows Theattacker (hacker) inserts malicious code into a specific web-page by operating a malicious code distribution server on theinternet or by hacking a vulnerable web serverThe clients (orusers) of the web server involuntarily use the exploitlandingdistribution site containing themalicious code and downloadthe malicious code Eventually the attacker collects the clientaccounts and various other information from the infectedserver and proceeds to act maliciously

The proposed system searchescrawls 25 million siteson a continuous basis detectsblocks the inserted maliciouscode and establishesoperates a malicious code blacklist

41 Analysis Results As a post hoc study based on the resultsof the MCC operation for a specific period our results sup-port decision making for proactive responses and follow-upmeasures enabling biomedical information system securityexperts or administrators to maximize their operationalefficiency

Figure 4 shows the MRI estimated through the 1st and2nd order risk analysis after the detection of malicious URLs

Table 1 lists the detected malicious code exploitlandingdistribution URLs (including both exploitlanding sites anddistribution sites) The risk index is a relative value If limitedto the range 0-1 the minimum risk would be fixed at 0 but itis hard to set a clear standard for the maximum risk

In this paper we use a relative risk index that fixes theminimum risk to 0 and indicates the high-risk core malwaresites through prioritization

42 Sensitivity Analysis Thedetection rate of actual zero-dayattacks can be measured using a sensitivity analysis based onthe results given in Table 1 Among the malware sites relatedto zero-day attacks occurring to biomedical informationsystems we analyze distribution sites and exploitlandingsites Table 2 shows the detection rate measurements basedon actual data produced in a specific time window

The results in Table 2 focus on the top five high-risk sitesThe multipath malware site group denotes the number ofexploitlanding sites actually connected with a distributionsite The percentage represents the average detection rate ina specific time window and this detection performance isbetter than in the pre-analysis stage

The average early detection rate of distribution sites andexploitlanding sites is also higher in this section than in thepreanalysis stage That is the proactive elimination of coremaliciouswebsites results in an average improvement in zero-day attack detection of more than 20

43 Visualization of Analysis Results The risk index of eachURL calculated in this paper can be analyzed by verifyingwhether the risk index agrees with the weak point of thecorresponding server

This section analyzes the actual weak point based onthe calculated risk index and verifies whether this indexagrees with the actual prioritization using an error analysis

6 Computational and Mathematical Methods in Medicine

1st highest-risk landing (or exploit) site1st highest-risk distribution site

Figure 4 Visualization of malware site risk

Table 1 MRI estimation result of exploitlandingdistribution sites

Node type URL MRI ReliabilityDistribution site http222lowast lowast lowastlowast lowast lowastlowast lowast lowastchhtml 03965 91Distribution site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcomNewindexhtml 03505 92Distribution site httpa1lowast lowast lowast lowast lowast lowast lowast lowast lowastcom1indexhtml 03058 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 03047 95Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 03026 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 03017 94Distribution site httpa2lowast lowast lowast lowast lowast lowast lowast lowast lowastcom2indexhtml 03009 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastrekr 03003 92Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 02993 92Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02991 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02983 91Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02982 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorg 02970 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 02969 96Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcom 02968 95Exploit site httplowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02967 95Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02966 96Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 02966 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 02962 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcom 02961 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02960 94(ldquolowast lowast lowast lowast lowast lowast lowast lowast lowastrdquo the URL information of malware site)

Computational and Mathematical Methods in Medicine 7

Table 2 Average detection rate of zero-day attacks for a given day

Priority of risk Malware site groupwith multipath

Distribution site withsingle path

Landing (or exploit) sitewith single path

1 233 (15) 215 2822 226 (9) 316 3243 147 (8) 228 1814 184 (10) 197 3235 212 (12) 242 176Average early detection rate 2004 2396 2572

technique Figure 4 visualizes the 1st highest-risk distributionsite according to the MRI

The detection and elimination of high-risk maliciouscode exploitlandingdistribution sites related to biomedicalinformation systems can be achieved by visualizing the 1sthighest-risk exploitlandingdistribution site as shown inFigure 4Thus our proposedmodel focuses on estimating therisk presented by target malware sites in the specific field ofbiomedical information

We verify the performance of the proposed model basedon static analysis However for military government orsimilar organizations we must dynamically filter out coremalware sites based on high-performance hardware plat-forms For this reason our method is a good example of asuitable defensive measure for APT attacks

5 Related Work

Methods for detecting and analyzingwebsites includingmali-cious code can generally be divided into static and dynamicanalysis

51 Static Analysis Static analysis mainly uses machinelearning and pattern matching to detect and classify mali-cious URLs

Ma et al [14 15] presented a classification model thatdetects spam and phishingURLsThismodel uses a statisticalmethod to classify URLs by considering the lexical and host-based properties of malicious URLs Although this methoddetects both spam and phishing URLs it cannot distinguishbetween the two

Another approach is to analyze the JavaScript code inweb pages to find the typical features of malicious codeThis is done either statically [16] or dynamically by loadingthe affected pages in an emulated browser [17] Systemssuch as Prophiler [18] consider both JavaScript and otherfeatures found in HTML and the URLs of malicious pagesWhittaker et al [19] proposed a phishing website classifier toautomatically update Googlersquos phishing blacklist They usedseveral features obtained from domain information and pagecontents

JSAND [20] used amachine learning approach to classifymalicious JavaScript

52 Dynamic Analysis Dynamic analysis analyzes theserverndashclient connection to detect and classify maliciousURLs

In other words dynamic analysis relies on visiting web-siteswith an instrumented browser (often referred to as a hon-eyclient) and monitoring the activities of the machine to findthe typical signatures of successful exploitations (eg the cre-ation of a new process) [21] PhoneyC [22] uses a signature-based low-interaction honeypot to detect malicious websites

Systems such as [23 24] execute web content dynamicallyand capture drive-by downloads based on either signatures oranomaly detection while Blade [25] leverages user behaviormodels for drive-by download detection All of these systemsexhibit good detection results However it is usually costlyto follow the full redirection path and monitor each scriptexecution in real time Moreover their accuracy is highlydependent on the malicious response of the webpage tovulnerable components

Provos et al [26] analyzed the maliciousness of a largecollection of web pages using a machine learning algorithmas a prefilter for VM-based analysis They adopted content-based features including the presence of obfuscated JavaScriptand exploit site-pointing iframes

The main differences between the models proposed inthis paper and previous approaches are as follows

(i) The model proposed in this paper applies a staticmethod to analyze the connectivity between nodesand detects the core-hub node dynamically based onthe risk index

(ii) The proposed model detects and blocks the core-hubnode using link data from the high-risk maliciouswebsites as observed for a specific period of time

(iii) The proposed model prevents the dissemination ofmalicious websites in the early stages by blocking thelink between the core malicious code distribution siteand the exploitlanding site

6 Conclusion

In this paper the 1st order risk of malware infection wasanalyzed using log information estimated by an MCC thatconsiders the DCI BCI and ECI of the main nodes based onthe priority of risk This provides a quantitative value of thepotential risk inherent in the corresponding site (node)

In addition the risk index of exploit sites and distribu-tion sites was calculated by considering their weights Theoverlapped infection history and survival ratio were used toestimate the risk of distribution sites whereas the overlapped

8 Computational and Mathematical Methods in Medicine

infection history and exposure frequency were consideredwhen estimating the risk of exploit sites Finally the MRI wasestimated using the 1st order risk analysis and the risk indexof the distribution sites and exploit sites

In future work we will develop a feature model thatpredicts the seriousness of website security problems by data-mining the logs produced frommalicious code detection andvulnerability scanning tools

As this feature model will be used to predict the risk ofa specific website it should contribute to establish an activemalicious code distribution blocking system that realizesproactive responses beyond the limit of reactive responsesthat rely only on traditional malicious code detection tools

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] J A Hansen and N M Hansen ldquoA taxonomy of vulnerabilitiesin implantable medical devicesrdquo in Proceedings of the 2ndAnnualWorkshop on Security and Privacy inMedical andHome-Care Systems (SPIMACS rsquo10) pp 13ndash20 October 2010

[2] C-S Park ldquoSecurity mechanism based on hospital authen-tication server for secure application of implantable medicaldevicesrdquo BioMed Research International vol 2014 Article ID543051 12 pages 2014

[3] E Hutchins M Cloppert and R Amin ldquoIntelligence-drivencomputer network defense informed by analysis of adversarycampaigns and intrusion kill chainsrdquo in Proceedings of the 6thInternational Conference on Information Warfare and Security(ICIW rsquo11) pp 113ndash125 Academic Conferences March 2011

[4] N Moran ldquoUnderstanding Advanced Persistent ThreatsmdashACase Studyrdquo 2010 httpswwwusenixorgsystemfilesloginarticles105484-Moranpdf

[5] S-J Kim D-E Cho and S-S Yeo ldquoSecure model againstAPT in m-connected SCADA networkrdquo International Journalof Distributed Sensor Networks vol 2014 Article ID 594652 8pages 2014

[6] N Provos P Mavrommatis M Abu Rajab and F MonroseldquoAll your iframes points to usrdquo in Proceedings of the USENIXSecurity 2008

[7] S Lee and J Kim ldquoWARNINGBIRD detecting suspiciousURLsin twitter streamrdquo in Proceedings of the Symposium on Networkand Distributed System Security (NDSS rsquo12) 2012

[8] X Sun Y Wang J Ren Y Zhu and S Liu ldquoCollectinginternet malware based on client-side honeypotrdquo in Proceedingsof the 9th International Conference for YoungComputer Scientists(ICYCS rsquo08) pp 1493ndash1498 Hunan China November 2008

[9] Y-C Cho and J-Y Pan ldquoMultiple-feature extracting modulesbased leak mining system designrdquoThe Scientific World Journalvol 2013 Article ID 704865 11 pages 2013

[10] D H Kim Y-G Kim H P In and H C Jeong ldquoA method forrisk measurement of botnetrsquos malicious activitiesrdquo InformationJournal vol 17 no 1 pp 165ndash180 2014

[11] C Ni C Sugimoto and J Jiang ldquoDegree closeness andbetweenness application of group centrality measurements toexplore macro-disciplinary evolution diachronicallyrdquo in Pro-ceedings of the ISSI pp 1ndash13 Durban South Africa 2011

[12] F Barzinpour B Hoda Ali-Ahmadi S Alizadeh and S G JalaliNaini ldquoClustering networksrsquo heterogeneous data in defining acomprehensive closeness centrality indexrdquoMathematical Prob-lems in Engineering vol 2014 Article ID 202350 10 pages 2014

[13] S K Raghavan Unnithan B Kannan and M JathavedanldquoBetweenness centrality in Some classes of graphsrdquo Interna-tional Journal of Combinatorics vol 2014 Article ID 241723 12pages 2014

[14] J Ma L K Saul S Savage and G M Voelker ldquoBeyondblacklists learning to detectmaliciousweb sites from suspiciousURLsrdquo in Proceedings of the 15th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo09) pp 1245ndash1253 July 2009

[15] J Ma L K Saul S Savage and G M Voelker ldquoIdentifyingsuspicious URLs an application of large-scale online learningrdquoin Proceedings of the 26th Annual International Conference onMachine Learning (ICML rsquo09) pp 681ndash688 2009

[16] C Curtsinger B Livshits B Zorn and C Seifert ldquoZozzlelow-overhead mostly static javascript malware detectionrdquo inProceedings of the USENIX Security Symposium 2011

[17] M Cova C Kruegel and G Vigna ldquoDetection and analysis ofdrive-by-download attacks and malicious JavaScript coderdquo inProceedings of the 19th International World Wide Web Confer-ence (WWW rsquo10) pp 281ndash290 April 2010

[18] D Canali M Cova G Vigna and C Kruegel ldquoProphiler a fastfilter for the large-scale detection of malicious web pagesrdquo inProceedings of the 20th International Conference on World WideWeb (WWW rsquo11) pp 197ndash206 2011

[19] C Whittaker B Ryner and M Nazif ldquoLarge-scale automaticclassification of phishing pagesrdquo in Proceedings of the Sympo-sium on Network and Distributed System Security (NDSS rsquo10)2010

[20] P Agten S van Acker Y Brondsema P H Phung L Desmetand F Piessens ldquoJSand complete client-side sandboxing ofthird-party JavaScript without browser modificationsrdquo in Pro-ceedings of the 28th Annual Computer Security ApplicationsConference (ACSAC rsquo12) pp 1ndash10 ACM December 2012

[21] C Seifert I Welch and P Komisarczuk ldquoHoneyc the low-interaction client honeypotrdquo in Proceedings of the New ZealandComputer Science Research Student Conference (NZCSRCS rsquo07)Hamilton New Zealand 2007

[22] N Jose ldquoPhoneyC a virtual client honeypotrdquo in Proceedingsof the 2nd USENIX Conference on Large-Scale Exploits andEmergentThreats Botnets SpywareWorms andMore USENIXAssociation Berkeley Calif USA April 2009

[23] Y-MWang D Beck X Jiang et al ldquoAutomatedweb patrol withstrider honeymonkeysrdquo in Proceedings of the 2006 Network andDistributed System Security Symposium February 2006

[24] The Honeynet Project Capture-HPC client honeypot 2008httpprojectshoneynetorgcapture-hpc

[25] L Lu V Yegneswaran P Porras and W Lee ldquoBlade an attack-agnostic approach for preventing drive-by malware infectionsrdquoin Proceedings of the 17th ACM Conference on Computer andCommunications Security (CCS rsquo10) pp 440ndash450 ACM Octo-ber 2010

[26] N Provos P Mavrommatis M A Rajab and F Monrose ldquoAllyour iFRAMEs point to usrdquo in Proceedings of the 17th USENIXSecurity Symposium pp 1ndash15 2008

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 4: Research Article Method for Detecting Core Malware Sites ...downloads.hindawi.com/journals/cmmm/2015/756842.pdf · # of nodes 1, Time complexity: ( ). B Eigenvector Centrality Index

4 Computational and Mathematical Methods in Medicine

Step 1Node

characteristicclassification

Eigenvectorcentrality analysis

of node

Betweenesscentrality analysis

of node

Degree centrality analysis of node

Step 2

1st order risk analysis

Step 3

2nd order risk analysis

Step 4

Distribution site risk analysis

Exploit site risk analysis

Weight value

calculation

Crawling DB

MRI estimationStep 5

Figure 3 Entire analysis diagram for risk estimation of malicious code exploitlandingdistribution site

A Degree Centrality Index (DCI)

(i) A node that has more directly connectedneighboring nodes has higher degree cen-trality The scale of direct effects is mea-sured

(ii) Degree centrality is calculated from thecomposition ratio of each node

DCI =sum (weight of incedent link)

of nodes minus 1

Time complexity 119874 (119899) (1)

B Eigenvector Centrality Index (ECI)

(i) Assume that the number of the linksincluded in node 119873119895 is 119897119895 If one of theselinks is connected to node119873119894 the probabil-ity that 119873119895 passes 119873119894 is 1119897119895 Therefore theultimate ECI is as follows

ECI = 119868 (119873119894) =sum 119868 (119873119895)

119897119895

(2)

C Betweenness Centrality Index (BCI)

(i) To measure the BCI measure the degreeto which a node is located on the shortestroute between nodes

(ii) The betweenness centrality of a node ishigher if the node connects more differentnode groups The BCI indicates the degreeto which a node functions as a bridge in theentire network

(iii) It is possible to find the intermediate URLthat links information between fields

(iv) Suppose that 119892119895119896 is the shortest possibleroute between nodes 119895 and 119896 in the networkand 119892119895119896(119899119894) is the shortest possible routebetween nodes 119895 and 119896 that includes node119894 The probability of the shortest route thatincludes node 119894 is 119892119895119896(119899119894)119892119895119896

BCI = 119862119861 (119899119894) =sum119895lt119896 119892119895119896 (119899119894)

119892119895119896

(3)

If the main target node is constructed asa child node of depth 1 the degree willbe increased However the BCI will bedecreased by (3)

(3) Step 3 1st Order Risk Analysis The 1st order risk isestimated by calculating the Euclidean distance of thenode analysis result from Step 2 The 1st order risk isthus estimated by the vector distance formula for thevalues calculated in Step 2

1199031 =radicDCI2 + ECI2 + BCI2 (4)

Computational and Mathematical Methods in Medicine 5

(4) Step 4 2nd Order Risk Analysis

A Distribution Site Risk Analysis The risk indexis estimated by considering the weights (over-lapped infection history and survival ratio)based on the 1st order risk analysis The dis-tribution site risk is calculated by the vector ofthe values calculated in Step 3 the overlappedinfection history (119868) of each distribution sitenode and the actual survival ratio (119878)lowast Survival Ratio (119878) is as follows whether treat-ment has been given after infection (based onone yearrsquos information)

Treatment Probability (1198781)

=Survival Cases

Survival Cases + Treatesd Cases

Failure Probability (1198782)

=Treated Cases

Survival Cases + Treatesd Cases

1199032 = 1199031 times 119868 times 1198781 (If the node has been treated)

1199032 = 1199031 times 119868 times 1198782 (If the node has not been treated)

(5)

B ExploitLanding Site Risk Analysis The riskindex is estimated by considering the weights(overlapped infection history and exposure fre-quency) from the 1st order risk analysis Theexploit site risk is calculated by the vectorof values calculated in Step 3 the overlappedinfection history (119868) of each distribution sitenode and the actual exposure frequency in asearch website (119864)

1199033 = 1199031 times (2 times 119868 times 119864

119868 + 119864) or 1199033 = 1199031 times 119868 (6)

(5) Step 5 Malicious URL Risk Index (MRI) The MRI isestimated from the 1st order risk analysis result andthe risk indexThe following formula can be deducedfrom the 1st order risk analysis result calculated inStep 3 and the risk index of each distributionexploitsite calculated in Step 4 considering the characteris-tics of the corresponding node

119903final = radic11990321 + 11990322 + 11990323

(7)

4 Experimental Results

We conducted experiments to examine the performance ofour zero-day detection method based on MCC

For these experiments we processed the detection logacquired by crawling biomedical information system-relatedmalware sites with the developedMCC in the log form statedin Step 1

The estimated risk values are intuitive in our proposedmodel That is our final interpretation is based on the

crawling result Additionally the crawling method uses ablacklist or known patterns Thus our proposed modelexhibits a low false positive rate

The MCC detection method proceeds as follows Theattacker (hacker) inserts malicious code into a specific web-page by operating a malicious code distribution server on theinternet or by hacking a vulnerable web serverThe clients (orusers) of the web server involuntarily use the exploitlandingdistribution site containing themalicious code and downloadthe malicious code Eventually the attacker collects the clientaccounts and various other information from the infectedserver and proceeds to act maliciously

The proposed system searchescrawls 25 million siteson a continuous basis detectsblocks the inserted maliciouscode and establishesoperates a malicious code blacklist

41 Analysis Results As a post hoc study based on the resultsof the MCC operation for a specific period our results sup-port decision making for proactive responses and follow-upmeasures enabling biomedical information system securityexperts or administrators to maximize their operationalefficiency

Figure 4 shows the MRI estimated through the 1st and2nd order risk analysis after the detection of malicious URLs

Table 1 lists the detected malicious code exploitlandingdistribution URLs (including both exploitlanding sites anddistribution sites) The risk index is a relative value If limitedto the range 0-1 the minimum risk would be fixed at 0 but itis hard to set a clear standard for the maximum risk

In this paper we use a relative risk index that fixes theminimum risk to 0 and indicates the high-risk core malwaresites through prioritization

42 Sensitivity Analysis Thedetection rate of actual zero-dayattacks can be measured using a sensitivity analysis based onthe results given in Table 1 Among the malware sites relatedto zero-day attacks occurring to biomedical informationsystems we analyze distribution sites and exploitlandingsites Table 2 shows the detection rate measurements basedon actual data produced in a specific time window

The results in Table 2 focus on the top five high-risk sitesThe multipath malware site group denotes the number ofexploitlanding sites actually connected with a distributionsite The percentage represents the average detection rate ina specific time window and this detection performance isbetter than in the pre-analysis stage

The average early detection rate of distribution sites andexploitlanding sites is also higher in this section than in thepreanalysis stage That is the proactive elimination of coremaliciouswebsites results in an average improvement in zero-day attack detection of more than 20

43 Visualization of Analysis Results The risk index of eachURL calculated in this paper can be analyzed by verifyingwhether the risk index agrees with the weak point of thecorresponding server

This section analyzes the actual weak point based onthe calculated risk index and verifies whether this indexagrees with the actual prioritization using an error analysis

6 Computational and Mathematical Methods in Medicine

1st highest-risk landing (or exploit) site1st highest-risk distribution site

Figure 4 Visualization of malware site risk

Table 1 MRI estimation result of exploitlandingdistribution sites

Node type URL MRI ReliabilityDistribution site http222lowast lowast lowastlowast lowast lowastlowast lowast lowastchhtml 03965 91Distribution site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcomNewindexhtml 03505 92Distribution site httpa1lowast lowast lowast lowast lowast lowast lowast lowast lowastcom1indexhtml 03058 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 03047 95Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 03026 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 03017 94Distribution site httpa2lowast lowast lowast lowast lowast lowast lowast lowast lowastcom2indexhtml 03009 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastrekr 03003 92Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 02993 92Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02991 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02983 91Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02982 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorg 02970 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 02969 96Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcom 02968 95Exploit site httplowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02967 95Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02966 96Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 02966 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 02962 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcom 02961 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02960 94(ldquolowast lowast lowast lowast lowast lowast lowast lowast lowastrdquo the URL information of malware site)

Computational and Mathematical Methods in Medicine 7

Table 2 Average detection rate of zero-day attacks for a given day

Priority of risk Malware site groupwith multipath

Distribution site withsingle path

Landing (or exploit) sitewith single path

1 233 (15) 215 2822 226 (9) 316 3243 147 (8) 228 1814 184 (10) 197 3235 212 (12) 242 176Average early detection rate 2004 2396 2572

technique Figure 4 visualizes the 1st highest-risk distributionsite according to the MRI

The detection and elimination of high-risk maliciouscode exploitlandingdistribution sites related to biomedicalinformation systems can be achieved by visualizing the 1sthighest-risk exploitlandingdistribution site as shown inFigure 4Thus our proposedmodel focuses on estimating therisk presented by target malware sites in the specific field ofbiomedical information

We verify the performance of the proposed model basedon static analysis However for military government orsimilar organizations we must dynamically filter out coremalware sites based on high-performance hardware plat-forms For this reason our method is a good example of asuitable defensive measure for APT attacks

5 Related Work

Methods for detecting and analyzingwebsites includingmali-cious code can generally be divided into static and dynamicanalysis

51 Static Analysis Static analysis mainly uses machinelearning and pattern matching to detect and classify mali-cious URLs

Ma et al [14 15] presented a classification model thatdetects spam and phishingURLsThismodel uses a statisticalmethod to classify URLs by considering the lexical and host-based properties of malicious URLs Although this methoddetects both spam and phishing URLs it cannot distinguishbetween the two

Another approach is to analyze the JavaScript code inweb pages to find the typical features of malicious codeThis is done either statically [16] or dynamically by loadingthe affected pages in an emulated browser [17] Systemssuch as Prophiler [18] consider both JavaScript and otherfeatures found in HTML and the URLs of malicious pagesWhittaker et al [19] proposed a phishing website classifier toautomatically update Googlersquos phishing blacklist They usedseveral features obtained from domain information and pagecontents

JSAND [20] used amachine learning approach to classifymalicious JavaScript

52 Dynamic Analysis Dynamic analysis analyzes theserverndashclient connection to detect and classify maliciousURLs

In other words dynamic analysis relies on visiting web-siteswith an instrumented browser (often referred to as a hon-eyclient) and monitoring the activities of the machine to findthe typical signatures of successful exploitations (eg the cre-ation of a new process) [21] PhoneyC [22] uses a signature-based low-interaction honeypot to detect malicious websites

Systems such as [23 24] execute web content dynamicallyand capture drive-by downloads based on either signatures oranomaly detection while Blade [25] leverages user behaviormodels for drive-by download detection All of these systemsexhibit good detection results However it is usually costlyto follow the full redirection path and monitor each scriptexecution in real time Moreover their accuracy is highlydependent on the malicious response of the webpage tovulnerable components

Provos et al [26] analyzed the maliciousness of a largecollection of web pages using a machine learning algorithmas a prefilter for VM-based analysis They adopted content-based features including the presence of obfuscated JavaScriptand exploit site-pointing iframes

The main differences between the models proposed inthis paper and previous approaches are as follows

(i) The model proposed in this paper applies a staticmethod to analyze the connectivity between nodesand detects the core-hub node dynamically based onthe risk index

(ii) The proposed model detects and blocks the core-hubnode using link data from the high-risk maliciouswebsites as observed for a specific period of time

(iii) The proposed model prevents the dissemination ofmalicious websites in the early stages by blocking thelink between the core malicious code distribution siteand the exploitlanding site

6 Conclusion

In this paper the 1st order risk of malware infection wasanalyzed using log information estimated by an MCC thatconsiders the DCI BCI and ECI of the main nodes based onthe priority of risk This provides a quantitative value of thepotential risk inherent in the corresponding site (node)

In addition the risk index of exploit sites and distribu-tion sites was calculated by considering their weights Theoverlapped infection history and survival ratio were used toestimate the risk of distribution sites whereas the overlapped

8 Computational and Mathematical Methods in Medicine

infection history and exposure frequency were consideredwhen estimating the risk of exploit sites Finally the MRI wasestimated using the 1st order risk analysis and the risk indexof the distribution sites and exploit sites

In future work we will develop a feature model thatpredicts the seriousness of website security problems by data-mining the logs produced frommalicious code detection andvulnerability scanning tools

As this feature model will be used to predict the risk ofa specific website it should contribute to establish an activemalicious code distribution blocking system that realizesproactive responses beyond the limit of reactive responsesthat rely only on traditional malicious code detection tools

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] J A Hansen and N M Hansen ldquoA taxonomy of vulnerabilitiesin implantable medical devicesrdquo in Proceedings of the 2ndAnnualWorkshop on Security and Privacy inMedical andHome-Care Systems (SPIMACS rsquo10) pp 13ndash20 October 2010

[2] C-S Park ldquoSecurity mechanism based on hospital authen-tication server for secure application of implantable medicaldevicesrdquo BioMed Research International vol 2014 Article ID543051 12 pages 2014

[3] E Hutchins M Cloppert and R Amin ldquoIntelligence-drivencomputer network defense informed by analysis of adversarycampaigns and intrusion kill chainsrdquo in Proceedings of the 6thInternational Conference on Information Warfare and Security(ICIW rsquo11) pp 113ndash125 Academic Conferences March 2011

[4] N Moran ldquoUnderstanding Advanced Persistent ThreatsmdashACase Studyrdquo 2010 httpswwwusenixorgsystemfilesloginarticles105484-Moranpdf

[5] S-J Kim D-E Cho and S-S Yeo ldquoSecure model againstAPT in m-connected SCADA networkrdquo International Journalof Distributed Sensor Networks vol 2014 Article ID 594652 8pages 2014

[6] N Provos P Mavrommatis M Abu Rajab and F MonroseldquoAll your iframes points to usrdquo in Proceedings of the USENIXSecurity 2008

[7] S Lee and J Kim ldquoWARNINGBIRD detecting suspiciousURLsin twitter streamrdquo in Proceedings of the Symposium on Networkand Distributed System Security (NDSS rsquo12) 2012

[8] X Sun Y Wang J Ren Y Zhu and S Liu ldquoCollectinginternet malware based on client-side honeypotrdquo in Proceedingsof the 9th International Conference for YoungComputer Scientists(ICYCS rsquo08) pp 1493ndash1498 Hunan China November 2008

[9] Y-C Cho and J-Y Pan ldquoMultiple-feature extracting modulesbased leak mining system designrdquoThe Scientific World Journalvol 2013 Article ID 704865 11 pages 2013

[10] D H Kim Y-G Kim H P In and H C Jeong ldquoA method forrisk measurement of botnetrsquos malicious activitiesrdquo InformationJournal vol 17 no 1 pp 165ndash180 2014

[11] C Ni C Sugimoto and J Jiang ldquoDegree closeness andbetweenness application of group centrality measurements toexplore macro-disciplinary evolution diachronicallyrdquo in Pro-ceedings of the ISSI pp 1ndash13 Durban South Africa 2011

[12] F Barzinpour B Hoda Ali-Ahmadi S Alizadeh and S G JalaliNaini ldquoClustering networksrsquo heterogeneous data in defining acomprehensive closeness centrality indexrdquoMathematical Prob-lems in Engineering vol 2014 Article ID 202350 10 pages 2014

[13] S K Raghavan Unnithan B Kannan and M JathavedanldquoBetweenness centrality in Some classes of graphsrdquo Interna-tional Journal of Combinatorics vol 2014 Article ID 241723 12pages 2014

[14] J Ma L K Saul S Savage and G M Voelker ldquoBeyondblacklists learning to detectmaliciousweb sites from suspiciousURLsrdquo in Proceedings of the 15th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo09) pp 1245ndash1253 July 2009

[15] J Ma L K Saul S Savage and G M Voelker ldquoIdentifyingsuspicious URLs an application of large-scale online learningrdquoin Proceedings of the 26th Annual International Conference onMachine Learning (ICML rsquo09) pp 681ndash688 2009

[16] C Curtsinger B Livshits B Zorn and C Seifert ldquoZozzlelow-overhead mostly static javascript malware detectionrdquo inProceedings of the USENIX Security Symposium 2011

[17] M Cova C Kruegel and G Vigna ldquoDetection and analysis ofdrive-by-download attacks and malicious JavaScript coderdquo inProceedings of the 19th International World Wide Web Confer-ence (WWW rsquo10) pp 281ndash290 April 2010

[18] D Canali M Cova G Vigna and C Kruegel ldquoProphiler a fastfilter for the large-scale detection of malicious web pagesrdquo inProceedings of the 20th International Conference on World WideWeb (WWW rsquo11) pp 197ndash206 2011

[19] C Whittaker B Ryner and M Nazif ldquoLarge-scale automaticclassification of phishing pagesrdquo in Proceedings of the Sympo-sium on Network and Distributed System Security (NDSS rsquo10)2010

[20] P Agten S van Acker Y Brondsema P H Phung L Desmetand F Piessens ldquoJSand complete client-side sandboxing ofthird-party JavaScript without browser modificationsrdquo in Pro-ceedings of the 28th Annual Computer Security ApplicationsConference (ACSAC rsquo12) pp 1ndash10 ACM December 2012

[21] C Seifert I Welch and P Komisarczuk ldquoHoneyc the low-interaction client honeypotrdquo in Proceedings of the New ZealandComputer Science Research Student Conference (NZCSRCS rsquo07)Hamilton New Zealand 2007

[22] N Jose ldquoPhoneyC a virtual client honeypotrdquo in Proceedingsof the 2nd USENIX Conference on Large-Scale Exploits andEmergentThreats Botnets SpywareWorms andMore USENIXAssociation Berkeley Calif USA April 2009

[23] Y-MWang D Beck X Jiang et al ldquoAutomatedweb patrol withstrider honeymonkeysrdquo in Proceedings of the 2006 Network andDistributed System Security Symposium February 2006

[24] The Honeynet Project Capture-HPC client honeypot 2008httpprojectshoneynetorgcapture-hpc

[25] L Lu V Yegneswaran P Porras and W Lee ldquoBlade an attack-agnostic approach for preventing drive-by malware infectionsrdquoin Proceedings of the 17th ACM Conference on Computer andCommunications Security (CCS rsquo10) pp 440ndash450 ACM Octo-ber 2010

[26] N Provos P Mavrommatis M A Rajab and F Monrose ldquoAllyour iFRAMEs point to usrdquo in Proceedings of the 17th USENIXSecurity Symposium pp 1ndash15 2008

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 5: Research Article Method for Detecting Core Malware Sites ...downloads.hindawi.com/journals/cmmm/2015/756842.pdf · # of nodes 1, Time complexity: ( ). B Eigenvector Centrality Index

Computational and Mathematical Methods in Medicine 5

(4) Step 4 2nd Order Risk Analysis

A Distribution Site Risk Analysis The risk indexis estimated by considering the weights (over-lapped infection history and survival ratio)based on the 1st order risk analysis The dis-tribution site risk is calculated by the vector ofthe values calculated in Step 3 the overlappedinfection history (119868) of each distribution sitenode and the actual survival ratio (119878)lowast Survival Ratio (119878) is as follows whether treat-ment has been given after infection (based onone yearrsquos information)

Treatment Probability (1198781)

=Survival Cases

Survival Cases + Treatesd Cases

Failure Probability (1198782)

=Treated Cases

Survival Cases + Treatesd Cases

1199032 = 1199031 times 119868 times 1198781 (If the node has been treated)

1199032 = 1199031 times 119868 times 1198782 (If the node has not been treated)

(5)

B ExploitLanding Site Risk Analysis The riskindex is estimated by considering the weights(overlapped infection history and exposure fre-quency) from the 1st order risk analysis Theexploit site risk is calculated by the vectorof values calculated in Step 3 the overlappedinfection history (119868) of each distribution sitenode and the actual exposure frequency in asearch website (119864)

1199033 = 1199031 times (2 times 119868 times 119864

119868 + 119864) or 1199033 = 1199031 times 119868 (6)

(5) Step 5 Malicious URL Risk Index (MRI) The MRI isestimated from the 1st order risk analysis result andthe risk indexThe following formula can be deducedfrom the 1st order risk analysis result calculated inStep 3 and the risk index of each distributionexploitsite calculated in Step 4 considering the characteris-tics of the corresponding node

119903final = radic11990321 + 11990322 + 11990323

(7)

4 Experimental Results

We conducted experiments to examine the performance ofour zero-day detection method based on MCC

For these experiments we processed the detection logacquired by crawling biomedical information system-relatedmalware sites with the developedMCC in the log form statedin Step 1

The estimated risk values are intuitive in our proposedmodel That is our final interpretation is based on the

crawling result Additionally the crawling method uses ablacklist or known patterns Thus our proposed modelexhibits a low false positive rate

The MCC detection method proceeds as follows Theattacker (hacker) inserts malicious code into a specific web-page by operating a malicious code distribution server on theinternet or by hacking a vulnerable web serverThe clients (orusers) of the web server involuntarily use the exploitlandingdistribution site containing themalicious code and downloadthe malicious code Eventually the attacker collects the clientaccounts and various other information from the infectedserver and proceeds to act maliciously

The proposed system searchescrawls 25 million siteson a continuous basis detectsblocks the inserted maliciouscode and establishesoperates a malicious code blacklist

41 Analysis Results As a post hoc study based on the resultsof the MCC operation for a specific period our results sup-port decision making for proactive responses and follow-upmeasures enabling biomedical information system securityexperts or administrators to maximize their operationalefficiency

Figure 4 shows the MRI estimated through the 1st and2nd order risk analysis after the detection of malicious URLs

Table 1 lists the detected malicious code exploitlandingdistribution URLs (including both exploitlanding sites anddistribution sites) The risk index is a relative value If limitedto the range 0-1 the minimum risk would be fixed at 0 but itis hard to set a clear standard for the maximum risk

In this paper we use a relative risk index that fixes theminimum risk to 0 and indicates the high-risk core malwaresites through prioritization

42 Sensitivity Analysis Thedetection rate of actual zero-dayattacks can be measured using a sensitivity analysis based onthe results given in Table 1 Among the malware sites relatedto zero-day attacks occurring to biomedical informationsystems we analyze distribution sites and exploitlandingsites Table 2 shows the detection rate measurements basedon actual data produced in a specific time window

The results in Table 2 focus on the top five high-risk sitesThe multipath malware site group denotes the number ofexploitlanding sites actually connected with a distributionsite The percentage represents the average detection rate ina specific time window and this detection performance isbetter than in the pre-analysis stage

The average early detection rate of distribution sites andexploitlanding sites is also higher in this section than in thepreanalysis stage That is the proactive elimination of coremaliciouswebsites results in an average improvement in zero-day attack detection of more than 20

43 Visualization of Analysis Results The risk index of eachURL calculated in this paper can be analyzed by verifyingwhether the risk index agrees with the weak point of thecorresponding server

This section analyzes the actual weak point based onthe calculated risk index and verifies whether this indexagrees with the actual prioritization using an error analysis

6 Computational and Mathematical Methods in Medicine

1st highest-risk landing (or exploit) site1st highest-risk distribution site

Figure 4 Visualization of malware site risk

Table 1 MRI estimation result of exploitlandingdistribution sites

Node type URL MRI ReliabilityDistribution site http222lowast lowast lowastlowast lowast lowastlowast lowast lowastchhtml 03965 91Distribution site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcomNewindexhtml 03505 92Distribution site httpa1lowast lowast lowast lowast lowast lowast lowast lowast lowastcom1indexhtml 03058 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 03047 95Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 03026 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 03017 94Distribution site httpa2lowast lowast lowast lowast lowast lowast lowast lowast lowastcom2indexhtml 03009 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastrekr 03003 92Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 02993 92Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02991 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02983 91Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02982 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorg 02970 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 02969 96Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcom 02968 95Exploit site httplowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02967 95Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02966 96Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 02966 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 02962 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcom 02961 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02960 94(ldquolowast lowast lowast lowast lowast lowast lowast lowast lowastrdquo the URL information of malware site)

Computational and Mathematical Methods in Medicine 7

Table 2 Average detection rate of zero-day attacks for a given day

Priority of risk Malware site groupwith multipath

Distribution site withsingle path

Landing (or exploit) sitewith single path

1 233 (15) 215 2822 226 (9) 316 3243 147 (8) 228 1814 184 (10) 197 3235 212 (12) 242 176Average early detection rate 2004 2396 2572

technique Figure 4 visualizes the 1st highest-risk distributionsite according to the MRI

The detection and elimination of high-risk maliciouscode exploitlandingdistribution sites related to biomedicalinformation systems can be achieved by visualizing the 1sthighest-risk exploitlandingdistribution site as shown inFigure 4Thus our proposedmodel focuses on estimating therisk presented by target malware sites in the specific field ofbiomedical information

We verify the performance of the proposed model basedon static analysis However for military government orsimilar organizations we must dynamically filter out coremalware sites based on high-performance hardware plat-forms For this reason our method is a good example of asuitable defensive measure for APT attacks

5 Related Work

Methods for detecting and analyzingwebsites includingmali-cious code can generally be divided into static and dynamicanalysis

51 Static Analysis Static analysis mainly uses machinelearning and pattern matching to detect and classify mali-cious URLs

Ma et al [14 15] presented a classification model thatdetects spam and phishingURLsThismodel uses a statisticalmethod to classify URLs by considering the lexical and host-based properties of malicious URLs Although this methoddetects both spam and phishing URLs it cannot distinguishbetween the two

Another approach is to analyze the JavaScript code inweb pages to find the typical features of malicious codeThis is done either statically [16] or dynamically by loadingthe affected pages in an emulated browser [17] Systemssuch as Prophiler [18] consider both JavaScript and otherfeatures found in HTML and the URLs of malicious pagesWhittaker et al [19] proposed a phishing website classifier toautomatically update Googlersquos phishing blacklist They usedseveral features obtained from domain information and pagecontents

JSAND [20] used amachine learning approach to classifymalicious JavaScript

52 Dynamic Analysis Dynamic analysis analyzes theserverndashclient connection to detect and classify maliciousURLs

In other words dynamic analysis relies on visiting web-siteswith an instrumented browser (often referred to as a hon-eyclient) and monitoring the activities of the machine to findthe typical signatures of successful exploitations (eg the cre-ation of a new process) [21] PhoneyC [22] uses a signature-based low-interaction honeypot to detect malicious websites

Systems such as [23 24] execute web content dynamicallyand capture drive-by downloads based on either signatures oranomaly detection while Blade [25] leverages user behaviormodels for drive-by download detection All of these systemsexhibit good detection results However it is usually costlyto follow the full redirection path and monitor each scriptexecution in real time Moreover their accuracy is highlydependent on the malicious response of the webpage tovulnerable components

Provos et al [26] analyzed the maliciousness of a largecollection of web pages using a machine learning algorithmas a prefilter for VM-based analysis They adopted content-based features including the presence of obfuscated JavaScriptand exploit site-pointing iframes

The main differences between the models proposed inthis paper and previous approaches are as follows

(i) The model proposed in this paper applies a staticmethod to analyze the connectivity between nodesand detects the core-hub node dynamically based onthe risk index

(ii) The proposed model detects and blocks the core-hubnode using link data from the high-risk maliciouswebsites as observed for a specific period of time

(iii) The proposed model prevents the dissemination ofmalicious websites in the early stages by blocking thelink between the core malicious code distribution siteand the exploitlanding site

6 Conclusion

In this paper the 1st order risk of malware infection wasanalyzed using log information estimated by an MCC thatconsiders the DCI BCI and ECI of the main nodes based onthe priority of risk This provides a quantitative value of thepotential risk inherent in the corresponding site (node)

In addition the risk index of exploit sites and distribu-tion sites was calculated by considering their weights Theoverlapped infection history and survival ratio were used toestimate the risk of distribution sites whereas the overlapped

8 Computational and Mathematical Methods in Medicine

infection history and exposure frequency were consideredwhen estimating the risk of exploit sites Finally the MRI wasestimated using the 1st order risk analysis and the risk indexof the distribution sites and exploit sites

In future work we will develop a feature model thatpredicts the seriousness of website security problems by data-mining the logs produced frommalicious code detection andvulnerability scanning tools

As this feature model will be used to predict the risk ofa specific website it should contribute to establish an activemalicious code distribution blocking system that realizesproactive responses beyond the limit of reactive responsesthat rely only on traditional malicious code detection tools

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] J A Hansen and N M Hansen ldquoA taxonomy of vulnerabilitiesin implantable medical devicesrdquo in Proceedings of the 2ndAnnualWorkshop on Security and Privacy inMedical andHome-Care Systems (SPIMACS rsquo10) pp 13ndash20 October 2010

[2] C-S Park ldquoSecurity mechanism based on hospital authen-tication server for secure application of implantable medicaldevicesrdquo BioMed Research International vol 2014 Article ID543051 12 pages 2014

[3] E Hutchins M Cloppert and R Amin ldquoIntelligence-drivencomputer network defense informed by analysis of adversarycampaigns and intrusion kill chainsrdquo in Proceedings of the 6thInternational Conference on Information Warfare and Security(ICIW rsquo11) pp 113ndash125 Academic Conferences March 2011

[4] N Moran ldquoUnderstanding Advanced Persistent ThreatsmdashACase Studyrdquo 2010 httpswwwusenixorgsystemfilesloginarticles105484-Moranpdf

[5] S-J Kim D-E Cho and S-S Yeo ldquoSecure model againstAPT in m-connected SCADA networkrdquo International Journalof Distributed Sensor Networks vol 2014 Article ID 594652 8pages 2014

[6] N Provos P Mavrommatis M Abu Rajab and F MonroseldquoAll your iframes points to usrdquo in Proceedings of the USENIXSecurity 2008

[7] S Lee and J Kim ldquoWARNINGBIRD detecting suspiciousURLsin twitter streamrdquo in Proceedings of the Symposium on Networkand Distributed System Security (NDSS rsquo12) 2012

[8] X Sun Y Wang J Ren Y Zhu and S Liu ldquoCollectinginternet malware based on client-side honeypotrdquo in Proceedingsof the 9th International Conference for YoungComputer Scientists(ICYCS rsquo08) pp 1493ndash1498 Hunan China November 2008

[9] Y-C Cho and J-Y Pan ldquoMultiple-feature extracting modulesbased leak mining system designrdquoThe Scientific World Journalvol 2013 Article ID 704865 11 pages 2013

[10] D H Kim Y-G Kim H P In and H C Jeong ldquoA method forrisk measurement of botnetrsquos malicious activitiesrdquo InformationJournal vol 17 no 1 pp 165ndash180 2014

[11] C Ni C Sugimoto and J Jiang ldquoDegree closeness andbetweenness application of group centrality measurements toexplore macro-disciplinary evolution diachronicallyrdquo in Pro-ceedings of the ISSI pp 1ndash13 Durban South Africa 2011

[12] F Barzinpour B Hoda Ali-Ahmadi S Alizadeh and S G JalaliNaini ldquoClustering networksrsquo heterogeneous data in defining acomprehensive closeness centrality indexrdquoMathematical Prob-lems in Engineering vol 2014 Article ID 202350 10 pages 2014

[13] S K Raghavan Unnithan B Kannan and M JathavedanldquoBetweenness centrality in Some classes of graphsrdquo Interna-tional Journal of Combinatorics vol 2014 Article ID 241723 12pages 2014

[14] J Ma L K Saul S Savage and G M Voelker ldquoBeyondblacklists learning to detectmaliciousweb sites from suspiciousURLsrdquo in Proceedings of the 15th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo09) pp 1245ndash1253 July 2009

[15] J Ma L K Saul S Savage and G M Voelker ldquoIdentifyingsuspicious URLs an application of large-scale online learningrdquoin Proceedings of the 26th Annual International Conference onMachine Learning (ICML rsquo09) pp 681ndash688 2009

[16] C Curtsinger B Livshits B Zorn and C Seifert ldquoZozzlelow-overhead mostly static javascript malware detectionrdquo inProceedings of the USENIX Security Symposium 2011

[17] M Cova C Kruegel and G Vigna ldquoDetection and analysis ofdrive-by-download attacks and malicious JavaScript coderdquo inProceedings of the 19th International World Wide Web Confer-ence (WWW rsquo10) pp 281ndash290 April 2010

[18] D Canali M Cova G Vigna and C Kruegel ldquoProphiler a fastfilter for the large-scale detection of malicious web pagesrdquo inProceedings of the 20th International Conference on World WideWeb (WWW rsquo11) pp 197ndash206 2011

[19] C Whittaker B Ryner and M Nazif ldquoLarge-scale automaticclassification of phishing pagesrdquo in Proceedings of the Sympo-sium on Network and Distributed System Security (NDSS rsquo10)2010

[20] P Agten S van Acker Y Brondsema P H Phung L Desmetand F Piessens ldquoJSand complete client-side sandboxing ofthird-party JavaScript without browser modificationsrdquo in Pro-ceedings of the 28th Annual Computer Security ApplicationsConference (ACSAC rsquo12) pp 1ndash10 ACM December 2012

[21] C Seifert I Welch and P Komisarczuk ldquoHoneyc the low-interaction client honeypotrdquo in Proceedings of the New ZealandComputer Science Research Student Conference (NZCSRCS rsquo07)Hamilton New Zealand 2007

[22] N Jose ldquoPhoneyC a virtual client honeypotrdquo in Proceedingsof the 2nd USENIX Conference on Large-Scale Exploits andEmergentThreats Botnets SpywareWorms andMore USENIXAssociation Berkeley Calif USA April 2009

[23] Y-MWang D Beck X Jiang et al ldquoAutomatedweb patrol withstrider honeymonkeysrdquo in Proceedings of the 2006 Network andDistributed System Security Symposium February 2006

[24] The Honeynet Project Capture-HPC client honeypot 2008httpprojectshoneynetorgcapture-hpc

[25] L Lu V Yegneswaran P Porras and W Lee ldquoBlade an attack-agnostic approach for preventing drive-by malware infectionsrdquoin Proceedings of the 17th ACM Conference on Computer andCommunications Security (CCS rsquo10) pp 440ndash450 ACM Octo-ber 2010

[26] N Provos P Mavrommatis M A Rajab and F Monrose ldquoAllyour iFRAMEs point to usrdquo in Proceedings of the 17th USENIXSecurity Symposium pp 1ndash15 2008

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 6: Research Article Method for Detecting Core Malware Sites ...downloads.hindawi.com/journals/cmmm/2015/756842.pdf · # of nodes 1, Time complexity: ( ). B Eigenvector Centrality Index

6 Computational and Mathematical Methods in Medicine

1st highest-risk landing (or exploit) site1st highest-risk distribution site

Figure 4 Visualization of malware site risk

Table 1 MRI estimation result of exploitlandingdistribution sites

Node type URL MRI ReliabilityDistribution site http222lowast lowast lowastlowast lowast lowastlowast lowast lowastchhtml 03965 91Distribution site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcomNewindexhtml 03505 92Distribution site httpa1lowast lowast lowast lowast lowast lowast lowast lowast lowastcom1indexhtml 03058 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 03047 95Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 03026 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 03017 94Distribution site httpa2lowast lowast lowast lowast lowast lowast lowast lowast lowastcom2indexhtml 03009 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastrekr 03003 92Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 02993 92Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02991 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02983 91Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02982 90Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorg 02970 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastorkr 02969 96Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcom 02968 95Exploit site httplowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02967 95Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02966 96Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 02966 94Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastkr 02962 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcom 02961 93Exploit site httpwwwlowast lowast lowast lowast lowast lowast lowast lowast lowastcokr 02960 94(ldquolowast lowast lowast lowast lowast lowast lowast lowast lowastrdquo the URL information of malware site)

Computational and Mathematical Methods in Medicine 7

Table 2 Average detection rate of zero-day attacks for a given day

Priority of risk Malware site groupwith multipath

Distribution site withsingle path

Landing (or exploit) sitewith single path

1 233 (15) 215 2822 226 (9) 316 3243 147 (8) 228 1814 184 (10) 197 3235 212 (12) 242 176Average early detection rate 2004 2396 2572

technique Figure 4 visualizes the 1st highest-risk distributionsite according to the MRI

The detection and elimination of high-risk maliciouscode exploitlandingdistribution sites related to biomedicalinformation systems can be achieved by visualizing the 1sthighest-risk exploitlandingdistribution site as shown inFigure 4Thus our proposedmodel focuses on estimating therisk presented by target malware sites in the specific field ofbiomedical information

We verify the performance of the proposed model basedon static analysis However for military government orsimilar organizations we must dynamically filter out coremalware sites based on high-performance hardware plat-forms For this reason our method is a good example of asuitable defensive measure for APT attacks

5 Related Work

Methods for detecting and analyzingwebsites includingmali-cious code can generally be divided into static and dynamicanalysis

51 Static Analysis Static analysis mainly uses machinelearning and pattern matching to detect and classify mali-cious URLs

Ma et al [14 15] presented a classification model thatdetects spam and phishingURLsThismodel uses a statisticalmethod to classify URLs by considering the lexical and host-based properties of malicious URLs Although this methoddetects both spam and phishing URLs it cannot distinguishbetween the two

Another approach is to analyze the JavaScript code inweb pages to find the typical features of malicious codeThis is done either statically [16] or dynamically by loadingthe affected pages in an emulated browser [17] Systemssuch as Prophiler [18] consider both JavaScript and otherfeatures found in HTML and the URLs of malicious pagesWhittaker et al [19] proposed a phishing website classifier toautomatically update Googlersquos phishing blacklist They usedseveral features obtained from domain information and pagecontents

JSAND [20] used amachine learning approach to classifymalicious JavaScript

52 Dynamic Analysis Dynamic analysis analyzes theserverndashclient connection to detect and classify maliciousURLs

In other words dynamic analysis relies on visiting web-siteswith an instrumented browser (often referred to as a hon-eyclient) and monitoring the activities of the machine to findthe typical signatures of successful exploitations (eg the cre-ation of a new process) [21] PhoneyC [22] uses a signature-based low-interaction honeypot to detect malicious websites

Systems such as [23 24] execute web content dynamicallyand capture drive-by downloads based on either signatures oranomaly detection while Blade [25] leverages user behaviormodels for drive-by download detection All of these systemsexhibit good detection results However it is usually costlyto follow the full redirection path and monitor each scriptexecution in real time Moreover their accuracy is highlydependent on the malicious response of the webpage tovulnerable components

Provos et al [26] analyzed the maliciousness of a largecollection of web pages using a machine learning algorithmas a prefilter for VM-based analysis They adopted content-based features including the presence of obfuscated JavaScriptand exploit site-pointing iframes

The main differences between the models proposed inthis paper and previous approaches are as follows

(i) The model proposed in this paper applies a staticmethod to analyze the connectivity between nodesand detects the core-hub node dynamically based onthe risk index

(ii) The proposed model detects and blocks the core-hubnode using link data from the high-risk maliciouswebsites as observed for a specific period of time

(iii) The proposed model prevents the dissemination ofmalicious websites in the early stages by blocking thelink between the core malicious code distribution siteand the exploitlanding site

6 Conclusion

In this paper the 1st order risk of malware infection wasanalyzed using log information estimated by an MCC thatconsiders the DCI BCI and ECI of the main nodes based onthe priority of risk This provides a quantitative value of thepotential risk inherent in the corresponding site (node)

In addition the risk index of exploit sites and distribu-tion sites was calculated by considering their weights Theoverlapped infection history and survival ratio were used toestimate the risk of distribution sites whereas the overlapped

8 Computational and Mathematical Methods in Medicine

infection history and exposure frequency were consideredwhen estimating the risk of exploit sites Finally the MRI wasestimated using the 1st order risk analysis and the risk indexof the distribution sites and exploit sites

In future work we will develop a feature model thatpredicts the seriousness of website security problems by data-mining the logs produced frommalicious code detection andvulnerability scanning tools

As this feature model will be used to predict the risk ofa specific website it should contribute to establish an activemalicious code distribution blocking system that realizesproactive responses beyond the limit of reactive responsesthat rely only on traditional malicious code detection tools

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] J A Hansen and N M Hansen ldquoA taxonomy of vulnerabilitiesin implantable medical devicesrdquo in Proceedings of the 2ndAnnualWorkshop on Security and Privacy inMedical andHome-Care Systems (SPIMACS rsquo10) pp 13ndash20 October 2010

[2] C-S Park ldquoSecurity mechanism based on hospital authen-tication server for secure application of implantable medicaldevicesrdquo BioMed Research International vol 2014 Article ID543051 12 pages 2014

[3] E Hutchins M Cloppert and R Amin ldquoIntelligence-drivencomputer network defense informed by analysis of adversarycampaigns and intrusion kill chainsrdquo in Proceedings of the 6thInternational Conference on Information Warfare and Security(ICIW rsquo11) pp 113ndash125 Academic Conferences March 2011

[4] N Moran ldquoUnderstanding Advanced Persistent ThreatsmdashACase Studyrdquo 2010 httpswwwusenixorgsystemfilesloginarticles105484-Moranpdf

[5] S-J Kim D-E Cho and S-S Yeo ldquoSecure model againstAPT in m-connected SCADA networkrdquo International Journalof Distributed Sensor Networks vol 2014 Article ID 594652 8pages 2014

[6] N Provos P Mavrommatis M Abu Rajab and F MonroseldquoAll your iframes points to usrdquo in Proceedings of the USENIXSecurity 2008

[7] S Lee and J Kim ldquoWARNINGBIRD detecting suspiciousURLsin twitter streamrdquo in Proceedings of the Symposium on Networkand Distributed System Security (NDSS rsquo12) 2012

[8] X Sun Y Wang J Ren Y Zhu and S Liu ldquoCollectinginternet malware based on client-side honeypotrdquo in Proceedingsof the 9th International Conference for YoungComputer Scientists(ICYCS rsquo08) pp 1493ndash1498 Hunan China November 2008

[9] Y-C Cho and J-Y Pan ldquoMultiple-feature extracting modulesbased leak mining system designrdquoThe Scientific World Journalvol 2013 Article ID 704865 11 pages 2013

[10] D H Kim Y-G Kim H P In and H C Jeong ldquoA method forrisk measurement of botnetrsquos malicious activitiesrdquo InformationJournal vol 17 no 1 pp 165ndash180 2014

[11] C Ni C Sugimoto and J Jiang ldquoDegree closeness andbetweenness application of group centrality measurements toexplore macro-disciplinary evolution diachronicallyrdquo in Pro-ceedings of the ISSI pp 1ndash13 Durban South Africa 2011

[12] F Barzinpour B Hoda Ali-Ahmadi S Alizadeh and S G JalaliNaini ldquoClustering networksrsquo heterogeneous data in defining acomprehensive closeness centrality indexrdquoMathematical Prob-lems in Engineering vol 2014 Article ID 202350 10 pages 2014

[13] S K Raghavan Unnithan B Kannan and M JathavedanldquoBetweenness centrality in Some classes of graphsrdquo Interna-tional Journal of Combinatorics vol 2014 Article ID 241723 12pages 2014

[14] J Ma L K Saul S Savage and G M Voelker ldquoBeyondblacklists learning to detectmaliciousweb sites from suspiciousURLsrdquo in Proceedings of the 15th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo09) pp 1245ndash1253 July 2009

[15] J Ma L K Saul S Savage and G M Voelker ldquoIdentifyingsuspicious URLs an application of large-scale online learningrdquoin Proceedings of the 26th Annual International Conference onMachine Learning (ICML rsquo09) pp 681ndash688 2009

[16] C Curtsinger B Livshits B Zorn and C Seifert ldquoZozzlelow-overhead mostly static javascript malware detectionrdquo inProceedings of the USENIX Security Symposium 2011

[17] M Cova C Kruegel and G Vigna ldquoDetection and analysis ofdrive-by-download attacks and malicious JavaScript coderdquo inProceedings of the 19th International World Wide Web Confer-ence (WWW rsquo10) pp 281ndash290 April 2010

[18] D Canali M Cova G Vigna and C Kruegel ldquoProphiler a fastfilter for the large-scale detection of malicious web pagesrdquo inProceedings of the 20th International Conference on World WideWeb (WWW rsquo11) pp 197ndash206 2011

[19] C Whittaker B Ryner and M Nazif ldquoLarge-scale automaticclassification of phishing pagesrdquo in Proceedings of the Sympo-sium on Network and Distributed System Security (NDSS rsquo10)2010

[20] P Agten S van Acker Y Brondsema P H Phung L Desmetand F Piessens ldquoJSand complete client-side sandboxing ofthird-party JavaScript without browser modificationsrdquo in Pro-ceedings of the 28th Annual Computer Security ApplicationsConference (ACSAC rsquo12) pp 1ndash10 ACM December 2012

[21] C Seifert I Welch and P Komisarczuk ldquoHoneyc the low-interaction client honeypotrdquo in Proceedings of the New ZealandComputer Science Research Student Conference (NZCSRCS rsquo07)Hamilton New Zealand 2007

[22] N Jose ldquoPhoneyC a virtual client honeypotrdquo in Proceedingsof the 2nd USENIX Conference on Large-Scale Exploits andEmergentThreats Botnets SpywareWorms andMore USENIXAssociation Berkeley Calif USA April 2009

[23] Y-MWang D Beck X Jiang et al ldquoAutomatedweb patrol withstrider honeymonkeysrdquo in Proceedings of the 2006 Network andDistributed System Security Symposium February 2006

[24] The Honeynet Project Capture-HPC client honeypot 2008httpprojectshoneynetorgcapture-hpc

[25] L Lu V Yegneswaran P Porras and W Lee ldquoBlade an attack-agnostic approach for preventing drive-by malware infectionsrdquoin Proceedings of the 17th ACM Conference on Computer andCommunications Security (CCS rsquo10) pp 440ndash450 ACM Octo-ber 2010

[26] N Provos P Mavrommatis M A Rajab and F Monrose ldquoAllyour iFRAMEs point to usrdquo in Proceedings of the 17th USENIXSecurity Symposium pp 1ndash15 2008

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 7: Research Article Method for Detecting Core Malware Sites ...downloads.hindawi.com/journals/cmmm/2015/756842.pdf · # of nodes 1, Time complexity: ( ). B Eigenvector Centrality Index

Computational and Mathematical Methods in Medicine 7

Table 2 Average detection rate of zero-day attacks for a given day

Priority of risk Malware site groupwith multipath

Distribution site withsingle path

Landing (or exploit) sitewith single path

1 233 (15) 215 2822 226 (9) 316 3243 147 (8) 228 1814 184 (10) 197 3235 212 (12) 242 176Average early detection rate 2004 2396 2572

technique Figure 4 visualizes the 1st highest-risk distributionsite according to the MRI

The detection and elimination of high-risk maliciouscode exploitlandingdistribution sites related to biomedicalinformation systems can be achieved by visualizing the 1sthighest-risk exploitlandingdistribution site as shown inFigure 4Thus our proposedmodel focuses on estimating therisk presented by target malware sites in the specific field ofbiomedical information

We verify the performance of the proposed model basedon static analysis However for military government orsimilar organizations we must dynamically filter out coremalware sites based on high-performance hardware plat-forms For this reason our method is a good example of asuitable defensive measure for APT attacks

5 Related Work

Methods for detecting and analyzingwebsites includingmali-cious code can generally be divided into static and dynamicanalysis

51 Static Analysis Static analysis mainly uses machinelearning and pattern matching to detect and classify mali-cious URLs

Ma et al [14 15] presented a classification model thatdetects spam and phishingURLsThismodel uses a statisticalmethod to classify URLs by considering the lexical and host-based properties of malicious URLs Although this methoddetects both spam and phishing URLs it cannot distinguishbetween the two

Another approach is to analyze the JavaScript code inweb pages to find the typical features of malicious codeThis is done either statically [16] or dynamically by loadingthe affected pages in an emulated browser [17] Systemssuch as Prophiler [18] consider both JavaScript and otherfeatures found in HTML and the URLs of malicious pagesWhittaker et al [19] proposed a phishing website classifier toautomatically update Googlersquos phishing blacklist They usedseveral features obtained from domain information and pagecontents

JSAND [20] used amachine learning approach to classifymalicious JavaScript

52 Dynamic Analysis Dynamic analysis analyzes theserverndashclient connection to detect and classify maliciousURLs

In other words dynamic analysis relies on visiting web-siteswith an instrumented browser (often referred to as a hon-eyclient) and monitoring the activities of the machine to findthe typical signatures of successful exploitations (eg the cre-ation of a new process) [21] PhoneyC [22] uses a signature-based low-interaction honeypot to detect malicious websites

Systems such as [23 24] execute web content dynamicallyand capture drive-by downloads based on either signatures oranomaly detection while Blade [25] leverages user behaviormodels for drive-by download detection All of these systemsexhibit good detection results However it is usually costlyto follow the full redirection path and monitor each scriptexecution in real time Moreover their accuracy is highlydependent on the malicious response of the webpage tovulnerable components

Provos et al [26] analyzed the maliciousness of a largecollection of web pages using a machine learning algorithmas a prefilter for VM-based analysis They adopted content-based features including the presence of obfuscated JavaScriptand exploit site-pointing iframes

The main differences between the models proposed inthis paper and previous approaches are as follows

(i) The model proposed in this paper applies a staticmethod to analyze the connectivity between nodesand detects the core-hub node dynamically based onthe risk index

(ii) The proposed model detects and blocks the core-hubnode using link data from the high-risk maliciouswebsites as observed for a specific period of time

(iii) The proposed model prevents the dissemination ofmalicious websites in the early stages by blocking thelink between the core malicious code distribution siteand the exploitlanding site

6 Conclusion

In this paper the 1st order risk of malware infection wasanalyzed using log information estimated by an MCC thatconsiders the DCI BCI and ECI of the main nodes based onthe priority of risk This provides a quantitative value of thepotential risk inherent in the corresponding site (node)

In addition the risk index of exploit sites and distribu-tion sites was calculated by considering their weights Theoverlapped infection history and survival ratio were used toestimate the risk of distribution sites whereas the overlapped

8 Computational and Mathematical Methods in Medicine

infection history and exposure frequency were consideredwhen estimating the risk of exploit sites Finally the MRI wasestimated using the 1st order risk analysis and the risk indexof the distribution sites and exploit sites

In future work we will develop a feature model thatpredicts the seriousness of website security problems by data-mining the logs produced frommalicious code detection andvulnerability scanning tools

As this feature model will be used to predict the risk ofa specific website it should contribute to establish an activemalicious code distribution blocking system that realizesproactive responses beyond the limit of reactive responsesthat rely only on traditional malicious code detection tools

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] J A Hansen and N M Hansen ldquoA taxonomy of vulnerabilitiesin implantable medical devicesrdquo in Proceedings of the 2ndAnnualWorkshop on Security and Privacy inMedical andHome-Care Systems (SPIMACS rsquo10) pp 13ndash20 October 2010

[2] C-S Park ldquoSecurity mechanism based on hospital authen-tication server for secure application of implantable medicaldevicesrdquo BioMed Research International vol 2014 Article ID543051 12 pages 2014

[3] E Hutchins M Cloppert and R Amin ldquoIntelligence-drivencomputer network defense informed by analysis of adversarycampaigns and intrusion kill chainsrdquo in Proceedings of the 6thInternational Conference on Information Warfare and Security(ICIW rsquo11) pp 113ndash125 Academic Conferences March 2011

[4] N Moran ldquoUnderstanding Advanced Persistent ThreatsmdashACase Studyrdquo 2010 httpswwwusenixorgsystemfilesloginarticles105484-Moranpdf

[5] S-J Kim D-E Cho and S-S Yeo ldquoSecure model againstAPT in m-connected SCADA networkrdquo International Journalof Distributed Sensor Networks vol 2014 Article ID 594652 8pages 2014

[6] N Provos P Mavrommatis M Abu Rajab and F MonroseldquoAll your iframes points to usrdquo in Proceedings of the USENIXSecurity 2008

[7] S Lee and J Kim ldquoWARNINGBIRD detecting suspiciousURLsin twitter streamrdquo in Proceedings of the Symposium on Networkand Distributed System Security (NDSS rsquo12) 2012

[8] X Sun Y Wang J Ren Y Zhu and S Liu ldquoCollectinginternet malware based on client-side honeypotrdquo in Proceedingsof the 9th International Conference for YoungComputer Scientists(ICYCS rsquo08) pp 1493ndash1498 Hunan China November 2008

[9] Y-C Cho and J-Y Pan ldquoMultiple-feature extracting modulesbased leak mining system designrdquoThe Scientific World Journalvol 2013 Article ID 704865 11 pages 2013

[10] D H Kim Y-G Kim H P In and H C Jeong ldquoA method forrisk measurement of botnetrsquos malicious activitiesrdquo InformationJournal vol 17 no 1 pp 165ndash180 2014

[11] C Ni C Sugimoto and J Jiang ldquoDegree closeness andbetweenness application of group centrality measurements toexplore macro-disciplinary evolution diachronicallyrdquo in Pro-ceedings of the ISSI pp 1ndash13 Durban South Africa 2011

[12] F Barzinpour B Hoda Ali-Ahmadi S Alizadeh and S G JalaliNaini ldquoClustering networksrsquo heterogeneous data in defining acomprehensive closeness centrality indexrdquoMathematical Prob-lems in Engineering vol 2014 Article ID 202350 10 pages 2014

[13] S K Raghavan Unnithan B Kannan and M JathavedanldquoBetweenness centrality in Some classes of graphsrdquo Interna-tional Journal of Combinatorics vol 2014 Article ID 241723 12pages 2014

[14] J Ma L K Saul S Savage and G M Voelker ldquoBeyondblacklists learning to detectmaliciousweb sites from suspiciousURLsrdquo in Proceedings of the 15th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo09) pp 1245ndash1253 July 2009

[15] J Ma L K Saul S Savage and G M Voelker ldquoIdentifyingsuspicious URLs an application of large-scale online learningrdquoin Proceedings of the 26th Annual International Conference onMachine Learning (ICML rsquo09) pp 681ndash688 2009

[16] C Curtsinger B Livshits B Zorn and C Seifert ldquoZozzlelow-overhead mostly static javascript malware detectionrdquo inProceedings of the USENIX Security Symposium 2011

[17] M Cova C Kruegel and G Vigna ldquoDetection and analysis ofdrive-by-download attacks and malicious JavaScript coderdquo inProceedings of the 19th International World Wide Web Confer-ence (WWW rsquo10) pp 281ndash290 April 2010

[18] D Canali M Cova G Vigna and C Kruegel ldquoProphiler a fastfilter for the large-scale detection of malicious web pagesrdquo inProceedings of the 20th International Conference on World WideWeb (WWW rsquo11) pp 197ndash206 2011

[19] C Whittaker B Ryner and M Nazif ldquoLarge-scale automaticclassification of phishing pagesrdquo in Proceedings of the Sympo-sium on Network and Distributed System Security (NDSS rsquo10)2010

[20] P Agten S van Acker Y Brondsema P H Phung L Desmetand F Piessens ldquoJSand complete client-side sandboxing ofthird-party JavaScript without browser modificationsrdquo in Pro-ceedings of the 28th Annual Computer Security ApplicationsConference (ACSAC rsquo12) pp 1ndash10 ACM December 2012

[21] C Seifert I Welch and P Komisarczuk ldquoHoneyc the low-interaction client honeypotrdquo in Proceedings of the New ZealandComputer Science Research Student Conference (NZCSRCS rsquo07)Hamilton New Zealand 2007

[22] N Jose ldquoPhoneyC a virtual client honeypotrdquo in Proceedingsof the 2nd USENIX Conference on Large-Scale Exploits andEmergentThreats Botnets SpywareWorms andMore USENIXAssociation Berkeley Calif USA April 2009

[23] Y-MWang D Beck X Jiang et al ldquoAutomatedweb patrol withstrider honeymonkeysrdquo in Proceedings of the 2006 Network andDistributed System Security Symposium February 2006

[24] The Honeynet Project Capture-HPC client honeypot 2008httpprojectshoneynetorgcapture-hpc

[25] L Lu V Yegneswaran P Porras and W Lee ldquoBlade an attack-agnostic approach for preventing drive-by malware infectionsrdquoin Proceedings of the 17th ACM Conference on Computer andCommunications Security (CCS rsquo10) pp 440ndash450 ACM Octo-ber 2010

[26] N Provos P Mavrommatis M A Rajab and F Monrose ldquoAllyour iFRAMEs point to usrdquo in Proceedings of the 17th USENIXSecurity Symposium pp 1ndash15 2008

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 8: Research Article Method for Detecting Core Malware Sites ...downloads.hindawi.com/journals/cmmm/2015/756842.pdf · # of nodes 1, Time complexity: ( ). B Eigenvector Centrality Index

8 Computational and Mathematical Methods in Medicine

infection history and exposure frequency were consideredwhen estimating the risk of exploit sites Finally the MRI wasestimated using the 1st order risk analysis and the risk indexof the distribution sites and exploit sites

In future work we will develop a feature model thatpredicts the seriousness of website security problems by data-mining the logs produced frommalicious code detection andvulnerability scanning tools

As this feature model will be used to predict the risk ofa specific website it should contribute to establish an activemalicious code distribution blocking system that realizesproactive responses beyond the limit of reactive responsesthat rely only on traditional malicious code detection tools

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] J A Hansen and N M Hansen ldquoA taxonomy of vulnerabilitiesin implantable medical devicesrdquo in Proceedings of the 2ndAnnualWorkshop on Security and Privacy inMedical andHome-Care Systems (SPIMACS rsquo10) pp 13ndash20 October 2010

[2] C-S Park ldquoSecurity mechanism based on hospital authen-tication server for secure application of implantable medicaldevicesrdquo BioMed Research International vol 2014 Article ID543051 12 pages 2014

[3] E Hutchins M Cloppert and R Amin ldquoIntelligence-drivencomputer network defense informed by analysis of adversarycampaigns and intrusion kill chainsrdquo in Proceedings of the 6thInternational Conference on Information Warfare and Security(ICIW rsquo11) pp 113ndash125 Academic Conferences March 2011

[4] N Moran ldquoUnderstanding Advanced Persistent ThreatsmdashACase Studyrdquo 2010 httpswwwusenixorgsystemfilesloginarticles105484-Moranpdf

[5] S-J Kim D-E Cho and S-S Yeo ldquoSecure model againstAPT in m-connected SCADA networkrdquo International Journalof Distributed Sensor Networks vol 2014 Article ID 594652 8pages 2014

[6] N Provos P Mavrommatis M Abu Rajab and F MonroseldquoAll your iframes points to usrdquo in Proceedings of the USENIXSecurity 2008

[7] S Lee and J Kim ldquoWARNINGBIRD detecting suspiciousURLsin twitter streamrdquo in Proceedings of the Symposium on Networkand Distributed System Security (NDSS rsquo12) 2012

[8] X Sun Y Wang J Ren Y Zhu and S Liu ldquoCollectinginternet malware based on client-side honeypotrdquo in Proceedingsof the 9th International Conference for YoungComputer Scientists(ICYCS rsquo08) pp 1493ndash1498 Hunan China November 2008

[9] Y-C Cho and J-Y Pan ldquoMultiple-feature extracting modulesbased leak mining system designrdquoThe Scientific World Journalvol 2013 Article ID 704865 11 pages 2013

[10] D H Kim Y-G Kim H P In and H C Jeong ldquoA method forrisk measurement of botnetrsquos malicious activitiesrdquo InformationJournal vol 17 no 1 pp 165ndash180 2014

[11] C Ni C Sugimoto and J Jiang ldquoDegree closeness andbetweenness application of group centrality measurements toexplore macro-disciplinary evolution diachronicallyrdquo in Pro-ceedings of the ISSI pp 1ndash13 Durban South Africa 2011

[12] F Barzinpour B Hoda Ali-Ahmadi S Alizadeh and S G JalaliNaini ldquoClustering networksrsquo heterogeneous data in defining acomprehensive closeness centrality indexrdquoMathematical Prob-lems in Engineering vol 2014 Article ID 202350 10 pages 2014

[13] S K Raghavan Unnithan B Kannan and M JathavedanldquoBetweenness centrality in Some classes of graphsrdquo Interna-tional Journal of Combinatorics vol 2014 Article ID 241723 12pages 2014

[14] J Ma L K Saul S Savage and G M Voelker ldquoBeyondblacklists learning to detectmaliciousweb sites from suspiciousURLsrdquo in Proceedings of the 15th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo09) pp 1245ndash1253 July 2009

[15] J Ma L K Saul S Savage and G M Voelker ldquoIdentifyingsuspicious URLs an application of large-scale online learningrdquoin Proceedings of the 26th Annual International Conference onMachine Learning (ICML rsquo09) pp 681ndash688 2009

[16] C Curtsinger B Livshits B Zorn and C Seifert ldquoZozzlelow-overhead mostly static javascript malware detectionrdquo inProceedings of the USENIX Security Symposium 2011

[17] M Cova C Kruegel and G Vigna ldquoDetection and analysis ofdrive-by-download attacks and malicious JavaScript coderdquo inProceedings of the 19th International World Wide Web Confer-ence (WWW rsquo10) pp 281ndash290 April 2010

[18] D Canali M Cova G Vigna and C Kruegel ldquoProphiler a fastfilter for the large-scale detection of malicious web pagesrdquo inProceedings of the 20th International Conference on World WideWeb (WWW rsquo11) pp 197ndash206 2011

[19] C Whittaker B Ryner and M Nazif ldquoLarge-scale automaticclassification of phishing pagesrdquo in Proceedings of the Sympo-sium on Network and Distributed System Security (NDSS rsquo10)2010

[20] P Agten S van Acker Y Brondsema P H Phung L Desmetand F Piessens ldquoJSand complete client-side sandboxing ofthird-party JavaScript without browser modificationsrdquo in Pro-ceedings of the 28th Annual Computer Security ApplicationsConference (ACSAC rsquo12) pp 1ndash10 ACM December 2012

[21] C Seifert I Welch and P Komisarczuk ldquoHoneyc the low-interaction client honeypotrdquo in Proceedings of the New ZealandComputer Science Research Student Conference (NZCSRCS rsquo07)Hamilton New Zealand 2007

[22] N Jose ldquoPhoneyC a virtual client honeypotrdquo in Proceedingsof the 2nd USENIX Conference on Large-Scale Exploits andEmergentThreats Botnets SpywareWorms andMore USENIXAssociation Berkeley Calif USA April 2009

[23] Y-MWang D Beck X Jiang et al ldquoAutomatedweb patrol withstrider honeymonkeysrdquo in Proceedings of the 2006 Network andDistributed System Security Symposium February 2006

[24] The Honeynet Project Capture-HPC client honeypot 2008httpprojectshoneynetorgcapture-hpc

[25] L Lu V Yegneswaran P Porras and W Lee ldquoBlade an attack-agnostic approach for preventing drive-by malware infectionsrdquoin Proceedings of the 17th ACM Conference on Computer andCommunications Security (CCS rsquo10) pp 440ndash450 ACM Octo-ber 2010

[26] N Provos P Mavrommatis M A Rajab and F Monrose ldquoAllyour iFRAMEs point to usrdquo in Proceedings of the 17th USENIXSecurity Symposium pp 1ndash15 2008

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom

Page 9: Research Article Method for Detecting Core Malware Sites ...downloads.hindawi.com/journals/cmmm/2015/756842.pdf · # of nodes 1, Time complexity: ( ). B Eigenvector Centrality Index

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Disease Markers

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom