Detecting Web-Based Botnets Using Bot Communication ...downloads.hindawi.com/journals/scn/2017/5960307.pdfWeb-based botnets can easily provide stable and quali-fied client-to-server

Research ArticleDetecting Web-Based Botnets Using Bot CommunicationTraffic Features

Fu-Hau Hsu1 Chih-Wen Ou1 Yan-Ling Hwang2 Ya-Ching Chang1 and Po-Ching Lin3

1Department of Computer Science and Information Engineering National Central University Taoyuan Taiwan2School of Applied Foreign Languages Chung Shan Medical University Taichung Taiwan3Department of Computer Science and Information Engineering National Chung Cheng University Chiayi Taiwan

Correspondence should be addressed to Chih-Wen Ou chihwenfrankougmailcom

Received 28 March 2017 Revised 18 June 2017 Accepted 25 September 2017 Published 3 December 2017

Academic Editor Steffen Wendzel

Copyright copy 2017 Fu-Hau Hsu et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Web-based botnets are popular nowadays A Web-based botnet is a botnet whose CampC server and bots use HTTP protocol themost universal and supported network protocol to communicatewith each other Because the botnet communication can be hiddeneasily by attackers behind the relatively massive HTTP traffic administrators of network equipment such as routers and switchescannot block such suspicious trafficdirectly regardless of costs Based on the clients constituent of aWeb server and characteristics ofHTTP responses sent to clients from the server this paper proposes a traffic inspection solution calledWeb-based Botnet Detector(WBD)WBD is able to detect suspicious CampC (Command-and-Control) servers of HTTP botnets regardless of whether the botnetcommands are encrypted or hidden in normalWeb pagesMore than 500GB real network traces collected from 11 backbone routersare used to evaluate our method Experimental results show that the false positive rate of WBD is 042

1 Introduction

A botnet is a group of compromised computers namely botscontrolled by one or multiple controllers [1ndash3] These botnetcontrollers also named bot masters provide commands totheir bots through CampC (Command-and-Control) servers sothat the bots can perform actions for their bot mastersThereare several criteria to categorize botnets including the attack-ing behavior CampC model communication channel rallyingmechanism and the evasion technique We firstly focus onthe centralized CampCmodel and discuss details about it in thisstudy

For a botnet with a centralized CampC model each botconnects to its CampC server to retrieve commands or to deliverdataThere aremany advantages to use such an architecture toorganize CampC servers and their bots compared to the decen-tralized and randomized models The first advantage is thelow cost to construct such a botnet because bot masters caneasily create this kind of botnets using many off-the-shelfopen resources and applications Meanwhile the centralizedmodel allows a bot master to quickly rally a large numberof its bots by commanding few CampC servers Such efficiency

obviously facilitates cybercriminals to use botnets to conductmalicious activities such as DDoS attacks and spamming [4]

According to the communication protocols used by bot-nets botnets can be classified into several categories Thesecategories include the IRC- (Internet Relay Chat-) basedbotnet the IM- (Instant Message-) based botnet and theWeb-based botnet This paper focuses on the Web-basedbotnet also named HTTP botnet whose communicationchannel between the CampC server and its bot clients is viathe HTTP A CampC server of a Web-based botnet works likea normal Web server and bot clients of a Web-based botnetwork as normal Web clients We call the CampC server of aWeb-based botnet a botnet Web server hereafter Two bot-nets Spyeye [5] and Zeus [6] are well-known HTTP-basedbotnets According to previous studies on these two botnetsthere are several reasons why theHTTP is attractive to botnetowners First HTTP traffic is the most popular Internettraffic nowadays so thatWeb-based botnet traffic can be easilydisguised as normal HTTP traffic making the botnets moredifficult to be discovered than those that use less popularprotocols Secondmost network firewallsproxies allowhostsbehind them to access Internet via the HTTP As a result

HindawiSecurity and Communication NetworksVolume 2017 Article ID 5960307 11 pageshttpsdoiorg10115520175960307

2 Security and Communication Networks

Web-based botnets can easily provide stable and quali-fied client-to-server connectivity Third many promisingsolutions [1 7ndash9] have been developed to precisely detecttraditional IRC-based botnets instead of Web-based botnetsTherefore the HTTP gradually becomes an ideal alternativeprotocol for botnet owners to use as the communicationchannel in recent years and our study focuses on this kindof botnets

11 Web-Based Botnet Detection Bot clients are trojans exe-cutable programs or scripts running on compromised hostsHence their behavior is different from human user behaviorBesides their activity pattern sizes and transferred contentare also different from human users As programs generatethe communication traffic automatically the Web-basedbotnet communication has some prominent characteristicsAccording to our preliminary survey on a botnet taxonomystudy [2] a typical bot client of a centralized CampC botnetoften needs to synchronize with its botnet Web serverto retrieve commands or deliver execution results Suchsynchronization is often scheduled when bot clients areeffectively controlled by botnetWeb servers Hence we thinkthat this phenomenon of synchronization can be utilized as ahint to indicate whetherWeb clients are controlled by humanusers or by bot clients Besides we also found that if a groupofWeb clients associatedwith aWeb server consists of humanusers each of them often has a different access pattern tothe Web server For example these human clients may visitthe Web server at different times of a day or these clientsmay visit the Web server different numbers of times eachday On the contrary if these Web clients are bot clientswhich run programs or scripts they may act together andbehave similarly Therefore they may contact their botnetWeb server repeatedly according to a predefined time intervalto access commands from their botnet Web server Suchrepeated contact to certain botnet Web servers may continuefor several days which is apparently different from normalhuman behavior In addition the same group of bot clientsusually tends to communicate with the same botnet Webserver Based on the long-term repeated contact phenomenonand similar access pattern of the clients of a Web server weuse a metric named Total Host Repetition Rate or THR inshort as one of our criteria to examine whether aWeb serveris a suspicious botnet CampC server

Instead of THR we also found that the payload inside thetraffic between bot clients and their botnetWeb server usuallycontains short and simple commands Furthermore all botclients commanded by a certain CampC server tend to receivecommands at the same time This similarity of payloadsamong the bot clients controlled by the same botnet Webserver is also described as the command-response pattern byBotProbe [7] A normal Web server usually contains manyWeb pages and different users accessing different Web pagesHence unless theWeb server contains only oneWebpage theprobability that its users retrieve the sameWeb page from theWeb server simultaneously is low and different Web pagesusually have different sizes As a result during a period oftime the sizes of responding payloads of differentWeb clientsaccessing the same Web server are supposed to be different

while a botnetWeb server oftendispatches similar commandsto its bot clients at the same timeThus we utilize the payloadsize difference as a metric called the payload size similarityor PSS in short to judge whether aWeb server is a suspiciousbotnet CampC server or not The formalization for these twometrics will be discussed later in Section 22

In our prototype implementation we designed an auto-matic mechanism based on the above two metrics andintegrated this mechanism into our prototype system namedWeb-based Botnet Detector orWBD in short to perform theinspection WBD is attached to a network traffic monitoringsystem which is able to generate traffic logs from the onlinenetwork stream and analyzes these logs simultaneously Onlyfew arithmetic calculations are required by WBD whileperforming runtime inspection on those monitored trafficlogs Such calculation brings significant overhead to othersimilar approaches during traffic inspection

12 Contributions Thesolution of this paper contains the fol-lowing characteristics (1)Compared tomainstreammachinelearning approaches which often rely heavily on tens or evenhundreds of features an approach with only few featurescan reduce notable overhead WBD requires only severaldeterministic calculations which are easily extracted andcalculated frommonitored network traffic (2)WBD inspectstraffic of backbone networks It does not require any programinstalled on network end-hosts and servers (3) WBD doesnot use features based on traffic content mining It does notrely on particular protocol-parsing as well In summary thecontributions of WBD include the following

(1) WBDrequires only several deterministic calculationswhich means that it is ideal to cooperate with heavy-loading backbone equipment

(2) We conducted large-scale backbone data inspectionfor this study It reveals those IP addresses andtimestamps of Web servers that generate suspiciousWeb-based botnet communications across the globalInternet

(3) Due to the low correlation between the content itselfour solution can target the HTTPS protocol theoret-ically Also botnet owners may deliberately embedtheir botnet commands into somenormal traffic suchas universal Web contents to bypass the potentialinspection along the traffic path Our solution canwork for this situation because the calculation onTHR and PSS requires the source and destination IPaddresses of the packets in the traffic which are notencrypted by most secure protocols

This paper includes six sections Section 2 will explainhow we use features calculated from these criteria for thedatagram-like network traffic logs Sections 3 and 4 evalu-ate our approach and discuss issues including comparisonwith other similar approaches and the accuracy Section 5describes previous studies aiming at botnet related issuesSection 6 summaries this study

Security and Communication Networks 3

A B C A B D E F D

Web server 1234

Web server 1234

Web server 1234

0000ndash0059 0100ndash0159 0200ndash0259

Figure 1 A communication pattern between 6 Web clients and a Web server during a three-hour monitoring period

2 Methodology

In order to develop a traffic inspection approach severalissues have to be consideredThese issues include the involvedscale of monitoring the volume of the traffic and thefeasibility to obtain such traffic If an approach has tomonitorthe user-side traffic an appropriate inspecting location maybe at a gateway a router or a proxy (if it is mandatoryfor each network user) of the target user-side network Ifan approach requires to monitor the server-side networka possible location to do this work may be located atthe intrusion detection equipment or firewall equipment ofthe target server-side network These two deployments arecommonly selected by many traffic inspection approachesbecause of their deployment feasibility and the affordabletraffic volume Different from these two categories ourapproach aims at monitoring the global Internet as muchas possible The possible inspecting locations for such kindof approach should include backbone equipment that routesand processes large amount of IP packets In our study weare allowed to obtain the traffic from several online backbonerouters in Taiwan so that we can develop a solution thatis not specifically restricted by user-side or server-side net-works

Even though we are able to obtain logs from actual back-bone routers these routers are so important for our Internetservice provider and they are always full-loaded Hencedirectly running inspecting procedures on them is certainlyimpractical so we adopted offline analyzing-after-recordingmethod to make our experiments We collected log samplesfrom these backbone routers for several times and analyzedthem The detailed information about this will be describedin Section 3 Due to many security and privacy concerns allthese actions were conducted and completed in an office ofan Internet service provider We cannot see the content ofIP packet payloads and we cannot take any log-out from theoffice Information about the recorded data from the trafficwill be described later in Section 31

21 A Case Study To provide a clear understanding of ourapproach considering an input case extracted from our logsdepicted in Figure 1 there are six Web clients named from Ato F requesting respectively a Web server with an IP addressdenoted as 1234 hereafter The three-hour long monitoringperiod is separated into three consecutive time intervalsas shown in Figure 1 and the length of each time intervalis one hour If clients A B and C make requests to theWeb server in the first time interval there will be arrowsconnecting them to the Web server as shown in the left timeinterval marked with 0000ndash0059 Similarly clients A andB repeat requesting and D makes the request in the secondtime interval Client D repeats requesting and clients E and Fmake requests in the third time interval in this case A graphicrepresentation of this case is shown in Figure 2 AWeb serveris denoted as an S-vertex and a Web client is denoted as aC-vertex The communication between a Web server and aclient is denoted as an undirected edge connecting these twovertexes as an example shown in Figure 2 There is no edgebetween two different C-vertexes because we do not need toconsider the case when a Web client also runs a Web serviceAfter all we only focus on centralized botnets We can alsoignore edges between two S-vertexes because we only focuson communication made by Web clients A graph is used todescribe the communication patterns between Web clientsand a Web server in a time interval These graphs will not beused directly for graph computation How these graphs areused will be described in Section 22

22 Features Formulation We use graphic representationonly for conceptive discussion For actual calculation con-ducted by WBD equivalent formulas calculation is adoptedafter obtaining traffic logs Such a design ensures thatrelated calculation is theoretically light-weight and such anapproach is suitable for working with existing Internet back-bone equipment As we have mentioned in Section 11 twometrics are used to determine whether there exists botnet


C

Web server 1234

Web client

CACA

CB CBCD CD

CECFCC

S1

S1

S1 S1

T1 T2 T3

Figure 2 Graph representations of communication patterns betweenWeb clients and theirWeb server at different monitoring time intervals

Host group 1

Host group 2 Web server 1234

Web client

CA

CB

CDCE

CG

CF

C

CH

CC

S1

S1

Figure 3 Two host groups related to Web server 1198781 at two consecutive time intervals 1198791 and 1198792

communication in themonitoredHTTP traffic In this paperwe call the group of Web clients that communicate with aWeb server in a time interval a host group as two groups ofthis case shown in Figure 3 show two host groups appearingat two different time intervals If a Web server is a botnetWeb server the associated host groups are called bot groupsAccording to the previous studies [2 10] and observationthe hosts of a bot group tend to communicate with the samebotnet Web server all the time Even though the constituentmembers of a bot group may change due to some technical

issues or management reasons such a change does not occurdramatically in a short period of time

Two host groups which are associated with the sameWeb server appearing in successive time intervals are calledadjacent host groups We use two scores Access (AC) scoreand Total Host Repeat (THR) score to evaluate the THRfeature of aWeb server Equation (1) defines these two scoresrespectively Score AC119904 represents the number of total hostscommunicating with Web server 119904 in these 119899 time intervalsFor a certain time interval the HR score is defined as the


proportion of the number of hosts appearing in both thecurrent host group and its previously adjacent host groupto the number of hosts in the current host group Theintersection of hostgroup119904119905minus1 and hostgroup119904119905 denotes the setof hosts appearing in both the adjacent host groups ScoreTHR119904 is an average of 119899 Host Repeat (HR) scores denotingthe similarity of the host groups of Web server 119904 in 119899 timeintervals

AC119904 =119899

sum119905=1

10038161003816100381610038161003816hostgroup11990411990510038161003816100381610038161003816

THR119904 =1119899119899

sum119905=1

10038161003816100381610038161003816hostgroup119904119905minus1 ⋂ hostgroup1199041199051003816100381610038161003816100381610038161003816100381610038161003816hostgroup119904119905

10038161003816100381610038161003816

(1)

Assume the host group of Web server 119904 is hostgroup119904119905 intime interval 119905 and there are 119896119905 different total responding pay-load sizes PS1199051198961 PS1199051198962 PS119905119896119905 for the |hostgroup119904119905 | hosts|PS119905119896119894 | represents the number of hosts whose payload size isPS119905119896119894 in time interval 119905 Score payload size similarity (PSS)defined in (2) is used to evaluate the payload similarity featureof a Web server Equation (2) gives its definition

PSS119904 =sum119899119905=1

max119896119905119894=1(10038161003816100381610038161003816PS11990511989410038161003816100381610038161003816)

sum119899119905=1

10038161003816100381610038161003816hostgroup11990411990510038161003816100381610038161003816=sum119899119905=1

max119896119905119894=1(10038161003816100381610038161003816PS11990511989410038161003816100381610038161003816)

AC119904 (2)

23 WBD Classifier After quantification of features for eachinput Web server we can distinguish between normal Webservers and suspicious botnetWeb server by comparing theirTHR AC and PSS scores to thresholds In order to find theproper thresholds we extracted the HTTP traffic traces ofAlexa top 20 Websites in Taiwan from 11 backbone routersThe traffic related to these popular Websites is supposedto be nonbotnet traffic However when botnet traffic ishidden in the traffic of a social network Website the socialnetwork traffic may contain botnet traffic Hence whencollecting nonbotnet traffic we can filter out social networkrelated traffic first to avoid the above problem However inour traces except Facebook which continues detecting andremoving fake or malicious accounts almost all of the top20 Websites are nonsocial network Websites Therefore weconsider the traffic associated with these popular Websites asbenign We calculated the THR score and AC score of eachWeb server and then selected the maximum THR score andAC score as the thresholds Equation (3) shows the equationsTHR119894 represents the THR score of Web server 119894 and AC119894represents the AC score of Web server 119894

THRthershold = max119894isintop 20 benigh servers

(THR119894)

ACthershold = max119894isintop 20 benigh servers

(AC119894) (3)

WBD uses the thresholds to examine Web serversappearing in the HTTP traffic traces and uses (4) to check

whether Web server 119894 exhibits the THR feature denoted byHRSusp Server(119894)

HRSusp Server (119894)

=

True THR119894 gt THRthershold and AC119894 lt ACthershold

False otherwise

(4)

WBD uses (5) to check whether Web server 119894 exhib-its the similar payload size feature denoted by notationPSSSusp Server(119894) PSSthreshold in (5) is defined as 05 because ifin average the payload sizes of 50hosts of each host group ofa Web server during a time interval are similar to each otherthe Web server is unlikely to be a normal Web server

PSSSusp Server (119894) =

True PSS119894 gt PSSthresholdFalse otherwise

(5)

Based on the values determined by (4) and (5) for Webserver 119894 WBD uses (6) to determine whether Web server 119894 isa suspicious botnet Web server All hosts which connect to itmore than once are supposed to be its bot clients

Susp CampC (119894)

=

True HRSusp Server (119894) == True or PSSSusp Server (119894) == TrueFalse otherwise

(6)

WBD is built based on the above equationsThere are fourcomponents in our prototype systemThe first is to collect theraw data The second is a module able to calculate THR andAC The third is a module able to calculate PSS The last is acombination of a report generator and a classifier operatingaccording to the output from the second and the thirdmodules

3 Evaluation

The evaluation of WBD has two purposes The first purposeis to discover appropriate thresholds of our solutions Thesecond purpose is to estimate the effectiveness of WBD Inorder to evaluate the effectiveness of WBD we need to knowthe number of botnet Web servers whose network traffic isrecorded in our datasets However according to the phishingdomain survey reports made byMcGrath et al [9] and Aaronet al [11] attackers usually do not use a compromised host formore than a couple of days Hence we are not able to check alltheWeb servers in our collected datasets before attackers stopusing some botnet Web servers that are hidden inside theselarge numbers of Web servers Instead of checking all Webservers for entire datasets we can only performmanual checkon malicious hosts identified by WBD to determine whetherthey are truly malicious so that we can at least calculate thefalse positive rate of WBD in our evaluation

We collected network traces three times from 11 backbonerouters in Taiwan These routers belong to one of the threelargest Internet service providers in Taiwan Each collectiongenerates a dataset Each collection lasts for 48 hours to gen-erate a dataset Hence we obtained 3 datasets These routers


Table 1 Fields of a NetFlow V5 record

Content Bytes offset Descriptionsrcaddr 0ndash3 Source IP addressdstaddr 4ndash7 Destination IP addressdPkts 16ndash19 Packets in the flowsrcport 32-33 Source port numberdstport 34-35 Destination port numberprot 38 Protocol (6 = TCP 17 = UDP)

Table 2 Information of our training phase dataset

Time period Size of the raw file Number of Web servers20130316 0000ndash20130317 2359 About 200GB 9933

Table 3 Information of our testing phase datasets

Index Time period Size of the raw file Number of Web servers1 20130601 0000ndash20130602 2359 About 140GB 1562942 20130118 0000ndash20130119 2359 About 160GB 170920

Table 4 Results of the testing phase

Index False positive Number of suspicious Web servers1 3 (028) 10472 9 (083) 1085Total 12 (042) 2132

are Cisco routers equipped with NetFlow [12] Thereforethese datasets were recorded in the NetFlow V5 compatibleformat Our experiments include two phases The first is thetraining phase which is used to determine the thresholdsusing the first dataset The second is the testing phase whichuses the other two datasets

31 NetFlow NetFlow is able to record all traffic passingthrough a Cisco router It fetches data from IP packets andgenerates flow recordsThose flow records can be transferredto other devices for further analysis The source address field119904119903119888119886119889119889119903 destination address field 119889119904119905119886119889119889119903 source port field119904119903119888119901119900119903119905 destination port field119889119904119905119901119900119903119905 and protocol field119901119903119900119905of a NetFlow V5 record specify a session between a certainsource host and a destination host via the HTTP as shownin Table 1 The 119889119875119896119905119904 field contains the raw packet data sothat we can calculate the payload size of a packet and the totalpayload size of an HTTP session

32Threshold and Training Phase Thefirst part of our exper-iments is to calculate the thresholds and perform trainingThe training data were collected fromMarch 16 to 17 in 2013The number of backbone routers involved in this phase isless than the number of routers in the testing phase becausewe chose the routers which forward packets to popular Webservers in this phase As shown in Table 2 the total raw datasize in this phase is about 200GB which consists of the IPaddresses of 9933 Web servers The THR scores of Alexa top20 popular Websites in Taiwan are all less than 0521 We also

used a browser to manually connect to the 9933 Web serversto check which of them are normal Web servers and whichof them are abnormal The THR scores of the above normalWeb servers are almost all less than 0521 In contrast theTHR scores of the above abnormal Web servers are almostall greater than 0521 Therefore we set THRthreshold as 0521and set ACthreshold as 12000 Besides PSSthreshold is set to 05as described in previous subsection We also calculated theaverage Web page size for these top 20 Websites and the sizeis 47087 bytes

33 Testing Phase In the testing phase two datasets wereused Table 3 shows the information of these samples Morethan 300GB data were used in our analysis These twodatasets contain network traces of 156294 and 170920 Webservers respectively Among these Web servers WBD found1047 suspicious botnetWeb servers from testing dataset 1 and1085 suspicious servers from testing dataset 2 For each ofthese 2132 suspicious botnet Web servers we use a browserto manually check their content If a suspicious botnet Webserver replies to a normal Web page we treat this case asa false positive case Besides bot clients usually retrievecommands from their botnet Web server and the sizes ofthe commands are supposed to be smaller than the size ofnormal Web pages Therefore if the size of data returningfrom a Web server is greater than 47087 bytes we will deemthe Web server as a normal one and also treat this case as afalse positive case The result of the testing phase is shownin Table 4 To calculate the false negative rate we need to


Table 5 Features and classifiers used by four similar approaches

Approaches Features ClassifierVenkatesh and Nadarajan ORT RIO PT SYN FIN PSH Neural networkZhao et al PX PPS NR APL FPS PV FPH TBP Decision treeCai and Zou SHH CC SCL BIC PR DWS Multilayer filterWBD AC THR PSS Decision tree

Table 6 Comparisons of features used by four approaches

Calculations Venkatesh and Nadarajan Zhao et al Cai and Zou WBDCounting specific packets ORT RIO PT SYN FIN PSH PX PPS NR mdash mdashArithmetic based on packet size mdash APL FPS PV SHH CC SCL PSSArithmetic based on numbers of hosts mdash FPH BIC AC THSArithmetic based on interpacket timing mdash TBP PR mdashHost fingerprinting mdash mdash DWS mdash

manually check 327214 Web servers to confirm the botnetWeb servers within themHowever according to the phishingdomain survey reportsmade byMcGrath et al [13] andAaronet al [14] attackers usually do not use a compromised hostfor more than a couple of days Because we are not able tocheck all the 327214 Web servers before attackers stop usingsome botnet Web servers that are hidden inside these 327214Web servers currently we are not able to calculate the falsenegative rate of WBD However when comparing with amalicious IP list provided by ICST [15] we found that themajority of the botnetWeb serverswe found are not in the listwhich shows that WBD provides a list of originally unknownbotnet Web servers to system administrators

4 Discussion

Some approaches aiming at detecting HTTP botnets werealso proposed in recent years These approaches use variousfeatures to inspect network traffic to detect HTTP botnetsThree of such approaches are selected and compared withWBD Table 5 lists these approaches with their features andclassifiers Venkatesh and Nadarajan [16] proposed a multi-layer feedforward neural network solution with six featuresincluding one-way ratio of TCP packets (ORT) ratio ofincoming to outgoing TCP packets (RIO) the proportion ofTCP packets in the flow (PT) and TCP flags counting onSYN FIN and PSH flags These features require only count-ing specific packets so that they increase relatively slightperformance overhead compared to other complex featuresused by the rest of the approaches Such counting-based fea-tures are simple and can be manipulated by communicatorsso that botnet owners who are aware of such features canbypass the detection by specifically changing their forms ofcommunication packets Zhao et al [17] proposed a solutionwith eight features Three of them are related to counting-based features including the number of packets exchanged(PX) the number of packets exchanged per second in shorttime interval (PPS) and the number of reconnections (NR)Three of them are related to arithmetic operations based onthe packet payload size including the average payload packet

length (APL) the variance of the payload packet length(PV) and the size of the first packet (FPS) One of the tworemaining features involves arithmetic calculations for thenumber of flows from this address over the total numberof flows generated per hour (FPH) and the other featurecalculates the average time interval between two consecutivepackets (TBS) This approach has higher accuracy than theprevious study of Venkatesh and Nadarajan and its perfor-mance overhead is certainly increased due to involving morecomplicated features compared to the previous study Caiand Zou [10] proposed a solution with six features Threeof them require arithmetic operations based on the packetpayload size including short HTTP header (SHH) constantcontent (CC) and short content length (SCL) One featureis related to the bot IP clustering (BIC) one focuses on theperiodical request (PR) and the last one requires the hostfingerprinting among Web servers to estimate the extent ofdiversified Web services (DWS) Although this approach hasthe comprehensive discussion about the features of HTTPbotnet and comes up with a set of complicated featuresthat is suitable for determining the existence of botnet com-munication precisely the performance overhead is still a sig-nificant issue Many complex features especially the DWSfeature are involved in this approach for traffic inspection

Based on the above discussion we discovered that somekinds of calculations are commonly required by some of thesefour approaches Table 6 describes the summarization Boththe study of Venkatesh and Nadarajan and the study of Zhaoet al count specific packets Both the study of Zhao et aland the study of Cai and Zou have features which requireperforming arithmetic operations based on the packet pay-load size or based on interpacket timing Three approachesincluding WBD have features requiring execution of arith-metic operations based on the numbers of hosts and thepacket size HoweverWBDuses only three features requiringexecution of arithmetic operations based on numbers ofhosts and the packet payload size Compared to the studyof Venkatesh and Nadarajan and the study of Zhao et alWBD does not need to count specific packets so that botnetowners have fewer opportunities to bypass WBD Besides


Table 7 Comparison among false positive rates of four similar approaches

Approaches False positive rates of various test datasetsVenkatesh and Nadarajan Spyeye-1 (097) Spyeye-2 (098) Zeus-1 (099) Zeus-2 (096)Zhao et al BlackEnergy (0) Weasel (82)Cai and Zou SJTU1 (176) SJTU2 (263) QingPu (136)WBD 042

unlike the study of Cai and Zou WBD does not apply timeconsuming features such as features of interpacket timing andhost fingerprinting so that WBD has limited performanceoverhead and is able to complete classification in time

To evaluate the effectiveness of their approaches eachof these four similar approaches used their own datasetsto obtain the false positive rates of the chosen datasetsTable 7 lists test datasets and respective false positive rates ofthese four similar approaches Four datasets were used byVenkatesh and Nadarajan and all false positive rates of thesedatasets are under 1 For the false positive rates of Zhaoet al the false positive rate of test dataset BlackEnergy is0 Dataset BlackEnergy is a pure botnet traffic datasetThe false positive rate of test dataset Weasel is 82 DatasetWeasel contains normal traffic The authors analyzed these82 false positives (2902 false alerts) and discovered that allof these false alerts belong to six applications They claimedthat once a whitelist is adopted for their approach these falsepositives would be reduced The study of Cai and Zou usedthree datasets to test their approach The false positive ratesrange from 136 to 263 WBD used logs directly capturedfrom backbone routers and the false positive rate is 042Compared to other three approaches WBD is better thanthe study of Venkatesh and Nadarajan and the study of Caiand Zou WBD does not need a prebuilt whitelist to removenormal applications before detection

41 False Positives The total false positive rate of our studyis 042 This excellent accuracy results from the adoptionof THS and PSS In fact many existing front-end Webapplications may repeat contacting a Web server The Web-based instant messenger is one of the typical examples whereWeb clients contact theirWeb servers repeatedly However asmentioned in previous paragraph other related approachesmay not distinguish the differences between a botnet and aWeb server functioning as a Web-based instant messenger

42 False Negatives Due to the reasons described in thissubsection currently we are not able to discuss the false neg-ative of our work According to the phishing domain surveyreports made by McGrath et al [13] and Aaron et al [14]attackers usually do not use a comprised host for more thana couple of days Apparently we are not able to check allthe Web servers classified as benign in our datasets in timebefore most attackers stop using botnet Web servers that arehidden inside these large amounts ofWeb servers Comparedto several previous similar studies [10 16 17] most of themevaluate their solution by the datasets containing specificbotnets of Spyeye [5] and Zeus [6] instead of real live trafficfrom Internet This means that these similar approaches

may be accurate when they are applied for detecting thosebotnets which have similar characteristics to Spyeye andZeus but the accuracy is not evaluated for other Web-based botnets However most botnet owners keep changingattributes and characteristics of their botnets to avoid beingdetected Another reason why false negatives sometimes areimpractical is that the Web-based botnets may provide legalonline Web services simultaneously Mostly they may actlike normal Web services and it is very difficult if notimpossible to enumerate all InternetWeb servers having sucha characteristic All issues listed here lead to the uncertaintyof the discussion about the false negatives We will keepdiscussing this issue in our future work

43 Detection Evasion Experimental results show thatWBDis an ideal solution forWeb-based botnet detection Howevercurrent Web-based botnets may change their designs tobypass the detection of WBD For example bot clients mayconnect to their CampC server at nonadjacent time intervalsor various lengths of gibberish bytes may be added to theresponse payloads of different bot clients to diversify theresponse lengths However such evasionmethodsmay createseveral drawbacks in the modified botnets First this makesthe design and operation of a botnet muchmore complicatedbecause a botnet needs to coordinate the action of eachbot client Second gibberish bytes increase network trafficBesides if different bot clients use the same URL but getWeb pages with different lengths this may be a sign thatthe related server is not a normal Web server Besides CampCservers may apply fast-flux domain technique to changetheir IP addresses frequently in a very short period of timeBotnets with such ability theoretically possibly bypass WBDdeliberately with the price that all bots need to connectand disconnect different hosts frequently which makes themmuch more detectable by systemnetwork administrators Inthe literature of fast-flux research an approach proposed byHsu et al [18] has been developed to detect fast-flux domainsfrom a single host without using router traces Hence byintegrating both kinds of approaches we can create aneffective method to detect variousWeb server-based botnets

The goal of this paper is to find the botnet Web serversHowever during the detection we can also obtain the hoststhat is bot clients that connect to the botnet Web serversHence in our future work we will make more detailedsurvey to find the properties of bot clients Moreover thisstudy also works for detecting botnet Web servers com-municating with their bot clients via the HTTPS channelbecause the detection relies only on unencrypted parts of IPpackets instead of inspecting the payload content The unen-crypted parts include the information of the source host and


HK DE CN IN JP KR PH SG TW USCountry

0

250

500

750

1000

Figure 4 CampC server locations

destination host and the payload size Furthermore the THRfeature and the PSS feature will not be changed by modifyingthe content that a botnet Web server sends to its bot clientsHence even if a bot master hides its command inside anormal-lookingWeb pageWBD is still able to detect it Afterdetecting a list of CampC servers we also survey the distributionof the locations of these CampC servers Figure 4 shows thelocations of the 1085 CampC servers that WBD detects fromtesting dataset 2Themajority of them are located in the USAand Taiwan Some of them are located in China SingaporePhilippines and so on

5 Related Work

Most previous studies aim at generic botnet detection Guet al proposed several correlation-based detection solutionsBotMiner [19] BotSniffer [3] BotProbe [7] and BotHunter[20] BotMiner is a well-known network level correlation-based and protocol-structure independent solution It per-forms the connection behavior (C-Plane) and attack behavior(A-Plane) clustering and then performs cross-plane corre-lation to build a model for botnet CampC servers BotMinerrequires some real-world CampC server network traces in thetraining phase However such reliable CampC traces are notalways available in practice BotSniffer is also a correlation-based solution able to detect CampC servers in a port-independentmanner It is composed of a protocolmatcher anactivitymessage response detector and a correlation engineThe correlation engine runs group activitymessage responseanalysis based on the outputs from this protocol matcher andresponse detectors without requiring other prior knowledgeof these botnet CampC servers Only few packets are needed fortraining BotSniffer and it also works well at detecting smallbotnets BotProbe is a behavior-based solution specificallyfocusing on the command-response pattern of the botnetand its deterministic behavior (for the stateless bot client)BotMiner BotSniffer and BotProbe have some problemswhen the botnet attempts to avoid such detection The possi-ble evasions include using strong encryption using atypicalresponse and injecting random noise packets Especiallyfor BotMiner the botnet can create a specific evasion forbypassing the C-plane and A-plane clustering BotProbe hasassumptions that the input has to be perspective and the chat

protocol between bots has to be available for the detectionengine BotHunter detects botnets based on the bot-specificheuristics and the IDS dialog-based correlation The IDSdialogs represent different stages of a botnet life cycle Suchcorrelation can produce signatures for IDS systems ThisIDS-driven strategy has a problem when detecting encryptedbotnet communications Furthermore this solution also hasweaknesses similar to IDS and the signature generation andupdate problems must be overcome to reach ideal detectionperformance

Yu et al proposed the SBotMiner [21] an approach basedon large-scale network traffic filtering aiming at detectingsearch bots which often perform suspicious search activitieson the Internet and SBotMiner uses PCA (Principle Compo-nent Analysis) to separate the bot traffic from the benign usertraffic This approach suffers from noise-queries because thesearch bots can generate lots of meaningless search activ-ities to decrease the detection performance considerablyKarasaridis et al proposed a wide network traffic correlationsolution [9] However it only focuses on IRC-based botnetand needs many kinds of prior knowledge before performingthe correlation Zand et al proposed an approach [22] toautomatically extract Command-and-Control signatures fordetecting botnets Since the signature generation is basedon the extraction of frequent communication patterns it isalso not applicable to encrypted communication Wang etal proposed a fuzzy pattern-based filtering algorithm [23]This algorithm depends on the DNS query patterns so thatthe botnet especially for the Web-based botnet can easilyavoid the filtering by directly using IP address to communi-cate

Some recent research aims at detecting the decentral-ized peer-to-peer (P2P) botnets Zhang et al proposed anapproach [24] aiming at detecting P2P botnets Using de-centralized architecture greatly increases the survivabilitybecause most botnet takedown actions target CampC serversHowever the decentralized architecture also has some criticaldisadvantages P2P botnets oftenhave a complex architectureHence maintaining a P2P network always demands signifi-cant technical efforts In addition its non-client-server archi-tecture makes it inappropriate to be integrated into existingWeb services Other early studies discussed and evaluated thescale and the takedown techniques of a botnet BothDagon etal [2] and Khattak et al [25] discussed how different kinds ofbotnets are organized and what activities they may have AbuRajab et al focused on botnet scale evaluation [1] and Stone-Gross et al addressed detailed issues of taking down a botnet[26] Honeypots are often used to collect or observemaliciousnetwork traffic in early botnet research However honeypotsusually do not provide outgoing communication Thereforethey are not suitable for collecting botnet traffic Nadji et alproposed a system for the botnet takedowns [27] Such botnettakedown solution aims at stopping those DNS servers fromfunctioning in the botnet communication However thoseCampC servers are able to reorganize using other DNS serversrapidly since this approach targets deactivating the botnetcommunication not removing botnet CampC servers

Most approaches mentioned so far may be able but notspecifically designed to detect Web-based botnet Since the


majority of botnet owners seldom use their own hosts as theCampC servers they often use compromised hosts instead Inother words there must be the HTTP-supported malwareon those compromised hosts performing CampC server-likeoperations According to the study [28] proposed by Perdisciet al they addressed the concrete relationship betweenHTTP-supported malware and Web-based botnets Theyalso proposed an approach to detect the HTTP-supportedmalware by using malicious network traces This approachuses behavioral clustering of theseHTTP-supportedmalwaresamples by finding their structural similarities among thesequences of HTTP requests The results of behavioral clus-tering are used for generating signatures for an IDS systemSince they look into the sequences ofHTTP requests itmeansthat this approach cannot be used forHTTPS-basedmalwareIn addition this approach still suffers from similar evasionsmentioned so far including injecting noise sequences ofHTTP requests and implementing HTTP requests in a timetriggering oriented approach

Many recent botnet studies focus on problems brought bynew types of botnets which utilize currently popular Internetapplications For example some of these papers aim atdetecting botnets running on social networks Kartaltepeet al proposed a study focusing on the social network-based botnet [11] Wang et al proposed an approach [29] todetect the DGA botnet by utilizing social network analysisVenkatesh andNadarajan proposed a survey of Stegobot [30]which is a kind of botnets using steganography to mask cru-cial information in digital images and then transmitting theimages over social networks Ferrara et al addressed the riseof botnets running on social networks in a recent article [31]Botnets utilizing IoT and mobile devices were also addressedby several prestigious conferences and projects recentlyBertino and Islam addressed the issues related to botnets andInternet ofThings (IoT) security [32] Project [33] conductedin 2017 is also motivated by Bertino and Islam to analyzethe DDoS attack via IoT botnets Mobile devices suffer fromvulnerabilities as well as untrusted firmware and are alsovulnerable to botnet owners Eslahi et al unveiled MoBots[34] which represent those botnets on mobile devices andnetworks MoBots may use some existing services such asSMS to communicate with their bot masters Such issue iscritical to the telecommunication industryTherefore there isa related patent [35] which has been filed in 2016 disclosinga method for SMS-based botnet detection Social network-based botnets IoT botnets or even mobile device-basedbotnets are not typicalWeb-based botnets To be able to com-municate inmultiplemechanisms they aremore complicatedthan traditional Web-based botnets

6 Conclusion

This study proposes a solution called WBD to detect sus-picious Web-based botnets no matter whether the botnetcommunication is encrypted or hidden in normalWeb pagesWe propose three features two of them related to robot-likerepeated contact clustering and one of them related to similarpayload size to detect the existence of botnet Web serverswithin the network communication Applying our solutions

to 500GBpractical network traces we found the false positiverate of WBD is only 042

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work was funded by the projects of Ministry of Scienceand Technology of Taiwan under no 105-2221-E-008-074-MY3 and no 106-3114-E-002-005

References

[1] M Abu Rajab J Zarfoss F Monrose and A Terzis ldquoA mul-tifaceted approach to understanding the botnet phenomenonrdquoin Proceedings of the 6th ACM SIGCOMM on Internet Measure-ment Conference IMC 2006 pp 41ndash52 Brazil October 2006

[2] D Dagon O Gu C P Lee and W Lee ldquoA taxonomy of botnetstructuresrdquo in Proceedings of the 23rd Annual Computer SecurityApplications Conference ACSAC 2007 pp 325ndash338 MiamiBeach Fla USA December 2007

[3] G Gu J Zhang and W Lee Botsniffer Detecting botnet com-mand and control channels in network traffic In NDSS TheInternet Society 2008

[4] FBI Botnets 101 WhatThey Are and How to AvoidThem 2013[5] A K Sood R J Enbody and R Bansal ldquoDissecting spyeye-un-

derstanding the design of third generation botnetsrdquo ComputerNetworks vol 57 no 2 pp 436ndash450 2013

[6] H Binsalleeh T Ormerod A Boukhtouta et al ldquoOn theanalysis of the Zeus botnet crimeware toolkitrdquo in Proceedingsof the 2010 8th International Conference on Privacy Security andTrust PST 2010 pp 31ndash38 Canada August 2010

[7] G Gu V Yegneswaran P Porras J Stoll and W Lee ldquoActivebotnet probing to identify obscure command and control chan-nelsrdquo in Proceedings of the 25th Annual Computer ConferenceSecurity Applications ACSAC 2009 pp 241ndash253 HonoluluHawaii USA December 2009

[8] C Livadas R Walsh D Lapsley and W T Strayer ldquoUsingmachine learning techniques to identify botnet trafficrdquo in Pro-ceedings of the 31st Annual IEEE Conference on Local ComputerNetworks (LCN rsquo06) pp 967ndash974 Tampa Fla USA November2006

[9] A Karasaridis B Rexroad and D Hoeflin ldquoWide-scale botnetdetection and characterizationrdquo in Proceedings of the FirstConference on First Workshop on Hot Topics in UnderstandingBotnets HotBotsrsquo07 7 pages USENIX Association BerkeleyCalif USA 2007

[10] T Cai and F Zou ldquoDetecting HTTP botnet with clusteringnetwork trafficrdquo in Proceedings of the 8th International Con-ference on Wireless Communications Networking and MobileComputing (WiCOM rsquo12) pp 1ndash7 September 2012

[11] E J Kartaltepe J A Morales S Xu and R Sandhu ldquoSo-cial network-based botnet command-and-control emergingthreats and countermeasuresrdquo Lecture Notes in Computer Sci-ence (including subseries Lecture Notes in Artificial Intelligenceand Lecture Notes in Bioinformatics) Preface vol 6123 pp 511ndash528 2010

[12] Cisco NetFlow Services Solutions Guide 2007


[13] Behind Phishing An Examination of Phisher Modi Operandi[14] Global Phishing Survey Trends and Domain Name Use in

2H2009[15] Information and Communication Security Technology Center[16] G K Venkatesh and R A Nadarajan ldquoHTTP botnet detection

using adaptive learning rate multilayer feed-forward neuralnetworkrdquo Lecture Notes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics) Preface vol 7322 pp 38ndash48 2012

[17] D Zhao I Traore B Sayed et al ldquoBotnet detection based ontraffic behavior analysis and flow intervalsrdquo Computers amp Secu-rity vol 39 pp 2ndash16 2013

[18] F-H Hsu C-S Wang C-H Hsu C-K Tso L-H Chen andS-H Lin ldquoDetect fast-flux domains through response timedifferencesrdquo IEEE Journal on Selected Areas in Communicationsvol 32 no 10 pp 1947ndash1956 2014

[19] G Gu R Perdisci J Zhang and W Lee ldquoBotminer Clus-tering analysis of network traffic for protocol- and structure-independent botnet detectionrdquo in Proceedings of the 17th Con-ference on Security Symposium SSrsquo08 pp 139ndash154 USENIXAssociation Berkeley Calif USA 2008

[20] G Gu P Porras V Yegneswaran M Fong and W Lee ldquoBoth-unter Detecting malware infection through ids-driven dialogcorrelationrdquo in Proceedings of 16th USENIX Security Symposiumon USENIX Security Symposium SSrsquo07 pp 121ndash1216 BerkeleyCalif USA 2007

[21] F Yu Y Xie and Q Ke ldquoSBotMiner Large scale search botdetectionrdquo in Proceedings of the 3rd ACM International Confer-ence onWeb Search andDataMiningWSDM2010 pp 421ndash430USA February 2010

[22] A Zand G Vigna X Yan and C Kruegel ldquoExtracting probablecommand and control signatures for detecting botnetsrdquo inProceedings of the 29th Annual ACM Symposium on AppliedComputing SAC 2014 pp 1657ndash1662 Republic of Korea March2014

[23] K Wang C Huang S Lin and Y Lin ldquoA fuzzy pattern-basedfiltering algorithm for botnet detectionrdquo Computer Networksvol 55 no 15 pp 3275ndash3286 2011

[24] J Zhang R Perdisci W Lee X Luo and U Sarfraz ldquoBuilding ascalable system for stealthy P2P-botnet detectionrdquo IEEE Trans-actions on Information Forensics and Security vol 9 no 1 pp27ndash38 2014

[25] S Khattak N R Ramay K R Khan A A Syed and S AKhayam ldquoA Taxonomy of botnet behavior detection and de-fenserdquo IEEE Communications Surveys amp Tutorials vol 16 no 2pp 898ndash924 2014

[26] B Stone-Gross M Cova L Cavallaro et al ldquoYour botnet is mybotnet Analysis of a botnet takeoverrdquo in Proceedings of the 16thACM Conference on Computer and Communications SecurityCCSrsquo09 pp 635ndash647 New York NY USA November 2009

[27] Y Nadji M Antonakakis R Perdisci D Dagon and W LeeldquoBeheading hydras Performing effective botnet takedownsrdquo inProceedings of the 2013 ACM SIGSAC Conference on Computerand Communications Security CCS 2013 pp 121ndash132 GermanyNovember 2013

[28] R Perdisci W Lee and N Feamster ldquoBehavioral clustering ofhttp-based malware and signature generation using maliciousnetwork tracesrdquo in Proceedings of the 7th USENIX Conferenceon Networked Systems Design and Implementation NSDIrsquo10 26pages Berkeley Calif USA 2010

[29] T-S Wang C-S Lin and H-T Lin ldquoDGA botnet detectionutilizing social network analysisrdquo inProceedings of the 2016 IEEEInternational Symposium on Computer Consumer and ControlIS3C 2016 pp 333ndash336 China July 2016

[30] N Venkatachalam and R Anitha ldquoA multi-feature approach todetect Stegobot a covert multimedia social network botnetrdquoMultimedia Tools and Applications vol 76 no 4 pp 6079ndash6096 2017

[31] E Ferrara O Varol C Davis F Menczer and A FlamminildquoThe rise of social botsrdquo Communications of the ACM vol 59no 7 pp 96ndash104 2016

[32] E Bertino and N Islam ldquoBotnets and internet of things secu-rityrdquo The Computer Journal vol 50 no 2 Article ID 7842850pp 76ndash79 2017

[33] R Hallman J Bryan G Palavicini J Divita and J Romero-Mariona Ioddos the internet of distributed denial of seviceattacks 2017

[34] M Eslahi R Salleh and N B Anuar ldquoMoBots A new genera-tion of botnets on mobile devices and networksrdquo in Proceedingsof the 2012 IEEE Symposium on Computer Applications andIndustrial Electronics ISCAIE 2012 pp 262ndash266 MalaysiaDecember 2012

[35] C Adams Sms botnet detection on mobile devices May 242016 US Patent 9 351 167

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Active and Passive Electronic Components

Control Scienceand Engineering

Journal of


International Journal of

RotatingMachinery


Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design



Shock and Vibration


Civil EngineeringAdvances in

Acoustics and VibrationAdvances in



Electrical and Computer Engineering

Journal of

Advances inOptoElectronics


Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of


Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Chemical EngineeringInternational Journal of Antennas and

Propagation




Navigation and Observation



DistributedSensor Networks



Web-based botnets can easily provide stable and quali-fied client-to-server connectivity Third many promisingsolutions [1 7ndash9] have been developed to precisely detecttraditional IRC-based botnets instead of Web-based botnetsTherefore the HTTP gradually becomes an ideal alternativeprotocol for botnet owners to use as the communicationchannel in recent years and our study focuses on this kindof botnets

11 Web-Based Botnet Detection Bot clients are trojans exe-cutable programs or scripts running on compromised hostsHence their behavior is different from human user behaviorBesides their activity pattern sizes and transferred contentare also different from human users As programs generatethe communication traffic automatically the Web-basedbotnet communication has some prominent characteristicsAccording to our preliminary survey on a botnet taxonomystudy [2] a typical bot client of a centralized CampC botnetoften needs to synchronize with its botnet Web serverto retrieve commands or deliver execution results Suchsynchronization is often scheduled when bot clients areeffectively controlled by botnetWeb servers Hence we thinkthat this phenomenon of synchronization can be utilized as ahint to indicate whetherWeb clients are controlled by humanusers or by bot clients Besides we also found that if a groupofWeb clients associatedwith aWeb server consists of humanusers each of them often has a different access pattern tothe Web server For example these human clients may visitthe Web server at different times of a day or these clientsmay visit the Web server different numbers of times eachday On the contrary if these Web clients are bot clientswhich run programs or scripts they may act together andbehave similarly Therefore they may contact their botnetWeb server repeatedly according to a predefined time intervalto access commands from their botnet Web server Suchrepeated contact to certain botnet Web servers may continuefor several days which is apparently different from normalhuman behavior In addition the same group of bot clientsusually tends to communicate with the same botnet Webserver Based on the long-term repeated contact phenomenonand similar access pattern of the clients of a Web server weuse a metric named Total Host Repetition Rate or THR inshort as one of our criteria to examine whether aWeb serveris a suspicious botnet CampC server

Instead of THR we also found that the payload inside thetraffic between bot clients and their botnetWeb server usuallycontains short and simple commands Furthermore all botclients commanded by a certain CampC server tend to receivecommands at the same time This similarity of payloadsamong the bot clients controlled by the same botnet Webserver is also described as the command-response pattern byBotProbe [7] A normal Web server usually contains manyWeb pages and different users accessing different Web pagesHence unless theWeb server contains only oneWebpage theprobability that its users retrieve the sameWeb page from theWeb server simultaneously is low and different Web pagesusually have different sizes As a result during a period oftime the sizes of responding payloads of differentWeb clientsaccessing the same Web server are supposed to be different

while a botnetWeb server oftendispatches similar commandsto its bot clients at the same timeThus we utilize the payloadsize difference as a metric called the payload size similarityor PSS in short to judge whether aWeb server is a suspiciousbotnet CampC server or not The formalization for these twometrics will be discussed later in Section 22

In our prototype implementation we designed an auto-matic mechanism based on the above two metrics andintegrated this mechanism into our prototype system namedWeb-based Botnet Detector orWBD in short to perform theinspection WBD is attached to a network traffic monitoringsystem which is able to generate traffic logs from the onlinenetwork stream and analyzes these logs simultaneously Onlyfew arithmetic calculations are required by WBD whileperforming runtime inspection on those monitored trafficlogs Such calculation brings significant overhead to othersimilar approaches during traffic inspection

12 Contributions Thesolution of this paper contains the fol-lowing characteristics (1)Compared tomainstreammachinelearning approaches which often rely heavily on tens or evenhundreds of features an approach with only few featurescan reduce notable overhead WBD requires only severaldeterministic calculations which are easily extracted andcalculated frommonitored network traffic (2)WBD inspectstraffic of backbone networks It does not require any programinstalled on network end-hosts and servers (3) WBD doesnot use features based on traffic content mining It does notrely on particular protocol-parsing as well In summary thecontributions of WBD include the following

(1) WBDrequires only several deterministic calculationswhich means that it is ideal to cooperate with heavy-loading backbone equipment

(2) We conducted large-scale backbone data inspectionfor this study It reveals those IP addresses andtimestamps of Web servers that generate suspiciousWeb-based botnet communications across the globalInternet

(3) Due to the low correlation between the content itselfour solution can target the HTTPS protocol theoret-ically Also botnet owners may deliberately embedtheir botnet commands into somenormal traffic suchas universal Web contents to bypass the potentialinspection along the traffic path Our solution canwork for this situation because the calculation onTHR and PSS requires the source and destination IPaddresses of the packets in the traffic which are notencrypted by most secure protocols

This paper includes six sections Section 2 will explainhow we use features calculated from these criteria for thedatagram-like network traffic logs Sections 3 and 4 evalu-ate our approach and discuss issues including comparisonwith other similar approaches and the accuracy Section 5describes previous studies aiming at botnet related issuesSection 6 summaries this study


A B C A B D E F D

Web server 1234

Web server 1234

Web server 1234



2 Methodology






C

Web server 1234

Web client

CACA

CB CBCD CD

CECFCC

S1

S1

S1 S1

T1 T2 T3


Host group 1


Web client

CA

CB

CDCE

CG

CF

C

CH

CC

S1

S1







AC119904 =119899

sum119905=1

10038161003816100381610038161003816hostgroup11990411990510038161003816100381610038161003816

THR119904 =1119899119899

sum119905=1


10038161003816100381610038161003816

(1)


PSS119904 =sum119899119905=1

max119896119905119894=1(10038161003816100381610038161003816PS11990511989410038161003816100381610038161003816)

sum119899119905=1

10038161003816100381610038161003816hostgroup11990411990510038161003816100381610038161003816=sum119899119905=1

max119896119905119894=1(10038161003816100381610038161003816PS11990511989410038161003816100381610038161003816)

AC119904 (2)



(THR119894)


(AC119894) (3)




=


False otherwise

(4)




(5)


Susp CampC (119894)

=


(6)


3 Evaluation























4 Discussion
















0

250

500

750

1000



5 Related Work









6 Conclusion





Acknowledgments


References




































RoboticsJournal of





Journal of



RotatingMachinery



Journal of

Volume 201


VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation










A B C A B D E F D

Web server 1234

Web server 1234

Web server 1234



2 Methodology






C

Web server 1234

Web client

CACA

CB CBCD CD

CECFCC

S1

S1

S1 S1

T1 T2 T3


Host group 1


Web client

CA

CB

CDCE

CG

CF

C

CH

CC

S1

S1







AC119904 =119899

sum119905=1

10038161003816100381610038161003816hostgroup11990411990510038161003816100381610038161003816

THR119904 =1119899119899

sum119905=1


10038161003816100381610038161003816

(1)


PSS119904 =sum119899119905=1

max119896119905119894=1(10038161003816100381610038161003816PS11990511989410038161003816100381610038161003816)

sum119899119905=1

10038161003816100381610038161003816hostgroup11990411990510038161003816100381610038161003816=sum119899119905=1

max119896119905119894=1(10038161003816100381610038161003816PS11990511989410038161003816100381610038161003816)

AC119904 (2)



(THR119894)


(AC119894) (3)




=


False otherwise

(4)




(5)


Susp CampC (119894)

=


(6)


3 Evaluation























4 Discussion
















0

250

500

750

1000



5 Related Work









6 Conclusion





Acknowledgments


References




































RoboticsJournal of





Journal of



RotatingMachinery



Journal of

Volume 201


VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation










C

Web server 1234

Web client

CACA

CB CBCD CD

CECFCC

S1

S1

S1 S1

T1 T2 T3


Host group 1


Web client

CA

CB

CDCE

CG

CF

C

CH

CC

S1

S1







AC119904 =119899

sum119905=1

10038161003816100381610038161003816hostgroup11990411990510038161003816100381610038161003816

THR119904 =1119899119899

sum119905=1


10038161003816100381610038161003816

(1)


PSS119904 =sum119899119905=1

max119896119905119894=1(10038161003816100381610038161003816PS11990511989410038161003816100381610038161003816)

sum119899119905=1

10038161003816100381610038161003816hostgroup11990411990510038161003816100381610038161003816=sum119899119905=1

max119896119905119894=1(10038161003816100381610038161003816PS11990511989410038161003816100381610038161003816)

AC119904 (2)



(THR119894)


(AC119894) (3)




=


False otherwise

(4)




(5)


Susp CampC (119894)

=


(6)


3 Evaluation























4 Discussion
















0

250

500

750

1000



5 Related Work









6 Conclusion





Acknowledgments


References




































RoboticsJournal of





Journal of



RotatingMachinery



Journal of

Volume 201


VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation











AC119904 =119899

sum119905=1

10038161003816100381610038161003816hostgroup11990411990510038161003816100381610038161003816

THR119904 =1119899119899

sum119905=1


10038161003816100381610038161003816

(1)


PSS119904 =sum119899119905=1

max119896119905119894=1(10038161003816100381610038161003816PS11990511989410038161003816100381610038161003816)

sum119899119905=1

10038161003816100381610038161003816hostgroup11990411990510038161003816100381610038161003816=sum119899119905=1

max119896119905119894=1(10038161003816100381610038161003816PS11990511989410038161003816100381610038161003816)

AC119904 (2)



(THR119894)


(AC119894) (3)




=


False otherwise

(4)




(5)


Susp CampC (119894)

=


(6)


3 Evaluation























4 Discussion
















0

250

500

750

1000



5 Related Work









6 Conclusion





Acknowledgments


References




































RoboticsJournal of





Journal of



RotatingMachinery



Journal of

Volume 201


VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation





























4 Discussion
















0

250

500

750

1000



5 Related Work









6 Conclusion





Acknowledgments


References




































RoboticsJournal of





Journal of



RotatingMachinery



Journal of

Volume 201


VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation















4 Discussion
















0

250

500

750

1000



5 Related Work









6 Conclusion





Acknowledgments


References




































RoboticsJournal of





Journal of



RotatingMachinery



Journal of

Volume 201


VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation





















0

250

500

750

1000



5 Related Work









6 Conclusion





Acknowledgments


References




































RoboticsJournal of





Journal of



RotatingMachinery



Journal of

Volume 201


VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation











0

250

500

750

1000



5 Related Work









6 Conclusion





Acknowledgments


References




































RoboticsJournal of





Journal of



RotatingMachinery



Journal of

Volume 201


VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation












6 Conclusion





Acknowledgments


References




































RoboticsJournal of





Journal of



RotatingMachinery



Journal of

Volume 201


VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation
































RoboticsJournal of





Journal of



RotatingMachinery



Journal of

Volume 201


VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation









RoboticsJournal of





Journal of



RotatingMachinery



Journal of

Volume 201


VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation









Documents

Detecting Web-Based Botnets Using Bot Communication ...downloads.hindawi.com/journals/scn/2017/5960307.pdfWeb-based botnets can easily provide stable and quali-fied client-to-server