V107 2 2016 S IN INSI I NINS 49 ISSN 1991-1696 …...2016/08/23 · V107 2 2016 S IN INSI I NINS 49 June 2016 Volume 107 No. 2 Africa Research JournalISSN 1991-1696 Research Journal

Vol.107 (2) June 2016 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 49

June 2016Volume 107 No. 2www.saiee.org.za

Africa Research JournalISSN 1991-1696

Research Journal of the South African Institute of Electrical EngineersIncorporating the SAIEE Transactions

Vol.107 (2) June 2016SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS50

(SAIEE FOUNDED JUNE 1909 INCORPORATED DECEMBER 1909)AN OFFICIAL JOURNAL OF THE INSTITUTE

ISSN 1991-1696

Secretary and Head OfficeMrs Gerda GeyerSouth African Institute for Electrical Engineers (SAIEE)PO Box 751253, Gardenview, 2047, South AfricaTel: (27-11) 487-3003Fax: (27-11) 487-3002E-mail: [email protected]

SAIEE AFRICA RESEARCH JOURNAL

Additional reviewers are approached as necessary ARTICLES SUBMITTED TO THE SAIEE AFRICA RESEARCH JOURNAL ARE FULLY PEER REVIEWED

PRIOR TO ACCEPTANCE FOR PUBLICATIONThe following organisations have listed SAIEE Africa Research Journal for abstraction purposes:

INSPEC (The Institution of Electrical Engineers, London); ‘The Engineering Index’ (Engineering Information Inc.)Unless otherwise stated on the first page of a published paper, copyright in all materials appearing in this publication vests in the SAIEE. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, magnetic tape, mechanical photo copying, recording or otherwise without permission in writing from the SAIEE. Notwithstanding the foregoing, permission is not required to make abstracts oncondition that a full reference to the source is shown. Single copies of any material in which the Institute holds copyright may be made for research or private

use purposes without reference to the SAIEE.

EDITORS AND REVIEWERSEDITOR-IN-CHIEFProf. B.M. Lacquet, Faculty of Engineering and the Built Environment, University of the Witwatersrand, Johannesburg, SA, [email protected]

MANAGING EDITORProf. S. Sinha, Faculty of Engineering and the Built Environment, University of Johannesburg, SA, [email protected]

SPECIALIST EDITORSCommunications and Signal Processing:Prof. L.P. Linde, Dept. of Electrical, Electronic & Computer Engineering, University of Pretoria, SA Prof. S. Maharaj, Dept. of Electrical, Electronic & Computer Engineering, University of Pretoria, SADr O. Holland, Centre for Telecommunications Research, London, UKProf. F. Takawira, School of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg, SAProf. A.J. Han Vinck, University of Duisburg-Essen, GermanyDr E. Golovins, DCLF Laboratory, National Metrology Institute of South Africa (NMISA), Pretoria, SAComputer, Information Systems and Software Engineering:Dr M. Weststrate, Newco Holdings, Pretoria, SAProf. A. van der Merwe, Department of Infomatics, University of Pretoria, SA Dr C. van der Walt, Modelling and Digital Science, Council for Scientific and Industrial Research, Pretoria, SAProf. B. Dwolatzky, Joburg Centre for Software Engineering, University of the Witwatersrand, Johannesburg, SAControl and Automation:Prof K. Uren, School of Electrical, Electronic and Computer Engineering, North-West University, S.ADr J.T. Valliarampath, freelancer, S.ADr B. Yuksel, Advanced Technology R&D Centre, Mitsubishi Electric Corporation, JapanProf. T. van Niekerk, Dept. of Mechatronics,Nelson Mandela Metropolitan University, Port Elizabeth, SAElectromagnetics and Antennas:Prof. J.H. Cloete, Dept. of Electrical and Electronic Engineering, Stellenbosch University, SA Prof. T.J.O. Afullo, School of Electrical, Electronic and Computer Engineering, University of KwaZulu-Natal, Durban, SA Prof. R. Geschke, Dept. of Electrical and Electronic Engineering, University of Cape Town, SADr B. Jokanović, Institute of Physics, Belgrade, SerbiaElectron Devices and Circuits:Dr M. Božanić, Azoteq (Pty) Ltd, Pretoria, SAProf. M. du Plessis, Dept. of Electrical, Electronic & Computer Engineering, University of Pretoria, SADr D. Foty, Gilgamesh Associates, LLC, Vermont, USAEnergy and Power Systems:Prof. M. Delimar, Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia Engineering and Technology Management:Prof. J-H. Pretorius, Faculty of Engineering and the Built Environment, University of Johannesburg, SA

Prof. L. Pretorius, Dept. of Engineering and Technology Management, University of Pretoria, SAEngineering in Medicine and BiologyProf. J.J. Hanekom, Dept. of Electrical, Electronic & Computer Engineering, University of Pretoria, SA Prof. F. Rattay, Vienna University of Technology, AustriaProf. B. Bonham, University of California, San Francisco, USA

General Topics / Editors-at-large: Dr P.J. Cilliers, Hermanus Magnetic Observatory, Hermanus, SA Prof. M.A. van Wyk, School of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg, SA

INTERNATIONAL PANEL OF REVIEWERSW. Boeck, Technical University of Munich, GermanyW.A. Brading, New ZealandProf. G. De Jager, Dept. of Electrical Engineering, University of Cape Town, SAProf. B. Downing, Dept. of Electrical Engineering, University of Cape Town, SADr W. Drury, Control Techniques Ltd, UKP.D. Evans, Dept. of Electrical, Electronic & Computer Engineering, The University of Birmingham, UKProf. J.A. Ferreira, Electrical Power Processing Unit, Delft University of Technology, The NetherlandsO. Flower, University of Warwick, UKProf. H.L. Hartnagel, Dept. of Electrical Engineering and Information Technology, Technical University of Darmstadt, GermanyC.F. Landy, Engineering Systems Inc., USAD.A. Marshall, ALSTOM T&D, FranceDr M.D. McCulloch, Dept. of Engineering Science, Oxford, UKProf. D.A. McNamara, University of Ottawa, CanadaM. Milner, Hugh MacMillan Rehabilitation Centre, CanadaProf. A. Petroianu, Dept. of Electrical Engineering, University of Cape Town, SAProf. K.F. Poole, Holcombe Dept. of Electrical and Computer Engineering, Clemson University, USAProf. J.P. Reynders, Dept. of Electrical & Information Engineering, University of the Witwatersrand, Johannesburg, SAI.S. Shaw, University of Johannesburg, SAH.W. van der Broeck, Phillips Forschungslabor Aachen, GermanyProf. P.W. van der Walt, Stellenbosch University, SAProf. J.D. van Wyk, Dept. of Electrical and Computer Engineering, Virginia Tech, USAR.T. Waters, UKT.J. Williams, Purdue University, USA

Published bySouth African Institute of Electrical Engineers (Pty) Ltd, PO Box 751253, Gardenview, 2047 Tel. (27-11) 487-3003, Fax. (27-11) 487-3002, E-mail: [email protected]

President: Mr André HoffmannDeputy President: Mr TC Madikane

Senior Vice President: Mr J Machinjike

Junior Vice President:Dr H. Heldenhuys

Immediate Past President: Dr Pat Naidoo

Honorary Vice President:Mr Max Clarke


VOL 107 No 2June 2016

SAIEE Africa Research Journal

Characterization and Analysis of NTP Amplifier Traffic ............ 54L. Rudman and B. Irwin

Detecting Derivative Malware Samples using Deobfuscation-Assisted Similarity Analysis ................................ 65P. Wrench and B. Irwin

A Management Model for Building a Computer Security Incident Response Capability .................................................................... 78R.D. Mooi and R.A. Botha

Reference Architecture for Android Applications to Support the Detection of Manipulated Evidence ............................................ 92H. Pieterse, M.S. Olivier and R.P. Van Heerden

Using a Standard Approach to the Design of next GenerationE-Supply Chain Digital Forensic Readiness Systems ................. 104D.J.E. Masvosvere and H.S. Venter

SAIEE AFRICA RESEARCH JOURNAL EDITORIAL STAFF ...................... IFC


GUEST EDITORIAL

INFORMATION SECURITY SOUTH AFRICA (ISSA) 2015

This special issue of the SAIEE Africa Research Journal is devoted to selected papers from the Information Security South Africa (ISSA) 2015 Conference which was held in Johannesburg, South Africa from 12 to13 August 2015. The aim of the annual ISSA conference is to afford information security practitioners and researchers, from all over the globe, an opportunity to share their knowledge and research results with their peers. The 2015 conference focused on a wide spectrum of aspects in the information security domain including the functional, business, managerial, human, theoretical and technological aspects of modern-day information security.

With the assistance of the original reviewers, ten conference papers that had received good overall reviews were identified. I attended the presentation of each of these papers and based on the reviewer reports and the presentations, eight of these papers were selected for possible publication in this special edition. The authors of these eight selected papers were asked to rework their papers by expanding and/or further formalizing the research conducted. Each of these papers was subsequently reviewed again by a minimum of three reputable international subject specialists. These reviews were received to make a confident decision as to the inclusion of these papers in the special edition.

After the review process was completed, including attending to the reviewers’ suggestions, only five papers were selected to be published in this special edition. These five papers cover various aspects of information security. Topics addressed by the five papers include: the manipulation of timestamps on Android smartphones, e-supply chain forensic readiness systems, code hiding and malware. Therefore, this special edition includes five rather diverse papers in the discipline of information security, providing a true reflection of the multidisciplinary nature of this field of study.

I would like to thank the expert reviewers who diligently reviewed these papers. These reviews certainly contributed to the quality of this special edition.

To conclude, I would like to express my appreciation to IEEE Xplore who originally published the ISSA conference papers, and for granting permission for these reworked papers to be published in this special edition.

Prof. Stephen V. Flowerday Guest Editor


NOTES


CHARACTERIZATION AND ANALYSIS OF NTP AMPLIFIERTRAFFIC

L. Rudman∗ and B. Irwin†

∗ Security and Networks Research Group, Dept. of Computer Science, Rhodes University,Grahamstown 6139, South Africa E-mail: [email protected]† Security and Networks Research Group, Dept. of Computer Science, Rhodes University,Grahamstown 6139, South Africa E-mail: [email protected]

Abstract: Network Time Protocol based DDoS attacks saw a lot of popularity throughout 2014. Thispaper shows the characterization and analysis of two large datasets containing packets from NTP basedDDoS attacks captured in South Africa. Using a series of Python based tools, the dataset is analysedaccording to specific parts of the packet headers. These include the source IP address and Time-to-Live(TTL) values. The analysis found the top source addresses and looked at the TTL values observedfor each address. These TTL values can be used to calculate the probable operating system or DDoSattack tool used by an attacker. We found that each TTL value seen for an address can indicate thenumber of hosts attacking the address or indicate minor routing changes. The Time-to-Live values arethen analysed as a whole to find the total number used throughout each attack. The most frequent TTLvalues are then found and show that the majority of them indicate the attackers are using an initial TTLof 255. This value can indicate the use of a certain DDoS tool that creates packets with that exact initialTTL. The TTL values are then put into groups that can show the number of IP addresses a group ofhosts are targeting. The paper discusses our work with two brief case studies correlating observed datato real-world attacks, and the observable impact thereof.

Key words: denial of service, network security, network time protocol.

1. INTRODUCTION

Distributed Reflection Denial of Service (DRDoS) attacksusing Network Time Protocol (NTP) servers gainedpopularity in late 2013 and continued to be a factor ina number of major attacks in the first half of 2014 [1].The Network Time Protocol is used to distribute accuratetime information to networked computers [2]. There aremany public NTP servers throughout the Internet that areused by legitimate client systems in order to synchronizesystem clocks. A NTP server which is exploitable inthis type of attack allows the use of the MONLISTcommand. This command returns up to the last 600 clientIP addresses that have queried an NTP server, and hastraditionally been used as part of the NTP protocol suiteoperational debugging [3]. Vulnerable NTP servers canthus provide a high degree of amplification scale as theMONLIST request packet is significantly smaller than thereply packet(s) generated. The MONLIST request UDPpacket size is around 64 bytes and the reply ”can bemagnified to 100 responses of 482 bytes each [4], thusproviding a potentially large amplification bot in termsof bytes (Byte Amplification Factor - BAF) and packets( Packet Amplification Factor - PAF).

Combined with the ease of spoofing the source of UDPtraffic, this amplification, makes NTP servers an idealresource for DDoS attacks [5]. As seen in Figure 1, theattack is carried out by sending NTP MONLIST requestswith a spoofed source address of the intended target ofthe attack to a vulnerable NTP server on port 123/udp [6].The server then sends the replies to the spoofed IP address(the victim) which is then flooded with large volumes of

traffic. This in turn can have further impact on systemsbeyond just bandwidth exhaustion as receivers need toprocess datagrams which are not necessarily of the correctprotocol, depending on the spoofed source port used.

Figure 1: Distributed Denial of Service attack using NTP serversas reflectors

In [6], it was stated that from January to February of 2014,the number of NTP amplification attacks had increasedconsiderably with one of these attacks reaching just below400 Gbps. It was reported as being the largest attackrecorded using NTP. In early 2014 there were morethan 430 000 vulnerable NTP servers [7]. By April2014, Arbor Networks released data that showed that85% of DDoS attacks above 100 Gbps were using NTPamplification [7]. However by June 2014, this numberdecreased to around 17 647 vulnerable servers largely dueto the application of patches and configuration changes bynetwork administrators [4]. A report released by ArborNetworks in October 2014 showed that NTP amplificationbased attacks are decreasing, with a little over 50% ofincidents in excess of 100 Gbps using this protocol [8].

The problem with this class of attacks is that the realaddress of an attacker is never used due to the spoofing of

Based on: “Characterization and Analysis of NTP Amplification Based DDoS Attacks”, by L. Rudman and B. Irwin which appeared in the Proceedings of Information Security South African (ISSA) 2015, Johannesburg, 12 & 13 August 2015. © 2015 IEEE


the source IP address [9]. More detailed analysis of suchattacks is therefore important in order to gain informationwhich may be valuable in mitigating or finding the sourceof an attack. In this paper a number of characteristics of theattacks are looked at. The primary focus, however, relatesto the observed TTL values within the IP header. Theanalysis of these has shown that they can be used to provideinsight on how many hosts are being used to generaterequest packets or where they may be in the world.

The remainder of this paper is structured as follows.Section 2 looks at related research in the area of Denialof Service and NTP based Denial of Service attacks inparticular. The data sources used and analysis processundertaken are described in Section 3. Results of theanalysis of the two datasets are presented in Sections 4 and5 respectively. Two brief case studies arising out of theanalysis of the combined data are presented in Section 6.The paper concludes with Section 7, which also considerspossible future work.

2. RELATED WORK

Czyz et al. [10], reported on their analysis of NTPDDoS attacks which were were analyzed on a globalscale. This was achieved by looking at the rise ofNTP amplification attacks, how many amplifiers therewere, and their amplification scale. The victims of theattacks were found by looking at the source port of theoriginal attack packets. It was found that most targetswere related to online gaming, with victims includingMinecraft, Runescape and Microsoft Xbox live service.The most popular source port was port 80/udp which, asthey stated, may have been used to target games that usethis port or websites. When classifying the number ofattacks that occurred throughout a 15-week period, whilemonitoring a number of amplifiers, a simplification wasused by classifying each unique targeted IP in a week-longsample as one attack. This simplification does not takeinto account attacks targeting network blocks or a singleIP hosting multiple sites.

In relation to TTL analysis it was determined by[10] that most of the attack traffic from a ColoradoState University dataset appeared to originate fromWindows-based machines and that they are probablycomputers in a botnet. This is because the mode of the IPv4TTL field was observed to be 109, and the default initialTTL set on Microsoft Windows platforms is 128. Whatthese researchers failed to mention was that the attackerscould have been using a DDoS tool that could have set theinitial TTL to 128 or slightly above.

3. RESEARCH

At the time of the research being conducted in early2014, there had not been much in-depth research intothe interplay of TTL values and NTP DDoS. The driverbehind this research was to investigate a number of packetcharacteristics relating to the observed TTL values ofrecorded inbound traffic. In addition, the source IP

address, source port, UDP header size and IP datagram sizewere also analyzed. The purpose of this was to determinethe victims of the attacks in a similar manner to that usedin [10].

3.1 Data Sources

The analysis carried out and presented in the remainderof this paper was based on two packet captures obtainedfrom systems running a vulnerable version of the NTPsoftware (NTP 4.2.6 or below). Packet captures wererecorded using tcpdump. Packets that did not contain asource or destination port of 123/udp were filtered outprior to analysis as they did not contain a MONLISTrequest or reply and would not have contacted a vulnerableNTP server for potential amplification. Both datasetswere collected within South African IPv4 address space,contained within AS2018 (TENET). An overview of thelogical capture setup is shown in Figure 2.

Figure 2: Capture points of the two datasets

ZA1: The ZA1 data set consists of data collected between15 July 2013 and 9 March 2014. The capture files havea combined size of 3.2 GB and contain a total of 32 799299 packets. The captures show two attacks lasting aroundtwo weeks each. These were observed in the periods 23December 2013 to 7 January 2014 and 10-25 February2014 respectively. This dataset is of interest as the datacapture was initiated pre-exploitation, and contains trafficdestined for a single IP address.

ZA2: The capture files constituting the ZA2 dataset


consisted of 103 060 564 packets in total amounting to11.5 GB, which were captured over a period of just overone month, from 12 February 2014 to 10 March 2014.This data was captured after the initially detected attackhad been mitigated. It contains both request and replyMONLIST packets and sees a larger number of packetsper hour compared to ZA1. This is partially due to the factthat it was collected by recording traffic for the majority oftarget IP addresses within a single /27 IPv4 net block. Forthe purposes of this paper, the analysis across addresseshas been merged, and individual activity has not beenanalysed.

3.2 Analysis Tools

Analysis was performed using a series of customdeveloped tools which were implemented in Python. Thistoolchain was used to parse and extract data from the rawpacket captures, and then filter and plot time series graphsof information such as packets per hour, unique hosts perhour, IP addresses with a certain TTL, TTL per hour, IPdatagram length, UDP datagram length, TTL frequencyand others. The tools also outputted .csv files of the rankedsource data which was be used for processing and analysisof the data in other tools.

3.3 Address Disclosure

The IP addresses reported in this research are thoseobserved as source addressed of the datagrams attemptingto exploit NTP systems within the two monitorednetworks. In the vast majority of cases these can reliablybe determined to be the spoofed addresses of the attackersintended targets. These addresses have not been blindedor obscured, as the attacks against these organisationshave have been well publicized and documented. Currentaddressing information can be trivially obtained usingcommon search tools and DNS resolution. IT is felt thatthe disclosure fo the addresses may help other researchersin future efforts around correlation of data relating tovictims of such attacks.

4. ZA1 ANALYSIS

This section presents the results of the analysis conductedon the ZA1 dataset. Two periods of exploitation wereduring the observed period were determined to be 23December 2013 to 7 January 2014 and 10-25 February2014. These can be seen clearly in Figure 3. Peak packetrates in excess of 500 000 packets/hour were observed inthe initial attack. Packet rates subsequently decreased toaround 50 000 packets/hour for the remainder of the attack.Significant diurnal trends were observed in the secondphase. These patterns can be similarly observed in Figure4 which plots the number of unique source hosts observedduring each hour period. For both attacks the unique hostsstarted off with a considerably higher value than the restof the unique host values of the attack. This is possiblydue to attackers priming the NTP servers (by floodingwith generated queries from different source addresses),

Table 1: Top 10 IP addresses from ZA1Rank IP address Count % TTL

1 217.168.137.25 3 896 074 11.8846 (9.21%)47 (1.85%)50 (88.54%)

2 72.46.150.210 320 816 0.983 64.37.171.32 257 066 0.78 1114 85.17.207.236 253 588 0.77 1115 159.153.228.77 228 753 0.7 1116 62.67.0.130 204 174 0.62 1117 192.95.11.54 189 224 0.58 111

8 63.251.20.99 163 830 0.5111 (98.73%)232 (0.25%)234 (1.20%)

9 212.143.95.26 154 368 0.47 11110 75.126.29.106 150 659 0.46 111

Total 5 818 552 % 17.74%

in order to get the maximum amplification scale for theattack using the NTP MONLIST command, there musthave been over 600 historical connections to the serverwhich can then be sent in response to the forged packet.The remainder of the section considers specific attributesof the observed attack traffic.

Figure 3: Packets per hour for ZA1

Figure 4: Unique hosts per hour

4.1 Source IP and TTL

An analysis of the observed source addresses shows that asingle address (217.168.137.25) was observed at a level inexcess of an order of magnitude more than the other topsources with its packets taking up 11% of the total packetsobserved. This target address is located in Poland, and wasmost likely hosting online gaming services. However atthe time of the analysis, the system had been taken offline,and as such is it is not possible to confirm this. Table1 lists the traffic for the top 10 observed sources, which


Table 2: Top 10 TTL values for ZA1

Rank TTL Packet Count % of totalInitialTTL

1 111 25 771 844 78.57 1282 50 3 575 563 10.9 60 or 643 236 920 462 2.81 2554 234 864 950 2.64 2555 232 434 480 1.32 2556 46 359 082 1.09 60 or 647 237 111 038 0.34 2558 242 82 377 0.25 2559 62 78 255 0.24 60 or 64

10 47 71 969 0.22 60 or 64Total 32 270 020 % of total 98.39%

constitute 17.74% of the overall traffic. This indicatesthat the IP address was targeted from the beginning ofthe attack, and the varying TTL values observed furthershow a strong likelihood that more than one host was beingused to spoof packets with this source address. The TTLvalues found in packets using the top IP address couldindicate three different attacking hosts or one host withrerouted packets. As shown in the Table 1, there are onlytwo IP addresses where more than a single TTL valuewas observed. This is indicative of multiple participantsspoofing the address, or minor routing changes havingoccurred during the attack. Further investigation into thetemporal overlap is discussed in future work. Taking a lookat the IP address 63.251.20.99, it can be seen that thereis a distinct difference between the observed TTL values.Although the majority of the packets have a TTL of 111 (incommon with the majority of traffic) there are some with avery high TTL value. The occurrences of the top 10 TTLs,are shown in Table 2.

4.2 Time-to-Live values

There were a total of 64 distinct TTL values observed. Themost frequent of these being 111, which could mean theattacker was using a Windows operating system (assumingthe TTL is not spoofed) as Windows sets the initial TTLto 128. This is a similar result to [10] and may indicate abotnet being used or one host attacking multiple victims.

4.3 TTL Groups

Based on the top 10 TTL values shown in Table 2, the IPaddresses for which two or more TTLs were observed wereisolated. This resulted in 4 679 IP addresses out of the 56273 IP source addresses being observed with multiple TTLvalues, which comprises of just over 8%. The isolated IPaddresses and their observed TTLs were analysed to findwhich IP addresses used the same group of TTL values.The top 10 groupings of TTLs are shown in Table 3.

There were 51 different groups of TTLs found. The mostfrequent being the TTL values of 234 and 236 with 3 207IP addresses observed. Although not shown in Table 3,

Table 3: Top 10 TTL ranges in ZA1Rank TTL group Unique IP % of total

1 234, 36 3207 68.542 232, 236 245 5.243 111, 236 232 4.964 232, 234, 236 201 4.35 111, 234 116 3.556 111, 234, 236 97 2.077 232, 234 72 1.548 234, 236, 237 72 1.549 236, 237 54 1.1510 62, 111 45 0.96

Total 4341 % of total 92.78%

the largest group included six out of the top 10 TTLs (62,111, 232, 234, 236, 237). This was however observedwith only four IP addresses. The number of distinct TTLvalues observed within a group (all claiming to be from asingle IP source address) is strongly indicative of multiplehosts being involved in generating the attack traffic. Whilethe data shows that multiple hosts are likely involved itis difficult to determine with any certainty as to the exactnumber given the vagaries of Internet routing.

As such, the first ranked TTL group in Table 3 couldindicate that two hosts were possibly attacking 3 207different IP addresses (the spoofed source address beingthat of the target of the amplification attack). It could alsoindicate that the hosts attacking the 3 207 IP addresseswere using a DDoS tool that set the default TTL to 255,which means there could be more than two attacking hosts.By extension given similar routing topologies there couldbe two groups of hosts generating the spoofed traffic.

4.4 Full packet size

The packet sizes were analysed because there are a smallnumber of NTP DDoS tools and there are two sizes ofpacket sizes generated by them, so this may be useful indetermining a probable tool and learning more about anattacker.

The ZA1 dataset contained five distinct packet lengths,shown in the x-axis of Figure 5. The most frequentlength being 60 bytes, which according to [11] [12], couldindicate that the attacker was using a Perl based DDoS∗ ora Python based tool named ntpdos.py∗∗. The Perl tool isthe only DDoS tool out of the previously mentioned three,to explicitly set its TTL value to 255 (instead of relyingon the OS default). We can therefore assume with a fairlyhigh degree of certainty that the 60 bytes packets, whoseTTLs were above 230, were generated from this tool. Theobserved packet payloads for this traffic also match the Perltool.

The next most frequent packet length observed is 234

∗ http://goo.gl/MpDU97 https://goo.gl/QzLnXC∗∗ https://goo.gl/9qAauN


bytes, which according to [13] can be generated by aPython program called ntp MONLIST.py∗∗∗ or with theuse of a special query program which is part of the NTPsoftware suite – ntpdc† . This program can be usedto create a MONLIST request of 234 bytes accordingto [11] [12]. This program’s MONLIST command andthe resulting query datagram can be used in a script thatgenerates large number of the requests, as has been donewith the Python script which just encapsulated the queryas generated by ntpdc. As shown in Figure 5, the packetsthat have a length of 234 bytes also have TTL values of46, 47 and 50 (the other four values can be ignored as theirobserved packet count is below ten and this can be regardedas anomalous).

When a MONLIST request is sent to a server, that has afull list of hosts that have connected to it, the reply packetsare 482 bytes each [14].

Figure 5: Full packet length per TTL value for ZA1

5. ZA2 ANALYSIS


Figure 7: Unique hosts per hour for ZA2

This section presents results of the analysis conducted onthe ZA2 dataset. The dataset, which spans a shorter overall

∗∗∗https://goo.gl/SufSrS† http://doc.ntp.org/4.1.2/ntpdc.htm

time period than the ZA1 set, contains nearly three timesthe volume of traffic, and shows sustained attack trafficover the period. As noted in Section 3.1, this merges attacktraffic across a number of hosts within a /27 net block.The volume of the observed traffic is shown in Figure6; however this lacks the clear break in attacks as seenpreviously in Figure 3. This attack saw a peak packet perhour rate of around 2 million, which is significantly largerthan the peak rate of ZA1 (although it is acknowledgedthat there are a larger number of targets in this sample).Figure 7 shows a similar diurnal pattern as seen in Figure4. Another similarity between the two unique source hostsper hour plots is the considerably high packet count at thestart of the capture.

5.1 Source IP and TTL

As shown in Table 4, which lists the traffic for the top 10observed sources, as was found in the ZA1 analysis, thereis one IP address (217.168.137.25) that is observed at alevel in excess of an order of magnitude more than theother top sources. This address was also observed with ahigh count on other or monitored NTP servers around theworld around during the same time period [4] [15] [16].The TTL values observed for packets originating from217.168.137.25 were 47, 51 and 48, which are similar TTLvalues for this IP address in ZA1. This is indicative ofmultiple participants spoofing the address, minor routingoccurring during the attack or the attackers used a similarDDoS tool. Most importantly, it shows the exploitation ofmultiple servers used as reflectors in amplification attacksagainst targets since the same source IPs were observed inboth datasets.

Unlike the ZA1 dataset where only two of the top 10 sourceaddresses were observed with more than one TTL, the ZA2dataset shows all top 10 source addresses having trafficwith at least two, and in most cases, more than five distinctTTL values. This strongly supports the supposition thatmore than one attack source was used to generate attacktraffic to be reflected and amplified towards the intendedvictim. In addition the exploited NTP servers were used inthe attack of multiple target hosts.

5.2 Time-to-Live values

A total of 248 distinct TTL values were observed. Themost frequent being 236, which could mean the attackerwas using a tool that sets the initial TTL to the maximum of255, rather than relying on the operating system defaults.Table 5 shows the top 10 TTLs observed in the ZA2dataset. Eight of the top 10 TTLs are < 230, whichsuggests the initial TTL was set to 255. The Perl NTPDDoS tool mentioned in Section 4.4 could have been usedhere. The TTL of 64, seen in Table 5, is the initial TTL ofthe NTP servers. This TTL signifies all the traffic that hasbeen reflected by the servers.


Table 4: Top 10 IP addresses from ZA2

Rank IP address Count

%ofto-tal

TTL

1 217.168.137.25 18 565 847 18.01 47 (9.67%)51 (87.98%)

2 162.218.54.28 3 158 997 3.07

232 (6.18%)233 (7.93%)234 (69.82%)235 (2.29%)236 (7.37%)238 (6.41%)

3 192.64.169.29 2 322 018 2.25

234 (8.26%)235 (9.02%)236 (73.58%)237 (0.87%)243 (0.41%)

4 178.32.140.23 1 820 792 1.77

233 (0.37%)235 (9.60%)236 (66.99%)237 (4.25%)238 (14.65%)243 (0.84%)

5 198.50.180.205 1 495 278 1.45

232 (31.15%)233(11.37%)235 (2.51%)236 (54.86%)238 (0.10%)

6 37.187.77.125 1 210 377 1.17

235 (77.27%)236 (8.34%)237 (2.95%)238 (11.26%)

7 178.32.137.207 825 674 0.8

232 (0.38%)233 (6.22%)235 (19.09%)236 (65.16%)237 (4.28%)243 (1.43%)

8 94.23.19.43 751 176 0.73

233 (3.31%)235 (35.45%)236 (19.26%)237 (7.38%)243 (23.16%)

9 109.163.224.34 682 706 0.66

236 (57.45%)237 (7.19%)238 (34.68%)64 (0.66%)

10 198.27.74.181 677 972 0.66

236 (43.53%)233 (14.69%)232 (2.36%)234 (19.20)235 (20.22%)

Total 31 510 837

%ofto-tal

30.58%

Table 5: Top 10 TTL values for ZA2Rank TTL Packet Count % of total Initial TTL

1 236 23 509 039 22.81 2552 235 17 009 527 16.5 2553 51 16 778 624 16.28 60 or 644 237 13 946 335 13.53 2555 232 4 054 193 3.93 2556 234 3 893 500 3.78 2557 238 2 755 487 2.67 2558 243 2 451 201 238 2559 64 2 232 754 2.17 100/128?10 233 2 152 300 2.08 255

Total 887 829 60 % of total 86.15%

Table 6: Top 10 TTL ranges in ZA2Rank TTL group Unique IPs % of total

1 235, 237 3 440 38.992 233, 237 244 2.773 233, 235, 237 167 1.894 234, 237 137 1.555 236, 237 131 1.486 234, 235, 237 93 1.057 232, 234 88 1.008 237, 64 88 1.009 235 ,236, 237 69 0.7810 235, 237, 238 60 0.68

Total 4517 % of total 51.20%

5.3 TTL Groups

From the top 10 TTL values seen (shown in Table 5), theIP addresses which used two or more TTLs were found.This resulted in 8 823 IP addresses out of the 38 634 IPaddresses seen. There were 143 different groups of TTLsfound, the top 10 of which are shown in Table 6. Themost frequent being 235 and 237 with 3 440 IP addressesobserved using only these. Although not shown in Table6 the largest group of top 10 TTLs included of nine out ofthe top 10 TTLs (232, 233, 234, 235, 236, 237, 238, 51,64) however, only one (spoofed) IP address was observedto be part of this grouping. This again illustrates the strongevidence towards multiple sources for the spoofed traffic.

5.4 Full packet size

The ZA2 dataset showed similar results to ZA1 andcontained 12 distinct full packet sizes, shown in Figure 8.The most frequent being 60 bytes, which is the same resultas observed in the ZA1 dataset. As previously stated inSection 4.4, this length could indicate the use of a Perl orPython tool. In Figure 8, it is observed that 8 out of 9 ofthe TTL values seen with a full packet size of 60 bytes arearound 230, which is indicative of the use of the Perl tool,because this tool sets the TTL field to 255 and generatespackets of 60 bytes.


As seen in ZA1 and ZA2, and seen in 8, the secondmost frequent packet size is 234 bytes. Packetsobserved with this size were most likely generated by thentp MONLIST.py or ntpdc programs. The most frequentTTL seen in packets of 234 bytes is 51, which is the TTLthat is observed for the top IP address. This is a similarresult to ZA1, where the most frequent TTL seen in the234 byte packets are 46, 47 and 50, which are also theTTLs observed for the top IP address.

Because the ZA2 dataset contains request and reflectedpackets, a packet length of 482 is observed. Packets withthis length are only obseved with a TTL of 64, which is theinitial TTL for ZA2 server. These packets are the reflectedpackets that are generated from the 60 and 234 byte requestpackets. In the case when the MONLIST table was notyet full, the 122, 194, 266, 338, 410 byte packets couldindicate the reply packet sizes as the NTP monitor list grewlonger.

Figure 8: Full packet length per TTL value for ZA2

6. COMMON CHARACTERISTICS

This section briefly discussed the similarities of the twodatasets. The first similarity found was observed in Section4 and 5, between the unique hosts per hour, where therewas a large spike of over 600 unique hosts per hour at thestart of each attack illustrated in Figure 4 and 7. As statedpreviously, this is possibly due to attackers priming theNTP servers, in order to get the maximum amplificationscale from the MONLIST command.

The top IP address seen in ZA1 was also the top IP addressfor ZA2. This fact will be explored further in Section 7.1,where it will be shown that the address is being attacked bythe same attacker. Although not shown in this paper, thetop 20 domain names of the source addresses and the top20 source ports for ZA1 and ZA2 indicate that the majorityof the targets are online gaming related. Out of the top 20source ports, ZA1 and ZA2 had 13 ports in common, withmost of them being common online gaming ports.

Taking a look at the top 10 TTLs listed in Table 2 and Table5 it is observed that ZA1 has 5 TTLs and and ZA2 has8 TTLs whose values are above 230. Together with theinformation from Figure 5 and Figure 8 it was found thatthe attackers that used the ZA1 and ZA2 servers were usingthe Perl DDoS tool.

Table 7: TTL values for 217.168.137.25 in the ZA1dataset

Rank TTL Count %1 50 3 449 705 88.542 46 358 966 9.213 47 71 933 1.854 51 15 395 0.45 54 49 0.00136 48 18 0.00057 49 5 0.00018 45 2 0.009 53 1 0.00

Total: 3 896 074

Table 8: TTL values for 217.168.137.25 in the ZA2dataset

Rank TTL Count %1 51 16 334 079 87.982 47 1 794 768 9.673 48 359 660 1.944 52 76 880 0.415 55 295 0.00156 49 125 0.00077 50 35 0.00028 54 5 0.00

Total: 18 565 847

7. CASE STUDIES

The following section will discuss two case studies.The first being a brief analysis of the top IP address(217.168.137.25) observed in the ZA1 and ZA2 datasetand the similarities of the results found. The second casestudy is about a group who call themselves Derp Trollingand the evidence showing that they may have used thevulnerable server in ZA1 to carry out DDoS attacks ononline gaming related IP addresses.

7.1 Top IP address

The IP address (217.168.137.25) was observed as the topIP address in both the ZA1 and ZA2 datasets. As shownin Tables 7 and 8, the TTL values seen in the ZA1 datasetare one less than the TTLs seen in the ZA2 dataset (anexception being the TTL of 45 in Table 7). This is becausethe ZA1 data capture point was one more hop away asshown in Figure 2. Looking at the percentages of eachof the TTL values in the Tables, it can be seen that thepercentages are extremely similar. This is a factor thatmade it possible that both of the datasets contained packetsfrom the same attacker.

Since there is one dominant TTL value, it strongly supportsthe rerouting of packets. Figure 9 and 10, which showthe packets per hour of the top IP address (separated bycoloured TTL values). Because the ZA1 and ZA2 capturepoints were one hop apart from the attacker, the colours


were changed accordingly. This produced two very similargraphs, which show the rerouting of packets during theattack. What must be noted is that although they are thesame shape, the ZA2 dataset has a higher packet countwhich stayed steady around 107 000 packets per hour. Thisis nearly five times as many packets as the ZA1 datasetthat saw a steady 21 400 packets per hour. This differencemay be because of bandwidth changes. Because the ZA2dataset was captured after the attack had started, Figure 10does not show packets from the beginning of the attack, butthey would most likely have the same shape as the start ofFigure 9.

The evidence shown throughout the two dataset analyses,and the above information given in this section leads theauthors to the conclusion that there was one coordinatedgroup of attackers using multiple vulnerable NTP serversfor amplification. A portion of this attack traffic wascaptured by the packer loggers which resulted in the ZA1and ZA2 datasets.



7.2 Derp Trolling

Derp Trolling or Derp for short, is the name of a hackergroup that carried out numerous DDoS attacks from late2013 and throughout 2014 mainly targeting gaming relatedservers [17]. Their twitter account†† is full of tweetsabout their previous attacks. The account however stoppedtweeting about attacks in August 2014, and the hackerclaims to have become a white hat hacker. The next fewparagraphs show the observations that make it likely thatthis hacker group used the ZA1 server as a reflector in theirattacks. The group was found when looking at the domain

†† https://twitter.com/DerpTrolling

names of the top 20 source addresses and noticing thatmany of the domains were game related. After searchingonline for DDoS attacks on DC universe online, all thesearch results pointed at Derp Trolling. These posts fallwithin the times frames of the attack traffic observed inthe ZA1 dataset. While its impossible to prove that theobserved traffic was definitiavely the result of activities bythis group, the precise timing, and choice of targets, have ahigh correlation with their posted activity reports. Afterfinding their twitter page, it was clear that many of theother top 20 source addresses were related to their attacksduring the period of observation.

DC Universe Online: DC Universe Online‡ is a massivelymultiplayer online role-playing game (MMORPG). DerpTrolling may have used the ZA1 server along with othervulnerable NTP servers to attack the DC Universe Onlinegame server. The timestamps on the tweets (seen inFigure 11) about DC Universe Online and Planetside 2(their intended target), range from 2014-01-01 23:16:11to 2014-01-02 02:48:26. These tweets fit into the timeperiod (seen in Figure 12) of the packets captured inthe ZA1 dataset, which is around 2014-01-01 00:00:00to 2014-01-02 02:00:00. In a forum post [18] ongamefaqs.com, there were complaints of not being able toload the DC Universe Online website.

Figure 11: Tweet by Derp Trolling about DC Universe Online

Figure 12: Packets per hour for DC Universe Online

EA Login Servers: The EA login servers were targeted byDerp Trolling with their tweets (seen in Figure 13) having

‡ https://www.dcuniverseonline.com/home


the timestamps of 2014-01-03 03:00:43 to 2014-01-0305:31:30. The time period of the attack (seen in Figure 14)from the captured packets range from 2014-01-03 04:00:00to 2014-01-03 06:00:00, which is very similar to the DerpTrolling attack tweets (first packet and first tweet with inone hour of each other). In a news article [19], the writersaid at 04:15, that they were not able to access the servers.

Figure 13: Tweet by Derp Trolling about the EA login servers

Figure 14: Packets per hour for EA login servers

Runescape: Runescape‡‡ is a fantasy MMORPG. Thetimestamps of their tweets (seen in Figure 15) range from2013-12-31 16:19:50 to 2013-12-31 17:51:15. Althoughthe correlation between the captured packet’s time span(seen in Figure 16) and the tweets are not as close as theabove two cases, the attack packets and tweets occurredwithin one hour and forty minutes of each other. Thismay be a coincidence, but may be evidence that the ZA1server was not being used until later in the attack. On asubreddit for Runescape [20] there were many complaintsabout connection and lag issues on 31 December, whichcommenters blamed on the DERP Trolling DDoS attack.

EVE Online: EVE Online§ is a massively multiplayeronline (MMO) game set in a futuristic space universe. Thetimestamps on the tweets (refer to Figure 17) range from2013-12-31 20:19:25 to 2013-12-31 21:36:22. The timebetween the tweet and the first packet received (refer toFigure 18) was one hour and forty minutes (as above).This could also be a coincidence, but may show that fortwo hours after the attack started around 20:20, the ZA1server was used to generate extra traffic. In a post on theEve Online forums¶ , one of the game developers postedat 2013-12-31 21:51:24 that were the targets of a DDoSattack. This was followed by other commenters on theforum confirming that they were not able to access thegame.

‡‡ http://www.runescape.com/§ http://www.eveonline.com/¶ https://forums.eveonline.com/default.aspx?g=posts&m=4059746

Figure 15: Tweet by Derp Trolling about Runescape

Figure 16: Packets per hour for Runescape

8. CONCLUSION

This paper has presented initial exploratory analysis oftwo NTP based DDoS attack datasets. The primary focusof this analysis has been the IPv4 Time-to-Live valuesobserved in recorded network packets. Cases were foundwhere multiple origin systems were generating datagramswith spoofed source addresses in order to attack one victimsystem. This attack was was achieved though the thoughthe exploitation of the NTP MONLIST feature whichgenerated a substantal amplification of the original trafficvolumes in response to the spoofed packets. This wasdetermined due to the numerous source addresses observedwith multiple TTL values. Since many of the TTL valuesobserved are >230, it could mean that the attacking hostsare using the same initial TTL of 255. This was confirmedby reviewing the source of several common NTP DDoStools, which explicitly set the TTL field of the generatedpackets to the maximum value. We were also able to inferthe number attackers (or attackers sharing common routingpaths) targeting a certain victim, by finding the TTL usedfor each source address. In addition, the results show thatthese attacks utilise more than one vulnerable NTP serverduring an attack.

It was found that the main targets of these attacks aregaming related servers (as seen in the Derp Trolling casestudy), which is an expected result as gaming servers area major target, particularly in DDoS attacks focusing onbandwidth exhaustion.


Figure 17: Tweet by Derp Trolling about Eve Online

Figure 18: Packets per hour for Eve Online

8.1 Future Work

Future work with the datasets described in this work willinclude further analysis. A specific area to be exploredfurther is using the IP datagram and UDP datagram lengthsto further characterise NTP DDoS attacks, with a viewto understanding the popularity of the exploitation toolsused. Linked to the aforementioned exploration, the actualpaylaods can be analyzed and in conjunction with baseTTL’s and packet sizes, be attributed to certain toolsavailable in the wild. An analysis of the packet contentswill also be carried out as well as looking at the sourceports used by the packets and find their uses, which areexpected be game related.

REFERENCES

[1] J. Graham-Cumming. (2014, January)Understanding and mitigating NTP-basedDDoS attacks. Blog Post. CloudFlare.[Accessed: 25 June 2014]. [Online]. Avail-able: https://blog.cloudflare.com/understanding-and-mitigating-ntp-based-ddos-attacks/

[2] D. L. Mills, “Internet time synchronization: theNetwork Time Protocol,” Communications, IEEETransactions on, vol. 39, no. 10, pp. 1482–1493,1991.

[3] C. Rossow, “Amplification hell: Revisiting networkprotocols for DDoS abuse,” in Symposium onNetwork and Distributed System Security (NDSS),2014.

[4] BG.net. (2014, January) ”NTP reflection attack- a vulnerability in implementations ofNTP”. [Accessed: September 2014]. [Online].Available: http://bg.net.ua/content/ntp-reflection-attack-uyazvimost-v-realizatsiyakh-protokola-ntp

[5] K. J. Higgins. (2013, December) Attackers wagenetwork time protocol-based DDoS attacks. NewsArticle:. DarkReading. [Online]. Available: http://www.darkreading.com/attacks-breaches/attackers-wage-network-time-protocol-bas/240165063

[6] M. Prince. (2014) Technical details behinda 400Gbps NTP amplification DDoS attack.Online document. Cloud Flare. [Online]. Avail-able: http://blog.cloudflare.com/technical-details-behind-a-400gbps-ntp-amplification-ddos-attack

[7] M. Mimoso. (2014, June) Dramatic Dropin Vulnerable NTP Servers Used in DDoSAttacks. Online Article. threat post. [Accessedon: 9 October 2015]. [Online]. Available:http://threatpost.com/dramatic-drop-in-vulnerable-ntp-servers-used-in-ddos-attacks/106835

[8] Arbor Networks. (2014, October) Arbor Networks’ATLAS Data Shows Reflection DDoS AttacksContinue to be Significant in Q3 2014.Press Release. Arbor Newtorks. [Accessedon: 9 October 2014]. [Online]. Available:http://www.arbornetworks.com/news-and-events/press-releases/recent-press-releases/5283-arbor-networks-atlas-data-shows-reflection-ddos-attacks-continue-to-be-significant-in-q3-2014

[9] M. Kuhrer, T. Hupperich, C. Rossow, and T. Holz,“Exit from hell? Reducing the impact ofamplification DDoS attacks,” in USENIX SecuritySymposium, 2014.

[10] J. Czyz, M. Kallitsis, M. Gharaibeh,C. Papadopoulos, M. Bailey, and M. Karir,“Taming the 800 pound gorilla: The rise anddecline of ntp ddos attacks,” in Proceedings ofthe 2014 Conference on Internet MeasurementConference, ser. IMC ’14. New York, NY, USA:ACM, 2014, pp. 435–448. [Online]. Available:http://doi.acm.org/10.1145/2663716.2663717

[11] G. Huston. (2014, March) NTP for Evil. Blog post.APNIC. [Accessed on: 9 January 2016 [Online].Available: http://labs.apnic.net/blabs/?p=464

[12] T. Yuzawa. (2014, February) One-liner iptablesrule to Filter NTP Reflection on Linux Hypervisor.[Accessed on: 20 October 2014[. [Online]. Avail-able: http://packetpushers.net/one-liner-iptables-rule-to-filter-ntp-reflection-on-linux-hypervisor/

[13] R. Dobbins and J. Braunegg. (2014, February)Micron21 - NTP Reflection (Amplification) DDoSAttack - Request Packet. Online Article. Micron21.[Accessed on: 25 October 2014]. [Online]. Available:http://www.micron21.com/ddos-ntp.php


[14] NSFOCUS. (2014, February) NTP amplificationattacks are on the rise? (Part1). Blog Post. NSFOCUS. [Accessedon: 12 October 2014]. [Online].Available: http://nsfocusblog.com/2014/02/04/ntp-amplification-attacks-are-on-the-rise-part-1/

[15] StPaddy. (2014, February) VMWare esxi 3.x- High Bandwidth. Online forum. SpiceWorks. [Accessed: September 2014]. [Online].Available: http://community.spiceworks.com/topic/445704-vmware-esxi-3-x-high-bandwidth

[16] Tatung University. IP traffic school. Tatung Univer-sity. [Accessed: September 2014]. [Online]. Avail-able: http://traffic.ttu.edu.tw/gigaflow/extipflow.php?start=2014-02-16&end=2014-02-16&OrderBy=stin

[17] S. Bogos. (2013, December) Update: Hackers BringDown LoL, DoTA 2, Blizzard, EA Servers. NewArticle. The Escapist. [Accessed on: 23 July 2014].

[Online]. Available: http://www.escapistmagazine.com/news/view/130941-Update-Hackers-Bring-Down-LoL-DoTA-2-Blizzard-EA-Servers

[18] xrxleader. (2013, January) Error code 1046.Forum Post. [Accessed on: 12 August 2014].[Online]. Available: http://www.gamefaqs.com/boards/950873-dc-universe-online/68234564

[19] A. Garreffa. (2014, January) DERP takes down EAlogin, Origin is not working at all. News article.[Accessed on: 23 August 2014]. [Online]. Available:http://www.tweaktown.com/news/34601/derp-takes-down-ea-login-origin-is-not-working-at-all/index.html

[20] AlmostNPC. (2013, December) Connection andlLag issues... Forum post. Reddit. [Accessedon: 23 August 2014]. [Online]. Available:https://www.reddit.com/r/runescape/comments/

1u3j34/connection%5C and%5C lag%5C issues/


DETECTING DERIVATIVE MALWARE SAMPLES USINGDEOBFUSCATION-ASSISTED SIMILARITY ANALYSIS

P. Wrench∗ and B. Irwin†

∗ Department of Computer Science, Rhodes University, Grahamstown, South Africa. Email:[email protected]† Department of Computer Science, Rhodes University, Grahamstown, South Africa. Email:[email protected]

Abstract: The abundance of PHP-based Remote Access Trojans (or web shells) found in the wild hasled malware researchers to develop systems capable of tracking and analysing these shells. In the past,such shells were ably classified using signature matching, a process that is currently unable to copewith the sheer volume and variety of web-based malware in circulation. Although a large percentage ofnewly-created webshell software incorporates portions of code derived from seminal shells such as c99and r57, they are able to disguise this by making extensive use of obfuscation techniques intended tofrustrate any attempts to dissect or reverse engineer the code. This paper presents an approach to shellclassification and analysis (based on similarity to a body of known malware) in an attempt to createa comprehensive taxonomy of PHP-based web shells. Several different measures of similarity wereused in conjunction with clustering algorithms and visualisation techniques in order to achieve this.Furthermore, an auxiliary component capable of syntactically deobfuscating PHP code is described.This was employed to reverse idiomatic obfuscation constructs used by software authors. It was foundthat this deobfuscation dramatically increased the observed levels of similarity by exposing additionalcode for analysis.

Key words: Similarity Analysis, Code Hiding, PHP Malware, Remote Access Trojan

1. INTRODUCTION

PHP’s popularity as a hosting platform [1] has madeit the language of choice for developers of RemoteAccess Trojans (RATs) and other malicious software[2]. This software is typically used to compromiseand monetise web platforms, providing the attackerwith basic remote access to the system, including filetransfer, command execution, network reconnaissance,and database connectivity. Once infected, compromisedsystems can be used to defraud users by hosting phishingsites, perform Distributed Denial of Service (DDOS)attacks, or serve as anonymous platforms for sending spamor other malfeasance [3].

Although many new shells are frequently created, trulyunique samples are rare - the vast majority of new threatsare at least partially derivative, incorporating large portionsof code from more established shells [4]. These subtledifferences are often the result of malware authors addingfunctionality or attempting to make shells more resistantto signature-based matching techniques through the useof obfuscation. By investigating idiomatic deobfuscationtechniques and different measures of similarity, this paperpresents an alternative approach to malware analysis,with the goal of eventually developing a comprehensivetaxonomy of web shells. Reference is made throughoutthe paper to work already published by the authors in thearea of code deobfuscation and normalisation [5, 6].

This paper begins with an outline of a typical web

shell and its common capabilities. The concept of codeobfuscation is also introduced, with particular emphasison how it is typically achieved in PHP. Section 2 alsodescribes the ssdeep fuzzy hashing tool and its usefulnessas a basis for similarity analysis, and discusses theconcept of data visualisation. Section 3 details how thesystem was designed and implemented, outlining both thedeobfuscation process and the construction of similaritymatrices and visual representations of sample similarity.The results obtained during system testing are presented inSection 4. Section 5 concludes the paper before ideas forfuture work and improvement are presented in Section 6.

2. BACKGROUND AND RELATED WORK

This section begins by detailing research already carriedout by the author into the creation of a module capable ofsyntactically deobfuscating PHP code [5]. This includesa description of the structure and capabilities of typicalweb shells and an overview of idiomatic code obfuscationtechniques. The latter part of the section introduces theconcept of code similarity and the various methods oftesting for it, with particular emphasis on context-triggeredpiecewise hashing (CTPH) algorithms. The sectionconcludes by briefly describing two methods of visualisingdata similarity.

2.1 Web Shells

Remote Access Trojans (or web shells) are small scriptsdesigned to be uploaded onto production servers. Once

Based on: “Towards a PHP Webshell Taxonomy using Deobfuscation-assisted Similarity Analysis”, by P. Wrench and B. Irwin which appeared in the Proceedings of Information Security South African (ISSA) 2015, Johannesburg, 12 & 13 August 2015. © 2015 IEEE


infected, a remote operator is able to control the server asif they had physical access to it [6, 7]. Most web shellsinclude features such as access to the local file system,keystroke logging, registry editing, and packet sniffingcapabilities [3].

2.2 Code Obfuscation and PHP

Code obfuscation is a program transformation intendedto thwart reverse engineering attempts [5]. Collberget al. [8] define a code obfuscation as a “potenttransformation that preserves the observable behaviour ofprograms”. Although often used to protect proprietarycode, code obfuscation is also employed by malwareauthors to hide their malicious code. Reverse engineeringobfuscated malware is non-trivial, as the obfuscationprocess complicates the instruction sequences, disruptsthe control flow and makes the algorithms difficult tounderstand.

As a procedural language with object-oriented features,PHP can be obfuscated using all of these methods. Of themany built-in functions included in the core distributionof PHP, just two code execution functions account forthe majority of code hiding efforts and are specificallymarked by the PHP Group as being potentially exploitable[5, 9, 10].

As a result of its ability to execute an arbitrary string asPHP code, the eval() function is widely used as a methodof hiding code. The potential for exploitation is so greatthat the PHP Group includes a warning against its use,advising that it only be used in controlled situations, andthat user-supplied data be strictly validated before beingpassed to the function [11].

The eval() function is often combined with auxiliarystring manipulation functions to form the followingobfuscation idiom [3]:

eval(gzinflate(base64_decode(’GSJ+S...’)));

The string containing the malicious code is compressedbefore being encoded in base64. At runtime, the processis reversed. The code that is produced is then executedthrough the use of the eval() function.

The preg_replace() function is used to perform a regularexpression search and replace in PHP [12]. Although thisdoesn’t present a problem in itself, the deprecated ’/e’modifier allows the resultant text to be executed as PHPcode (in effect causing an eval() function to be applied tothe result). An example of the use of the preg_replace()function for hiding code is shown in the following codeextract:

The example shows a very simple preg_replace()function that searches for the pattern ‘x’ in the string ‘y’,

preg_replace(’/x/e’, ’echo($a);’, ’y’);

replaces it with the string ‘echo($a);’ and then evaluatesthe resulting code. In this case, the text contained in the $avariable would be displayed if the code was executed.

2.3 Lexical Analysis and the Zend Engine

Parsing is defined as the process of analysing a string ofsymbols to determine whether it conforms to the rules laidout by a formal grammar [13]. In the field of ComputerScience, the first step in the parsing process is referred to aslexical analysis, which is the process of converting a stringof symbols into a sequence of meaningful tokens [14]. InPHP, lexical analysis is carried out by the Zend Engine,an open source interpreter originally developed by AndiGutmans and Zeev Suraski [15].

The Tokenizer PHP extension [16] provides an interfaceto the lexical analyser used by the Zend Engine. Usingthis interface, it is possible to carry out token-based sourcecode analysis and modification without the need for acustom parser. Of particular interest to this research arethe token_get_all() function and the T_FUNCTION tokentype, which can be used in combination to locate andextract function names and bodies (see Section 3.3 formore detail on these processes).

2.4 Fuzzy Hashing and Ssdeep

Hashing is a technique commonly used in forensic analysisthat transforms an input string of arbitrary length intoa fixed-length signature [17]. Once generated, thesesignatures can then be used to efficiently match identicalfiles. Traditional cryptographic hashing algorithms suchas MD5 and SHA256 are designed so that changing justone bit in the input file will lead to the generation ofa completely different hash signature. This approach,although ideal for matching identical files, makesthese algorithms incapable of matching files that aremerely similar. For this purpose, it is necessary touse context-triggered piecewise hashing (CTPH) [18].Also known as fuzzy hashing, this technique combinespiecewise hashing and rolling hashes to create a hash thatis composed of values that only depend on part of the input.Piecewise hashing is the process of breaking an input intochunks and hashing these chunks separately, which meansthat changing part of the input file will only affect part ofthe resulting hash [19]. Because of this property, CTPHcan be used to identify similar files as well as identicalfiles. The rolling hash is used to provide the trigger pointsfor separating the input into chunks by monitoring thecontext, which in this case is represented by the last ncharacters in a file [17].

Ssdeep is a hashing tool that was developed by JesseKornblum in 2006 [17]. It is capable of using CTPH


to generate fuzzy hashes that can then be compared todetermine the similarity of a set of files. The similarityvalue that the tool generates represents the edit distancebetween two fuzzy hashes (i.e. the number of changesthat need to be made to convert the one hash into theother). As a result of its combination of both rolling andpiecewise hashes, the tool’s hashing algorithm is morecomputationally intensive than other algorithms such asMD5, but it is a far more effective way of identifying codereuse in similar files.

2.5 Data Visualisation

Data visualisation is the process of representing mundanedata (such as numerical values) as visual objects withthe aim of increasing accessibility and understanding[20]. Successful visualisation techniques should assist theviewer with analytical tasks such as making comparisonsand identifying patterns in data. Although a widevariety of data visualisation structures exist, heatmaps anddendrograms are considered the most adept at highlightingsimilarity and relationships, and were thus selected as thetools for visualising the results of this research.

Heatmaps: Heatmaps are used to display each value ina given matrix as a colour that represents the magnitudeof that value. Because of this property, the structures canbe used to easily identify values (or areas of values) thatrepresent a high level of similarity.

Dendrograms: Dendrograms are tree-like structures thatcan be used to display relationships that result fromhierarchical clustering algorithms. The hierarchical natureof the dendrograms produced in this way allows for theidentification of derivative sample relationships, as well asthe magnitude of such relationships.

3. DESIGN AND IMPLEMENTATION

This section begins by describing the decoder, whichwas developed and tested in previous research [5, 6] andis responsible for code deobfuscation and normalisationprior to analysis. The script’s primary decode() functionis also outlined, along with its two auxiliary functions,processEvals() and processPregReplace(), beforeViper, (the malware analysis framework used in thisresearch) is discussed. Four individual preprocessingmodules are then introduced, each of which representa unique measure of similarity. A brief description ofthe batch modules and their respective configurations isprovided, as well as an overview of the module which isresponsible for the creation of similarity matrices. Finally,the visualisation modules that are used to interpret anddisplay these matrices are described in Section 3.6.

3.1 The Decoder

The first of the major components developed for the systemwas the decoder, which is responsible for performingcode deobfuscation and normalisation prior to analysis.Deobfuscation is the process of revealing code that hasbeen deliberately disguised, while code normalisation isthe process of altering the format of a script to promotereadability and uniformity [21].

The decoder is considered a static deobfuscator in thatit manipulates the code without ever executing it. Theadvantage of this approach is that it suffers from noneof the risks associated with malicious software execution,such as the unintentional inclusion of remote files, theoverwriting of system files, and the loss of confidentialinformation. Static analysers are, however, unable toaccess runtime information (such as the value of a variableat a given point in the execution or the current programstate) and are thus limited in terms of behavioral analysis.

The purpose of this component is to expose theunderlying program logic of an uploaded shell byremoving any layers of obfuscation that may havebeen added by the shell’s developer. This process iscontrolled by the decode function, which makes useof two core supporting functions, processEvals() andprocessPregReplace() and is described below.

Decode: The part of the decode script responsible forremoving layers of obfuscation from PHP shells is thedecode() function. It scans the code for the two functionsmost associated with obfuscation, namely eval() andpreg_replace(), both of which are capable of arbitrarilyexecuting PHP code. The eval() function interpretsits string argument as PHP code, and preg_replace()can be made to perform an eval() on the result of itsregular expression search and replace by including thedeprecated ‘/e’ modifier. Furthermore, eval() is oftenused in conjunction with auxiliary string manipulation andcompression functions in an attempt to further obfuscatethe code.

Once an eval() or preg_replace() is foundin the script, either the processEvals() or theprocessPregReplace() helper function is called toextract the offending construct and replace it with thecode that it represents. To deal with nested obfuscationtechniques, this process is repeated until neither of thefunctions is detected in the code. Some code normalisationis then performed to get the output into a readable formatbefore the decoded shell is stored in the database alongsideits raw counterpart. The full pseudo-code of this processis presented in Listing 1.

ProcessEvals: The eval() function is able to evaluatean arbitrary string as PHP code, and as such is widely


BEGINFormat the codeWHILE there is still an eval or preg_replace

Increment the obfuscation depthProcess the eval(s)Format the codeProcess the preg_replace(s)Format the code

END WHILE

Perform normalisationStore the decoded shell in the database

END

Listing 1: Psuedo-code for the decode() function

used as a method of obfuscating code. The function isso commonly exploited that the PHP group includes awarning against its use - it is recommended that it only beused in controlled situations, and that user-supplied data bestrictly validated before being passed to the function. [9]

As is described in Section 2.2, authors of malicioussoftware often use the eval() function in conjunctionwith other string manipulation functions in order to furtherfrustrate reverse engineering attempts. These functionstypically compress, encode, or otherwise modify the stringargument to increase the complexity of the obfuscation andthereby increase its resilience to automated analysis. TheprocessEvals() function is able to detect and performsome of the more common string manipulation functionsin an attempt to reveal the obfuscated code. A list of thefunctions that processEvals() is currently able to detectand process is shown in Table 1.

Function Descriptionbase64_decode() Decodes data encoded by base64_encode()

gzinflate() Inflates a deflated string gzdeflate()

gzuncompress() Decompresses compressed data

str_rot13() Restores a string encoded using str_rot13()

strrev() Restores a string reversed using strrev()

rawurldecode() Decodes data encoded using rawurlencode()

stripslashes() Unescapes an escaped string

trim() Strips whitespace from the edges of a string

Table 1: Auxiliary string manipulation functions that arehandled by processEvals()

The processEvals() function was designed to beextensible. At its core is a switch statement that isused to apply auxiliary functions to the string argument.Adding another function to the list already supported bythe system can be achieved by simply adding a case forthat function. In future, the system could be extended totry and apply functions that it has not encountered beforeor been programmed to deal with.

Listing 2 shows the full pseudo-code of theprocessEvals() function. To begin with, string

processing techniques are used to detect the eval()constructs and any auxiliary string manipulation functionscontained within them. The eval() is then removedfrom the script and its argument is stored as a stringvariable. Auxiliary functions are detected and stored in anarray, which is then reversed allowing each function to beapplied to the argument. The result of this process is thenre-inserted into the shell in place of the original construct.

BEGINWHILE there is still an eval in the script

IF the eval contains a string argumentFind the starting positionFind the end positionRemove the eval from the scriptExtract the string argumentCount the number of auxiliary functionsPopulate the array of functionsReverse the array

FOR every function in the reversed arrayApply the function to the argument

END FOREND IFInsert the resulting code

END

Listing 2: Psuedo-code for the processEvals() function

ProcessPregReplace: The preg_replace() function isused to perform a regular expression search and replacein PHP [10].

BEGINWHILE there is still a preg_replace

Find the starting positionFind the end positionRemove the preg_replace from the scriptExtract the string argumentsRemove the ’/e’ from first argument

to prevent evaluationPerform the preg_replaceInsert the deobfuscated code

END WHILEEND

Listing 3: Psuedo-code for the processPregReplace()function

Listing 3 shows the full pseudo-code of theprocessPregReplace() function. It is tasked withdetecting preg_replace() calls in a script and replacingthem with the code that they were attempting to obfuscate.In much the same way as the processEvals() function,string processing techniques are used to extract thepreg_replace() construct from the script. Its threestring arguments are then stored in separate stringvariables and, if detected, the ‘/e’ modifier is removedfrom the first argument to prevent the resulting text frombeing interpreted as PHP code. The preg_replace() canthen be safely performed and its result can be insertedback into the script.


Normalise: Many of the outputs of the feature extractionmodules described later in this chapter are affected by thelayout of the scripts that are passed to them. Furthermore,it was found that the deobfuscation operations performedby the processEvals() and processPregReplace()functions often produced unpredictable and irregularlyformatted code. In order to mitigate the effects of arbitraryformatting constructs on the results of the similarityanalysis process, the normalise() function was created.

The purpose of the normalise() function is to apply auniform formatting convention to every shell sample afterthe deobfuscation process is completed. A useful wayof achieving this is to pass the script to a PHP parserwhich then creates an AST. All the original formatting islost during the parsing process, as the AST only storesthe lexical tokens found in the script. These tokens canthen be output according to a predefined set of formattingrules, ensuring that every sample conforms to the sameformatting scheme.

Although the Zend engine that is used to interpret PHP canbe used to split source code into an array of PHP tokens,it lacks the functionality to construct an AST and output itin a uniform way. For this reason, an open source lexicalparser called PHP Parser was used to construct the ASTand overwrite the existing sample text.

3.2 The Viper Framework

Viper [22] is a unified framework designed to facilitate thestatic analysis of arbitrary files. It consists of commands(core functions used to open, close, delete, and tag filesamples) and modules, which are dynamically loaded andcan be run against either an open file or any number offiles from the database. This modular design makes theframework highly extensible - additional functionality canbe added by simply creating a new module. It is thisextensibility that prompted Viper’s use as a basis for thisresearch.

Projects: Malware samples in Viper can be organisedinto separate projects [23]. Every project maintains itsown repository of binary files, and an arbitrary number ofprojects can be created. All commands and modules inViper can only be run against samples that form part of theproject that is currently open.

Viper projects are particularly useful when dealing withlarge malware collections, as they allow specific familiesof samples to be stored and analysed separately. Onceit has been determined that a group of samples share acommon feature, it is a simple matter to transfer thesesamples into a new project for further analysis. Tests runagainst a smaller selection of samples are more expedient,and the resulting graphs are more concise, allowing forfaster and more accurate conclusions to be drawn.

Sessions: Access to a specific malware sample in Viperis achieved by opening a Viper session [24], either bysearching for the sample by name or by specifying its MD5hash. Most of the commands and the modules providedin the core Viper framework are designed to be run on asingle file and require a session, but any module can accessmultiple files by retrieving them from the database (seeSection 3.2 for information about how this is achieved).

Session objects are used to provide modules withinformation about the sample that is currently open. Aglobal __sessions__ object provides access to the currentsession object (__sessions__.current), a list of allopen session objects (__sessions__.sessions), and alist containing the results of the last find command that wasexecuted (see Table 3 for more information on commandsin Viper). A summary of the information that each sessionobject encapsulates is provided in Table 2.

Session Attribute Descriptioncurrent.file.path The absolute path of the current file

current.file.name The name of the current file

current.file.size The size (in bytes) of the current file

current.file.type The type and encoding of the current file

current.file.mime The MIME type of the current file

current.file.md5 The MD5 hash of the current file

current.file.sha1 The SHA-1 hash of the current file



current.file.crc32 The CRC-32 check value for the current file

current.file.tags A list of tags attached to the current file

Table 2: Attributes of a __session__ object in Viper

The individual modules developed for this research (anddescribed in Section 3.3) all require that an active sessionbe open on the sample that needs to be processed. Thisis because these modules rely on the session attributeslisted in Table 2 in order to perform their respective tasks.An extract from the Decode.py module shown in Listing4 demonstrates how the is_set() function of the global__sessions__ object is used to check for the presence ofan open session on line eight.

1 class Decode(Module):2 cmd = ‘decode’3 description = ‘Reveals hidden code’45 def run(self):67 # Check for an open session8 if not __sessions__.is_set():9 print_error(’No session opened’)

10 return11 ...

Listing 4: Extract from the Decode.py module demonstrat-ing the use of the is_set() function


Database: The Viper sessions discussed in the previoussection provide a more accessible way to accessinformation about a single sample without resorting todatabase queries. If a module requires access to multiplesamples, it must import and interact with the Databaseclass, which acts as a wrapper for the SQLLite databaseused to organise and store malware samples. Onceimported, the Database object can be used to access thelocal project repository through the use of the find()function, which accepts a key and a value as searchparameters.

The batch modules described in Section 3.4 all make useof the Database class’s find() function to retrieve andprocess all samples in a given project. An extract fromthe Decode All.py module shown in Listing 5 details howthis is achieved on lines fourteen and fifteen.

1 from viper.core.database import Database23 class Decode_All(Module):4 cmd = ’decode_all’5 description = ‘Reveals hidden code’6 in all samples’78 def run(self):9

10 # Get Viper’s root path11 viper_path = __project__.get_path()1213 # Retrieve all samples from the database14 db = Database()15 samples = db.find(key=‘all’)1617 # Decode all samples18 for sample in samples:19 ...

Listing 5: Extract from the Decode All.py moduledemonstrating the use of the find() function

Commands: Simple sample access and modificationoperations in Viper are carried out using commands [25].This set of core operations allows a user to open, close,delete, store, or tag an open binary file, as well as displayan overview of its characteristics. Table 3 details all theavailable Viper commands and their respective uses.

3.3 The Indivdual Modules

Three preprocessing modules were created to processsamples in different ways to prepare them for similarityanalysis. Each of these modules was designed tobe run against a single shell sample, and require thata Viper session already exists (see Section 3.2 formore information on sessions in Viper). BDecode.pyprocesses samples in their entirety and produces a newfile, whereas Functions.py and FunctionBodies.py extractrelevant features for analysis.

Command Descriptionclear Clears the console window

close Closes the current session

delete Deletes the current file

exit Terminates the current execution of Viper

export Saves the surrent session to a specified file

find Searches for a file using a name or hash

help Displays the help dialogue

info Display an overview of the current file

new Creates a new file

notes Allows file notes to be viewed, edited, or deleted

open Opens a specified file using either its SHA-1 or MD5 hash

projects Lists all existing projects

sessions Lists all open sessions

store stores a specified file or folder in the local repository

tags Allows associated file tags to be modified

Table 3: Viper’s core commands

Decode.py: The purpose of the Decode.py module isto remove idiomatic PHP obfuscation constructs from asingle sample, thereby exposing more code for analysisand processing by the other three individual modules (allof which can be run on either raw or decoded samples forthe purposes of comparison). It does this by accessing theViper session, retrieving the open file, and passing it to theDecode.php script, the details of which are described inSection 3.1. Once the script has reached completion, theresulting code is stored alongside the original script in theViper repository.

FunctionBodies.py: The purpose of the FunctionBodies.pyscript is to extract the contents of all user-defined functionbodies present in a malware sample for subsequentcomparative analysis. The identification and extraction ofthese bodies required that the samples be separated intotokens, which was more easily achieved using PHP itself.For this reason, the FunctionBodies.py script makes use ofan external PHP script, as is the case with Functions.py andit’s accompanying Functions.php script.

Functions.py: The purpose of the Functions.py scriptis to extract the names of any user-defined functions ina given sample. To do this, it makes use of PHP’sTokenizer extension to split a sample into tokens beforelooping through each token in search of the T FUNCTIONtoken type. Once this token type is found, the nextstring (representing the name of the function) is stored.Because the Tokenizer is implemented in PHP, an externalPHP script called Functions.php was used to performthe name extraction process and return the results to theFunctions.py script.


3.4 The Batch Modules

The batch modules contain no feature extraction or sampleprocessing capabilities of their own, but rather apply eachof the individual modules to all of the samples in thecurrent project (see Section 3.2 for more information onprojects in Viper). The purpose of the batch modules isto prepare an entire collection of samples for comparisonby the Matrix.py module. Each of the command lineoptions contained in this module (apart from a specialcase involving unprocessed samples) require that a specificbatch module already be complete. A list of the batchmodules and a short description of their functionality isshown in Table 4.

Module DescriptionDecodeAll.py Reveals hidden code for all samples

FunctionBodiesAll.py Extracts function bodies from all samples

FunctionsAll.py Creates a list of functions for all samples

Table 4: The batch modules and their descriptions

3.5 The Matrix Module

The purpose of the Matrix.py module is to producematrices that represent the observed similarity betweenall samples in a given collection based on a specifiedmeasure of similarity. It relies on the feature extraction andsample processing performed by the aforementioned batchfunctions (which in turn rely on the individual functions toperform their tasks).

Several options can be passed to the matrix module. Eachoption represents the measure of similarity that should beused to generate a similarity matrix. If one would like toview the number of user-defined function name matchesbetween raw shells in a project, for example, the commandwould be ’matrix -f raw’. To make use of the same measureof similarity (i.e. function name matches) on decodedshells in a project, the command would be ’matrix -fdecoded. A full list of the available option combinationsis shown in Table 5.

Options Description-r Compares raw samples using ssdeep

-d Compares decoded samples using ssdeep

-b raw Compares the function bodies of raw samples

-b decoded Compares the function bodies of decoded samples

-f raw Compares the function names of raw samples

-f decoded Compares the function names of decoded samples

Table 5: The possible option combinations for Matrix.py

Each option (or measure of similarity) in the Matrix.pymodule is associated with a validation function and acomparison function. The validation function ensures thatthe batch functions needed to create the required files

have been run successfully, and the comparison functioncalculates the observed similarity between two given files.A completed matrix represents the collation of the resultsreturned by the comparison function for every pair ofsamples in the project.

3.6 The Visualisation Modules

The purpose of the visualisation modules is to createa graphical representation of a given similarity matrix.These representations are easier to interpret, and can bestudied to discover relationships between samples.

Heatmap.py: The Heatmap.py module is used to displayeach value in a given matrix as a colour that represents themagnitude of that value. Heatmaps can be generated frommatrices created using any of the measures of similaritylisted in Table 5. Clusters of dark colours represent areasof greater similarity, while lighter areas indicate a lack ofsimilarity.

Dendrogram.py: Dendrograms are tree-like structuresthat can be used to display relationships that resultfrom hierarchical clustering algorithms. Dendrogram.pyperforms this clustering and displays the resulting figure,and can be run on any matrix created using the measures of similarity listed in Table 5. The hierarchical natureof the dendrograms produced in this way allows for theidentification of derivative sample relationships, as well asthe magnitude of such relationships.

4. RESULTS

This section begins with a description of the collection ofsamples that was used for testing purposes. It then goes onto evaluate the effectiveness of the Decoder.py module andits attempts to normalise and deobfuscate samples prior tosimilarity analysis. A case study involving the c99 familyof shells is then presented to demonstrate the results of theaforementioned analysis.

4.1 Test Data

Although the malware samples used in this research wereobtained from a variety of online sources (see Table 6for a detailed source breakdown), the vast majority wereretrieved from the collection of samples maintained bythe VirusTotal online analysis service. VirusTotal allowsresearchers and commercial clients with access to a privatekey to download samples that have been submitted by otherusers. This is achieved by making scripted search anddownload queries to the service’s online API.

The query string was parameterised in such a way as tolimit the results to RATs written in PHP. Additionally, onlysamples that have been identified as being malicious by at


Source Number of ShellsVirusTotal.com 1978Insecurety.net 87c99shell.gen.tr 21

r57shell.net 7r57.gen.tr 10hoco.cc 35

Table 6: Sample Source Breakdown

least one antivirus engine are included in the request. Thefull parameterised query is shown below:

params = urllib.urlencode({‘apikey ’: key ,‘query ’: ‘type:php engines:"Backdoor:PHP"positives:1+’})

File sizes among the 2138 shell samples ranged from 1.1kbto 546kb. An MD5 hash was generated for each file andcompared to the hashes of every other file to ensure that notwo files were identical. This was further reinforced duringthe comparison of the fuzzy hashes - 100% similarity wasonly ever observed when a shell was compared againstitself.

4.2 Decode.py Tests

The decoder is responsible for performing code normalisa-tion and deobfuscation prior to execution in the sandbox,with the goal of exposing the program logic of a shell. Assuch, it can be declared a success if it is able to remove alllayers of obfuscation from a script (i.e., if it removes alleval() and preg_replace() constructs). The tests forthis component progressed from scripts containing simple,single-level eval() and preg_replace() statements tomore comprehensive tests involving auxiliary functionsand nested obfuscation constructs. The results of thesetests are omitted from this paper for the sake of brevity, butcan be found in work previously published by the authors[5, 6].

4.3 Similarity Analysis Case Study: The c99 Family ofShells

Given the prohibitive size of the graphs generated whenrun against the entire collection of shells, it proved moreexpedient to demonstrate the results produced by thevisualisation modules with a smaller subset of samples.The samples used in this case study contain seven variantsof the popular c99 shell, which are listed below:

1. c99.txt

2. c99-bd.txt

3. c99-locus.txt

4. c99-mad1.txt

5. c99-mad2.txt

6. c99-v1.txt

7. c99-ud.txt

For testing purposes, all of the option combinations werepassed to the Matrix.py module in order to create allpossible similarity matrices. These matrices were thenprocessed by the visualisation modules to produce bothheatmaps and dendrograms for every matrix.

Heatmap.py Tests: The measure of similarity thatwas chosen to demonstrate the output produced bythe Heatmap.py module was the user-defined functionmatching module (FunctionBodies.py) outlined in Section3.3. The FunctionBodiesAll.py batch module was runagainst the family of c99 shells described in the previoussection in both raw and decoded form, and the Matrix.pymodule was then used to create two similarity matricesbased on the extracted function bodies. The matrix basedon raw samples is shown in Figure 1, and the matrixbased on decoded samples is shown in Figure 2. Afterrunning the Heatmap.py module against both matrices, theheatmaps shown in Figures 3 and 4 were produced. Darkercolours represent a high level of similarity and vice versa.

Figure 1: Similarity matrix based on the function bodiesextracted from raw c99 family shells

Figure 3 reveals a relatively sparse distribution ofsimilarity, with high values only occurring as a resultof comparing samples against themselves. Of particularinterest are the ud.txt and mad1.txt samples, which exhibitno similarity to the other shells in their raw forms.Clustering algorithms using this figure as an input wouldconclude that these two shells were not part of the c99family of shells.

The similarity shown in Figure 4 differs slightly from thatin Figure 3. In each case the values either increased or


Figure 2: Similarity matrix based on the function bodiesextracted from decoded c99 family shells

Figure 3: Similarity heatmap based on the function bodiesextracted from raw c99 family shells

remained the same, which is to be expected when a largerportion of code is available for analysis. The decodedud.txt and mad1.txt samples in particular demonstrated afar greater overall level of similarity to the rest of thecollection. Upon examination of both the raw and decodedsamples, it was discovered that these two shells were bothencapsulated in eval() statements, which explains boththeir lack of similarity in Figure 3 and the subsequentincrease shown in Figure 4.

Dendrogram.py Tests: The same measure of similarity(i.e. the comparison of extracted user-defined functionbodies) was used to demonstrate the capabilities of theDendrogram.py module so as to avoid the inclusion of twonew matrices. Reference can therefore be made to thematrices depicted in Figures 1 and 2. The figures that wereproduced once the Dendrogram.py module had been runagainst these two matrices are shown in Figures 5 and 6respectively.

Figure 4: Similarity heatmap based on the function bodiesextracted from decoded c99 family shells

Figure 5: Similarity dendrogram based on the function bodiesextracted from raw c99 family shells

The height of each cluster in a dendrogram representsthe average distance between all inter-cluster pairs, andtherefore the level of similarity between the samples thatform that cluster. The lower the cluster height, the greaterthe similarity, and vice versa. As an example, consider thedendrogram shown in Figure 6. The c99.txt sample is moresimilar to mad1.txt than ud.txt is to locus.txt, because thefirst cluster is lower than the second. The two most similarsamples are mad2.txt and bd.txt, because their cluster isthe lowest on the dendrogram. These observations aresupported by the values in the matrix shown in Figure 2,as the highest similarity value between two different shellsis 65, which occurs between mad2.txt and bd.txt.

The difference between the similarity observed amongstraw and decoded samples is even more apparent from thechange in the shape of the dendrogram from Figure 5 andFigure 6. The only pair of samples with any meaningfullevel of similarity in Figure 5 was observed between thev1.txt and bd.txt samples. As was the case with the


Figure 6: Similarity dendrogram based on the function bodiesextracted from decoded c99 family shells

heatmaps in Section 4.3, all sample relationships eitherstrengthened or remained the same.

4.4 Similarity Analysis Case Study: Cluster Identifica-tion

Although the smaller c99 case study is useful fordemonstrating the similarity analysis process in a moreconcise and manageable way, the goal of the system isto identify areas of interest within a larger dataset. Oncefound, these areas could be subjected to further analysis.In order to demonstrate this process, a random selectionof 150 raw shells was used to create a large heatmap thatcould be used to identify areas of elevated similarity. Themeasure of similarity used for this case study was thepercentage of matching function names, which drew onthe function lists created by the Functions.py module. Onesimilarity cluster was then identified and expanded uponby running the clustered samples through the decoder andthen rendering the cluster again to gauge any differences inobserved similarity.

Figure 7 shows the heatmap that was obtained by runningthe 150 shells through the analysis process. Once thiswas completed, an area of interest was selected for thepurposes of demonstration. An enlarged version of thisarea is displayed in Figure 8. In order to more accuratelydetermine how similar this collection of samples was, allof the shells were run through the decoder, a new matrixwas created, and a new heatmap was rendered, as is seen inFigure 9. A comparison of the heatmaps created before andafter the deobfuscation process (shown in Figures 8 and 9respectively) highlights the improvement in similarity dueto the increased availability of code for analysis.

5. CONCLUSION

The primary goal of this research was to determine thepatterns of similarity within a collection of malware

samples. This was achieved by using four differentmeasures of similarity to create representative similaritymatrices, and then visualising and interpreting thesematrices graphically. Section 4.3 demonstrates the resultsof this process, and outlines how conclusions relating tosample similarity can be drawn by consulting either thematrices or their graphical representations. In addition tothis, it was demonstrated that the deobfuscation processdescribed in Section 3.1 was successfully able to increasethe amount of code available for comparison, and therebyincrease the accuracy of the similarity analysis process asa whole.

6. FUTURE WORK

The development of different methods of similarityanalysis and visualisation are intended to be used as a toolsfor creating detailed webshell taxonomies in the future.To this end, alternate methods of comparing shell samplesneed to be examined and other research into the evolutionof malware needs to be investigated.

6.1 Alternative Shell Comparison Methods

Although the four measures of similarity discussed inSection 3.3 are useful as measures of similarity, theyrepresent only a few approaches to the detection of codereuse in webshells. In future, a thorough evaluation ofalternate classification methods could be carried out todetermine which approach (or combination of approaches)is most accurate. The following methods could beconsidered:

• HTML output matching

• Control graph matching

• Dynamic sandbox analysis

• Line-by-line analysis

• N-gram analysis

• Normalised compression distance

6.2 A Webshell Taxonomy

It is envisioned that this work will eventually lead tothe construction of a taxonomy tracing the evolutionof popular web shells such as c99, r57, b374k andbarc0de [26] and their derivatives. This would involve theimplementation of several tree-based structures that havethe aforementioned shells as their roots and are able toshow the mutation of the shells over time. Such a taskwould build on research into the evolutionary similarityof malware already undertaken by Li et al. [27], andwould draw on the deobfuscation and similarity analysiscapabilities described in this paper.


Figure 7: Similarity heatmap based on the function names extracted from a random selection of 150 raw shells

Figure 8: Focussed similarity heatmap based on the cluster identified in Figure 7


Figure 9: Similarity heatmap based on decoded version of the samples shown in Figure 8

REFERENCES

[1] K. Tatroe, Programming PHP. O’Reilly &Associates Inc, 2005.

[2] N. Cholakov, “On some drawbacks of the PHPplatform,” in Proceedings of the 9th InternationalConference on Computer Systems and Technologiesand Workshop for PhD Students in Computing,ser. CompSysTech ’08. New York, NY, USA:ACM, 2008, pp. 12:II.7–12:2. [Online]. Available:http://doi.acm.org/10.1145/1500879.1500894

[3] M. Landesman. (2007, March) Malware Revolution:A Change in Target. Microsoft. Accessed on 1 March2013. [Online]. Available: http://technet.microsoft.com/en-us/library/cc512596.aspx

[4] M. Doyle, Beginning PHP 5.3. Wiley, 2011.[Online]. Available: http://books.google.co.za/books?id=1TcK2bIJlZIC

[5] A. N. Other, “Towards a sandbox for the deobfusca-tion and dissection of php malware,” in InformationSecurity for South Africa (ISSA), 2014. IEEE, 2014,pp. 1–8.

[6] ——, “A sandbox-based approach to the deobfus-cation and dissection of php-based malware,” SouthAfrican Insitute of Electrical Engineers, vol. 106, pp.46–63, 2015.

[7] R. Kazanciyan. (2012, December) Old Web Shells,New Tricks. Mandiant. Accessed on 1 March2013. [Online]. Available: https://www.owasp.org/images/c/c3/ASDC12-Old Webshells New Tricks

How Persistent Threats haverevived an old ideaand how you can detect them.pdf

[8] C. Collberg, C. Thomborson, and D. Low, “A taxon-omy of obfuscating transformations,” Department ofComputer Science, The University of Auckland, NewZealand, Tech. Rep., 1997.

[9] The PHP Group. (2013, May) Eval. Accessedon 16 October 2013. [Online]. Available: http://php.net/manual/en/function.eval.php

[10] ——. (2013, May) Preg Replace. Accessed on 16October 2013. [Online]. Available: http://php.net/manual/en/function.preg-replace.php

[11] A. Moser, C. Kruegel, and E. Kirda, “Limits of StaticAnalysis for Malware Detection,” in Twenty-ThirdAnnual Computer Security Applications Conference,December 2007, pp. 421–430.

[12] M. Christodorescu and S. Jha, “Testing malwaredetectors,” SIGSOFT Softw. Eng. Notes, vol. 29,no. 4, pp. 34–44, Jul. 2004. [Online]. Available:http://doi.acm.org/10.1145/1013886.1007518

[13] A. V. Aho, R. Sethi, and J. D. Ullman, Compilers:Principles, Techniques, and Tools. Boston, MA,USA: Addison-Wesley Longman Publishing Co.,Inc., 1986.

[14] P. Terry, Compiling with C# and Java. PearsonEducation, 2005.

[15] L. Ullman, PHP for the world wide web: visualquickstart guide. Peachpit Press, 2004.


[16] The PHP Group. (2013, May) Tokenizer. Accessedon 16 October 2013. [Online]. Available: http://php.net/manual/en/intro.tokenizer.php

[17] J. Kornblum. (2013, July) Context TriggeredPiecewise Hashes. Accessed on 26 October 2013.[Online]. Available: http://ssdeep.sourceforge.net/

[18] ——, “Identifying almost identical files usingcontext triggered piecewise hashing,” DigitalInvestigation, vol. 3, Supplement, pp. 91 –97, 2006, the Proceedings of the 6th AnnualDigital Forensic Research Workshop (DFRWS ’06).[Online]. Available: http://www.sciencedirect.com/science/article/pii/S1742287606000764

[19] L. Chen and G. Wang, “An efficient piecewise hash-ing method for computer forensics,” in KnowledgeDiscovery and Data Mining, 2008. WKDD 2008.First International Workshop on, Jan 2008, pp.635–638.

[20] M. Friendly and D. J. Denis, “Milestones in thehistory of thematic cartography, statistical graphics,and data visualization,” http://www. datavis. ca/mile-stones, 2001.

[21] M. Preda and R. Giacobazzi, “Semantic-BasedCode Obfuscation by Abstract Interpretation,” inAutomata, Languages and Programming, ser.Lecture Notes in Computer Science, L. Caires,G. Italiano, L. Monteiro, C. Palamidessi, andM. Yung, Eds. Springer Berlin Heidelberg, 2005,vol. 3580, pp. 1325–1336. [Online]. Available:http://dx.doi.org/10.1007/11523468 107

[22] C. Guarnieri. (2014, March) Viper officialdocumentation. Accessed on 5 August 2015.[Online]. Available: http://viper-framework.readthedocs.org/en/latest/index.html

[23] ——. (2014, March) Viper projects.Accessed on 5 August 2015. [Online].Available: http://viper-framework.readthedocs.org/en/latest/usage/concepts.html#projects

[24] ——. (2014, March) Viper sessions.Accessed on 5 August 2015. [Online].Available: http://viper-framework.readthedocs.org/en/latest/usage/concepts.html#sessions

[25] ——. (2014, March) Viper commands.Accessed on 5 August 2015. [Online].Available: http://viper-framework.readthedocs.org/en/latest/usage/commands.html

[26] T. Moore and R. Clayton, “Evil Searching:Compromise and Recompromise of InternetHosts for Phishing,” in Financial Cryptographyand Data Security, ser. Lecture Notes inComputer Science, R. Dingledine and P. Golle,Eds. Springer Berlin Heidelberg, 2009, vol.5628, pp. 256–272. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-03549-4 16

[27] J. Li, J. Xu, M. Xu, H. Zhao, and N. Zheng,“Malware obfuscation measuring via evolutionarysimilarity,” in First International Conference on

Future Information Networks, 2009, pp. 197–200.


A MANAGEMENT MODEL FOR BUILDING A COMPUTERSECURITY INCIDENT RESPONSE CAPABILITY

Roderick D. Mooi∗ † and Reinhardt A. Botha †

∗ Meraka Institute, Council for Scientific and Industrial Research, South Africa. Email:[email protected]† Center for Research in Information and Computer Security, School of ICT, Nelson MandelaMetropolitan University, South Africa. Email: [email protected]

Abstract: Although there are numerous guides available for establishing a computer security incidentresponse capability, there appears to be no underlying management model that brings them all together.This paper aims to address the problem by developing a management model for establishing a ComputerSecurity Incident Response Team (CSIRT). A design science-based approach has been selected forthe overall project. However, the current paper reports on the first three activities in design scienceresearch: identifying the problem, listing solution objectives, and designing and developing a model. Acomprehensive literature review serves two purposes: to confirm the problem and to provide a structuredway of revealing the requirement areas. Following the uncovering of the requirement areas, CSIRTbusiness requirements and services are introduced, before exploring the relationships between the areasusing argumentation. This culminates in the development of the management model in two parts: astrategic view and a tactical view. The strategic view comprises the business requirements and “higher”level decisions – the environment, constituency and funding considerations – that need to be made whenestablishing a CSIRT. The tactical view follows by presenting the “how” considerations. Together, thesetwo views provide an holistic model for establishing a CSIRT by parties interested in doing so.

Key words: CSIRT, CERT, establishing requirements, building, incident response team, cyber securityteam, security operations centre, information security capability, management model.

1. INTRODUCTION

The concept of a CERT R© or Computer Security IncidentResponse Team (CSIRT) is well known in the informationsecurity domain. It has recently received renewed interestfor novel areas of application like cloud computing (withits unique information security challenges) [1] and indeveloping countries [2, 3] catching up with the Internetbandwagon.

Popular IT news sites [4, 5], security-related conferences[6, 7] and talks of global cyber-war [8, 9] evidence thereality of the information security landscape today. Ayear of particular significance was 2014 with DistributedReflective Denial of Service (DRDoS) attacks on therise (and ongoing in 2015) [10], followed by therevealing of critical vulnerabilities in core librariesof systems connected to the Internet. Names like“Heartbleed” (OpenSSL vulnerability) and “Shellshock”(Bash vulnerability) together with the exploitation ofthese vulnerabilities [11, 12] show just how fragile thisecosystem really is. In addition, the increasing hype overbig data breaches [13, 14] has alerted those without aninformation security team to the fact that establishing oneis inevitable.

So, how do we respond and what steps are required toreach the goal of being a less attractive target? Theprimary mechanism adopted by the Internet community torecognise and deal with incidents like hacking, malwareoutbreaks and denial of service attacks is the establishmentof an incident response capability [15, p. 1] in the form

of a team responsible for incident handling. Thereare various options for this capability ranging from anad-hoc (informal) approach to dedicated CSIRTs [15, 16].These include transitional or special-purpose models likeWarning, Advice and Reporting Points (WARPs) [17], andCommunity-orientated Security, Advisory and Warning(C-SAW) teams [18]. These models are typically lessformal and offer a subset of traditional CSIRT services.Therefore, as the most complete/mature and formalcapability, this research will focus on the establishmentof a Computer Security Incident Response Team (CSIRT),which is

“an organization or team that provides servicesand support to a defined constituency for pre-venting, handling, and responding to computersecurity incidents” [19, p. 1].

The resulting management model is also useful to othertypes of teams though, as the broad areas of considerationare still applicable.

2. RESEARCH PROBLEM AND OBJECTIVES

There are a plethora of guidelines and processes to followwhen setting out to establish a CSIRT and where to startis not clear. Plentiful advice is provided regarding whichaspects to consider, without a consistent method or processto follow and thereby ensure that the most important ofthese “requirements” for establishing a CSIRT are cateredfor. A similar view is shared by [1] which highlights

Based on: “Prerequisites for building a computer security incident response capability”, by R. Mooi and R. Botha which appeared in the Proceedings of Information Security South African (ISSA) 2015, Johannesburg, 12 & 13 August 2015. © 2015 IEEE


the lack of consistency in incident management literature.Therefore, the problem addressed by this paper is the lackof a clear, holistic method to follow when setting out toestablish a CSIRT. Solving this problem is important aswe then have a means of knowing where and how tostart when building a CSIRT. This will ultimately resultin a prepared, improved and coordinated response to ITsecurity incidents.

Successfully providing CSIRT services requires anholistic approach. “Policies, procedures, equipment,premises, contacts, and staff should be established beforecommencing operations” [20, p. 32] although it is likelythat many of these items will be missing or inadequate[20]. To ensure that nothing is omitted, a structurebased on the IT Infrastructure Library (ITIL) managementframework is proposed in the following section.

The objectives of this paper are therefore to

1. analyse existing CSIRT literature, with the purposeof identifying the primary areas of requirements forbuilding a CSIRT, and

2. develop a management model based on the associa-tions revealed between these areas.

To facilitate application and readability, the managementmodel is presented in two views: a strategic view (section5) and a tactical view (section 6). Together, these viewscan be used as a framework for the task of establishing aCSIRT.

As the methodology used to achieve these objectives, thenext section will explain the Design Science Research(DSR)-based approach and comprehensive literaturesurvey employed to develop the model.

3. METHODOLOGY

The primary technique used for this study is DesignScience Research (DSR). Design science “attempts tocreate things that serve human purposes” [21, p. 253].

3.1 Design Science Research overview

Design science has been defined as creating and evaluatingartefacts “intended to solve identified organisationalproblems” [22, p. 77]. This emphasises the two basicactivities of building and evaluating artefacts [21, p. 254].As a problem-solving paradigm [22, p. 76], design scienceis well suited for providing a solution to the problempresented in the previous section. More specifically, thisresearch answers the DSR question: “ ‘What utility doesthe new artifact provide?’ ” [22, p. 91].

The format of DSR products (constructs, models, methodsand instantiations) [22, p. 77–78] suited this researchparticularly well. CSIRT constructs (as the vocabularyof the domain) are combined to form a model (a

Table 1: Design science research process [24, pp. 52–56]Activity Description Paper

1 Define theproblem

Identify the specific researchproblem and why a solutionis needed.

§2

2 List solutionobjectives

Specify how a new artefactsupports a solution or howan existing one will be animprovement.

§2

3 Design anddevelop

Create an artefact (construct,model, method or instantia-tion)

§4–6

4 Demonstrate Solve the problem in asuitable context.

N/A

5 Evaluate Observe and measure theeffectiveness and efficiencyof the artefact.

N/A

6 Communicate Publish the results. + other

representation of the solution space to a problem)describing the task of establishing a CSIRT and expressingthe relationships between the constructs [21, pp. 253, 256].

A DSR process was used to create an effective artefact,a model in this instance, through the application ofknowledge [21, p. 253]. This knowledge was obtained viaa comprehensive study of the relevant literature utilising aconcept matrix for categorisation [23]. This process andthe literature study are presented in the following sections.

3.2 A process for DSR

A process for executing and presenting design scienceresearch has been developed by [24]. The process consistsof the six activities listed and described in table 1.

In section 2 the problem was defined and solutionobjectives were listed. The rest of this section willpresent the conceptual framework for the actual modeldevelopment, describe the literature sources, and finallyreveal a concept matrix of the literature. The remainingsections are therefore focused on activity three - thedevelopment of the management model - as a DSR artefact.

The demonstration and evaluation steps have beenidentified as not feasible for this paper due to space andtime constraints. The current paper therefore only reportson the first three steps of the DSR process.

3.3 Conceptual framework for the management model

Considering that a CSIRT can be seen as a team providingspecialised IT services to a defined constituency [25], theIT Infrastructure Library (ITIL R©) as a framework for ITservice management, is certainly applicable.

People, Processes, Products and Partners: According toITIL, IT service management should include preparing andplanning “the effective and efficient use of the four Ps”


CSIRT

Constituency

Services People

Products

Partners

Processes

Figure 1: ITIL’s four Ps, the CSIRT and the constituency

[26, p. 40]. The four Ps, based on ITIL’s descriptions andlocalised to the CSIRT environment, include

• people — the staff and management of the CSIRT;

• processes — CSIRT policies and procedures;

• products — customer-facing services, technologiesand tools; and

• partners — internal (those present in the same hostorganisation/ business e.g. human resources orpublic relations) and external (vendors, suppliers,other CSIRT teams, media, law enforcement, serviceproviders, etc.).

To complete the picture, the constituency, or customerbase [16], of the CSIRT (as the consumer of the servicesprovided) must be included.

How the CSIRT (represented through the four Ps) and theconstituency fit together is shown in fig. 1. Partners mayform part of the CSIRT depending on whether or not theyassist directly with provisioning the CSIRT services (e.g asexpert consultants, help desk staff, etc.).

ITIL’s management framework: ITIL advises that the fourPs need to be aligned with the business and thereforeproposes “five areas that need to be considered with regardto the design of a management architecture” [26, p. 61].These five areas are the

1. business requirements (objectives within the organi-sation),

2. people (including roles and activities),

3. processes and procedures,

4. management tools, and

5. technology.

The business requirements were noticeably missing fromthe four Ps as an important initial consideration for

Business requirements

Ser

vice

s Partners

People

Policies and Processes

Tools and Technologies

Figure 2: Framework for this research

establishing a CSIRT. Services (as the basis of ITIL’sframework) are also needed. Lastly, to complete the “fourPs”, partners are included.

Therefore, in order to provide structure to the CSIRTestablishment model, the framework was adapted toinclude the CSIRT Partners and Services as shown in fig. 2.These two areas are developed iteratively as we progressthrough the other requirement areas and therefore theyencompass the non-business requirements.

In the following section the approach used for the literaturesurvey is presented, as a means of extracting the relevantinformation from the knowledge base.

3.4 Primary literature sources

A comprehensive search was performed with the objectiveof determining the primary sources of literature relevant toCSIRT establishment. The search concepts were thereforeselected as

1. CSIRT or CERT (“incident response team”, “com-puter emergency response/readiness team”) and

2. establish (create, start, found).

Searches were conducted on ScienceDirect, Scopus,SpringerLink and IEEEXplore using these terms. Toensure that no important publications were missed, GoogleScholar was searched using the terms CSIRT OR CERTOR “Incident Response Team” OR “Computer EmergencyResponse Team” establish. To reduce the volumeof returned publications, the results were filtered forrelevance based on the title followed by the abstract orsummary text.


Then, as recommended by [23], the processes of goingbackwards and forwards through citations was utilised toensure that the final set was as complete as possible. Whenno significant new concepts emerged, the primary sourceof each concept was identified and the search terminated[23, p. xvi].

Reading through the material revealed a pattern ofauthoritative sources from respected authors. It wasfound that the main institutions affiliated with thesesources include the Software Engineering Institute ofCarnegie Mellon University (CMU-SEI) [15, 19, 25, 27](who established the first CERT in 1988∗) and theEuropean Union Agency for Network and InformationSecurity (ENISA) [16, 28]. Other contributors include theNational Institute of Standards and Technology (NIST)[29]; the Australian Computer Emergency ResponseTeam (AusCERT) [20]; the University of Auckland(AUCK) with Sun Microsystems (SUN) [30]; and theSysAdmin, Audit, Networking and Security (SANS)Institute [31]. Finally, the Portuguese National (CERT.PT)and academic – Fundacao para a Computacao CientıficaNacional (FCCN) – CSIRTs contributed to the technicalrequirements [32].

The most relevant publications from these institutions wereselected as the primary sources for this study∗∗.

3.5 Summary of the primary sources

This section summarises these primary sources groupedby institution and ordered by the number of sources fromeach.

Handbook for Computer Security Incident ResponseTeams (CSIRTs) [25] (CMU-SEI): Providing guidanceon building and running a CSIRT, this handbook hasa particular focus on the incident handling service [25,p. xv]. In addition, a basic CSIRT framework is providedcovering the mission, constituency, organisational placingand relationships of the CSIRT to other teams. Detaileddescriptions of CSIRT services, policies and teamoperations (including staffing issues) are supplied.

Organizational models for Computer Security Incident Re-sponse Teams (CSIRTs) [15] (CMU-SEI): This handbookprovides guidance on selecting the correct model for anorganisation’s incident response capabilities. The primaryfocus is on the organisational model and operationalstructure of the team. Common CSIRT models withtheir attributes, respective advantages and disadvantages,as well as typical service offerings are discussed.

State of the practice of Computer Security Incident∗see http://www.cert.org/about/

∗∗Although some of these references appear dated, recent publicationsconfirm that there have been no relevant updates [33, p. 4] and veryfew related publications in general since 2009 [1]. This is partly dueto the fact that CSIRTs have been evolving since 1988 and are mature indeveloped countries [34]. The foundations of these publications are stillapplicable though and provide good academic value particularly due tothe authoritative nature of the sources.

Response Teams (CSIRTs) [27] (CMU-SEI): A compre-hensive survey forms the basis of this technical reportintended to present the status quo of CSIRTs across theglobe [27, p. xii]. A summary of what CSIRTs require inorder to be effective is provided. This is complemented bya literature review, which includes a basic framework ofareas and factors to consider when developing an incidentresponse capability [27, p. 84]. This information is usefulto both new and existing CSIRTs.

Defining incident management processes for CSIRTs: Awork in progress [19] (CMU-SEI): This report takes aprocess-centric approach to identifying the resources androles required for incident management. The processdefinitions are accompanied by workflow diagrams anddescriptions. The resulting process maps provide abest-practice model outlining the requirements for asuccessful incident management capability in terms of theprimary functions and tasks [19, p. 8].

A step-by-step approach on how to set up a CSIRT [16](ENISA): This document, provided by the European UnionAgency for Network and Information Security (ENISA),covers business management, processes and technicalaspects of CSIRT establishment. It provides a definitionof a CSIRT, a description of services that can be providedas well as a process to follow for getting started [16, p. 4].

Good practice guide for incident management [28](ENISA): Also by ENISA, this more recent guideprovides practical information and good practices formanaging network and information security incidents.This handbook is especially useful to developing CSIRTsin the establishment phase as it contains guidelines onstructuring incident management and, in particular, theincident handling service [28, p. 4].

Computer security incident handling guide [29] (NIST):This guide from the National Institute of Standardsand Technology (NIST) provides recommendations forestablishing a successful incident response capability.Incident handling in general is also featured with theprimary focus on detecting, analysing, prioritising andhandling incidents.

Forming an incident response team [20] (AusCERT):Based on the experience of building the AustralianSecurity Emergency Response Team (SERT), this paperlooks at what it takes to form and maintain an incidentresponse team. Topics include the constituency, policies,information, equipment and tools, as well as partnerrelationships and interactions [20, p. 1].

Expectations for computer security incident response [30](AUCK, SUN): This best current practice Request forComments (RFC) provides a general framework of whatcan reasonably be expected of a CSIRT and the importantsubjects that are of concern to the community. A templatefor CSIRTs is provided as an aid for implementing andcommunicating the recommendations. Although quitedated, it is still used as the basis for defining many CSIRTs


Table 2: Literature concept matrixBusiness People Policies & Services Tools & Partners

requirements Processes Technologies

ENISA [28] ��

CMU-SEI [27] ��

CMU-SEI [25] ��

CMU-SEI [15] ��

ENISA [16] ��

CMU-SEI [19] � ��

AusCERT [20] ��

NIST [29] ��

AUCK, SUN [30] ��

SANS Institute [31] ��

CERT.PT, FCCN [32] ��

�= minimal information or reference only; ��= useful information; ��= primary source

as evidenced by googling cert OR csirt rfc 2350∗∗∗.

Computer security incident handling [31] (SANS Insti-tute): This step-by-step publication presents an “actionplan for dealing with intrusions, cyber-theft, and othersecurity-related events” [31, p. i]. It reflects the experienceof incident handlers from over 50 commercial, governmentand educational organisations [31, p. iii] and is specificallyintended to provide a starting point for incident handlingprocedures. An “emergency action card” is provided fororganisations that are not prepared for when an IT securityincident occurs [31, p. x].

Technical infrastructure of a CSIRT [32] (CERT.PT /FCCN): The goal of this paper is to provide a guide forbuilding the technical infrastructure required by a CSIRT,with an emphasis on the necessary tools, equipmentand mechanisms. The technical infrastructure of thePortuguese NREN CSIRT is used as an operationalexample.

It is important to observe that although there are only afew primary sources here, the volume of material is quitelarge as these articles and reports range from 6 [32] to291 [27] pages with an average of 123 pages betweenthem (total 1351 pages). As a result, a method to makethe data more coherent and useful was required. Theframework conferred in section 3.3 was used to organisethe literature and produce a concept matrix. This conceptmatrix is presented, together with a brief summary of thecontribution areas for each source, in the next section.

∗∗∗Examples of the use of RFC2350 include: http://www.ren-isac.net/csirt/ and https://www.cert.at/about/rfc2350/rfc2350_en.html

3.6 Concept matrix

The previous two sections were brought together throughthe use of a concept matrix to synthesise the literature(as recommended by [23]). The concepts (or topics)emerged while reading the literature as natural groupingsof CSIRT requirements and matched up well with the ITILframework. This resulting matrix is shown in table 2.

The number of ticks in the table shows the perceivedstrength/influence of the resource, that is, how muchit has to say on a topic as reflected in the followingsections. Primary sources for the respective requirementareas are shown using triple ticks (��). More detailedconcept matrices were developed for each area but theseare excluded as they are not core to this research.

As seen in the table, [19], with their process-centric focus,emerged as a primary reference for people and processes.The ENISA guides provided the strongest contributionsto the business requirements and also provided significantinputs to policies and processes. Respectively, people[16], and tools and technologies [28], also came outstrong. The focus on organisational models is apparentwith [15] shown as a primary source for the people-relatedrequirements. This is complemented by detailed servicedescriptions, including applicable and appropriate servicesfor each model, and hence the emergence of [15] as aprimary reference for services. The strengths are broaderfor [27], as can be expected from a state of the practicesurvey and general discussion on CSIRTs. The handbookfrom [25] is similarly diverse with strengths in multiplecategories. Although broad strengths would be expectedfrom [20] (considering the title and topics of the paper) itappears to be stronger in policies and processes than theother categories. Lastly, [32] is clearly focused on toolsand technologies, in line with the goal of the paper.


4. BACKGROUND

In order to commence the management model definition,some constructs needed to be defined. This sectiontherefore introduces the CSIRT business requirements andservices.

4.1 Business requirements

As the logical entry point for developing a CSIRTmanagement model, the primary business requirements(or organisational inputs) that need to be consideredwhen establishing a CSIRT are described in this section.These business requirements, i.e. the objectives withinthe organisation [26, p. 61], are those requirements anddecisions needed prior to building a CSIRT.

The following sub-areas of business requirements havebeen identified [35]:

• environment,

• constituency,

• authority,

• funding, and

• legal considerations.

These are discussed individually in the remainder of thissection.

Environment: Determining the environment is the first stepin establishing a CSIRT. The CSIRT environment can bedefined by the sector which will be served by the CSIRT,the geographic region of operations and the organisationalstructure of the host institution (if applicable) [29, p. 47].To determine the environment, it was argued that thefollowing questions need to be answered [35]:

1. What type of CSIRT?The choices for type and/or sector include [15,p. 3]: national, academic, CIP/CIIP, government,military, SME, commercial, internal, vendor andother CSIRTs. It is important to understand wherethe CSIRT fits into the national hierarchy and identifythe sector early on so that contact can be made withpotential partners and coordinators.Answering this question alludes to suitable fundingmodels. A commercial CSIRT typically charges forCSIRT services utilising membership fees and/or paidfor services, while an internal CSIRT is typicallyfunded by the host organisation. In addition, differenttypes of CSIRTs will have different service priorities:e.g. a banking sector CSIRT will be more concernedwith protecting credit card information and onlinebanking security, whereas an academic sector CSIRTwill be more concerned with protecting studentrecords and intellectual property theft.

2. What geographic area will be covered by the CSIRT?

A global or regional CSIRT has very differentimplications on the constituency and services, forexample, than an internal CSIRT for a singlesite. These include time-zones (hours of operation),languages, viable services and other issues. Inaddition, a CSIRT can span single or multiple cities,provinces or even countries. The geographic area hasa significant influence on the team model selection.

3. Which organisational model(s) will the CSIRT use?

The following four organisational models werehighlighted in [35] as the primary choices:

• independent,

• embedded,

• campus, and

• voluntary.

Answering these questions provides a mission for theCSIRT and subsequently reveals the constituency. Theenvironment additionally scopes the legal considerationsby revealing applicable laws and regulations. Theanswers also partly determine the team model and servicesprovided by the CSIRT. Finally, the mission and goals ofthe CSIRT are important inputs to policies and processes.More information on the environmental considerations anddecisions is available in [35] .

Constituency: The constituency is the customer baseof the CSIRT, that is, the people and organisationsreceiving CSIRT services. Following the definition ofthe environment, the constituency is usually defined byIP address range, domain name, autonomous systemnumber(s), and/or free text [28, p. 14].

Together with the environment, the constituency influencesthe services that will be provided by the CSIRT. Theirneeds, skills, expertise, etc. all have an effect. Theconstituency also provides insight into the type of systemsand network(s) the CSIRT needs to support as well aspossible funding models.

The team model additionally depends on the accessibilityof skilled experts from the constituency, and/or partners,who can extend the team. Furthermore, requiredcommunication mechanisms (tools and technologies) aredetermined by the needs of the constituency.

Authority: The nature of the constituency determines thetype of authority which the CSIRT may have and exercise.Once the type of authority — full, shared, indirect or none— has been determined, it needs to be communicatedback to the constituency. CSIRT management staff mustsupport the authority relationship. In addition, authorityaffects the services which can be provided by the CSIRT.For example, it is not possible to provision some CSIRTservices, e.g. incident tracing and intrusion detection,without some level of authority [20, p. 9].


Table 3: CSIRT services (adapted from [28, p. 26])

Reactive services

Alerts and warningsIncident handlingVulnerability handlingArtefact handling

Proactive services

AnnouncementsTechnology watchSecurity audits or assessmentsConfiguration and maintenance of security tools, applicationsand infrastructureDevelopment of security toolsIntrusion detection servicesSecurity-related information dissemination

Security qualitymanagement services

Risk analysisBusiness continuity and disaster recovery planningSecurity consultingAwareness buildingEducation and trainingProduct evaluation or certification

determinesscopes(applicable laws)

Funding / Budget

reveals

revenuemodel

Legal

Environment

Authority

Constituency

Figure 3: Relationships between the business requirements

Funding: Equipping a CSIRT requires resources — bothpeople and technical infrastructure. Equipment, salariesand other operational expenses need to be budgeted forwhen establishing a CSIRT. Revenue can come fromexisting resources, membership fees, project subsidyand/or per-use charge for services rendered [16, p. 19]. Ithas been illustrated that a voluntary approach is possiblewithout funding but this can have serious consequences onthe performance and effectiveness of the CSIRT [35].

Clearly, income relies on the environment as well as thenature of the constituency (e.g. whether they would bewilling to pay for CSIRT services). The main contributorsto CSIRT expenses are the hours of operation and staffsalaries [16, p. 18]. Conversely, the available budgetdetermines how many people can be employed (andat which skill level). Thus, the people (team modeland staffing) decisions go hand-in-hand with funding,determining the primary costs. Funding also influences theequipment that can be obtained by the CSIRT.

Legal considerations: The CSIRT should be sensitive torelevant local laws and regulations as well as at least anawareness of laws in other countries [28]. The specificlaws relevant to the CSIRT will depend on the CSIRT

environment and could include statutory and common orcase laws [27, p. 112]. A global CSIRT, for example,requires a different understanding of laws and regulationsthan a CSIRT operating in a single country (though atleast an awareness of the most important laws affectingexternal partners is required). Legal council and lawenforcement partners can advise on compliance to theselaws or regulations. More information on legal issuesrelated to CSIRTs is available in [25, p. 51-58] and [20,pp. 11–12].

4.2 Relationships between the business requirementareas

The relationships uncovered in this section constrained tothe sub-areas of the business requirements are depictedin fig. 3. (The remaining relationships are included insections 5 and 6.)

Only once these business requirements have beenadequately addressed, should one proceed to the nextsteps in establishing a CSIRT. These include evaluatingthe CSIRT services, people requirements and potentialpartners for the CSIRT.

4.3 Services

CSIRT services are classified in three broad categories,namely reactive, proactive or security quality managementservices [15]. Reactive services, as the primary CSIRTactivity, involve actions taken to resolve or mitigateincidents as they occur [19]. They are triggered by eventsor requests requiring a “reaction” and thereby initiating theservice process [15, p. 13].

Proactive services are aimed at preventing incidents fromoccurring in the first place through securing systems,providing training and education, monitoring and sharing


information [19]. That is, the execution of proactiveservices is intended to directly reduce the number of futureIT security incidents by providing related announcementsand information for preparing, protecting and securingsystems [15, p. 14].

Security quality management services, meeting widerorganisational security needs, only indirectly relate toincident handling and may be provided by the CSIRTor another entity in the organisation (structure-dependent)[15, p. 14].

A commonly accepted list of CSIRT services is providedin table 3. To be effective, services should address a rangeof reactive and proactive issues [25]. Which of theseservices will be provided to the constituency as well asthe extent of provision, are important considerations whenestablishing a CSIRT [15, p. 13]. Most CSIRTs do notprovide all of these services but rather a subset based onthe type of CSIRT and the needs of the constituency [15,p. 14]. Detailed descriptions of these services are beyondthe scope of this paper but can be found in [25] and [15].

In the following sections the relationships betweenthe business requirements, services and four Ps arerevealed and combined to form a management model forestablishing a CSIRT. For clarity, the model has beendivided into two parts, the strategic view and the tacticalview.

5. MODEL DEVELOPMENT – STRATEGIC VIEW

The strategic view is centred around the businessrequirements — those decisions that need to be made atthe “higher” level. This view marks the entry point of themanagement model, highlighting the initial decisions thatmust be made when establishing a CSIRT. These are

1. determining the environment,

2. deciding who the constituency of the CSIRT will be,and

3. selecting a suitable funding model.

Only once these decisions are made can the rest ofthe requirements be met. The connections between thebusiness requirements and the relevant Ps are revealed inthis section.

5.1 Business requirements and Services relationship

It is evident that the provision of services is dependent onthe constituency as well as financial and human resources.Clearly, before services can be defined, an understandingof the needs and expectations of the constituency, as wellas the budget, are required. A CSIRT should aim to providea realistic, high-quality service portfolio by selecting thecore services and developing them as the need arises andresources allow [15]. A recommended list of core servicesfor each type of CSIRT as well as guidelines for extendingservice offerings is provided by [15].

5.2 Business requirements and Partners relationship

Partners are required for collaboration in resolvingincidents and sharing information. Access to partnersvaries greatly based on the CSIRT environment. If theCSIRT is hosted within an established parent organisationthere may be multiple internal partners that can assist withCSIRT activities (e.g., legal council, human resources,marketing and finances). An independent CSIRT, on theother hand, may have to establish many more internalcapabilities. This can have a significant effect on staffing(see section 6.4). Required external partners are moredependent on the team model and services provided bythe CSIRT, as shown in the following sections. Finally,as mentioned in section 4.1, partners can advise on therelevant legal considerations for the CSIRT.

5.3 Business requirements and People relationship

With respect to people, the two main areas forconsideration are the team model and staff. The primaryinputs to these areas are the environment and funding. Inaddition, it will be shown that access to skilled expertsfrom the constituency can affect required staffing.

The team model is closely linked to the organisationalstructure and can comprise one or more of thefollowing: central, distributed, coordinating or combined[15]. Staffing decisions, working hours and ultimatelythe number of required staff members, are basedon the business requirements (environment, funding,constituency), the desired services and even partners(accessible external experts) as shown in the previoussections. The budget and staffing go directly hand-in-handas staff salaries are a significant expense. Funding alsoneeds to be adaptable to changing staff requirements e.g.workload and services demand [20].

5.4 Summary of the strategic view of the managementmodel

The strategic view is shown in fig. 4. Authority andlegal considerations have been included in this view asthey can be directly determined from the definitions of theenvironment and constituency. They complete the businessrequirements and also have no direct relationship to therequirements of the four Ps in this model. For clarity,the effects of funding on the other areas of requirements,namely, services, people, and tools and technologies, havebeen included in the strategic view as the material resourcefor equipping the CSIRT. Besides the influence these otherareas have on determining the budget, including them alsosimplifies the tactical view which is presented next.

6. MODEL DEVELOPMENT – TACTICAL VIEW

The remaining relationships, those between the four Ps,together with the environment, constituency and services,comprise the tactical view of the management model. Thispresents the “how” part of establishing a CSIRT and thesecond level of considerations following the strategic view.


determines

Partners Authorityadvise on

Services

Environment Constituency

People

Funding / Budget

revenue model

salaries,skills, numbersresources

revenue model

Tools & Technologies

equipmenthours of

operation

Legal

applicable

Figure 4: Strategic view of the management model

In the following subsections, the details of the relationshipbetween the two areas indicated is provided, prior tosummarising the connections in the final paragraph of eachsubsection.

6.1 People and Services relationship

CSIRT staff provide defined services to the constituency(the customers of these services) [30]. The size, availableexpertise and skills of the team [32] (i.e. human resources)influence the quantity, type, depth and quality of servicesthat can be provided [19].

These services in turn partly determine the technical staffrequirements [16] including specialised skills [15, p. 14].Clearly, without sufficient resources (staff and funding),services cannot be provided [15].

Therefore, People affect the quantity and type of Servicesthat can be provided. The selected Services will, on theother hand, dictate staff requirements.

6.2 People and Policies & Processes relationship

Policies and processes are developed and reviewed fromtime-to-time by people [20]. In addition, policiesand processes should be published, communicated andunderstood by both the CSIRT and the constituency [20,25, 29].

Staff follow processes [28] in order to provide consistentservices to the constituency [19], aligned with expectationsset forth in policies [16]. This contributes to building trustbetween the constituency and the CSIRT [20].

Policies and their accompanying processes describe theassociated roles, responsibility and accountability of thevarious participants [19, 25, 29]; covering the “who”

aspects of various CSIRT services [36] and therebyformalising the team [15].

Thus, People develop Processes which formalise the team(including roles and responsibilities).

6.3 People and Tools & Technologies relationship

CSIRT staff identify, organise (e.g. configure andmaintain) and use technical resources to provide securityincident response services [16, 19]. This includes thedevelopment and testing of incident handling tools [19]and the receiving of incident reports from the constituency(via phone, fax, email and/or web forms) [16].

Tools and technologies simplify the work done by incidenthandlers by automating tasks and subsequently reducingthe staff load [20] and risk of human error [25]. Thesemechanisms can include the pre-processing of information[20] as well as selecting interesting events from logs andsecurity software for human review [29]. Automatedtools can help address the vulnerability of CSIRT staff formaking mistakes due to the high-stress situations and theassociated responsibility of the work they do [25, p. 150].Furthermore, tracking and ticketing systems facilitate thehandover of incident reports to other staff [27].

Hence, People identify and develop Tools and Technolo-gies which in turn influence the required staff numbers.

6.4 People and Partners relationship

People identify and interact with partners in several waysincluding escalating to them when applicable/appropriate(e.g. law enforcement, other CSIRTs) [28]. Partners alsoassist CSIRT staff. For example, human resources can dealwith people issues [27] and specialists can provide legaladvice [19].


Relationships of trust are required with both internal andexternal partners [29] to facilitate the exchange of sensitiveinformation [20]. These contacts and relationships shouldideally be identified and developed in advance of theirpossible requirement [25, p. 51].

CSIRT staff may be required to “coordinate responseactivities with internal departments and externally withother CSIRTs, law enforcement agencies, and securityexperts” [27, p. 104]. These partners can also assist intimes of crisis [25, p. 51] by relieving the workload and/orproviding specialist skills [29], thereby complementingexisting staff and addressing deficiencies [25, 30].

Supplementary staff can be comprised of people fromthe host organisation (internal partners), external partners(including service providers) and even the constituency[25, 27]. Internal partners such as technical writers, publicaffairs officers as well as legal and human resourcesconsultants can support the work of CSIRT staff [15].Moreover, the organisational help desk can possibly beused as the initial point of contact for incident reporting[31].

In summary, People identify and coordinate Partners whoconversely supplement and advise staff.

6.5 Policies & Processes and Services relationship

Policies and processes include an advertised description ofservices [15]. Policy dictates which services take prioritywhen resources are limited (e.g. incident handling takesprecedence over technology watch or awareness building)and can include the relationships of processes to services(e.g. media interaction is related to incident handling) [25].

Additionally, successful services are based on appropriatepolicies and procedures [25]; workflows, for instance,improve the quality and efficiency of services [16]. Thusprocesses, by their very nature, improve service response.

Therefore, Services are based on Policies and Processeswhich describe said Services.†

6.6 Policies & Processes and Tools & Technologiesrelationship

Policy requires visibility in order to be effective [36].Technologies, particularly communication mechanisms,make it possible to market and distribute policies andprocesses to the constituency and other parties the CSIRTmay interact with [20]. Tools and technologies are alsoused to complement processes, for example by automation.

Policies and processes also guide the development oftools and technologies. These include: the design andconfiguration of automated tools [19], system managementand backup strategies [20], cryptographic key management

†Service selection typically occurs first. That is, services are selectedand then the policies and processes are written, providing the servicedetail and/or description of how it will be provided.

[25], and logging systems [29]. Security policies andbest current practices should be followed by CSIRT staffwhen establishing and managing resources, equipment andinfrastructure [19].

It is observed therefore that Processes guide thedevelopment of Tools and Technologies which aresubsequently used to communicate the Policies andProcesses to the constituency and partners.

6.7 Policies & Processes and Partners relationship

CSIRT preparation should include policies and processes(or procedures) for communicating with outside parties(both internal and external partners) [29]. Policiesand processes specify cooperation with external partiesand include associated expected levels of service[25]. Information sharing, non-disclosure and reportingagreements may be needed for partner interactions [29].Peering agreements can be established with partners [30]who can even report events that initiate the incidentresponse process (i.e. trigger services) [16]. Policiesand Processes thus define the relationship (cooperationagreements, service levels and reporting requirements)with Partners.

The public affairs office, legal department and manage-ment should be consulted by the team when establishingpolicies and processes for information sharing [29].Public/media relations staff assist in the development ofinformation disclosure and crisis communication policiesand processes [27, 28]. In addition, human resourcesstaff are typically involved in disciplinary action takenon employees and the development of associated policiesand processes [19, 27]. Strengthening this link, thelegal considerations from the business requirements shouldbe included in relevant policies and processes with theassistance of legal experts [27, p. 113]. Finally, policiesshould be shared with other CSIRTs for input based ontheir experiences [20].

Thus, Policies and Processes define the relationshipwith Partners who in turn are consulted for input whendeveloping Policies and Processes.

6.8 Services and Tools & Technologies relationship

While most of the services described in section 4.3utilise technologies in some form, some direct associationshave been identified. These include the obvioustechnology-related services (e.g. technology watch,configuration and development of security tools, etc.), aswell as those described here.

A ticketing system is the preferred mechanism for trackingincidents and announcements [16, 25]. This system caninclude email/web integration for automatic reporting [28],information collection and event correlation [25]. Adatabase is additionally useful for storing incident-relateddata [15, 32].


Incident reports (as a trigger to the incident handlingservice) are received from the constituency and partnersvia phone, fax, email and web forms [16]. Similarly, thesetechnologies are used to provide incident response and co-ordination services [27]. Communication technologies arefurther required for security-related education, awarenessand technology watch services [15, 29].

Hence, Services determine the Tools and Technologieswhich are required for providing said services.

6.9 Partners and Tools & Technologies relationship

Technologies are used to coordinate and aid interactionswith partners (including other teams) and outside parties(including victims) [16]. These include communicationmechanisms (web forms, email, etc.) [29], data sharingtools [19] and unique ticket numbers issued by trackingsystems [30]. Cooperation and coordination is facilitatedby trusted communication and information distributionchannels [19, 28].

Partners in turn can assist with providing or recommendingCSIRT tools and technologies via mechanisms such asthe ENISA Clearinghouse for Incident Handling Tools(CHIHT)††. They can also be approached to assist withanalysis and testing if the CSIRT does not have accessto the required equipment or technologies [15] therebyaddressing CSIRT technical deficiencies.

So we see that Tools and Technologies enable Partner in-teractions and coordination, while Partners can supplementTools and Technologies.

6.10 Partners and Services relationship

Partners may participate in or collaborate in providingservices. To illustrate, external experts can be calledupon for assistance during incident response to providespecialised platform or operating system support [15].Sharing the analysis process with other expert groupsis desirable in order to learn from them (gain fromtheir knowledge and mutually enhance human capitaldevelopment) [25].

The selected CSIRT services influence required partner-ships [25]. Firstly, service functions can be triggered bythe CSIRT itself, constituents or other parties (includingpartners) [25]. Secondly, collaboration with trustedpartners can expedite the response process, therebyenhancing service delivery [29]. Finally, incident responsecoordination involves the facilitation of interactionsbetween involved parties [15].

Therefore, Partners provide skills and expertise toServices, while the choice of Services influences therequired Partner relationships.

††https://www.enisa.europa.eu/activities/cert/support/chiht

6.11 Summary of the tactical view of the managementmodel

The tactical view, showing the second level of relation-ships, can be seen in fig. 5. This view is used once theinitial decisions of the strategic view have been considered.Furthermore, it is well aligned with the observation thatat a tactical level, security is a combination of people,processes and technologies [37].

As an entry point, the nature of the environment and theconstituency is used for inputs to the people, processes,services, and tools and technologies areas. A number ofinter-relationships can be observed between the remainingareas, revealing that there is no specific order to the tacticalrequirements, but rather that they need to be considerediteratively and in parallel. For example, as the team modeland staffing (people) decisions are made, their effect onservices, policies and processes, tools and technologies, aswell as partners needs to be considered. The inverse alsoapplies: e.g. as services are defined and developed, theirimpact on the staffing requirements needs to be recognised.

Therefore, to read the management model, each area ishighlighted and as the decisions for that area are evaluated,the inputs from the related areas according to the indicatedrelationships need to be considered. When selectingor developing tools and technologies for example, thefollowing inputs must be considered:

• relevant policies and processes for development needto be consulted (or established);

• team members will perform the actual identificationand development of these tools and technologies;

• the selected services as well as the needs ofthe constituency will determine which tools andtechnologies are required; and lastly,

• partners can supplement tools and technologies bynegating the need for re-development and/or assistingwith customisation.

When establishing a CSIRT, both the strategic and tacticalviews must be considered together to complete the model.

7. CONCLUSION

This paper combined the requirements and relationshipsuncovered from authoritative CSIRT literature to form amanagement model for establishing a CSIRT. The initialsteps in the design science research process, i.e. definingthe problem, listing the solution objectives and designingand developing a solution artefact, were executed for thisresearch.

The problem of having no consistent method or processto follow when establishing a CSIRT was addressedby combining the literature findings to form a coherentmodel. A comprehensive literature survey and concept


needsdetermine

skilledexperts

needs +expectations

influence ConstituencyEnvironment

mission,goalsinfluences

reveals

helpsdetermine

supplement & advise

identify &coordinate

input & help develop

definerelationship

developformalise team

determinerequired for

identify &develop

influencenumbers

quantity& type

staffrequirements

People Policies & Processes

Partners

Services Tools & Technologies

guidedevelopment

communicate

skills &expertise

describe

based on

influencerequired

supplement

enableinteraction

Figure 5: Tactical view of the management model


matrix facilitated the identification of the areas ofCSIRT requirements and relationships between the areas.Progressing through and integrating the associations ledto the development of the complete management modelaligned with the ITIL management framework.

When applying the model, the business requirements needto be addressed first. These include determining: theenvironment, constituency, authority, funding and legalconsiderations applicable to the CSIRT. Following that, thetactical view can be followed to implement the CSIRT.This includes services and partner decisions togetherwith the people, policies and processes, and tools andtechnologies.

The management model is beneficial as a basis for anyonewanting to establish a CSIRT. It addresses the deficiency ofa standard model for CSIRT development, enabling one toknow where and how to start when building a CSIRT, andensuring that all important factors are considered. Utilisingthe model therefore enables an holistic view, providingdirection as well as reassurance that nothing is overlookedduring the establishment process.

Areas of future research include demonstration andevaluation of the model (possibly with case studies),as well as further elaboration on the decisions requiredfor each area of the model. This will facilitate furtherdevelopment of the management model into a method orprocess for CSIRT establishment [21, p. 257].

ACKNOWLEDGEMENT

The authors would like to thank the Department of Scienceand Technology of the Republic of South Africa formaking this research possible through the funding it hasprovided for the SANReN/NMMU research collaboration.

REFERENCES

[1] N. H. Ab Rahman and K.-K. R. Choo, “A survey ofinformation security incident handling in the cloud,”Computers & Security, vol. 49, pp. 45–69, 2015.

[2] I. Ellefsen and S. von Solms, “Implementing criticalinformation infrastructure protection structures indeveloping countries,” in Critical InfrastructureProtection VI, ser. IFIP Advances in Information andCommunication Technology, pp. 17–29. Springer,2012.

[3] Y. Wara and D. Singh, “A guide to establishing com-puter security incident response team (CSIRT) fornational research and education network (NREN),”African Journal of Computing & ICT, vol. 8, no. 2,pp. 1–8, 2015.

[4] D. Goodin. (2015, April) Just-released WordPress0day makes it easy to hijack millionsof websites [updated]. [Online]. Available:http://arstechnica.com/security/2015/04/27/just-released-wordpress-0day-makes-it-easy-to-hijack-millions-of-websites/

[5] B. Donohoe. (2015, April) VMware fixes Javainformation disclosure vulnerability. [Online].Available: https://threatpost.com/vmware-fixes-information-disclosure-vulnerability/112007

[6] H. Dalziel. (2014, March) Informationsecurity conferences of 2015. [Online].Available: https://www.concise-courses.com/security/conferences-of-2015/

[7] Black Hat. (2015) Welcome to Black Hat USA 2015.[Online]. Available: https://www.blackhat.com/us-15/

[8] S. Ranger. (2014) Inside the secret digital armsrace: Facing the threat of global cyberwar.[Online]. Available: http://www.techrepublic.com/article/inside-the-secret-digital-arms-race

[9] Editorial Board. (2015, April) Preparing for warfarein cyberspace. The New York Times. [Online].Available: http://www.nytimes.com/2015/04/28/opinion/preparing-for-warfare-in-cyberspace.html

[10] US-CERT. (2014, January) Alert (TA14-017A):UDP-based amplification attacks. [Online].Available: https://www.us-cert.gov/ncas/alerts/TA14-017A

[11] C. Glyer. (2014, April) Attackers exploit theHeartbleed OpenSSL vulnerability to circumventmulti-factor authentication on VPNs. [Online].Available: https://www.fireeye.com/blog/threat-research/2014/04/attackers-exploit-heartbleed-openssl-vulnerability.html

[12] J. T. Benett. (2014, September)Shellshock in the wild. [Online].Available: https://www.fireeye.com/blog/threat-research/2014/09/shellshock-in-the-wild.html

[13] S. Ragan. (2015, July) Hacking team hacked,attackers claim 400GB in dumped data.[Online]. Available: http://www.csoonline.com/article/2943968/data-breach/hacking-team-hacked-attackers-claim-400gb-in-dumped-data.html

[14] M. Williams. (2015, July) OPM hackersstole data on 21.5m people, including1.1m fingerprints. [Online]. Available:http://www.computerworld.com/article/2946031/cybercrime-hacking/opm-hackers-stole-data-on-215m-people-including-11m-fingerprints.html

[15] G. Killcrece, K.-P. Kossakowski, R. Ruefle, andM. Zajicek, “Organizational models for computer se-curity incident response teams (CSIRTs),” CarnegieMellon Software Engineering Institute, Tech. Rep.,December 2003.

[16] ENISA, “A step-by-step approach on how to set up aCSIRT,” ENISA, Tech. Rep., 2006.


[17] T. Proctor, “The development of cyber securitywarning, advice and report points,” in Secure ITSystems, pp. 61–72. Springer, 2012.

[18] I. Ellefsen and S. von Solms, “Thecommunity-orientated computer security, advisoryand warning team,” in IST-Africa 2010 ConferenceProceedings, 2010.

[19] C. Alberts, A. Dorofee, G. Killcrece, R. Ruefle,and M. Zajicek, “Defining incident managementprocesses for CSIRTs : A work in progress,”Carnegie Mellon University, Tech. Rep., October2004.

[20] D. Smith, “Forming an incident response team,”in Proceedings of the FIRST Annual Conference.University of Queensland Brisbane, Australia, 1994.

[21] S. T. March and G. F. Smith, “Design andnatural science research on information technology,”Decision Support Systems, vol. 15, no. 4, pp.251–266, December 1995.

[22] A. R. Hevner, S. T. March, J. Park, and S. Ram,“Design science in information systems research,”MIS Quarterly, vol. 28, no. 1, pp. 75–105, March2004.

[23] J. Webster and R. T. Watson, “Analyzing the past toprepare for the future: Writing a literature review,”MIS Quarterly, vol. 26, no. 2, pp. xiii–xxiii, June2002.

[24] K. Peffers, T. Tuunanen, M. A. Rothenberger, andS. Chatterjee, “A design science research method-ology for information systems research,” Journal ofManagement Information Systems, vol. 24, no. 4, pp.45–77, 2007.

[25] M. J. West-Brown, D. Stikvoort, K.-P. Kossakowski,G. Killcrece, R. Ruefle, and M. Zajicek, Handbookfor Computer Security Incident Response Teams(CSIRTs), 2nd ed. Carnegie Mellon SoftwareEngineering Institute, April 2003.

[26] L. Hunnebeck, ITIL R© Service Design. London: TheStationary Office (TSO), 2011.

[27] G. Killcrece, K.-P. Kossakowski, R. Ruefle, andM. Zajicek, “State of the practice of computer se-curity incident response teams (CSIRTs),” CarnegieMellon Software Engineering Institute, Tech. Rep.,October 2003.

[28] ENISA, “Good practice guide for incident manage-ment,” ENISA, Tech. Rep., 2010.

[29] P. Cichonski, T. Millar, T. Grance, and K. Scarfone,“Computer security incident handling guide,” NIST,Special Publication 800-61. Revision 2, August 2012.

[30] N. Brownlee and E. Guttman, “Expectations forcomputer security incident response,” RFC 2350(Best Current Practice), June 1998. [Online].Available: http://www.ietf.org/rfc/rfc2350.txt

[31] S. Northcutt, Computer Security Incident Handling,version 2. ed. SANS Press, 2003.

[32] D. Penedo, “Technical infrastructure of a CSIRT,”in Proceedings of the International Conference onInternet Surveillance and Protection, ser. ICISP ’06,pp. 27–32. IEEE Computer Society, 2006.

[33] C. Zimmerman, Ten Strategies of a World-ClassCybersecurity Operations Center. MITRE Corpo-ration, 2014.

[34] R. Ruefle, A. Dorofee, D. Mundie, A. D.Householder, M. Murray, and S. J. Perl, “Computersecurity incident response team development andevolution,” Security & Privacy, IEEE, vol. 12, no. 5,pp. 16–26, 2014.

[35] R. Mooi and R. Botha, “Prerequisites for buildinga computer security incident response capability,” inInformation Security for South Africa (ISSA), 2015,pp. 1–8, August 2015.

[36] B. Guttman and E. A. Roback, An Introduction toComputer Security: The NIST Handbook, no. 800.NIST, 1995.

[37] B. Schneier, “The future of incident response,”Security & Privacy, IEEE, vol. 12, no. 5, pp. 96–96,2014.


Based on: “Playing Hide-and-Seek: Detecting the Manipulation of Android Timestamps”, by H. Pieterse, M.S. Olivier and R. Van Heerden which appeared in the Proceedings of Information Security South African (ISSA) 2015, Johannesburg, 12 & 13 August 2015. © 2015 IEEE

REFERENCE ARCHITECTURE FOR ANDROID APPLICATIONSTO SUPPORT THE DETECTION OF MANIPULATED EVIDENCE

H. Pieterse,∗† M.S. Olivier† and R.P. van Heerden‡

∗ Defence, Peace, Safety and Security, Council for Scientific and Industrial Research, Pretoria, SouthAfrica E-mail: [email protected]† Department of Computer Science, University of Pretoria, Pretoria, South Africa E-mail:[email protected]‡ Meraka Institute, Council for Scientific and Industrial Research, Pretoria, South Africa E-mail:[email protected]

Abstract: Traces found on Android smartphones form a significant part of digital investigations. Akey component of these traces is the date and time, often formed as timestamps. These timestampsallow the examiner to relate the traces found on Android smartphones to some real event that tookplace. This paper performs exploratory experiments that involve the manipulation of timestamps foundin SQLite databases on Android smartphones. Based on observations, specific heuristics are identifiedthat may allow for the identification of manipulated timestamps. To overcome the limitations of theseheuristics, a new reference architecture for Android applications is also introduced. The referencearchitecture provides examiners with a better understanding of Android applications as well as theassociated digital evidence. The results presented in the paper show that the suggested techniques toestablish the authenticity and accuracy of digital evidence are feasible.

Key words: Digital forensics, mobile forensics, smartphones, Android, timestamps, referencearchitecture.

1. INTRODUCTION

The past decade saw the rapid improvement of smartphonetechnology, allowing these devices to become very popularacross the globe. Their current prominence is directlyrelated to the provided capabilities and functionality,which nowadays closely resemble a personal computer.Bundled with a complete operating system, improvedconnectivity and communication functions, and the optionof adding additional third-party applications, smartphoneshave become powerful devices. The leading smartphoneoperating system (OS) of 2014 was Android [1], which hasbeen evolving in a remarkable way and continues to gainwidespread popularity. The current prevalence of Androidled this paper to focus only on this particular smartphoneOS.

The extensive and wide use of Android smartphones allowsthese devices to become a rich source of trace evidence[2]. All events occurring on Android smartphones generatetraces that form an important component of digitalinvestigations, especially when the user of the smartphoneis involved in criminal activities. The valuable information(such as contacts, text messages, call lists, website visitedor instant messages) contained in these traces can provide awell-defined snapshot of a user’s actions at a specific time.Besides providing a description of the event, traces foundon Android smartphones also often store the time anddate component in the form of a timestamp. Timestampsare integral to digital investigations since it provides theexaminer the opportunity to relate the traces found on theAndroid smartphones to some physical event that tookplace.

To conceal fraudulent activities, smartphone users can usecertain techniques to manipulate the timestamps of thetraces and change the associated events. These techniquesare referred to as Anti-forensics and are primarily used “tocompromise the availability or usefulness of evidence” [3].These techniques are applied by smartphone users in anattempt to either hide or change event logs, which resultsin the alteration of the timestamps associated with thoseevents.

It is thus important for examiners to be able to verifythe authenticity and accuracy of timestamps. Withoutsuch verification, the collected timestamps might beincorrect or inaccurate due to tampering and will leadthe examiner to make unreliable conclusions. Existingresearch shows few papers that attempt to offer a solutionregarding the verification of the authenticity and accuracyof timestamps. Verma et. al. [4] preserve date and timestamps by capturing the system generated modification,access, change and/or creation date and timestamps (MACDTS) values and storing it in a secure location such asa cloud server outside of the smartphone. The cloudsnapshot of the original MAC DTS values can be used toverify the authenticity of MAC DTS values of questionablefiles on the smartphone [4]. Govindaraj et. al. [5] designeda solution, called iSecureRing, which allows a jailbrokeniPhone to be secure and forensic ready by preservingthe timestamps. These timestamps are stored outside thedevice on a secure server or the cloud and can be usedduring security incidents [5]. Both solutions, however,require the installation of additional functionality on thesmartphone prior to seizing the device for investigation.There is, thus, no existing solution (to the best of the


authors’ knowledge) that allows for the verification oftimestamps collected from seized Android smartphones.

This paper performs exploratory experiments that involvethe manipulation of timestamps found in SQLite databaseson Android smartphones. While conducting theexperiments, the changes occurring on the Androidsmartphone are observed. Based on those observations,specific heuristics are identified that may indicate themanipulation of timestamps. The identified heuristics canbe categorised into two distinct groups. The first group ofheuristics are specific changes that occur on the Androidfile system and the second group refer to inconsistenciesin individual SQLite databases. These heuristics are,however, susceptible to external factors that can impacttheir availability. To further establish the accuracy andauthenticity of the timestamps as well as other forms ofdigital evidence, a new reference architecture for Androidapplications is also introduced. The purpose of thereference architecture is to provide the examiners witha better understanding of Android applications and howthe associated digital evidence originated. The immediatechallenges to address are the following: (a) effectivemanipulation of timestamps found in SQLite databaseson Android smartphones, (b) verifying the authenticity ofthese timestamps by using the identified heuristics and(c) introducing a newly designed reference architecturefor Android applications. The current paper providespreliminary evidence that, in terms of the challengesidentified, the suggested approach shows potential.

The remainder of the paper is structured as follows.Section 2 briefly describes the architecture of Androidand the internal structure of SQLite databases. Section3 presents the methodology followed to conduct theexploratory experiments and offers a descriptive summaryof the findings. The reference architecture for Androidapplications is introduced in Section 4. A short discussionand future developments of the research is presented inSection 5. The final conclusions are made in Section 6.

2. BACKGROUND

With the continuous growth in functionality of Androidsmartphones, increasing number of people make use ofthese devices during their daily activities. For the tracescollected by Android smartphones to be of use duringdigital investigations, a comprehensive understanding ofthe architecture of Android is required. An evaluation ofSQLite is also required, since most of the traces foundon Android smartphones are stored in SQLite databases.This section, therefore, provides a short introduction of theAndroid architecture and presents the internal structure ofSQLite databases.

2.1 Android’s Architecture

Android is a popular open source software architectureprovided by the Open Handset Alliance [8] that is currentlytargeting mobile devices, such as smartphones and tabletcomputers. The Android software architecture (see Figure

Figure 1: Architecture of the Android Operating Systems [6, 7]

1) is divided into five layers: Applications, ApplicationFramework, Libraries, Android Runtime and the LinuxKernel [9]. The uppermost layer, Applications, providesaccess to a set of core applications. The ApplicationFramework layer implements a software framework thatreassembles functions used by existing applications. Allavailable libraries are written in C/C++ and called througha Java interface. The Android runtime consists of aset of core libraries and a Dalvik virtual machine. Thebottommost layer is the Linux kernel, which allows forinteraction between the upper layers by means of devicedrivers [9, 10].

Until Android version 2.2 (Froyo), most Androidsmartphones used Yet Another Flash File System 2(YAFFS2) [11]. YAFFS2 was developed in 2004in response for larger sized NAND (Not-AND) flashdevices [12]. With the release of Android version2.3 (Gingerbread), the file system for Android devicesswitched from YAFFS2 to Fourth Extended (EXT4)file system [11]. YAFFS2 was developed with asingle-threaded design, which may cause bottlenecks indevices released with a multi-core chipset. The EXT4file system, which is one of the most used file systemsin Linux, does not have this limitation and can runsmoothly on multi-core devices. The disk space of theEXT4 file system is divided into logical blocks, whichreduce management overhead and improves throughput[13]. The key features of the EXT4 file system promote thedevelopment of advance applications and functionalities.

The architecture of Android regularly improves to supportmore improved applications. It is therefore necessary tocontinuously evaluate Android’s architecture and remainup to date with the current changes.


Table 1: User Data Stored in SQLite Databases

User Data SQLite Database Location TableCall History /data/data/com.sec.android.provider.logsprovider/databases/log.db logsMessages (SMS/MMS) /data/data/com.android.providers.telephony/databases/mmssms.db smsE-mails (Gmail) /data/data/com.google.android.gm/databases/mailstore.<[email protected]>.db messagesGoogle Hangouts /data/data/com.google.android.talk/databases/babel1.db messagesWhatsApp Messenger /data/data/com.whatsapp/databases/msgstore.db messages

2.2 SQLite Databases

SQLite is an open source software library that implementsa lightweight Structured Query Language (SQL) databaseengine for embedded use [14, 15]. The lightweight designof SQLite does not require a separate server and thusallows for the quick processing of stored data by readingand writing directly to a disk file [16]. The main databasefile, <database name>.db or <database name>.db3,consists out of a complete SQL structure that includestables, indices, triggers, and views [16]. To support theSQL structure, the main database file is divided into one ormore pages and each page share the same size [17]. Thefirst page of the main database file is called the header pageand is composed of the database header and the schematable. The database header stores structural informationand the schema table contain the table information ofthe database. The pages following the header page arestructured as B-trees and store the actual data [18].

During transactions, SQLite stores additional informationin a second file called either a rollback journal orwrite-ahead log (WAL) file [17]. The rollback journalis the default method of SQLite to implement an atomiccommit and rollback. Beginning with SQLite version3.7.0, the new WAL approach was introduced and allowedfor improved speed and concurrent execution. TheWAL approach preserves the original content in the maindatabase file and appends changes to a separate WAL file(<database name>.db-wal), which contains a header andzero or more WAL frames. Transferring the transactionsfrom the WAL file to the main database file is calleda “checkpoint”. When a checkpoint occurs the updatedor new pages in the WAL file are written to the maindatabase file. The checkpoint operation leaves the WALfile untouched, allowing the WAL file to be reused ratherthan deleted [19]. SQLite does a checkpoint automaticallywhen a file reaches a size of 1000 pages [20].

SQLite databases are a popular choice for data storage inAndroid applications [14]. An Android application, whichuses SQLite, separately includes the SQLite databasesand this allows for reduced external dependencies andminimized latency [21]. A lot of events taking placeon an Android smartphone generate valuable traces, forexample: call history, SMS/MMS messages, e-mails(Gmail) and instant messages generated by GoogleHangouts (previously Google Talk) as well as WhatsAppMessenger. A summary of the SQLite databases used tostore these traces, as well as the location of these databases

on an Android smartphone are provided in Table 1. Theexamples used throughout the remainder of this paperfocus on the SQLite database of the Messaging application.

3. DETECTION OF MANIPULATED TIMESTAMPS

Timestamps of traces found on Android smartphonesare integral to digital investigations, especially if theowner of the smartphone participates in criminal activities.Collected timestamps allow the examiner to relate thetraces to some physical event and, more importantly,establish a timeline depicting the chronological orderof events. Due to the importance of timestamps indigital investigations, smartphone users, or even maliciousapplications, can alter timestamps to compromise theintegrity of traces as evidence.

In order to detect manipulated timestamps, it is necessaryto understand the processes involved in the creation ofthe timestamps. Understanding these processes providethe required insight to manipulate timestamps found inSQLite databases. The exploratory experiments conductedthroughout this paper can be categorised into two groups.The first group of experiments observe normal operationof SQLite databases while the second group focuseson detecting changes occurring due to the manipulationof timestamps found in the SQLite databases. All ofthe experiments and observations were performed on aSamsung Galaxy S2, running Android version 4.1.2 (JellyBean). The experiments conducted are not limited to theSamsung Galaxy S2 and can be performed on any otherAndroid smartphone.

3.1 Observing SQLite Databases

To understand the underlying structure of SQLitedatabases and comprehend the operations involved in thecreation of timestamps, it is necessary to observe thesedatabases under normal conditions. Applications usingSQLite databases to store user-related data are located inthe /data/data/<application package name>/databases/directories on an Android smartphone [22]. Observingthe SQLite databases located in the /data directory is notpermitted by default and is only accessible by rooting theAndroid smartphone. The term rooting, which is similarto the Jailbreaking of an iPhone, is often perceived asa negative action [12]. Rooting an Android smartphonemerely means to escalate the current rights to root accessrights. Root access rights allow any user access to theroot directory (/) and provide the necessary permissions to


take root actions [12]. The technical process of rootingan Android smartphone is, however, beyond the scopeof this paper. The Samsung Galaxy S2, which is usedto observe the results of interacting with the SQLitedatabases, is already rooted and therefore provides accessto the required databases.

The purpose of observing SQLite databases is to identifyhow these databases react and function under normaloperations. The observations are made by monitoringthe directory containing the SQLite database whilesimultaneously interacting with the database. Interactionswith the SQLite databases occur by sending messages,such as text and instant messages, and making phonecalls. After conducting a set of 10 experiments usingthe rooted Samsung Galaxy S2 smartphone, the finalconclusion of the observations led to the identificationof several changes occurring as a direct result of theinteraction with the SQLite databases. Firstly, fromthe observations it is possible to infer that all ofthe data received during the interactions are stored inthe <database name>.db-wal file. The data are onlytransferred from the <database name>.db-wal file to the<database name>.db file when reaching the limit of 1000pages. Secondly, the timestamps associated with the dataare added as new entries, with a unique record identifier, atthe end of the database table.

A summary of the findings of observing the SQLitedatabases is the identification of the files that are alteredduring normal operations and the location where thetimestamps are stored in the tables. The insight gainedby observing the SQLite databases will assist with themanipulation of timestamps.

3.2 Manipulation of Individual Timestamps in SQLiteDatabases

Manipulation of timestamps found in SQLite databasesrequires access to the correct files. Observationsfrom the previous exploratory experiments identifiedthe correct files to be the <database name>.db and<database name>.db-wal files. Access to these filesrequires the enabling of the Universal Serial Bus (USB)debugging functionality [12]. Although the default settingfor USB debugging is “disabled”, going to Settings,selecting Developer options and touching the checkboxnext to USB debugging will turn on this feature. OnceUSB debugging is enabled, interaction with the rootdirectory can occur using the Android debug bridge (adb).Android Debug Bridge is a versatile command-line toolthat communicates with a connected Android smartphone[23]. To allow for complete access to the root directory,adb is restarted with the command, adb root, to gain rootpermission. Full access, with the necessary permissions,has been established and it is now possible to manipulatethe timestamps in SQLite databases.

Manipulation of timestamps proceeds through threeindividual phases: retrieve, manipulate, and return.The first phase retrieves the required SQLite database

files from the Android smartphone. Since the sqlite3command utility, which is required to interact withthe SQLite databases, does not come pre-installed onAndroid smartphones [24], the SQLite databases must betransferred to the local computer. The command, adbpull <remote><local>, copies the specified file fromthe Android smartphone to the local computer [23]. Itis necessary to repeat this command for both the maindatabase file, as well as the associated WAL file, whichcannot be edited directly. Retrieving both the maindatabase file and the associated WAL file ensures that allthe latest records are present. The list of commands toretrieve these files is shown below:

• adb pull /data/data/<application package name>/databases/<database name>.dbC:\<local folder>

• adb pull /data/data/<application package name>/databases<database name>.db-walC:\<local folder>

Manipulating the timestamps found in the copied SQLitedatabase files is performed during the second phase.A script is created to act as a malicious applicationand randomly manipulate timestamps within the SQLitedatabase. During the execution of the script the maindatabase file is opened, allowing for a checkpoint to occur.Once the execution of the script is completed, only themain database file (<database name>.db) remains andmust be returned to the Android smartphone.

The final phase returns the modified SQLite databaseto the Android smartphone. The command, adb push<local><remote>, copies the specified file to theconnected Android smartphone [23]. To prevent thechanges from being over-written by the existing datain the <database name>.db-wal file, the file must beremoved. The first step is to start an interactiveshell using adb shell, followed by su, which providesroot permissions within the shell. The next command,cd /data/data/<application package name>/databases/,change the current working directory to the directorycontaining the main database and associated WAL file.Using the command rm <database name>.db-wal willdelete the WAL file from the directory. For the changes toreflect on the Android smartphone, it is necessary to rebootthe device.

Using the Messaging application as an example, Figure2 provides a snapshot of the messages before and afterthe manipulation of the timestamps. The comparisonshows significant changes to the dates of specific messagesand an adjustment of the order of the messages. Thesechanges occur as a direct result of manipulating thetimestamps. Close observation during the manipulationallow for the identification of the altered database files(<database name>.db and <database name>.db-wal).Examining the main database file shows that the individualrecords are ordered incorrectly. These findings are furtherdiscussed in the following section.


(a) Original timestamps (b) Manipulated timestamps

Figure 2: Messaging application showing messages with the original (a) and manipulated (b) timestamps

3.3 Discussion of Findings and Identification of Heuris-tics

The findings and observations of the exploratory exper-iments, conducted to identify changes occurring due tothe manipulation of timestamps, can be categorised intotwo groups. The first group contains a collection ofheuristics that identifies the presence of certain changesin the Android file system, which are indicators ofthe manipulation of the SQLite database. The secondgroup subsequently focuses on the individual SQLitedatabases and the identification of inconsistencies in thesedatabases. The presence of specific file system changes aswell as inconsistencies in the associated SQLite databaseindicates that the authenticity of the timestamps might becompromised.

Android File System Changes: Android File System Flags(AFS-Flags) are indicators of the potential tamperingof SQLite databases on Android smartphones. EachAFS-Flag represents a change that occurs in the Androidfile system due to the modification or removal of a SQLitedatabase or any other associated database files. Thepresence of any of these AFS-Flags is not an indicationof the manipulation of timestamps but merely that theSQLite databases have been tampered with. The followingfour individual AFS-Flags offer guidance regarding thetampering of SQLite databases:

• File permissions: associated with the SQLitedatabase files in the directory of a specific applicationare set to give only read/write access to the file ownerand the group members. For each application, thecurrent file owner and group members are only theindividual application. Any modification or removalof a file within this directory will change the existingfile permissions of the modified file from -rw-rw—-to -rw-rw-rw-. The following changes to the filepermissions are therefore an indication of the possible

manipulation of timestamps in the SQLite databases:

– File permission of <database name>.db filechanged from -rw-rw—- to -rw-rw-rw-.

– File permission of <database name>.db-walfile changed from -rw-rw—- to -rw-rw-rw-.

• Ownership: of the SQLite databases is given tothe specific application using the database. The fileowner and group members are thus set to the userID (UID) of the application, which is unique andspecific to the application. The UID remains constantfor the duration of the application on the particularAndroid smartphone. Modifications to any SQLitedatabase files will result in a change of ownershipand subsequently change the UID of both the fileowner and group members. The following change tothe ownership of the main database file is a possibleindication of the manipulation the databases:

– The current ownership for the file owner andgroup members of the <database name>.dbfile changed from the current UID to root.

• File Size: of the main database file is expected to besmaller than the size of the associated WAL file, sinceall new transactions are appended to the WAL file.The size of the main database file is only expectedto grow after a checkpoint, when all the transactionsfrom the WAL file is transferred to the main databasefile. A checkpoint, however, occurs only after theWAL file accumulated 1000 entries (leading to afile size of approximately 4MB), and thus the sizeof the main database remains relatively small. Anautomatic checkpoint occurs when the main databasefile is opened to allow for the manipulation of thetimestamps. The WAL file must be deleted to preventthe changes made in the main database file from beingoverwritten by the existing content located in theWAL file. Once the Android smartphone is rebootedto reflect the changes, a new WAL file is automatically


(a) Before manipulation of the mmssms.db (b) After manipulation of the mmssms.db

Figure 3: Comparison of changes in the directory containing the SQLite database for the Messaging application

generated. This new WAL file contains limited dataand thus has a file size that is smaller than the sizeof the main database file. A WAL file with a file sizesmaller than the size main database file is thereforean indication of the possible manipulation of thatdatabase.

• System Reboot: is required for the changesmade to the SQLite databases to reflect on theAndroid smartphone. The system reboot must occurafter making the changes to the SQLite database.Therefore the timestamps of the files associated witha system reboot will follow after the timestamp thatshows when the main database file was last modified.Multiple experiments revealed that the following filesare indicators of a system reboot:

– rtc.log file located in the /data/log/ directory.

– powerreset info.txt file located in the /data/log/directory.

– SYSTEM BOOT@[timestamp].txt file generatedin the /data/system/dropbox/ directory.

– event log@[timestamp].txt file generated in the/data/system/dropbox/ directory.

Android log data are written to certain files inthe /data/log/ directory [25]. Two files, rtc.logand powerreset info.txt, are existing files in thisdirectory. These files are updated with a newentry after every system reboot and every entryshows the boot time of the Android smart-phone. The files, SYSTEM BOOT@[timestamp].txtand event log@[timestamp].txt, are located in thedirectory /data/system/dropbox/ [26, 27]. This folderis used by a service known as DropBoxManager(unrelated to the DropBox cloud storage service)and persistently stores chunks of data from varioussources such as application crashes and kernel logrecords [26]. The SYSTEM BOOT@[timestamp].txtfile is generated consistently at boot time, with thetimestamp forming part of the file name showingwhen the Android smartphone was booted. The otherfile, event log@[timestamp].txt, is generated at 30minute intervals and also indicates the time whenthe Android smartphone was rebooted. A systemreboot occurring closely after the modification dateof a SQLite database provides a possible indication ofthe modification of timestamps. A system reboot can,however, occur at any time after pushing the modified

Figure 4: The /data/log/ directory containing the rtc.log andpowerreset info.txt

main database file onto the Android smartphone andis thus necessary to establish a time frame in whichthis particular AFS-Flag will be deemed reliable.

Figure 3 provides a comparison of the changes thatoccurred in the directory containing the SQLite databasethat stores the SMS/MMS messages. Figure 3 (b)indicates the existence of three AFS-Flags. The firstAFS-Flag is the file permissions of both the mmssms.dband the mmssms.db-wal, which changed from -rw-rw—-to -rw-rw-rw-. The second AFS-Flag is the ownershipfor both the file owner and group members of the maindatabase file that changed from radio to root. The finalAFS-Flag is the file size of the mmssms.db-wal, which issmaller than the size of the main database file, indicatingthat the mmssms.db-wal file was possibly deleted.

Figure 4 shows that the rtc.log and powerreset info.txt fileswere last modified at 15:28 on May 6 and Figure 5 presentsthe contents of a SYSTEM [email protected],which indicates that a reboot occurred at 15:29 on 6May. All three files illustrate that a reboot occurredapproximately at 15:28 on May 6, which follows after thelast modified date of the main database file (15:27 on May6). The existence of these files verifies that a system rebootoccurred after pushing the modified main database file ontothe Android smartphone.

The presence of all four AFS-Flags indicates the possiblemanipulation of timestamps within the SQLite databasestoring the SMS/MMS messages. Identification of thesemanipulated timestamps requires the analysis of theSQLite database for potential inconsistencies.

SQLite Database Inconsistencies: SQLite databasesare the prominent choice for data storage in theAndroid OS. The association of one or more AFS-Flags


Figure 5: The SYSTEM BOOT@[timestamp].txt file is generatedafter a reboot

with a specific SQLite database indicates the potentialmanipulation of the stored timestamps. Detection of themanipulated timestamps requires the further analysis ofthe SQLite database for any potential inconsistencies. Aninconsistency in a SQLite database is described as a recordthat is listed incorrectly when ordered according to thefollowing fields: primary key and a field containing datesor timestamps. The identification of inconsistencies in thetables of SQLite databases requires the evaluation of theabove mentioned fields.

The tool selected to perform the evaluation is SQL.SQL is a powerful query language, allowing for theformulation of queries that can be of forensics use [28].The evaluation of the tables available in the SQLitedatabases proceeds through three steps and use thefollowing SQL statements: CREATE TABLE, INSERTINTO, and SELECT. To preserve the integrity of the datastored in the original table, a new temporary table iscreated using the CREATE TABLE statement. The purposeof the CREATE TABLE statement is to define the physicalstructure of the new temporary table [29]. The temporarytable contains a primary key, which is an integer valuethat auto-increments, and all the fields that are necessaryto identify the inconsistencies. The query to create thetemporary table is as follows:

CREATE TABLE temp (new id INTEGER PRIMARY KEYAUTOINCREMENT, original id INTEGER, timestampsINTEGER);

Following the creation of the new temporary table is thepopulation of this table with all the records currentlylocated in the original table, which is being investigated.To perform this action, a combination of the INSERTINTO and SELECT statements is used. The SELECTstatement selects all the records from the table currentlyunder investigation while the INSERT INTO statementinserts these selected records into the temporary table.Continuing with the SMS/MMS SQLite database of theMessaging application as an example, the SQL queryrequired to copy the records from the sms table into thetemporary table is as follows:

INSERT INTO temp (original id, timstamps) SELECT id,date FROM sms;

To locate any inconsistencies in the records collected inthe temporary table, it is necessary to compare the valuesin the timestamps field of subsequent records. Since allthe values in the timestamps field are expected to followone another (each new record is appended at the end of

the table), the difference between two subsequent valuesin the timestamps field must be smaller than or equal tozero. A positive difference is an indication of a timestampthat is out of order and the cause of this inconsistency isthe manipulation of the timestamp. The SQL query usedto detect the records that are inconsistent is as follows:

SELECT T1.original id, T1.timestamps, (T1.timestamps- T2.timestamps) AS difference FROM temp T1, temp T2WHERE T2.new id = T1.new id + 1 AND difference>0;

Applying this SQL query to the records in the temporarytable leads to the identification of multiple inconsistenciesin the SMS/MMS SQLite database. The existenceof these inconsistent records in the SMS/MMS SQLitedatabase invalidates the authenticity of the database.The examiner must thus decide whether to exclude themanipulated records from the investigation or only focusthe investigation around the manipulated records.

4. REFERENCE ARCHITECTURE FOR ANDROIDAPPLICATIONS

The exploratory experiments conducted during thisresearch identified two categories of heuristics thatcan be used to establish the authenticity and accuracyof timestamps in SQLite databases. The successfulapplication of these heuristics depends, however, on thefollowing two external factors. Firstly, the skills of thesmartphone user or the sophistication of the maliciousapplication performing the manipulation can influencethe availability of the heuristics. Smartphone users orthe malicious application may be aware of the changesthat occur due to the manipulation of timestamps inSQLite databases. To prevent detection, these changescan be removed or altered in an attempt to thwart theexaminer performing the investigation. Secondly, thetimeframe between the manipulation of the timestampsand the seizing of the smartphone can also influence theavailability of certain AFS-Flags. An extended timeframecan cause specific AFS-Flags (such as File size and Systemreboot) to be deleted or be overwritten by the AndroidOS. The aim of these heuristics is to assist with theidentification of manipulated timestamps and not otherforms of digital evidence.

Due to the limitations of the identified heuristics, itbecomes necessary to explore other techniques that canalso be used to establish the authenticity and accuracyof digital evidence. Identifying such techniques requiresa deeper understanding of Android applications, whichis responsible for creating the evidence. It is possibleto obtain such an understanding by designing referencearchitecture for Android applications. A referencearchitecture captures the common architectural elementsas well as the relationships between these elements fora specific domain [30]. Using a reference architectureto model Android applications allow the examiner tocomprehend how the digital evidence originated andwhether the evidence is authentic and accurate. Whilereference architectures exist for many domains such as


Figure 6: Structured Analysis and Design Technique notation

web browsers [31] and web servers [32], this is thefirst reference architecture, to the best of the authors’knowledge, which allows for the modelling of Androidapplications.

The design of the reference architecture is accomplishedby using the Structured Analysis and Design Technique(SADT). SADT is a graphical language for describingsystems by using a set of diagrams consisting primarilyof boxes, which are interconnected by arrows. The boxesrepresent a specific function or system activity and theinterconnected arrows provide external interfaces to thedefined function or system activity. The external interfacesare the following [33, 34]:

• Input represents data or other consumables requiredfor functioning.

• Output is the results produced by the function orsystem activity.

• Control influences or regulates the execution but arenot consumed.

• Mechanism is a component used to accomplish thefunction or system activity.

The notation of SADT, as defined above is used in thispaper to illustrate the reference architecture for Androidapplications (see Figure 6).

Designing a reference architecture for Android appli-cations requires the examination of a wide variety ofexisting Android applications. This paper closely focusedon the Messaging application, which allowed for theidentification of certain architectural elements as wellas the relationship between these elements. In orderto confirm whether the identified architectural elementsand the relationship between these elements are prevalentamong other applications, additional Android applicationswere also thoroughly examined. The examination of theseAndroid applications led to the discovery two commonarchitectural elements: application activity and SQLitedatabases.

The application activity is responsible for launching thegraphical user interface and initialising the logic ofthe Android application. The graphical user interface,structured according to a specified layout and styledfollowing a certain theme, accepts as input an action, alongwith optional data, from the end user. The graphical userinterface is therefore the space where interactions betweenend users and the Android application occur and acceptsspecific sets of input, which leads to expected results.The application logic contains the work-flow logic of theAndroid application and executes the received input toproduce results accordingly.

The data involved in the requested action is transferred andretained in a SQLite database. This includes the originaldata supplied by the end user during the admission of theaction and a timestamp, indicating when the action wasperformed. The retention of the data proceeds accordingto the policies or set guidelines of the SQLite library [35],which describes what data will be stored, how such datais stored and for how long it will be kept. The SQLitelibrary receives the incoming data and transforms the dataaccording to the rules and requirements of both the SQLitelibrary and SQLite database. Once transformed, the datais retained in the <database name>.db-wal file until theWAL file reaches the limit of 1000 pages. Once the limitis reached, the SQLite library transfers the data to the<database name>.db file.

The reference architecture for Android applicationsconsists of two core components: application activityand SQLite database. The final reference architecturefor Android application is shown in Figure 7. Thepurpose and ultimate goal of the reference architecture isto organise conventional Android applications accordingto a standardised model. From this standardisation,an examiner conducting a digital forensics investigationcan establish certain particulars regarding the Androidapplications currently under investigation. Using thenewly designed reference architecture, a collection ofgeneral characteristics regarding Android applications canbe identified:

• The Android application must first receive an actionfrom the end-user

• Only after receiving the action can changes be madeto data retained in the SQLite database.

• Each individual application accepts a limited selec-tion of inputs.

• Each input contains an action and possibly optionaldata.

• Every input, along with the action and optional data,leads to expected results.

• The action to be executed by the Android applicationis provided by the end user, who is either a humanoperator or another application.


Figure 7: Reference Architecture for Android Applications

Figure 8: Modelling of the Messaging application according to the Reference Architecture

• Data is expected to flow from the application activityto the SQLite database.

• The data is transformed according to the SQLite or

database rules and inserted into the WAL file.

Any findings by the examiner that contradicts the above


general characteristics are possible indications of theinaccuracy and unreliability of digital evidence producedby the Android application.

Continuing with the Messaging application as an example,the newly designed reference architecture can now be usedto model the application (see Figure 8). The mapping ofthe Messaging application onto the reference architectureallows for the identification of the core components ofthe application as well as the flow of data between thesecomponents. The modelled architecture of the applicationcan now be used to evaluate the authenticity of the storeddata by determining if any of the general characteristicsare violated. Firstly, the data is only available in themain database file (mmssms.db) and not, as expected, inthe WAL file (mmssms.db-wal). The omission of thedata in the WAL file shows that the data flow betweenthe application activity and the SQLite database hasbeen violated. Secondly, reviewing the usage log ofthe Messaging application shows that the input was notprovided by the end-user. These findings thus confirmthe inaccuracy of the digital evidence associated withthe Messaging application. It is therefore possible toconclude that the data stored by the Messaging applicationis possibly changed or altered.

The modelling of Android applications according to thereference architecture thus provides support to examinersinvolved in ongoing investigations. Using the referencearchitecture the examiner is capable of establishing theauthenticity and accuracy of any digital evidence related tothe modelled Android application. The insight offered bythe reference architecture will lead the examiner to cometo the correct conclusions regarding the investigation, sincepotentially incorrect digital evidence, such as manipulatedtimestamps, can be eliminated before concluding the finalreport. The reference architecture is, however, limited toAndroid applications only and can’t be used to model othersoftware applications.

5. DISCUSSION AND FUTURE WORK

The exploratory experiments performed while composingthis paper showed that the timestamps found in SQLitedatabases can be manipulated by following the techniquedescribed in Section 3.2. This technique is currentlythe most plausible technique to manipulate timestampsin SQLite databases. Although other techniques can bedesigned, the inability to directly alter the data in the WALfile and the unavailability of the sqlite3 command utilityon an Android smartphone will limit the capabilities andimpact the efficiency of other techniques.

To establish the authenticity of timestamps found inSQLite databases and detect the potentially manipulatedtimestamps, this paper introduced two categories ofheuristics. The first group of heuristics determine theauthenticity of timestamps by evaluating the file systemfor specific changes and the second group identifiesinconsistencies in the SQLite databases. These heuristicsare independent of an Android smartphone and does not

require any prerequisites to be installed on the Androidsmartphones prior the investigation. The purpose of theheuristics is to give examiners an indication of whetherthe timestamps collected in SQLite databases weretampered with. The results presented by the heuristicsallow the examiner to establish the authenticity of thetimestamps. Based on the authenticity of the timestamps,the examiner can decide to either include or disregardthe evidence, associated with the evaluated timestamps,in the investigation. The heuristics are therefore capableto save crucial time during the investigation and allow theexaminer to arrive at correct and accurate conclusions. Theexperiments provided throughout this paper showed thatall of the identified heuristics are capable of providingthe examiner with the necessary support to establish theauthenticity of timestamps in SQLite databases.

The current collection of heuristics only focuses ondetecting the manipulation of timestamps found in SQLitedatabases. Manipulation of timestamps can, however,occur at multiple locations. The time zone settings ofan Android smartphone, which can be set incorrectlyor be changed (intentionally or unintentionally), caninfluence the accuracy of the timestamps found in SQLitedatabases. Besides the time zone settings, the actualtime of the Android smartphone can also be manuallyadjusted by disabling the automatic time synchronisationfeature. Manual adjustments of the time will impactthe timestamps that are generated for the traces storedin the SQLite databases when certain events occur onthe Android smartphone. It is therefore necessary toincorporate the evaluation of the time zone and timesettings of an Android smartphone.

The availability of the identified heuristics can, however,be influenced by external factors and therefore this paperalso introduced a reference architecture for Androidapplications. The newly introduced reference architectureserves as a template to model a diverse collection ofAndroid applications. To allow for such diversity,the reference architecture captures only the essentialcomponents found in modern implementations of Androidapplications and identifies the relationships between thesecomponents. The simplistic design of the referencearchitecture clearly and concisely describes the role ofeach component, allowing for easy comprehension ofthe modelled Android application. The design of thereference architecture is flexible, providing the abilityto model Android applications at different levels ofcomplexity. Depending on the architecture of Androidapplications, the components of the reference architecturecan be combined or expanded to more closely representthe modelling of a specific Android application. Thereference architecture therefore serves as a valuabletemplate for examiners to gain a better understanding ofAndroid applications. Modelling any Android applicationaccording to this reference architecture allows examinersto easily comprehend the related digital evidence andprovide the necessary insight to determine the evidence’sauthenticity and accuracy.


Future work will therefore continue to expand thisresearch, exploring different avenues to further establishthe authenticity and accuracy of digital evidence. Firstly,the existing collection of heuristics will be extended byidentifying additional heuristics that can establish theauthenticity and accuracy of timestamps. Such heuristicswill include the evaluation of the time zone and timesettings of a seized Android smartphone. Secondly, thecurrent focus is only on determining the authenticity andaccuracy of timestamps. It is thus necessary to broadenthe focus and also include other forms of digital evidence.Lastly, the existing reference architecture will be extendedto include the modelling of other forms of applicationssoftware. The reference architecture will also be validatedusing mathematical notation to ensure the architectureapplies to more complex scenarios.

6. CONCLUSION

Evidence found, in the form of traces, on smartphonesform an important asset of digital investigations. Thetimestamps associated with the traces allow the examinerto construct a timeline of events. Such a timeline oftenforms the basis for further investigation and has theability to provide answers to certain questions. Dueto the importance of timestamps, it is necessary forexaminers to be able to verify their authenticity. Collectedtimestamps might be incorrect due to tampering andwithout additional verification; the timestamps will leadthe examiner to make unreliable conclusions. To verifythe authenticity of timestamps found in SQLite databases,this paper introduced a collection of heuristics that canbe categorised into two distinct groups. The first groupof heuristics identifies the presence of certain changesin the Android file system, which is indicator of themanipulation of the SQLite databases. The secondgroup of heuristics subsequently focuses on the individualSQLite databases and the identification of inconsistenciesin these databases. The availability of these heuristicsare, however, susceptible to external factors and thereforea reference architecture for Android applications wasalso introduced to further establish the authenticity andaccuracy of digital evidence. The challenges addressedin the paper were to show that (a) timestamps can bemanipulated in SQLite databases, (b) identifying that theauthenticity of these timestamps has been compromisedand (c) overcoming the limitations of techniques usedto identify the compromised timestamps. Challenge(a) was addressed by showing the process that mustbe followed to successfully manipulate timestamps inSQLite databases, challenge (b) was addressed by usingthe identified heuristics and challenge (c) was addressedby introducing a newly designed reference architecturefor Android applications. The current paper providespreliminary evidence that the suggested approach showspotential and future work will focus on expanding thisresearch.

REFERENCES

[1] L. Goasduff and J. Rivera: “Gartner sayssmartphone sales surpassed one billionunits in 2014,” 2015. [Online]. Available:http://www.gartner.com/newsroom/id/2996817.(Accessed: Apr. 7, 2015).

[2] S. Radicati: “Mobile statistics report, 2014-2018,”The Radicati Group Inc., Tech. Rep., 2014.

[3] R. Harris: “Arriving at an anti-forensics consen-sus: examining how to define and control theanti-forensics problem,” Digital Investigation, Vol. 3,pp. 44-49, 2006.

[4] R. Verma, J. Govindarai and G. Gupta: “Preservingdates and timestamps for incident handling in An-droid smartphones,” Advances in Digital Forensics X,Vol. 433, pp. 209-225, 2014.

[5] J. Govindaraj, R. Verma, R. Mata and G. Gupta:“Poster: iSecureRing: Forensic ready secure iOSapps for jailbroken iPhones,” Proceedings: 35thIEEE Symposium on Security and Privacy, 2014.

[6] M. Song, W. Xiang and X. Fu: “Research onarchitecture of multimedia and its design basedon Android,” Proceedings: 2010 InternationalConference on Internet Technology and Applications,pp. 1-4, 2010.

[7] M. Faheem, N.-A. Le-Khac and T. kechadi:“Smartphones Forensic Analysis: A Case Study forObtaining Root Access of an Android Samsung S3Device and Analyse the Image without an ExpensiveCommercial Tool,” Journal of Information Security,Vol. 5, pp. 83-90, 2014.

[8] Open handset alliance: “Android,”2015. [Online]. Available:http://www.openhandsetalliance.com/androidoverview.html. (Accessed: Apr. 9, 2015).

[9] C. Maia, L.M. Nogueira and L.M. Pinho: “Evalu-ating Android OS for embedded real-time systems,”Proceedings: 6th International Workshop on Op-erating Systems Platforms for Embedded Real-TimeApplications, pp. 63-70, 2010.

[10] B. Speckmann: “The Android mobile platform,”Ph.D. dissertation, Department of Computer Science,Eastern Michigan University, Michigan, 2008.

[11] C. Zimmermann, M. Spreitzenbarth, S. Schmittand F.C. Freiling: “Forensic analysis of YAFFS2,”Sicherheit, pp. 59-69, 2012.

[12] J. Lessard and G. Kessler: “Android forensics:simplifying cell phone examinations,” Small ScaleDigital Device Forensics Journal, Vol. 4, No. 1, pp.1-12, 2010.


[13] H.J. Kim and J.S Kim: “Tuning the ext4 filesys-tem performance for Android-based smartphones,”Frontiers in Computer Education, Springer BerlinHeidelberg, pp. 745-752, 2012.

[14] F. Freiling, M. Spreitzenbarth and S. Schmitt:“Forensic analysis of smartphones: The AndroidData Extractor Lite (ADEL),” Proceedings of theConference on Digital Forensics, Security and Law,pp. 151-160, 2011.

[15] S. Jeon, J. Bang, K. Byun and S. Lee: “A recoverymethod of deleted record for SQLite database,”Personal and Ubiquitous Computing, Vol. 16, No. 6,pp. 707-715, 2012.

[16] SQLite: “About SQLite,” 2015. [Online]. Available:https://www.sqlite.org/about.html. (Accessed: Apr.14, 2015).

[17] SQLite: “The SQLite Database FileFormat,” 2015. [Online]. Available:https://www.sqlite.org/fileformat2.html. (Accessed:Sept. 15, 2015).

[18] P. Patodi: “Database recovery mechanism forAndroid devices,” Ph.D. dissertation, Indian Instituteof Technology, Bombay, 2012.

[19] A. Caithness: “The forensic implications ofSQLite’s write ahead log,” 2012. [Online]. Available:http://www.cclgroupltd.com/the-forensic-implications-of-sqlites-write-ahead-log/. (Accessed: Sept. 15,2015).

[20] SQLite: “Write-Ahead Logging,” 2015. [Online].Available: https://www.sqlite.org/wal.html. (Ac-cessed: Sept. 15, 2015).

[21] A.S. Sharma, M.S. Malhi, M. Singh and R. Singh:“CSLA - An application using Android Capstoneproject,” Proceedings: 2014 IEEE InternationalConference on MOOC, Innovation and Technology inEducation, pp. 382-385, 2014.

[22] A.D. Schmidt, H.G. Schmidt, L. Batyuk, J.HClausen, S.A. Camtepe, S. Albayrak and C. Yildizli:“Smartphone malware evolution revisited: Androidnext target?,” Proceedings: 4th InternationalConference on Malicious and Unwanted Software(MALWARE), pp. 1-7, 2009.

[23] Android Developers: “Android DebugBridge,” 2015. [Online]. Available:http://developer.android.com/tools/help/adb.html.(Accessed: Sept. 22, 2015).

[24] Stackoverflow: “How to use ADB in Android studioto view an SQLite DB,” 2013. [Online]. Available:http://stackoverflow.com/questions/18370219/how-to-use-adb-in-android-studio-to-view-an-sqlite-db.(Accessed: Sept. 22, 2015).

[25] R. Johnson and A. Stavrou: “Resurrecting theread logs permission on Samsung devices,” Proceed-ings: Black Hat Asia 2015, pp. 1-25, 2015.

[26] Android Developers: “DropBoxManager,”2015. [Online]. Available:http://developer.android.com/reference/android/os/DropBoxManager.html. (Accessed: Sept. 22, 2015).

[27] M. Kaart and S. Laraghy: “Android forensics:Interpretation of timestamps,” Digital Investigation,Vol. 11, No. 3, pp. 234-248, 2014.

[28] H. Pieterse and M. Olivier: “Smartphones asdistributed witnesses for digital forensics,” Advancesin Digital Forensics X, Vol. 433, pp. 237-251, 2014.

[29] E. Ugboma: Learn Database Programming usingSQL of MS Access 2007, BookSurge Publishing,2009.

[30] W. Eixelsberger, M. Ogris, H. Gall and B.Bellay: “Software architecture recovery of a programfamily,” Proceedings: 1998 International Conferenceon Software Engineering, pp. 508-511, 1998.

[31] A. Grosskurth and M.W. Godfrey: “A referencearchitecture for web browsers,” Proceedings: 21stIEEE International Conference on Software Mainte-nance, pp. 661-664, 2005.

[32] A.E. Hassan and R.C. Holt: “A reference architecturefor web servers,” Proceedings: Seventh WorkingConference on Reverse Engineering, pp. 150-159,2000.

[33] D.A. Marca and C.L. McGowan: SADT: structuredanalysis and design technique, McGraw-Hill, Inc.,1987.

[34] M.E. Dickover, C.L. McGowan and D.T. Ross:“Software design using: SADT,” Proceedings: 1977Annual Conference, pp. 125-133, 1977.

[35] S. Lee: “Creating and Using Databases for AndroidApplications,” International Journal of DatabaseTheory and Application, Vol. 5, No. 2, pp. 99-106,2012.


Using a standard approach to the design of next generation e-Supply Chain Digital Forensic Readiness systems D.J.E. Masvosvere* and H.S. Venter** * ICSA Research Group, Department of Computer Science, Corner of University Road and Lynnwood Road, University of Pretoria, Pretoria 0002, South Africa E-mail: [email protected] ** Department of Computer Science, Corner of University Road and Lynnwood Road, University of Pretoria, Pretoria 0002, South Africa E-mail: [email protected] Abstract: The internet has had a major impact on how information is shared within supply chains, and in commerce in general. This has resulted in the establishment of information systems such as e-supply chains (eSCs) amongst others which integrate the internet and other information and communications technology (ICT) with traditional business processes for the swift transmission of information between trading partners. Many organisations have reaped the benefits that come from adopting the eSC model, but have also faced the challenges with which it comes. One such major challenge is information security. With the current state of cybercrime, system developers are challenged with the task of developing cutting edge digital forensic readiness (DFR) systems that can keep up with current technological advancements, such as (eSCs). Hence, the problem addressed in this paper is the lack of a well-formulated DFR approach that can assist system developers in the development of e-supply chain digital forensic readiness systems. The main objective of such a system being that it must be able to provide law enforcement/digital forensic investigators (DFI) with forensically sound and readily available potential digital evidence that can expedite and support digital forensics incident response processes. This approach, if implemented can also prepare trading partners for security incidents that might take place, if not prevent them from occurring. Therefore, the work presented in this paper is aimed at providing a procedural approach that is based on digital forensics principles. This paper discusses the limitations of current system monitoring tools in relation to the kind of specialised DFR systems that are needed in the eSC environment and proposes an eSC-DFR process model and architectural design model that can lead to the development of next-generation eSC DFR systems. It is the view of the authors that the conclusions drawn from this paper can spearhead the development of cutting-edge next-generation digital forensic readiness systems, and bring attention to some of the shortcomings of current system monitoring tools. Keywords: Network forensics, e-Supply Chains (eSCs), Digital forensic readiness (DFR), Cyber-Crime, e-supply chain digital forensic (eSC-DFR) system, digital forensic data analysis tools, Forensics domains, Digital Forensic Investigation (DFI).

1. INTRODUCTION

In this digital age, collaborative commerce is the key to running a successful business. Organisations have come to realise that it is important to establish and manage relationships that are mutually beneficial, as this is central to their survival and growth [1]. In the recent past, organisations have become heavily dependent on their computers and networks. Needless to say, the comprehensive use of computers and networks for the exchange of information and services has had a major impact on the escalation of crime through their use [1]. As a result, monitoring such networks has become a mission-critical task. E-Supply Chains (eSCs) are becoming an increasingly adopted model for organisations to conduct business. This model encourages organisations to share information and resources in order to achieve improved customer service, speed up business operations and reduce costs. Despite the many benefits that eSCs provide, they also create new avenues for fraudsters. Ayers [2] indicates that current digital forensics (DF) systems, of which digital forensic readiness (DFR)

systems fall under are not keeping up with the increased complexity and data volumes of modern investigations and insists that the existing architecture of first-generation computer forensics tools is rapidly becoming out-dated. DF systems generally implement reactive processes that assist in the collection, preservation, analysis and reporting of digital evidence. DFR systems on the other hand are systems that implement proactive processes such as potential digital evidence (PDE) gathering and data pre-analysis. Reddy and Venter [10] indicate that many digital forensic investigations take a long time to conclude due to a lack of sufficient forensically sound digital evidence. Therefore, developments in today’s networks, which support both internal and external business processes, call for cutting edge DFR systems that can assist in the collection, storage and retrieval of PDE in a forensically sound manner. The problem pursued in this paper is that there are no DFR systems that are designed specifically for the eSC environment and no standardised approach followed in the design and development of such tools. With all the technological advancements that have occurred over the years in eSCs, there has been very little focus on the is

Based on: “A Model for the Design of Next Generation e-supply Chain Digital Forensic Readiness Tools”, by D. Masvosvere and H. Venter which appeared in the Proceedings of Information Security South African (ISSA) 2015, Johannesburg, 12 & 13 August 2015. © 2015 IEEE


the implementation of digital forensic readiness (DFR) within this environment. By definition digital forensics is the use of specialised techniques for the extraction, preservation, identification, authentication, examination, analysis and documentation of digital information from any environment [3]. This procedure is often called upon in response to the occurrence of an incident and not as a proactive process that is incorporated in the design of eSC systems. Industry’s standard tools such as the EnCase forensic tool and the Forensic Tool Kit (FTK) application do not incorporate DFR properties in their specifications, which is a proactive forensic process. Therefore, in this paper, the authors present an eSC-DFR process model that can be viewed as the methodology for achieving DFR in an eSC environment and a system-design model that is to be used as a blueprint for the design of next-generation eSC-DFR systems. The remainder of this paper is structured as follows: section 2 provides background on the eSC environment and digital forensic readiness. Section 3 discusses the limitations of current digital forensic readiness systems, leading to the proposed methodology for achieving DFR in the eSC environment in section 4. Through the proposed method, eSC-DFR system requirements are identified and discussed in section 5. In section 6 and 7 the design model for eSC-DFR systems is presented, showing the dynamic aspect of the system through the use of a use-case diagram and activity diagram. Section 8 and 9 present the generic architectural model of a next generation eSC-DFR system and its components to illustrate how the requirements set out in previous sections may be implemented. In section 10 some architectural aspects regarding the proposed model are elaborated on for greater clarity; followed by the last two sections that conclude the paper and provide a critical evaluation of the proposed eSC-DFR model.

2. BACKGROUND

This section provides the background on the e-Supply Chain environment and digital forensic readiness. The authors present a brief background discussion on e-supply Chains (eSCs) because the approach proposed in this paper is created to serve the eSC environment. The approach employs a digital forensics process, digital forensic readiness (DFR), justifying its importance in this section. 2.1 e-Supply Chains Pathak, Dilts and Biswas mention that a conventional supply chain is a system that comprises of firms, activities, people, information and resources that work together to facilitate the movement of goods and services from supplier to customer [4]. The internet overcomes the gap that has been there for business systems to be connected, providing a means to connect businesses all

over the world. An eSC is an advancement of a conventional supply chain, meaning it has additional building blocks, such as web technologies, that contribute to an improved and integrated supply-chain relationship [5]. This relationship is facilitated by web technology solutions that effect information exchange between trading partners and consumers over a distributed network environment. In the next section a more detailed description of the components that make up the eSC environment is given. 2.2 E-supply chain Architecture E-supply chains are built on hardware, middleware and software components that work together to facilitate the smooth operation of business processes between trading partners; the key components being software and middleware. Software components : such as Supply-chain management (SCM) systems provide both internal and external services to trading partners and an integrated view of core business processes [1]. These software components, in conjunction with the internet and web services, provide an entry point for an enterprise to access information from other trading partners. All SCM software applications are ready-made applications usually designed to deal with specific tasks e.g. online inventory management processes between suppliers and clients. These ready-made software applications are mass-customised for specific markets and industries. From a data management point of view, e-supply chain software can be organised into two categories: transactional and analytical software applications [6]. Transactional software applications are applications that provide services that are concerned with acquiring, processing and communicating raw data about a trading partner’s supply chain network interactions with other partners. Analytical software applications are applications used for evaluating and disseminating decisions based on e-supply chain decision databases. Examples would be forecast systems or production scheduling systems just to mention a few. Middleware components: such as application servers and content management systems, are computer software that support enterprise application integration (EAI). Middleware can be defined as programs that provide messaging services, which include enterprise-application integration, data integration, links between database systems and webservers in the eSC network. This is systems software that resides between applications and the operating system, network protocol stacks and hardware [7]. The role of middleware software is to bridge the gap between applications and the lower-level hardware and software infrastructure in order to coordinate how applications are connected and how they interact. Such


middleware components if implemented properly can help to shield software developers from low-level and error-prone platform details and assist in providing developer oriented services such as logging and security services that are necessary in a network environment [8]. Hardware components: create a communication link between each trading partner in the eSC for the transmission and processing of data. Examples of hardware components include PCs, mobile computers, routers, switchboards and servers just to mention a few, all of which are vulnerable to IT-specific threats. From Figure 1, the different components that make up an eSC environment are illustrated at a high level. The figure illustrates the structure of an eSC and how the internal infrastructure of a trading partner (TP) interacts with the information hub that facilitates interactions with other trading partners’ internal systems via internet-based protocols [9].

Figure 1: eSC Structure

The eSC network environment is full of potential evidence data that can be used when an incident occurs; that is if data is collected in a forensically sound manner. Therefore, it is the authors’ view that a digital forensic readiness system can provide such critical data. 2.3 Digital Forensic Readiness Due to the above-mentioned security issues and problems, there is a need for ways to gather digital evidence in a forensically sound manner. DFR provides different techniques which can be used to address such issues [10]. Rodney McKemmish [11] defines “forensically sound” as a term used in the digital forensics community to qualify and justify the use of a particular forensic method or technology. Very often digital forensics is called upon in response to an information-security incident or computer-related crime. Although this happens in most cases, there are many situations where DFR may benefit an organisation before an incident occurs, providing the ability to gather and preserve potential digital evidence [12]. By definition DFR is the capability of a system to efficiently collect valid digital evidence that can be used in a court of law

[13]. It is important for organisations to understand the crucial role that DFR plays as a proactive process in digital forensics and the impact a DFR system could have in a DFI. In an article Rowlingson [14] mentions a number of goals that are essential to DFR. These goals include gathering admissible evidence legally without interfering with business processes, gathering evidence targeting the potential crimes and disputes that may adversely impact on an organisation and to minimise interruption to the business from any investigation. Therefore, the role of a DFR tool in an eSC environment would be to gather such evidence from the eSC network environment and store it in a forensically sound manner; allowing a forensic investigator to access the collected potential evidence in the event that an incident occurs. 2.4 ISO/IEC 27043 ISO/IEC 27043 (2015), which is an International Standard, outlines a three-step procedure to fully implement DFR [28]. The processes in this standard deal with setting up an organisation in a way that, if a digital investigation needs to be carried out, such an organisation has the ability to maximise its potential to use digital evidence; whilst minimising the time and costs incurred in an investigation. This standard has been tested and applied to numerous real world scenarios by different researchers, validating its importance scientifically [15-19]. According to ISO/IEC 27043 [25], the three groups of processes that make up DFR are: planning, implementation and implementation assessment as shown in figure 2 below.

Figure 2: Readiness processes groups Figure 2 illustrates the order in which DFR processes take place, starting with the planning group which is concerned with the planning activities, followed by the implementation group which includes readiness processes concerned with implementing the planned processes from the planning group. Lastly, the assessment group defines readiness processes which are concerned with the assessment of the success from the implementation process group. The processes groups are concurrent with other processes that are defined in ISO/IEC 27043 such as DFI; meaning


that as DFR takes place, investigative processes can be taking place as well [25]. The data collected from the implementation of DFR in the e-supply chain can therefore be used as input to other processes in the ISO/IEC 27043 standard such as a DFI. Unfortunately, tools used in the eSC do not incorporate forensic readiness processes that maximise an eSC’s ability to provide digital forensic evidence, let alone use the ISO/IEC 27043 standard in their design. In the next section the authors discuss the limitations of current monitoring tools, in relation to what is required of eSC-DFR systems.

3. LIMITATIONS OF CURRENT TOOLS

A considerable amount of research has been conducted on the adoption of DFR processes in different network environments [3, 10, 20, 21]. Unfortunately there has not been adequate attention given towards the development of eSC-DFR systems that are designed specifically for eSC environments. The eSC environment is a distributed web-based network hence it requires a specialised DFR strategy that is able to capture PDE from different parts of the network and ensure its integrity.

It is in this paper that the authors identified a number of DFR limitations that come from not having a standard approach to design and implementation of most monitoring systems that are in most cases used as DFR tools and sources of PDE. Limitations include:

Limited throughput for data capturing devices.

Poor usability.

Compromised privacy and limited filtering of

packets.

No technical support.

No centralised storage for collected data from a distributed network environment.

Software errors.

Each of these limitations are elaborated upon in the sections to follow. 3.1 Limited throughput for data capturing devices Due to a tremendous increase in network traffic over the years, current monitoring systems are struggling to keep pace with network traffic speeds. These tools cannot capture 100 per cent of network traffic data at higher speeds [5]. For an investigation to be successful, especially in the DFR arena, as much data as possible needs to be captured. Considering that the practicallity of capturing all network traffic data is questionable, other strategic methods that come from implementing the ISO/IEC 27043 standard must be considered and are going to be disussed in this paper.

3.2 Poor Usability Most monitoring systems do not provide a user-friendly interface for end users to quickly scan through a visual timeline of an event, deeply interrogate the activity, and understand the context associated with each object [22]. Large amounts of unfiltered data are collected from different network access points and represented in a form that is too sophisticated for an ordinary person to understand; creating a need to improve the GUI, data search and filtering capabilities in such systems so that DFR processes and functionality can be executed efficiently. Considering that an eSC is a distributed system, there is a need for DFR systems that can capture potential digital evidence at different parts of the supply chain and store it in a central place, where collected data can be retrieved by digital forensic investigators or law enforcement, which would be readily available in the case of an enquiry. Through perfect planning that comes from adopting planning processes from ISO/IEC 27043, eSC-DFR systems can be designed to provide users with the best user experince and system functionality. 3.3 Compromised privacy and limited filtering of packets Packet sniffing and filtering has its drawbacks [23]. Firstly, only limited filtering on packets received is carried out, resulting in massive post processing. Secondly, no filtering is done based on the packet payload content (which is the critical data that is carried within a packet or other transmission unit). Lastly, as the entire data is dumped into a central database, the privacy of innocent individuals who may be communicating during the time of monitoring may be violated. Therefore, access to captured eSC data is not restricted to relevant potential evidence and relevant parties. ISO/IEC 27043 provides processes that can be incorporated in DFR systems for PDE preservation purposes. 3.4 No technical support Commercial Digital forensics tools that offer technical support are generally costly, making it difficult for small to medium-sized enterprises (SMEs) to purchase them [2]. On the other hand, open-source network monitoring tools are very often difficult to use as they do not provide technical support and the ability to gain insight into their inner workings [2]. The validity and trustworthiness of digital evidence is an essential part of digital forensics. This calls the validity of a DFR tool to verify that tools meet the requirements of a digital forensics tool. 3.5 Software errors Software errors continue to pose a challenge for tool developers. Analysts and other digital forensic tool users are often faced with the problem of unexplained crashes


that lead to disruption and often to loss of data [2]. These seem to be caused by a combination of factors, such as design errors in tools and a lack of high-integrity software development practices within the tool. Therefore, software crashes continue to be a significant concern for analysts and improvements to the robustness of forensic tools are crucial for this reason alone. This issue can be solved through the assessment process group in ISO/IEC 27043 which provides assessment tests to ensure that all the errors are eradicated [24].

Therefore, in the next section, the authors introduce the adoption of the ISO/IEC 27043 DFR model as a method for achieving forensic readiness in the eSC environment and countering the identified limitations. This standard provides a standard approach to the design, implementation and assessment of DFR systems. 4. DIGITAL FORENSIC READINESS IN E-SUPPLY

CHAINS

In this section a discussion on how DFR can be implemented in the eSC environment by adopting the ISO/IEC 27043 DFR process model and identifying DFR processes is carried out. 4.1 Proposed methodology for achieving DFR in eSC environment The ISO/IEC 27043 standard DFR model illustrated in Figure 2 provides us with three DFR process groups which are adopted in this paper and are critical to the achievement of DFR in the eSC environment. This grouping of processes helps to identify processes that are critical for achieving DFR in the eSC environment and clearly define the order in which events should take place. It is also important to mention that the proposed methodology goes beyond a mere adoption of the ISO/IEC 27043 standard DFR process and some identified processes for each DFR process group are not necessarily mentioned in the ISO/IEC 27043 standard. It is the author’s opinion that the three ISO/IEC 27043 DFR process groups adopted in the eSC-DFR process model eSC must result in the development of cutting edge eSC-DFR systems that have 5 main objectives: To capture PDE from the eSC network environment

To protect PDE from unauthorized parties and ensure its integrity is not compromised through the PDE protection process

To provider store secure centralized for PDE

To present collected PDE in a useful manner

To ensure that authenticated users can only access information that is relevant to a specific case and incident through the controlled access to PDE process.

Figure 3, shows an adoption of the standard DFR model by the authors to develop the eSC-DFR process model, which constitutes processes that are essential to the eSC-DFR process.

Figure 3: eSC-DFR process model

The processes shown in the figure must be utilised in three ways as indicated by the grouping of processes. The first process group is the planning process group, which involves the planning and designing of DFR solutions for this environment through the identification of policies that need to be implemented. The second process group is the implementation process group, which focuses on the implementation of solutions/systems to achieve forensic readiness in the eSC environment. The last group is the assessment of implementation process group that basically focuses on the assessment/evaluation of the effectiveness of the implemented solutions/systems in order to determine and make the necessary adjustments to achieve an effective DFR strategy. Each process group is discussed in greater detail in the following sections, starting with the planning processes group. 4.2 Planning processes group for DFR in e-supply chains Planning may be defined as a process of brainstorming and organising the activities required to achieve a desired


outcome [28]. In the eSC-DFR process model, the planning processes group presents critical processes required to achieve DFR in the eSC environment. The following sub-sections elaborate on each eSC-DFR planning process identified. Identify sources of PDE in e-supply chain: Identifying sources of potential evidence is a crucial step in the DFR process. Rowlingson [9] mentions that the purpose of this process is to identify what evidence is available across an entire system or application for collection. For the purpose of this research the role of process no. 1 is to identify the different types of potential digital evidence that may be available across an e-supply chain network and where it may be located. Examples of data sources in an eSC system include servers, firewalls, application software, general logs [25]. Examples of digital evidence include e-mails, transaction logs, audio files, system logs and video files. Planning data collection: After identifying PDE sources, it is up to the eSC service provider to decide which of the identified sources of PDE is worth pursuing to collect PDE and which methods will be considered to gather this evidence. There are a number of issues that should be considered during this process, such as how to acquire digital evidence without interfering with business processes, the legality behind collecting this data, size of collected data, and the costs involved [9]. All of these have an impact on the effectiveness of the DFR process in an eSC. Plan PDE storage: Upon collecting PDE, the next issue of concern is the storage of the collected PDE. It is the authors’ opinion that there are a number of issues that arise when considering the storage of gathered digital evidence, such as the security factor and the size of storage factor. An eSC handles sensitive company data such as client records, business transactions and other sensitive trading partner information, hence it is important to ensure that gathered information in it is kept secure from unauthorised parties. In addition, it is important to consider efficient ways of storing large amounts of PDE that is captured across the eSC such as compression. Plan PDE Protection: The main focus of this process is ensuring that the integrity of captured PDE is not compromised. The eSC is a web-based system hence there are many ways for intruders to try and access and either steal or corrupt PDE. Therefore, it important to consider methods of protecting captured data from potential threats through the deployment of certain security measures such as encryption and password protection. It is necessary to ensure that once data is collected or stored in a data repository, its integrity is maintained and it can be used in a useful way. This also involves considering measures to assess the authenticity of captured data to ensure that at all times there is proof that the PDE has not been tampered with e.g. through

hashing. Therefore content management policies and systems have to be looked into to identify specific policy measures that can be implemented in the eSC to ensure captured data is secure and can be used in a useful manner. Plan PDE pre-analysis: Once data is collected and stored in a secure database, there are elements that have to be considered to identify what can be done with the collected data, such as presenting it in a manner that makes it easy to trace events for law enforcement and forensic investigators. Therefore within the design of eSC-DFR systems that operate within the eSC environment, developers should consider all the scenarios in which collected data could be useful and design systems that can perform certain pre-analysis functions, such as categorising different types of PDR. Defining system architecture: With all the above- mentioned factors considered, process number 7 has to do with designing a system that incorporates all the planned DFR processes, from the security aspect to the usability aspect of an eSC-DFR system. This has to do with defining the architecture and behaviour of a system that implements the DFR solutions that come from all the above-mentioned planning processes [26]. 4.3 Implementation processes group for DFR in e-supply chains In the Implementation processes group, defined system architecture is implemented. This involves the incorporation of new DFR infrastructure that is software, middleware and hardware (eSC-DFR systems). It is therefore the responsibility of each e-supply chain service provider to ensure that such architecture is implemented across the e-supply chain. It is in the implementation processes group where next-generation eSC-DFR system architectures are realised and implemented.

E-supply chain service providers are to develop and implement systems that support data collection and storage as illustrated in Figure 3. The following sub-sections elaborate on each eSC-DFR implementation process. Implement PDE collection: Through the identified sources for potential digital evidence in an eSC network, the specified process deploys data capturing methods such as logging and network sniffing to capture data (PDE) at specified critical points in the e-supply chain. As mentioned in section 3.2.2, examples of data sources in an eSC system include servers, firewalls, application software, general system logs and network logs.

Implement centralised storage: Upon collecting PDE, the next issue of concern is the storage of the collected PDE. The centralisation of collected data is critical in a distributed network environment, let alone business platform such as an eSC network [27]. That is because it reduces the chances of data redundancy and replication,


also making it easy to manage the collected data and have closer control on data protection. Implement PDE protection: Implementing PDE handling focuses on implementing security measures across the e-supply chain. Making sure that from the time data is collected and transported across an eSC network to storage, to it being accessed by authorised parties, it is not compromised. Hence in this process, security measures such as encryption, hashing, firewall and intrusion detection systems may be deployed to protect the integrity and privacy of captured PDE. Implement PDE pre-analysis: Collected data from the eSC environment must be insightful and presented in a manner that is useful to its users. Therefore, during this process planned pre-analysis methods should be implemented, to provide law enforcement agents and forensic investigators with a user friendly yet multifaceted DFR solution to the eSC environment. Implement access control: Access control has to do with controlling the access to the PDE. Considering that captured data is sensitive information, it is necessary to ensure that only users that need to use the PDE to solve an investigation are granted access to it. Therefore, implementing an access control strategy focuses on ensuring that authenticated users are the only ones that can view and use the PDE e.g. using username and strong passwords to access an eSC-DFR system. Once DFR is fully implemented across the e-supply chain, there has to be a method to assess the effectiveness of the DFR process. This calls for assessment processes which are discussed in the next section. 4.4 Assessment of implementation processes group

An assessment is a set of processes that evaluate or estimate the nature, ability, or effectiveness of a method [28]. It is quite critical to be able to assess the effectiveness of a DFR approach that is implemented in an information system such as an eSC. This is necessitated for the simple reason that certain adjustments over time need to be made in infrastructure and policy to keep up with advancements in information and communications technology. An assessment of implementation must come after the implementation of a DFR solution in the eSC has taken place. Figure 3 shows the three processes that were identified as part of the assessment processes group, namely the implementation testing process, result documentation process and result evaluation process respectively. Each process is discussed in the following sections. Implementation Testing: As mentioned in IS0/IEC 27043 (2015), the assessment of implementation process focuses on assessing the effectiveness of an implemented DFR strategy, to determine if it meets the aims for achieving digital investigation readiness. Therefore, as illustrated in

the eSC-DFR process model, the implementation testing process is an assessment process that checks to see if the implemented DFR techniques, controls and architectures are cost-effective and meet DFR fundamentals. Another important aspect to consider is the legal aspect. The ISO/IEC 27043 suggests that it is during this process that a legal review should be carried out to determine if implemented processes conform to legal regulations and digital forensics principles. This is to ensure that all collected PDE is forensically sound and can be used in a court of law. Result documentation process: Documenting the testing process is an essential part of an assessment. It is a way to keep track of all the elements that are assessed in the implementation testing process and the observations made during testing process. This gives an authentic account of the testing process. A documentation of assessment results assists in the evaluation process, ensuring that an accurate evaluation of results can take place, which comes next in the assessment of implementation process group. Result Evaluation: An evaluation is a process of analysing, summarising and making informed decisions based on results obtained during the result documentation process[29]. During this process recommendations are made on certain changes that might need to be made regarding implemented processes. An evaluation of the implemented DFR process in the e-supply chain environment enables service providers to modify the DFR process, making adjustments to implemented tools. Here trading partners decide on whether to go back to the planning process or implementation process, depending on the conclusions from the assessment process.

It is the authors’ opinion that the three ISO/IEC 27043 DFR process groups adopted in the eSC-DFR process model eSC should result in the design and development of cutting edge eSC-DFR solutions that have three main goals, to capture PDE from the eSC network environment, ensure its integrity and store it in a centralised data repository for retrieval by authorised users.

In the next section, system requirements that next generation eSC-DFR systems must satisfy are listed and elaborated on.

5. REQUIREMENTS FOR NEXT-GENERATION eSC-DFR SYSTEMS

The ability of an organisation to gather potential digital evidence from its network environment before an incident occurs is the focus of digital forensic readiness. Therefore the functional requirements of a DFR eSC tool, basically defines the services that such a tool must provide; which are:


Monitor and capture all network traffic from the eSC.

Ensure confidentiality of captured data.

Provide exceptional usability and availability.

Provide accessibility to the system.

Ensure access control to the system.

Therefore, the proposed requirements are elaborated on in the sub-sections that follow.

5.1 Monitor and Capture Data from e-supply Chain

The main function of a DFR tool is to provide forensically sound records of events before an incident occurs [24]. Therefore an eSC-DFR tool should give the user a holistic view of the events transpiring in the eSC. The use of probes and other data-capturing techniques ensure that all the events that take place within an eSC are recorded in a forensically sound manner and incidents are identified. An eSC DFR tool must therefore, have a logging component, which is able to monitor and capture all the events that take place across the IT infrastructure of an eSC communication network. Once the system captures data, it should ensure the safekeeping of this data in order to ensure that the integrity is not compromised.

5.2 Confidentiality, integrity and privacy of collected data

One of the biggest concerns of many organisations is the privacy of their users’ sensitive data. An eSC-DFR system should ensure that users’ privacy is not compromised. The authors stress that logging facilities and log information which refers to captured data from different parts of the eSC, should be protected against tampering and unauthorised access. An eSC is a highly targeted environment; therefore it is safe to assume that an eSC-DFR system will also be a target for hackers and criminals [24, 30]. For that reason it is highly critical that such a system be able to provide as much security as possible by employing security measures such as confidentiality, integrity, access control and privacy. 5.3 Improved usability and availability The usability of an eSC-DFR tool is of utmost importance. Most monitoring tools are not easy to navigate, making it difficult for users to identify incidents when they occur or to merely monitor traffic [2]. It is very crucial that an eSC-DFR system be user friendly, displaying data to trading partners and law enforcement in a manner that is easy to deduce and trace events recorded. The graphic user interface (GUI) of such a tool should provide users with enough flexibility to either view, download, search categorically and filter captured data. A digital forensic investigator should be able to sign up, login and navigate through captured data effortlessly. Hence, the availability aspect of such a tool is crucial in its design. A DFR tool should be able to perform all its designated functions that include providing forensically sound captured data to users upon demand. It is therefore, crucial that usability and availability tests be conducted to ensure that the system meets its intended functions.

5.4 Having an accessible system

Since an eSC is a web-based system, an eSC-DFR system should also be web based, providing services to law enforcement agents and digital forensic investigators from this platform. Supply Chain network developers must integrate the eSC-DFR system with the eSC system, giving the tool access to the systems that are in the e-supply chain network (trading partner systems) for data capturing purposes. The system should direct all captured data to a central eSC-DFR system repository server where it is securely stored. Any system errors or alarms raised by a trading partner’s internal system should also be captured by the eSC-DFR system and stored in the repository server, where records can be retrieved once a user logs in to the eSC-DFR system. 5.5 Access control of data retrieval Considering that eSC DFR systems have to be web based, strict authentication and access control measures should be implemented. Different entities should be allocated different roles within a DFR tool. Therefore, it is proposed that an eSC-DFR system limit the access rights of different users as a privacy and confidentiality measure, in order to ensure that users only access relevant potential digital evidence from the eSC-DFR system. This requires that the system be able to store meta-data about different users, which includes the system administrator, trading partners, digital forensic investigators and law enforcement agents.

In the next section the authors present a high-level use-case diagram to show an outside view of the proposed eSC-DFR system and show how such a system interacts with its users and other software.

6. eSC-DFR SYSTEM USE-CASE DIAGRAM

A use-case diagram is widely used to capture the dynamic aspect of a system, displaying steps a user needs to follow to reach the goal as well as how the various components interact with a user. In this section the authors make use of a use-case diagram to show the high-level view of an eSC-DFR system and the interactions between actors of the system with the system itself. The authors identified three main actors i.e. eSC trading partner systems, system administrator and law enforcement agents/Forensic Investigators, depicted in the use-case diagram in Figure 4. Each actor is discussed in the sections that follow and an illustration of the roles that each user executes are also depicted in the figure.

For the system to work effectively, there are conditions that should be met. Namely, each user should have an account with the system as a system administrator, law enforcement agent or digital forensic investigator. Furthermore, the eSC network should incorporate the eSC-DFR system.


Figure 4: ESC-DFR System use-case diagram

In the sections that follow, each actor is defined, illustrating the role that each user of the system executes. 6.1 System administrator The system administrator (actor number 1 in Figure 2) represents the person responsible for maintaining the eSC-DFR system. This user must have full access rights to the administrative aspects of the system, ensuring that the system is configured correctly. It is the role of a system administrator to manage user accounts, manage user privileges and maintain the system. It the role of the system administrator to implement any updates to the system that add new features and resolve bugs. It is important to note, that all other users in the system are dependent on the system administrator as illustrated in the use-case diagram in Figure 4. 6.2 Law enforcement agent/Digital forensic investigator Actor number 2 represents a law enforcement agent or digital forensic investigator, responsible for downloading, analysing and validating collected potential digital evidence (PDE) from the eSC-DFR system. This actor is granted access to the system to view, download and validate the potential digital evidence captured from the eSC. The regulation of access to captured data is critical within an eSC business environment as organisations might want to maintain a level of privacy concerning their business operations. Therefore, strict authentication measures ensure that a user is validated and granted access to relevant data only. 6.3 Trading partners’ systems An eSC is a distributed business network environment; made up of multiple web-based trading partner systems that interact with each other through an information hub, sharing information and services [11]. Therefore, a DFR tool that operates in this environment has to be integrated with the information hub and trading partners’ web-based

systems (actor 3) to capture data coming in and out of these systems and upload it to the eSC-DFR system. Captured data might be in the form of information requests and responses sent between trading partners through the information hub, eSC system modifications on trading partner systems or other system data such as alarm system data. The eSC-DFR system may use an internet browser for users to access the system, considering that it is a web-based application. Furthermore multiple web servers may be involved in performing different functions such as securing storage, running applications and so forth. In the next section the design of a next generation eSC DFR system is presented, showing the system’s components and how the system operates.

7. DESIGN OF PROPOSED eSC-DFR SYSTEM

In this section the authors propose a model for the design of an eSC-DFR system. The model is illustrated with two significant views; a high-level structure in Figure 7 and a more detailed logical view of the design in Figure 8. In Figure 6 an activity diagram illustrates the services provided by the system to its users (digital forensic investigators).

Below a hypothetical scenario is provided to illustrate how the eSC-DFR system could benefit both trading partners and digital forensic investigators.

Figure: 5 A small eSC network

In the provided scenario, e-hub is a service provider (information hub) that connects suppliers, retailers and consumers in real time. E-hub allows retailers to sell supplier products that they do not keep in stock on their webstores; connecting Product Catalogue Data, using Selling and Fulfilment Tools and lastly make use of Transaction Processing. X is a web store that is connected to the e-hub network, selling Y and J’s products. Both Y and J are suppliers running massive warehouses. B is a retailer just like X, selling Y and J’s products. A malicious employee R who works for X decides to install a malicious code on X’s web server that infiltrates the e-hub network, attacking other trading partners Y, J and B’s web-based systems. After J, Y and B realise that their systems are being attacked they decide to call upon a


digital forensic investigator to assist them with the investigation. The e-hub network integrated with the eSC-DFR system (that extracts PDE and log information on each trading partner’s web system) is connected to the e-hub network. The forensic investigator should be able to retrieve readily available digital evidence pertaining to the incident. The evidence captured could lead directly to trading partner X’s webstore, showing the installation of malicious code and the changes made by the malicious code on trading partner X’s web-based system, the time of events and maybe who was logged in at the time of incident. Through the user-friendly eSC-DFR system, the investigator should be able to narrow down from all the captured data to the specific events related to the incident.

Figure 6 illustrates the behaviour of the system when a forensic investigator logs into the eSC-DFR system, illustrating the processes that must take place at different parts of the system.

Figure 6: Digital forensic investigator and law enforcement interacting with ESC-DFR system.

In the next section, the authors present and discuss the high level eSC-DFR system architecture.

8. HIGH-LEVEL eSC-DFR SYSTEM ARCHITECTURE

There are two essential elements to the discussion of proposed eSC-DFR system, namely the eSC network and the eSC-DFR component. These elements combined provide a platform for DFR to be achieved across the eSC. The eSC network is an important aspect in the architectural design of the eSC-DFR system because it is the environment where PDE is extracted, with infrastructural components that are critical to the implementation of DFR in the eSC. Some of the components are discussed in the following sections.

The eSC-DFR component provides DFR services to DFIs and law enforcement. Such services include eSC PDE capturing, PDE storage, eSC incident prevention and eSC PDE retrieval. The DFR components that are integrated with the eSC network infrastructure enable data capturing in the eSC network. Hence, communication between the eSC-DFR component and the eSC network through web protocols and IT infrastructure is a key part of the eSC DFR system architecture as illustrated in Figure 7. This allows PDE to be captured in the eSC network and securely stored in the CDR.

Figure 7 illustrates the integration between the eSC network and the eSC-DFR component, showing the transporting of captured data to the CDR (1) and the requesting/retrieval of PDE at the eSC-DFR component (2).

Figure: 7 High level architecture of eSC-DFR system In the next section a more detailed model of the eSC-DFR system is presented and some critical components of the system are discussed.

9. DETAILED MODEL OF eSC-DFR SYSTEM ARCHITECTURE

Figure 8 illustrates a more detailed model of the eSC-DFR system and its key elements. It should be noted that with further research, more components might be added to the proposed model.

As mentioned previously there are two key components in the proposed architecture. One is the eSC network and second is the eSC-DFR component. Both components are to utilise secure protocols such as the SSL protocol to transmit data over the web; from the eSC network to the eSC-DFR component. There are elements that are critical to both the eSC-DFR component and the eSC network. Such elements include the eSC host servers and deployed logging probes; which are located in the eSC host machines as shown in Figure 8.

In the eSC-DFR component there are three key components, the CDR server, eSC application server, eSC-DFR web-server and a log daemon which interacts with the database located in the CDR shown in Figure 8 below.


Figure: 8 Architecture of the ESC-DFR System

As mentioned in section 8, PDE is captured in the eSC network by the deployed probes and sent through to the CDR; where it is processed and stored. In the event that an incident occurs, digital forensic investigators and law enforcement agents can retrieve captured PDE from their web-browser through the eSC-DFR webserver that connects them to the CDR. In the sub-sections that follow, the authors take a deeper look at the role that each element illustrated in Figure 8 performs in the eSC-DFR system. 9.1 eSC Network The eSC network is basically the environment that is being monitored, hence it is the source for PDE. It comprises of trading partner (TP) host machines and other eSC system infrastructure. In this network instances of the application are run by the user, whilst in communication with the eSC information hub which is the heart of the eSC allowing users to interact. The two DFR components proposed for data collection are the logging module and logging probe.

Logging module: Integrating data capturing functionality to the eSC system is the most important aspect of an eSC-DFR system. Considering that the eSC network and the eSC-DFR system are integrated, the logging module has to be incorporated in the code of the eSC system application. Once the eSC system is installed onto a trading partner’s host machine, the logging module should start capturing system activity and initiate communication with the eSC-DFR logging probe. As

trading partners perform business processes using the eSC system, the eSC applications through the logging module should be able to invisibly build a vault of useful event information (log entries) for forensic investigators through the logging module. The logging module incorporated in the code of an application is designed to let a program produce messages of interest to other processes. The ability to obtain useful records of events taking place on each instance of the distributed eSC system is one of the main functional requirements of the eSC-DFR system. Therefore, having a sound logging strategy is a critical factor.

Log file probe: A log file probe is a program that runs as a background process, acquiring PDE in the form of logged events from the eSC system through the logging module, providing common formating/filtering of log data and forwarding logs to the designated storage. Remote probes generally offer a number of different functions for different scenarios. In this scenario, the main function of such probes is to extract critical information about the eSC network from the host machines, compute digital signatures and initiate the transmission of captured data from the eSC network across the web to the eSC-DFR component. PDE might include firewall data, system log files, erased files, temp files and sniffed packets depending on the configuration of the probes. The incorporation of event logging within an instance of the eSC application is critical to the implementation of the eSC-DFR system. The distributed eSC system and other integrated eSC processes must direct log data to the log file probe using the logging module, allowing the probes to process the log data according to the log file probe’s configuration. The logging file probes collectively should be able to record the entire procedure leading to an incident. They should be able to identify where requests and responses in the eSC network are coming from, the time when requests are sent and received, protocols being used and type of data being transmitted between entities in the eSC.

For improved performance a number of remote probes can be deployed. This number is based on the number of eSC hosts being monitored and the eSC network traffic throughput. Upon completion of processing the log data, the log file probe on the TP host machine should compile a log file containing the approved log events and forward it over the internet through a secure communication channel to the central database repository server.

9.2 eSC-DFR Component

The eSC DFR component provides a number of services that include system security, system maintenance, database management, content management, and user management. The component ensures that all the DFR processes are systematically executed in the eSC, also providing authenticated user access to the system’s functions. For log files to be transmitted to the eSC-DFR component, it is important to establish a connection between the eSC network and the eSC-CDR server. This might require that all the necessary ports in the eSC-DFR component firewall be opened. To ensure the security of


transmitted data, the log file probes should send the captured data through secure channels such as the SSL protocol. This is to ensure that data sent back and forth from different parts of the eSC-DFR system is not visible to intruders. Once captured data is received at the eSC-CDR server, it is processed by a log daemon. In the following sections the eSC-DFR component’s sub-components are discussed. Log daemon in eSC-DFR Repository server: A log daemon is a server process that provides a message logging facility for application and system processes. The log daemon receives data on an appropriate port from the eSC hosts and processes the received data as specified by the configuration file before sending it for storage in the central database repository database where logged events are stored. CDR in eSC-DFR Repository server: The central database repository (CDR) is where captured data from different parts of the eSC network is stored, including eSC-DFR system files, metadata and user profiles. The CDR can be defined as a central place where data is stored and maintained or a place where data is obtained for distribution across a network. When information is transmitted across the eSC or actions are executed on trading partner systems, the deployed eSC-DFR system infrastructure will capture as much data pertaining to the those events and send that data to the CDR through the log daemon that processes received PDE from eSC network. It is the view of the authors that an eSC-DFR system might require large volumes of storage, depending on the size of the e-supply chain and considering the amount of data collected from different parts of the eSC network. Hence, issues of big data might arise but are not discussed in this paper. The database management module handles the structuring of PDE and retrieval of stored data. With the help of the different modules at the eSC-DFR application web server that handles the logic and presentation aspects of the eSC-DFR system, users can access the eSC-DFR system content with relative ease. eSC-DFR application server: An application server by definition provides the business logic for a web-based system, running different processes in the middleware tier [30]. Hence an eSC-DFR application server executes a number of operations which are represented in Figure 8. There are a number of modules that execute diverse critical functions, starting with the user agent.

The user manager handles the administrative functions of the eSC-DFR system that include system maintenance, managing user profiles, user authentication and validation.

The data access agent is the module that processes the user requests to access PDE and with the help of a pre-analysis module, the system can provide meaningful data to law enforcement agents and digital forensic investigators.

The personalisation manager module of the eSC-DFR system handles the customisation aspect of the system to

provide users with a user-friendly system. The personalisation attempts to satisfy the usability requirements of the eSC-DFR system, making it an interactive system.

eSC-DFR web server: The eSC-DFR webserver is to process user requests via HTTP/S. This server attends to requests to access the eSC-DFR system by authenticated users. For example, a forensic investigator may request to login to the eSC-DFR system through a user agent such a web browser. The web browser should initiate communication with the web server by making a request for a specific for confirmation in the eSC-DFR application server and the web-server will either respond with the successful login response or an error message.

10. ARCHITECTURAL ASPECTS

Considering that the key functions of an eSC-DFR system are to capture PDE and to securely store the captured PDE for retrieval, it is safe to assume that the most critical elements of such a system are data capturing, secure storage and system reliability. Therefore, in this section the authors elaborate more on the design of the remote probes as indicated in Figure 8 and key factors to consider for system reliability and secure storage. 10.1 Design of Probes A remote probe in general can be seen as an object used for data extraction. Data includes system log files, intrusion detection system log files, system configure files, temp files and network packets [11]. Within an eSC environment, this capturing module would be installed within each trading partner’s host machine where it can capture data concerning the eSC system and send captured data to the CDR, where all captured data is stored [31]. In Figure 9 the authors display the adapted architecture of the remote probes.

Figure: 9 Remote probes

In an eSC environment information is shared and transmitted at a fast rate; many transactions are conducted by different trading partners. It is for that reason that a probe capturing PDE in this environment needs to be able to handle the rate at which data is transmitted.

As stated in Section 3, current limitations in DFR systems include limited throughput. Hence, as a measure to ensure that the eSC-DFR system is able to cope with the high- speed traffic, the authors propose the use of a kernel-level


multi-processor traffic probe that captures and analyses network traffic in high-speed networks [31]. This solution is based on execution threads that are designed to take advantage of multiprocessor architectures. The network interface cards (NIC) within the eSC host machines direct the network traffic to the probes where it is captured by the capturing engine. In the eSC-DFR system the probes as illustrated in Figure 8 are responsible for capturing eSC network traffic, filtering through captured traffic (based on their protocol or IP address), capturing system data related to the eSC network on host machines and processing/analysing captured traffic before it is sent to the CDR for storage. 10.2 Evidence storage and system reliability It is no secret that digital forensic workloads are characterised by large volumes of data and the need for high data throughput is in fact real. Therefore, it is in the authors’ opinion that improvements to data capturing rates and data transfer rates will definitely improve the performance of an eSC-DFR system. A suggested solution would be to use clustered or parallel file systems where a user reading data from the eSC-DFR system is actually receiving data from multiple physical servers at once. This would mean that a user’s read rate can exceed the maximum network I/O bandwidth of a single server. This supports the idea that was stated by Ayers that the performance of clustered file systems is greatly increased when servers and clients use teamed network adapters to increase bandwidth [2]. The eSC-DFR system will incorporate a module for managing the capturing of potential evidence and maintaining a detailed record of all tasks executed by users of the system. Making sure that the system is reliable is also of utmost importance, especially considering that this system should provide services to businesses of all sizes. Hence, the proposed system has to be carefully designed and implemented to ensure that the system is highly robust. The use of modern software engineering techniques has to be considered to ensure that the system is as secure, robust and versatile; able to handle any unforeseen software errors while minimising the risk of data loss.

While there is room for more thorough optimisation of the eSC-DFR system, it is in the authors’ opinion that the core elements that are included in the proposed design validate this approach.

11. DISCUSSION OF THE eSC-DFR PROCESS MODEL AND eSC-DFR SYSTEM ARCHITECTURE

MODEL

In this section, the authors discuss the relevance of the proposed next generation eSC-DFR system design that is based on the eSC-DFR process model. The eSC-DFR process model and system architecture are a new contribution that focus on forensic planning and preparing the eSC environment for a digital forensic investigation process.

It is the authors’ viewpoint that due to the ever- increasing collaboration between businesses and the incorporation of the internet in business processes, there is a need to shift from old ways of incorporating digital forensics. As suggested by the problem, digital forensics is often called upon in response to cyber incidents and not adopted as a proactive process, which creates a problem that is addressed in this paper of a lack of cutting-edge DFR systems let alone a well-formulated method for proactively collecting PDE in the eSC environment.

With the proposed method (eSC-DFR process model), there are clear procedures and processes to follow that are in line with the ISO/IEC 27043 standard in order to design and develop cutting edge eSC-DFR systems. Tools that proactively collect, store in a central data repository and maintain the integrity of PDE, only giving access to such data to authenticated users e.g. law enforcement agents and DFIs.

The proposed system architecture in Figure 8 shows that by using the eSC-DFR process model alone, cutting edge eSC-DFR systems can be developed. The primary objective of this research was to design a high-level architecture of an eSC-DFR system that can provide useful data to digital forensic investigators and law enforcement agents to aid in digital forensic investigations and other processes that might require such data. From the limitations identified in current DFR tools the proposed eSC-DFR process model is created to assist in identifying the processes that should be followed in the design and implementation of eSC-DFR systems. The proposed eSC-DFR system architecture is designed to cater to the security needs of an eSC environment specifically, ensuring that the eSC is forensically ready. It comprises of a secure eSC-DFR system web server, remote logging probes that are strategically deployed in the eSC network, a central repository database for storage of PDE and a user component that provides users with controlled access to the system. The authors identified the need for next generation DFR systems that:

• Can handle high throughput that passes through eSC information networks.

• Are robust and can meet DFR toll requirements.

• Can present captured PDE in a comprehensible manner.

• Are able to maintain a level of privacy for trading partners.

• Provide users with uncompromised forensically sound data.

• Collect data on supplier’s supplier relationships.

Considering that an eSC environment may comprise of many entities, the proposed architecture was designed to handle large amounts of potential evidence data. It is in the authors’ opinion that software developers should ensure that the capturing and storage components of such a system can capture data at high speeds and accommodate large volumes of data. In addition, it should be considered that an eSC is a distributed network environment connecting retailers to suppliers and suppliers to more suppliers. The architecture of the eSC-


DFR system is designed to cater to those kinds of relationships. The deployment of remote probes as data capturing modules onto host systems in the eSC network ensure that potential digital evidence is captured across the eSC.

As mentioned earlier in the paper, a major requirement of a DFR tool is that it should ensure that the integrity of captured PDE is not compromised, thereby meeting digital forensics standards. Therefore, the use of secure communication protocols and remote probes is incorporated in the proposed architecture. The use of encryption and digital signatures amongst other methods are suggested to maintain the integrity of captured data. In addition, a specialised probe design was incorporated to ensure that the capturing of high speed traffic is accomplished [11].

It has been strongly emphasised by the authors that the ease of use that a DFR system provides and its ability to present captured data in a comprehensible manner is of utmost importance (Usability). Therefore in this paper the authors emphasise the importance of paying close attention to the design of a usable graphic user interface; making the eSC-DFR system easy to navigate, the time taken to retrieve captured evidence and trace events is greatly reduced. Hence, developers should ensure that much attention is placed on the usability aspect of the system.

The strict authentication of users through the security component in the eSC-DFR system architecture ensures that a level of privacy for sensitive data is maintained. It is of great importance to stress the point that the eSC-DFR system is a system designed to serve law enforcement and digital forensic investigators to solve cases and monitor the eSC. Therefore, right of access should be strictly monitored to ensure that only validated users are granted access to the system. Therefore, measures taken to control access and at the same time maintain a level of flexibility for users should be considered in the development of such a system.

It is evident, that the use of IT comes with numerous challenges that can cost organisations large sums of money. Although the effectiveness of a DFR system can only be fully comprehended through an assessment of the system, the authors believe that the proposed eSC-DFR system can help organisations to avoid incidents in the eSC. Also such a system can assist law enforcement agents and digital forensic investigators by providing readily available digital evidence that can be used in the investigative processes.

Considering that collected PDE is for prosecution purposes and law enforcement purposes, there are strict measures that should be enforced to ensure that only authenticated users have access to collected data as there might be serious legal consequences if captured information ends up in the wrong hands. It is also important to mention that different jurisdiction laws make provision for information that is captured to facilitate prosecution in a judicial system [3, 32, 33].

12. CONCLUSION Existing general purpose DFR systems are rapidly becoming inadequate for modern commercial network systems (eSCs). The out-dated architecture of such tools limits their ability to scale and adopt the current and future eSC forensic readiness processes. In the recent past, researchers have cited the need for more capable DFR systems that can support digital forensic investigations in the event an incident occurs. As much as these are steps in the right direction, implementing security policies and processes alone does not ensure that the eSC environment is fully forensically ready. This paper proposes a process model which can be used as a blueprint for the design of next generation eSC-DFR systems that can fully cater to the DFR requirements of such an environment. The eSC-DFR system is a useful tool for collecting data and monitoring the eSC environment. The design of the proposed system which is illustrated in Figure 8 is built around improving the user experience and providing adequate forensically sound data to trading partners and law enforcement agents about all trading partner interactions. The use of kernel-level multiprocessor network probes ensures that no data is lost during the packet capturing process. The authors also acknowledge that an eSC handles large amounts of data that are transmitted upstream and downstream of the supply chain, hence the eSC-DFR system provides clustered storage to increase the performance, capacity and reliability of the system.

In this paper the authors were able to discuss the limitations of current DFR systems and requirements for next-generation eSC-DFR systems were proposed. A detailed eSC-DFR process model was proposed including a generic architectural design for a practical next-generation eSC-DFR system was presented. The design and implementation of such as system is ongoing. The system incorporates strategies for optimising and managing potential evidence data collected from different parts of an eSC. For future work an implementation of the eSC-DFR system is necessary, also providing a look into the performance of the proposed system. This would assist in verifying whether the proposed system accomplishes what it is intended to accomplish.

13. REFERENCES [1] H. Cheng, "An integration framework of erm,

scm, crm," in Management and Service Science, 2009. MASS'09. International Conference on, 2009, pp. 1-4.

[2] D. Ayers, "A second generation computer forensic analysis system," digital investigation, vol. 6, pp. S34-S42, 2009.

[3] V. R. Kebande and H. S. Venter, "A Cloud Forensic Readiness Model Using a Botnet as a Service," in The International Conference on Digital Security and Forensics (DigitalSec2014), 2014, pp. 23-32.


[4] S. D. Pathak, D. M. Dilts, and G. Biswas, "Next generation modeling III-agents: a multi-paradigm simulator for simulating complex adaptive supply chain networks," in Proceedings of the 35th conference on Winter simulation: driving innovation, 2003, pp. 808-816.

[5] L. Pulevska-Ivanovska and N. Kaleshovska, "Implementation of e-Supply Chain Management."

[6] P. Helo and B. Szekely, "Logistics information systems: an analysis of software solutions for supply chain co-ordination," Industrial Management & Data Systems, vol. 105, pp. 5-18, 2005.

[7] R. E. Schantz and D. C. Schmidt, "Middleware for distributed systems: Evolving the common structure for network-centric applications," Encyclopedia of Software Engineering, vol. 1, 2002.

[8] R. B. Handfield and E. L. Nichols, Supply chain redesign: Transforming supply chains into integrated value systems: FT Press, 2002.

[9] N. Thomas, "Multi-state and multi-sensor incident detection systems for arterial streets," Transportation Research Part C: Emerging Technologies, vol. 6, pp. 337-357, 1998.

[10] K. Reddy and H. S. Venter, "The architecture of a digital forensic readiness management system," Computers & Security, vol. 32, pp. 73-89, 2013.

[11] R. McKemmish, When is digital evidence forensically sound?: Springer, 2008.

[12] G. G. Richard III and V. Roussev, "Digital forensics tools: the next generation," Digital crime and forensic science in cyberspace, pp. 76-91, 2006.

[13] C. Lankshear and M. Knobel, Digital literacies: Concepts, policies and practices vol. 30: Peter Lang, 2008.

[14] R. Rowlingson, "A ten-step process for forensic readiness," International Journal of Digital Evidence, vol. 2, pp. 1-28, 2004.

[15] V. Kebande and H. Venter, "A Functional Architecture for Cloud Forensic Readiness Large-scale Potential Digital Evidence Analysis," in Proceedings of the 14th European Conference on Cyber Warfare and Security 2015: ECCWS 2015, 2015, p. 373.

[16] S. Omeleze and H. S. Venter, "Testing the harmonised digital forensic investigation process model-using an Android mobile phone," in Information Security for South Africa, 2013, 2013, pp. 1-8.

[17] M. Grobler, "The Need for Digital Evidence Standardisation," Emerging Digital Forensics Applications for Crime Detection, Prevention, and Security, p. 234, 2013.

[18] M. Ingels, A. Valjarevic, and H. S. Venter, "Evaluation and analysis of a software prototype for guidance and implementation of a standardized digital forensic investigation process," in Information Security for South Africa (ISSA), 2015, 2015, pp. 1-8.

[19] A. Ajijola, P. Zavarsky, and R. Ruhl, "A review and comparative evaluation of forensics guidelines of NIST SP 800-101 Rev. 1: 2014 and ISO/IEC 27037: 2012," in Internet Security (WorldCIS), 2014 World Congress on, 2014, pp. 66-73.

[20] D. Barske, A. Stander, and J. Jordaan, "A Digital Forensic Readiness framework for South African SME's," in Information Security for South Africa (ISSA), 2010, 2010, pp. 1-6.

[21] P. M. Trenwith and H. S. Venter, "Digital forensic readiness in the cloud," in Information Security for South Africa, 2013, 2013, pp. 1-5.

[22] R. G. Clegg, M. S. Withall, A. W. Moore, I. W. Phillips, D. J. Parish, M. Rio, et al., "Challenges in the capture and dissemination of measurements from high-speed networks," arXiv preprint arXiv:1303.6908, 2013.

[23] B. Pande, D. Gupta, D. Sanghi, and S. K. Jain, "The Network Monitoring Tool-PickPacket," in Information Technology and Applications, 2005. ICITA 2005. Third International Conference on, 2005, pp. 191-196.

[24] I. ISO/IEC 27043, "Information Technology-Security techniques- Assurance for digital evidence investigation process," 2015.

[25] E. Casey, Digital evidence and computer crime: Forensic science, computers, and the internet: Academic press, 2011.

[26] H. Jaakkola and B. Thalheim, "Architecture-Driven Modelling Methodologies," in EJC, 2010, pp. 97-116.

[27] A. Turner, C. Bullok, K. Irvin, J. Hayre, and K. Markham, "Method and system for acquisition


and centralized storage of event logs from disparate systems," ed: Google Patents, 2005.

[28] V. R. Prybutok, L. A. Kappelman, and B. L. Myers, "A comprehensive model for assessing the quality and productivity of the information systems function: toward a theory for information systems assessment," Information Resources Management Journal, vol. 10, pp. 6-26, 1997.

[29] G. P. Wiggins, Educative assessment: Designing assessments to inform and improve student performance vol. 1: Jossey-Bass San Francisco, CA, 1998.

[30] E. J. Sinz, B. Knobloch, and S. Mantel, "Web-Application-Server," Wirtschaftsinformatik, vol. 42, pp. 550-552, 2000.

[31] L. Zabala, A. Ferro, A. Pineda, and A. Muñoz, "Modelling a Network Traffic Probe Over a Multiprocessor Architecture," Edited by Jesús Hamilton Ortiz, p. 303, 2012.

[32] J. Rajamaki and J. Knuuttila, "Law Enforcement Authorities' Legal Digital Evidence Gathering: Legal, Integrity and Chain-of-Custody Requirement," in Intelligence and Security Informatics Conference (EISIC), 2013 European, 2013, pp. 198-203.

[33] D. J. Ryan and G. Shpantzer, "Legal aspects of digital forensics," in Proceedings: Forensics Workshop, 2002.


NOTES


This journal publishes research, survey and expository contributions in the field of electrical, electronics, computer, information and communications engineering. Articles may be of a theoretical or applied nature, must be novel and

must not have been published elsewhere.

Nature of ArticlesTwo types of articles may be submitted:

• Papers: Presentation of significant research and development and/or novel applications in electrical, electronic, computer, information or communications engineering.

• Research and Development Notes: Brief technical contributions, technical comments on published papers or on electrical engineering topics.

All contributions are reviewed with the aid of appropriate reviewers. A slightly simplified review procedure is used in the case of Research and Development Notes, to minimize publication delays. No maximum length for a paper

is prescribed. However, authors should keep in mind that a significant factor in the review of the manuscript will be its length relative to its content and clarity of writing. Membership of the SAIEE is not required.

Process for initial submission of manuscriptPreferred submission is by e-mail in electronic MS Word and PDF formats. PDF format files should be ‘press

optimised’ and include all embedded fonts, diagrams etc. All diagrams to be in black and white (not colour). For printed submissions contact the Managing Editor. Submissions should be made to:

The Managing Editor, SAIEE Africa Research Journal, PO Box 751253, Gardenview 2047, South Africa.

E-mail: [email protected]

These submissions will be used in the review process. Receipt will be acknowledged by the Editor-in-Chief and subsequently by the assigned Specialist Editor, who will further handle the paper and all correspondence pertaining

to it. Once accepted for publication, you will be notified of acceptance and of any alterations necessary. You will then be requested to prepare and submit the final script. The initial paper should be structured as follows:

• TITLE in capitals, not underlined.• Author name(s): First name(s) or initials, surname (without academic title or preposition ‘by’)• Abstract, in single spacing, not exceeding 20 lines.• List of references (references to published literature should be cited in the text using Arabic numerals in

square brackets and arranged in numerical order in the List of References).• Author(s) affiliation and postal address(es), and email address(es).• Footnotes, if unavoidable, should be typed in single spacing.• Authors must refer to the website: http: //www.saiee.org.za/arj where detailed guidelines, including

templates, are provided.

Format of the final manuscriptThe final manuscript will be produced in a ‘direct to plate’ process. The assigned Specialist Editor will provide you

with instructions for preparation of the final manuscript and required format, to be submitted directly to: The Managing Editor, SAIEE Africa Research Journal, PO Box 751253, Gardenview 2047, South Africa.

E-mail: [email protected]

Page chargesA page charge of R200 per page will be charged to offset some of the expenses incurred in publishing the work.

Detailed instructions will be sent to you once your manuscript has been accepted for publication.

Additional copiesAn additional copy of the issue in which articles appear, will be provided free of charge to authors.

If the page charge is honoured the authors will also receive 10 free reprints without covers.

CopyrightUnless otherwise stated on the first page of a published paper, copyright in all contributions accepted for publication is vested in the SAIEE, from whom permission should be obtained for the publication of whole or part of such material.

SAIEE AFRICA RESEARCH JOURNAL – NOTES FOR AUTHORS


South African Institute for Electrical Engineers (SAIEE)PO Box 751253, Gardenview, 2047, South Africa

Tel: 27 11 487 3003 | Fax: 27 11 487 3002E-mail: [email protected] | Website: www.saiee.org.za

Documents

V107 2 2016 S IN INSI I NINS 49 ISSN 1991-1696 …...2016/08/23 · V107 2 2016 S IN INSI I NINS 49 June 2016 Volume 107 No. 2 Africa Research JournalISSN 1991-1696 Research Journal