82
Institutionen för datavetenskap Department of Computer and Information Science Final thesis Behavior-based malware detection system for the Android platform by Iker Burguera Hidalgo LIU-IDA/ERASMUS-A—11/002—SE 2011-09-27 Linköpings universitet SE-581 83 Linköping, Sweden Linköpings universitet 581 83 Linköping

Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Institutionen för datavetenskap

Department of Computer and Information Science

Final thesis

Behavior-based malware detection system for the Android platform

by

Iker Burguera Hidalgo

LIU-IDA/ERASMUS-A—11/002—SE

2011-09-27

Linköpings universitet SE-581 83 Linköping, Sweden

Linköpings universitet 581 83 Linköping

Page 2: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Linköping universitet Institutionen for datavetenskap

Examensarbete

Behavior-based malware detection system for the Android platform

av

Iker Burguera Hidalgo

LIU-IDA/ERASMUS-A—11/002—SE

2011-09-27

Handledare: Dr. Urko Zurutuza Examinator: Dr. Simin Nadjm-Tehrani

Page 3: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Linköping University Electronic Press

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för icke-kommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan be-skrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se för-lagets hemsida http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet – or its possible replacement –from the date of publication barring exceptional circumstances.

The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/. © Iker Burguera Hidalgo.

Page 4: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Abstract

Malware in smartphones is growing at a signi�cant rate. There arecurrently more than 250 million smartphone users in the world and thisnumber is expected to grow in coming years [44].

In the past few years, smartphones have evolved from simple mobilephones into sophisticated computers. This evolution has enabled smart-phone users to access and browse the Internet, to receive and send emails,SMS and MMS messages and to connect devices in order to exchange in-formation. All of these features make the smartphone a useful tool in ourdaily lives, but at the same time they render it more vulnerable to attacksby malicious applications.

Given that most users store sensitive information on their mobilephones, such as phone numbers, SMS messages, emails, pictures andvideos, smartphones are a very appealing target for attackers and mal-ware developers.

The need to maintain security and data con�dentiality on the Androidplatform makes the analysis of malware on this platform an urgent issue.

We have based this report on previous approaches to the dynamicanalysis of application behavior, and have adapted one approach in orderto detect malware on the Android platform. The detector is embeddedin a framework to collect traces from a number of real users and is basedon crowdsourcing. Our framework has been tested by analyzing data col-lected at the central server using two types of data sets: data from arti�cialmalware created for test purposes and data from real malware found inthe wild. The method used is shown to be an e�ective means of isolatingmalware and alerting users of downloaded malware, which suggests thatit has great potential for helping to stop the spread of detected malwareto a larger community.

Finally, the report will give a complete review of results for self writtenand real Android Malware applications that have been tested with thesystem.

This thesis project shows that it is feasible to create an Android mal-ware detection system with satisfactory results.

Page 5: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Acknowledgments

First of all, I would like to thank Prof. Simin Nadjm-Tehrani andDr. Urko Zurutuza for their support, guidance and patience over

the course of this Master's thesis project.

I would also like to thank all members of the Real-Time SystemsLaboratory (RTSLab), my corridor mates from Ryds Allé 9 andAlsättersgatan 9 and friends from Legazpi for all the support andfantastic moments we shared in 2010-2011.

Finally, I would like to thank my wonderful and fantastic family,which in addition to providing me with economic and moral supportalso wrote part of my acknowledgment notes.

Page 6: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Contents

1 Introduction 1

1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . 11.2 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Project Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Intended audience . . . . . . . . . . . . . . . . . . . . . . . . . . 41.5 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.6 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Background 14

2.1 Android Operating System . . . . . . . . . . . . . . . . . . . . . 142.1.1 Platform architecture . . . . . . . . . . . . . . . . . . . . 142.1.2 The Dalvik Virtual Machine . . . . . . . . . . . . . . . . . 182.1.3 The Android Security Model . . . . . . . . . . . . . . . . 202.1.4 Android applications . . . . . . . . . . . . . . . . . . . . . 22

2.2 Intrusion Detection System . . . . . . . . . . . . . . . . . . . . . 242.2.1 De�nition . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.2.2 Detection types . . . . . . . . . . . . . . . . . . . . . . . 25

2.3 System calls and Vectors . . . . . . . . . . . . . . . . . . . . . . . 272.4 Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.4.1 Data collection in KDD process . . . . . . . . . . . . . . 292.5 K-means Clustering algorithm . . . . . . . . . . . . . . . . . . . 31

2.6 Crowdsourcing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3 Behavior-Based malware detection system for Android Appli-

cations 35

3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.2 Android Data mining: Crowdsourcing and Self-written applications 37

3.2.1 Android Data collector script . . . . . . . . . . . . . . . . 383.2.2 Android Crowdsourcing and data mining application . . . 41

3.3 Behavior-Based malware detection system . . . . . . . . . . . . . 423.3.1 Design of the Behavior-Based malware detection system . 42

4 Results and Evaluation 48

4.1 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.2 Devices and Programs . . . . . . . . . . . . . . . . . . . . . . . . 484.3 Malware detection system Results . . . . . . . . . . . . . . . . . 50

4.3.1 Self-written Malware . . . . . . . . . . . . . . . . . . . . . 504.3.2 Real Malware . . . . . . . . . . . . . . . . . . . . . . . . . 58

5 Conclusions, Contributions and Future Work 67

5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

iii

Page 7: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

List of Figures

1 Number of Applications available at smartphone App Stores[40] . 22 Android platform architecture[5] . . . . . . . . . . . . . . . . . . 153 Android Linux Kernel and Init process . . . . . . . . . . . . . . . 174 Android boot sequence . . . . . . . . . . . . . . . . . . . . . . . . 185 Dex �le creation process . . . . . . . . . . . . . . . . . . . . . . . 196 Application request process . . . . . . . . . . . . . . . . . . . . . 217 Android APK �le . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 Android APK �le generation process . . . . . . . . . . . . . . . . 239 Misuse detection versus Anomaly detection . . . . . . . . . . . . 2510 Linux User and Kernel space . . . . . . . . . . . . . . . . . . . . 2711 Knowledge Discovery in Databases (KDD) process[46] . . . . . . 2912 Taxonomy clustering methods . . . . . . . . . . . . . . . . . . . . 3113 Hierarchical method: Agglomerative vs Divisive . . . . . . . . . . 3214 K-means applied as a detection system for android system calls 3415 Android malware detection system scheme . . . . . . . . . . . . . 3516 Data acquisition process . . . . . . . . . . . . . . . . . . . . . . . 3717 Data collector script user interface . . . . . . . . . . . . . . . . . 3818 Data collector script process . . . . . . . . . . . . . . . . . . . . . 3919 Android Crowdsourcing application . . . . . . . . . . . . . . . . . 4120 Static and Dynamic Analysis . . . . . . . . . . . . . . . . . . . . 4221 Android Malware Detection process . . . . . . . . . . . . . . . . 4422 Steamy Window application . . . . . . . . . . . . . . . . . . . . . 5823 Interaction with Steamy window application . . . . . . . . . . . 5924 Steamy Window Interactions bar plot . . . . . . . . . . . . . . . 64

iv

Page 8: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

List of Tables

1 Worldwide mobile device Operating System Market Shares and2010-2014 Growth[36] . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Related work State-of-the-Art Summary(i) . . . . . . . . . . . . . 113 Related work State-of-the-Art Summary(ii) . . . . . . . . . . . . 124 K-means Clustering algorithm process . . . . . . . . . . . . . . . 335 Static and Dynamic Malware analysis advantages and Disadvan-

tages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436 Matlab Clustering code for Android Malware Detection . . . . . 457 Clustering algorithm metrics . . . . . . . . . . . . . . . . . . . . 468 Vector comparison matrix . . . . . . . . . . . . . . . . . . . . . . 479 Example vector clustering results . . . . . . . . . . . . . . . . . . 4710 Test Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4811 Programs used in the project . . . . . . . . . . . . . . . . . . . . 4912 Crowdsourcing application result - Android Device Information 5113 Crowdsourcing application result - Installed applications . . . . . 5214 Self Written Application report - Calculator Good Application . 5315 Self Written Application report - Calculator Malicious Application 5516 Self written android applications description . . . . . . . . . . . . 5617 Self written Android Malware result . . . . . . . . . . . . . . . . 5718 Steamy Window system call vectors comparison matrix table . . 6119 Steamy window clustering result . . . . . . . . . . . . . . . . . . 61

v

Page 9: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Chapter 1

1 Introduction

This paper describes the results of a Master's thesis project (30 ECTS) towardsthe ful�llment of a degree in Telecommunications Engineering at MondragonUnibertsitatea. The project was carried out at the Department of Computerand Information Science at Linköping University while studying as a visitingstudent from Mondragon Unibertsitatea.

The following paragraphs will detail the background, motivation, relatedwork and goals of the master thesis. Details on how the project was carried outand on the results obtained will be presented in the following chapters.

1.1 Background and Motivation

Communications and technology are rapidly growing industries that are chang-ing every day. The constant evolution of technology necessitates adaption tonew concepts and awareness of new developments. In the following section webrie�y cover the trends in the evolution of the smartphone market that makethe subject matter of this thesis relevant.

According to the International Data Corporation [23], smartphone vendorswill ship more than 450 million smartphones in 2011, compared to the 303.4million units shipped in 2010[21]. Moreover, the smartphone market will growfour times faster than the traditional mobile phone market, and due to this, thedemand for smartphones will rise considerably. Eventually, customers will reachthe point where they will replace their old mobile phones with smartphones.

The sales growth of mobile phone companies such as Samsung and HTCbetween 2009 and 2010 has revolutionized the smartphone market. In light ofthis, the IDC predicts that the Android OS will surpass Nokia's Symbian OSin terms of sales in 2011, and will continue to lead the smartphone OS marketin the coming years [36]. Furthermore, it predicts that the Android OS andWindows Mobile will grow almost 50% between 2010 and 2014, with a highprobability of becoming the leading smartphone operating system vendors inthe future. See Table 1.

1

Page 10: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Operating System 2010MarketpredictedShare

2014Market

predictedShare

2014/2010Change

Symbian 40.1% 32.9% -18.0%BlackBerry OS 17.9% 17.3% -3.5%

Android 16.3% 24.6% 51.2%iOS 14.7% 10.9% -25.8%

Windows Mobile 6.8% 9.8% 43.3%Others 4.2% 4.5% 8.3%Total 100% 100%

Table 1: Worldwide mobile device Operating System Market Shares and 2010-2014 Growth[36]

The IDC predicts that the total number of smartphone applications will growat the same rate as smartphone sales. There are currently more than 350,000applications in Apple's iPhone market and 250,000 applications in the Androidmarket, according to Silicon Alley Insider [37]. This is depicted in Figure 1.

Figure 1: Number of Applications available at smartphone App Stores[40]

2

Page 11: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

The o�cial Google Android market nearly doubled in size in 2010 and 2011,surpassing 250,000 applications in March 2011. Figure 1, shows the interestof software developers in the Android platform, and we can assume that asAndroid developers continue to create applications for Android's OS, malwaredevelopers will continue to create Malware for the system, as well.

Malware1, has been a threat for PCs for many years[30] and in light ofthe rapid increase of smartphone sales over the last few years[38], it was onlya matter of time before malware developers became interested in staging theirattacks on the smartphone platform. In particular, 2010 and 2011 saw a growinginterest among malware developers in waging attacks on Android's OS[28].

Malware usually destroys valuable and sensitive information in infected sys-tems. Malware is also commonly used to exploit infected devices and obtainpro�ts from them. In the same way as malware harms computers, it can alsoperform attacks on smartphones, given that they have similar operating fea-tures. This observation makes it clear that it is necessary to enhance protectionof smartphone devices in the same way as we did with computers some yearsago.

The Android market is an open market system. This means that Androiddevelopers can upload their applications, also called third-party applications,to Android's o�cial market without them being �ltered by any certi�cationauthority that would check the trustworthiness of the applications. On theone hand, this increases the odds that the Android market will have a greatervariety of applications and content, but on the other hand it facilitates infectionby malware applications, as applications are not analyzed by any certi�cationauthority.

In conclusion, considering the growth of smartphones running the AndroidOS2 and the increasing number of applications available for the Android OS,improving the security (i.e. the integrity, con�dentiality and privacy) of theAndroid platform is the main objective of this project. In order to achieve thatobjective, we will develop a behavior-based malware detection system for theAndroid platform.

1Malicious(Mal) software(ware)2Samsung and HTC smartphone vendors[38]

3

Page 12: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

1.2 Goal

The goal of this Master's thesis was to design and implement a behavior-basedmalware detection system for the Android platform.

More speci�cally, the work was divided into the following sub-goals:

• Create a malware detection system for the Android platform.

• Create data collector applications to monitor Android OS activity.

• Design and implement the Android application behavior database.

The proposed solution was expected to detect malicious applications from An-droid o�cial and non-o�cial markets or repositories.

1.3 Project Assumptions

Some assumptions were made at the beginning of the project:

• Applications available on the o�cial Android market would be used toestablish the normality model for the applications, and the equivalentprograms in non-o�cial repositories would be used to test the system.

• Even if malware did exist in the Android market, �rst we needed clearor good applications with the same name or purpose to test the malwaredetection system.

• We assumed that downloaded third-party applications were not trustedapplications and must be analyzed/monitored with the crowdsourcing ap-plication or data collector script.

• The Android community would collaborate on this project by installingthe crowdsourcing application on their devices. The crowdsourcing appli-cation would send recorded �les to the malware detection system serverfor post-analysis.

1.4 Intended audience

This thesis is useful to anyone who is involved in mobile Security, and is speciallydesigned for Android smartphone users and developers. It is also targeted atanyone interested in crowdsourcing and data mining techniques as they applyto mobile phones.

The document does not require any prior knowledge in the area of security.Chapter 2 will provide all the basic theory for the concepts explained in thepaper.

4

Page 13: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

1.5 Related work

Malware has been a threat for computers for many years[30] and continues tocause irreparable damage to infected systems[29]. The �rst attempts to identifyand analyze malware on smartphones started by adapting existing PC securitysolutions and applying them to mobile phones. This was not a feasible solutionin light of the high demand placed on resources by antivirus techniques and thepower and memory constraints of mobile devices. Since malware and intrusiondetection systems have already been the subject of massive research, we will givejust a brief review of the evolution of malware and malware detection techniquesas regards mobile phones.

Nwokedi et al. compiled a summary of the most commonly used malwaredetection techniques[60]. Their report examined 45 di�erent malware detectiontechniques in the �elds of anomaly-based detection, speci�cation-based detec-tion and signature-based detection. All techniques explained in this report arevery useful background information in order to understand the �rst approachesto malware detection that can also be used in smartphones.

Iseclab[25], International Secure Systems Laboratory, explored the detectionof malicious applications and used di�erent approaches to detection based ondynamic analysis of malicious or infected applications. [55]. They used di�erentapproaches and detection techniques based on dynamic analysis that are usedto detect malicious or infected applications. The paper provides useful informa-tion about malware detection techniques and tools used in dynamic analysis ofmalware.

5

Page 14: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Jacoby et al. introduced battery-based intrusion detection, a host-basedintrusion detection system[61]. This technique monitors anomalous behavior ofsmartphone batteries and writes a report in the device listing the causes of highpower consumption.

Some years later, Buennemeyer et al. evaluated the power consumption ofdevices with a client application installed on a smartphone using the SymbianOS [50].The application monitored power consumption data and sent a report toa remote server to analyze and detect anomalies in the system. Due to the lackof smartphone malware patterns at that time, most of the anomalous detectiontechniques used battery power consumption as the main source of detectiondata. These techniques were based on checking and monitoring mobiles phones'power consumption and comparing it to the normal power consumption patternin order to detect anomalies.

Cheng et al. introduced SmartSiren, a collaborative virus detection appli-cation for Windows Mobile 5[52]. It collects the communication activity fromsmartphones and performs system log �le analysis to detect anomalous behaviorin the system. The system uses a proxy-based architecture that interacts witha client installed on devices in order to avoid a heavy processing load.

Schmidt et al. showed how to extract smartphone features from SymbianOS and Windows mobile phones in order to perform anomaly detection in thesystems[68]. They use several APIs provided by Windows and Symbian tomonitor applications and extract device features, such as RAM free memory,user inactivity, process count, CPU usage, sent SMS messages, etc. The aimof monitoring the applications' performance is to obtain data enabling us todi�erentiate between normal and malicious use of a device.

6

Page 15: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Schmidt et al. presented a novel approach to static malware detection inresource-limited mobile environments[67]. Their approach consisted of detectingmalware by extracting function calls from binaries in order to apply a clusteringalgorithm to the data. This technique was used for detecting Symbian OSmalware depending on a mobile phone's features, such as device e�ciency, speedand limited resource usage.

In 2006 Symbian was the most widely used smartphone OS and many mal-ware detection techniques were developed for this platform. Due to the immi-nent growth of smartphones with the Android OS, malware researchers decidedto switch their malware detection techniques and security mechanisms to thisplatform [38].

Schmidt et al. presented the �rst serious research on malicious applicationsfor the Android OS [69]. They proposed a solution based on monitoring eventsoccurring at the Linux kernel level. They used a monitoring application toextract features such as executed system calls, modi�ed �les, etc. from theLinux kernel. These features were used to create the smartphone normalitypattern.

The same group proposed static analysis in 2009[66] and an Android appli-cation sandbox system in 2010[48]. The �rst report presented a collaborativescenario in which di�erent devices could perform static analysis of malware di-rectly on the phone. The second method used an Android application sandbox,a totally secure environment, to perform static and dynamic analysis. Staticanalysis disassembled Android APK �les to detect malware patterns. Duringdynamic analysis, all of the events occurring on the device (opened �les, ac-cessed �les, battery consumption, etc.) were monitored. This sandbox provideda secure environment where malware applications could be executed withoutany risk of infection.

7

Page 16: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Enck et al. proposed real-time monitoring and analysis of sensitive data withdynamic taint tracking[56]. This technique taints data from privacy-sensitivesources and applies labels as sensitive data propagates through program vari-ables, �les, and inter-process messages. When tainted data leaves the system,the application scans for suspicious outgoing data.

Bose et al., Shabtai et al. and Sha�ri et al. have proposed another solu-tion for malware detection on smartphones based on Support Vector Machines(SVM) and learning machines[49, 71, 72], an extension to the Android mobilephone platform that tracks the �ow of privacy-sensitive data through third-party applications. Their proposal consists of monitoring smartphone devicesto determine their normal behavior and using collected data to train a learningmachine. This learning machine will learn the normality model of the smart-phone and applications and alert the user every time it detects a suspiciousaction.

Portolakidis et al. have proposed a system in which they will perform acomplete malware analysis of the phone in a virtual environment on a remoteserver[64] [63]. In both reports, they explain how to create replicas from Androiddevices and apply malware detection techniques to these Android mobile phones.The replicas are an equivalent version of the real mobile devices, and will besent to the remote server for malware analysis. Mobile phone replicas will runin a secure virtual environment where di�erent malware detection techniquesare applied.

8

Page 17: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Our purpose in this project is to improve on and contribute to malwaredetection strategies for the Android OS by o�ering up new ideas. Our work hasits foundation in many of the works mentioned above [48, 69, 28, 68, 64, 63, 66].

Our approach is based on detecting Android malware applications usingLinux system calls and clustering algorithms. Like Portolakidis et al.[63], andtaking into account the limited and poor battery life of smartphones, we arein complete agreement with the procedure of using a remote server machine toperform malware detection.

Antivirus software techniques are inadequate for use on smartphones, asthey consume a great deal of CPU and memory resources and can drasticallyshorten battery life.On the other hand, we consider it dangerous to send phonereplicas to a remote server, since the replicas contain important and con�dentialinformation (contact numbers, messages, pictures, etc.) and may compromiseuser con�dentiality. Rather than sending the whole replica, we propose sendingthe log �les, collected by a lightweight data collector application installed inAndroid devices and containing the device's most important information, to theremote server for remote malware analysis.

A lightweight data collector application3, installed on the device will be re-sponsible for collecting the system calls generated by Android applications inthe device and storing device information �les in the SD Card memory. Thisapplication has similar features to the one proposed by Buennemeyer et al.,[50]i.e. the sending of all monitored �les to a remote server. They, however, madevery few attempts with mobile phones, and we aim to extend use of the applica-tion as much as possible. To do so we will ask Android community users to usea lightweight script application (crowdsourcing application) in order to collectas much data as possible from di�erent Android devices.

A. Doan, R. Ramakrishnan and A. Halevy analyzed the impact of crowd-sourcing on the WWW (World-Wide Web) [54]. Their article explains how inthe future crowdsourcing will become one of the most in�uential techniques usedto collect information and create databases faster and more e�ciently.

3Crowdsourcing application[59]

9

Page 18: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

The following text gives an overview of some recent attacks targeting An-droid and of malware that has appeared on the Android platform.

Android malware has increased by 400% since 2010[31], and will continue togrow. In light of this, several malware attacks were carried out on the AndroidOS in 2010 and 2011, [65] [11].

Hong Tou Tou, Angry Birds Bonus Level, Tip Calculator, Tap Snake, Mon-key Jump and Steamy Window are the most famous malicious applications todate on the Android platform. Furthermore, more than 50 infected applicationswere found on Google's Android market in March 2011, all of them infectedwith the DroidDream Trojan application[1].

Another attack targeting the Android platform was carried out by J. Ober-heide. He developed the Angry Birds Bonus Level for the Android OS[11]. Thisapplication was a proof-of-concept malware application to showcase the weaksecurity of the Android marketplace. The Angry Birds Bonus Level malwarepurports to be an additional bonus level for the famous game Angry Birds.The malicious application downloads and installs three additional applications4

on the user's device in order to steal sensitive information. These applicationswere available in Android's o�cial marketplace for over �ve months, but wereremoved after they were discovered to be stealing sensitive information frommobile phone devices. J. Oberheide argues that he could collect con�dentialinformation from a great number of Android devices in only a few days' time.

NetQin Inc[34], a mobile security service provider, discovered a spywareapplication called Tip Calculator in the Android market. The spyware sentall incoming and outgoing SMS messages in the system to a designated emailaddress. Another piece of spyware with similar characteristics discovered in non-o�cial Android repositories was Steamy Window[43]. A Trojan Horse calledAndroid Pjapps modi�es the original version of this application and wages anattack by subscribing to a SMS premium service.

Due to its appeal as the latest malware discovered for the Android OS, andsince both the clean and malicious instances of the application were available, wedecided to analyze this spyware with our proposed malware detection system.

4Fake Contact Stealer, Fake Location Tracker and Fake Toll Fraud

10

Page 19: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Author

Approach

Detection

Method

Platform

Description

Jacobyet

al.(2004)[61]

HID

SSignatureBased

Detection

Symbian

OS

Monitor'sdeviceNorm

alpower

consumptionagainst

actual

devicepower

consumptionto

detectanomalies

inthesystem

.

Chenget

al.

(2007)[52]

HID

S,

NID

S

Anomaly

Detection

Symbian

OS

ItPerform

ssystem

log�leanalysisandcollectcommunication

activityfrom

thedevicein

order

todetectanyanomalous

behaviorin

thesystem

.

Buennem

eyer

et

al.(2008)[50]

HID

S,

NID

S

Anomaly

Detection

Symbian

OS

Lightweightapplicationmonitors

thepower

consumptionand

sendsthereport

toaremote

server

tobeanalyzedanddetect

anomalies.

Bose

et

al.(2008)[49]

HID

SSignature

Based

Detection

Symbian

OS

Itdetects

maliciousapplicationsbytrainingaclassi�er

basedon

Support

VectorMachines

(SVM)andconstructssignaturesfrom

monitoredevents

andAPIcallsin

SymbianOS.

Schmidtet

al.(2008)[68]

HID

SAnomaly

Detection

Symbian

OS/Win-

dows

Mobile

Itusesaremote

learning-basedmachineasanomaly

detection.

SymbianOSorWindowsmobileclientapplicationwillsend

extracted

devicefeaturesto

aremote

server

inavectorform

at.

Vectors

willbeprocessed

byaMachinelearningforfurther

analysis.

Schmidtet

al.(2008)[69]

HID

S,

NID

S

Anomaly

Detection

Android

OS

Thispaper

analyzesthesecurity

onAndroid

smartphones

from

Linux-kernelview.Itusesnetwork

tra�c,Kernelsystem

calls,

Filesystem

logsandEventdetectionmodulesto

detect

anomalies

inthesystem

.

Shabtaiet

al.(2009)[71]

HID

SSignature

Based

Detection

All

Itusesstaticfeaturesextracted

from

executablesforclassifying

maliciousapplicationusingMachineLearningmethods.

Detectiontechniques

described

canbeapplied

inany

SmartphoneOS.

Table2:Relatedwork

State-of-the-Art

Summary(i)

11

Page 20: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Author

Approach

Detection

Method

Platform

Description

Schmidtet

al.(2009)[66]

HID

SSignature

Based

Detection

Android

OS

Perform

Staticanalysisontheexecutablesto

extract

function

callsin

Android

OSusingthecommandreadelf.Functioncalls

are

comparedwithMalware

executablesforclassifying.

Schmidtet

al.(2009)[67]

HID

SAnomaly

Detection

Symbian

OS

They

extract

functioncallsfrom

binaries

inorder

toapply

clusteringmechanismsin

SymbianOS.

Bläsinget

al.(2010)[48]

HID

SSignature

Based

Detection

Android

OS

ItusesanAndroid

ApplicationSandbox(A

ASandbox)to

perform

StaticandDynamicanalysisonAndroid

applications.

Staticanalysisscan'sAndroid

sourcecodeto

detectMalware

patterns.

Dynamicanalysisexecutesandmonitors

Android

applicationsin

atotallysecure

environment.

Sha�riet

al.(2010)[72]

NID

SAnomaly

Detection

Symbian

OS

Itpresents

adistributedSVM

algorithm

todetectMalware

ona

mobiledevicenetwork.Alight-weightSymbianapplicationwill

monitornetwork

tra�cin

adistributedway.

Enck

et

al.(2010)[56]

HID

S,NID

SAnomaly

Detection

Android

OS

TaintD

roid

isarealtim

emonitoringsystem

forAndroid

OS.

TaintDroid

willmonitorAndroid

applicationsandwillalert

the

userwhenever

asensitivedata

oftheuseriscompromised.Uses

�tainttracking�analysisto

monitorprivacy

sensitiveinform

ation.

Portolakidis

et

al.(2010)[64,

63]

HID

S,NID

SAnomaly

Detection

Android

OS

Aremote

security

server

inthecloudperform

stheMalware

detectionanalysis.

Virtualenvironments

willbeusedto

analyze

Android

mobilephonereplicas.

Table3:RelatedworkState-of-the-Art

Summary(ii)

12

Page 21: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

1.6 Thesis structure

This section summarizes the main topics to be discussed throughout the paper,giving a short overview of each chapter.

Chapter 2, describes the basic theory of the Android platform, intrusiondetection systems, Linux system calls, data mining and clustering algorithms.The aim of this chapter is to enable the reader to understand the basic conceptsof the project.

Chapter 3, describes the behavior-based malware detection system for theAndroid platform that was designed in this project.

Chapter 4, describes the testing and evaluation methods used by the behavior-based malware detection system for the Android platform.

Chapter 5, describes the �nal conclusions and de�nes the future work of theproject.

13

Page 22: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Chapter 2

2 Background

This chapter will give a brief description of some of the fundamental conceptsand terminology relating to the Android OS, intrusion detection systems, Linuxsystem calls, data mining and clustering algorithms. The clustering algorithmsection will be illustrated with reference to the way in which we have appliedthese known techniques in order to group Android system calls.

2.1 Android Operating System

The Android OS is a Linux-based open source operating system for mobiledevices. It was originally developed by Android Inc. and was bought by Googlein 2005.

The operating system is based on a modi�ed version of the Linux 2.6 kernel[9]optimized for embedded systems and specially adapted for smartphones andtablets. The optimization process in embedded systems improves data process-ing and battery consumption, extending battery life.

The following pages will provide detailed information about the Android OS.

2.1.1 Platform architecture

Architecture

The Android platform was created for devices with limited processing power,memory and storage space, commonly called embedded systems. It was createdwith the objective of implementing an operating system in environments re-quiring a low memory footprint and processing load, such as smartphones ortablets.

14

Page 23: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Figure 2: Android platform architecture[5]

15

Page 24: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

The Android OS is composed of several software components that can bedivided into three main groups: Operating System (OS), Middleware and Ap-plications.

• Operating system: This group consists of Linux Kernel, the core andmost important component of the Android architecture. As mentionedabove, Android is based on Linux 2.6 kernel, which provides the platformwith basic services such as security, memory management and processmanagement. The kernel can be considered an abstraction layer betweensoftware and hardware layers, responsible for managing and processing re-quests received from higher layers for interaction with hardware resources.

• Middleware: This group consists of Android Runtime and Libraries.Android Libraries are written in the C/C++ programming language andAndroid developers can use them through the Application Framework.Libraries provide easier access to system resources, such as the camera,Wi-Fi, �ash memory, etc. Dalvik Virtual Machine, or Dalvik VM[16], isalso one of the most important parts of the Android architecture. DalvikVM is a Java Virtual Machine specially designed and modi�ed to optimizememory and energy consumption in embedded systems. Dalvik VM wasdesigned to run multiple virtual machines without placing additional pro-cessing load on the processor. It is also responsible for executing optimizedJava code and Dex �les (�les in the Dalvik execution format). Dalvik VMand Dex �le internals will be explained in greater detail in Section 2.1.2.

• Application: This group consists of the Application Framework and Ap-plications. By default, the Android OS includes basic applications like aweb browser, an email client and maps. This layer can also run third-partyapplications from the Android market or other repositories. Applicationsin this layer are written in the Java programming language. The applica-tion framework provides useful components for Android developers. Thislayer consists of views, a resource manager, content providers and the no-ti�cation manager, providing aid to applications using standard libraries.

As Android OS is an open-source project the kernel is available to download onthe internet [9] and it is possible to modify and create new versions adapted tosuit di�erent purposes.

16

Page 25: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Start-up

Another essential part of the Android OS is the startup process. Like anyother Linux system, Android has a boot sequence which prepares the servicesnecessary to run/start the device's operating system.

Figure 3 shows the �rst stage in the boot sequence on Android OS.

Figure 3: Android Linux Kernel and Init process

The �rst stage in the boot sequence is running the Bootstrapper application.The bootstrapper is the program which starts the device's operating system andinitializes and tests the basic requirements of the hardware, peripherals and ex-ternal memory devices. GRUB and LILO for Linux and NTLDR for Windowsare some of the most famous bootstrapper applications. The bootstrapper ap-plication loads the kernel image into RAM, and then the kernel starts the initprocess. Figures 3 and Figure 4 explain the Android OS init process and bootsequence.

The init process initializes system daemons for handling low-level hardwareinterfaces, such as USB, the Android debugger or Android Debug Bridge Dae-mon. The init process also starts the basic runtime processes, such as theRuntime service, Service manager, Media server and the Zygote.

17

Page 26: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Figure 4: Android boot sequence

Figure 4 shows the Android OS boot sequence in greater detail. As men-tioned above, the init process initializes several daemons and services in thesystem. At the same time, the init process starts the Zygote process. We willdescribe the process in greater detail on the following pages.

2.1.2 The Dalvik Virtual Machine

The Dalvik VM[16], is a Java virtual machine specially designed and modi�edto optimize memory and energy consumption in embedded systems like smart-phones, tablets and netbooks. It was designed and created by Dan Bornstein,with collaboration and contribution by other Google engineers. The virtual ma-chine is optimized to require a low level of memory usage and enables multiplevirtual machine instances to run simultaneously with little additional load onthe processor.

The Dalvik VM uses register-based architecture[45], which is faster and moree�cient than the stack-based architecture used in most other virtual machines.

Every Android application runs in its own process, with its own instanceof the Dalvik VM inside a secure environment, a Sandbox. The Dalvik VMexecutes �les in the Dalvik VM executable format (Dex Format), which is anoptimized Java code �le for systems with constrained memory and processorspeeds.

18

Page 27: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

The Dex �le format

The Android Java source code is still compiled in class �les. As mentionedearlier, the Dalvik VM is a modi�ed version of a Java virtual machine optimizedfor embedded systems, and therefore code must be optimal to achieve the bestperformance. Since it is not possible to run class �les on Dalvik VM they areoptimized and converted into the Dex �le format. Dex �les are optimized class�les ready to be executed on the Dalvik VM. Figure 5 shows the process ofcompilation from Java source code �les to optimized code Dex �les.

Figure 5: Dex �le creation process

The Zygote

As detailed above, every Android application runs in its own instance ofthe Dalvik VM and each instance must start quickly when a new applicationis launched in the application layer. Android uses a concept called Zygote toprovide the fast start-up time needed to run the Dalvik VM every time a newapplication is executed. Zygote loads the original Dalvik VM during the bootsequence and waits for new requests from the Runtime process. When theZygote process starts, it initializes an instance of Dalvik VM from the originalDalvik VM. Afterwards, it loads and initializes the core library classes. Everytime Zygote receives a new application request from the runtime process, itwill create/fork a new Dalvik VM instance from the original Dalvik VM thatwas loaded during the boot sequence. Creating an instance of Dalvik VM froman existing Dalvik VM minimizes the startup time of the application in thesecure environment. For every new application request, Zygote will create a newinstance of Dalvik VM. This process is repeated every time the user requests anapplication.

19

Page 28: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Register-based Architecture

Virtual machine developers have always been in favor of implementing vir-tual machines with a stack-based architecture [42] rather than a register-basedarchitecture[45]. The simple implementation of stack-based architecture leadsdevelopers to prefer its use. Obviously, this simple implementation comes witha performance cost. Executables for stack-based architecture are smaller thanexecutables for register-based architecture. This means a higher memory con-sumption, leading to a worse performance of the virtual machine. Register-based architecture requires an average of 48% fewer executed virtual machineinstructions than stack-based architecture, which considerably improves the per-formance of the device. On the other hand, the register code used by register-based architecture is larger than stack-based architecture code. Even so, theprocessing load generated by Register-based architecture is still lower than thatof Stack-based architecture. Taking into account the fact that the Dalvik VMruns on embedded devices with constrained memory and processing power, theuse of a register-based architecture is the most appropriate choice.

2.1.3 The Android Security Model

Android's security architecture guarantees that no application in the system candamage other applications or the operating system. Each application runs inan independent instance of Dalvik VM, with its corresponding PID. This meansthat applications are completely isolated. This technique of running applica-tions in a secure environment is called sandboxing[39]. A Sandbox is a securitymechanism often used to execute potentially unsafe code or applications fromthird-party developers. The Android OS uses a �le called AndroidManifest.xmlto enable applications to interact with other applications and system resourcesin the device. These permissions are declared before the application is installedon the device. These permissions are also declared before Android's installationAPK �le is generated, and cannot be modi�ed after the app is installed on thedevice.

In Linux a user ID identi�es a user. On Android the Android ID identi�es anapplication running on a Dalvik VM instance. This Android ID is assigned andstored in device's system after installation and is released when the applicationis removed from the device.

Android uses permissions in the sandbox environment to grant access tosystem resources such as �les, SD Card memory, network, sensors and APIs ingeneral. Figure 6 the process of executing applications in the Android OS.

20

Page 29: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Figure 6: Application request process

Every time an application is executed in the Android OS application layer,the System Manager is responsible for collecting and sending these requests tothe runtime process. The runtime process will catch the requests and notifyZygote of the execution of a new Android application. Zygote will create anew Dalvik VM instance for every new application request, and the requestedapplication will run in that Dalvik VM instance. Every Dalvik VM instancewill run only one application in order to provide a secure environment.

21

Page 30: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

2.1.4 Android applications

Android applications are written in the Java programming language. Androiduses the Android Software Development Kit (SDK) [10] and Java's programmingenvironments, such as Eclipse[19] or Netbeans[33], to compile Java code andcreate an Android application installation (APK) �le. These APK �les can beinstalled on Android devices using the Android Debug Bridge tool (adb) or bydownloading them from Android's O�cial Market. Figure 7 shows the basicstructure of an APK �le.

Figure 7: Android APK �le

An APK �le is composed of three main groups: AndroidManifest.xml, Classes.dexand Resources, which are packaged into a single �le.

• AndroidManifest.xml: The Android manifest �le describes the Androidapplication's essential information. It describes application features suchas the application and package name, permissions used by the applicationand the minimum version of Android required to run the application.

• Classes.Dex: This �le is the result of the compilation of Android Javasource code. It contains optimized Dex bytecode for the Android applica-tion and will run on the Dalvik VM.

• Resources: This group contains pictures, libraries and layout �les used bythe application.

Figure 8 shows the compilation process in the creation of an Android APK �le.

22

Page 31: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Figure 8: Android APK �le generation process

One of the most important elements of creating an APK �le is the compila-tion of Java source code. The process of generating the APK �le is describedin Figure 8. The �les undergo a series of transformations during the process ofcreating the Android APK �le. These transformations comprise the compilationprocess required to generate APK �les that will run on Android devices.

The �rst step in the process of creating an Android application is to createan Android project, in which Java source code, Android manifest and resource�les will be generated by Eclipse or Netbeans.

The next step is to program and con�gure the code to suit your purposes andto compile the project. Java's compiler in the SDK programming environmentwill generate class �les from Java's source code and the aapt5 will transformthe AndroidManifest.xml and resource �les into an adequate format so thatthey can be interpreted by the Dalvik VM. The generated class �les cannotbe interpreted by the Dalvik VM and in order to convert these class �les intoDex �les, Android SDK provides a tool called dx. This tool converts class �lesinto the Dex format. Once all the �les are compiled, the aapt is tasked withcompiling and generating the Android APK �le.

5Android Asset Packaging Tool

23

Page 32: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

2.2 Intrusion Detection System

2.2.1 De�nition

An Intrusion Detection System, also known as an ID[24], is a device or softwareapplication which monitors a network or system for malicious activities[58].

There are many di�erent types of IDS. The aim of an IDS is to identify anddetect anomalies in the system or device that is being monitored. Some classesof IDS will be described below.

• Network-Based

The Network-Based Intrusion Detection System (NIDS) is an intrusion detectionsystem that analyzes network tra�c, makes decisions about the purpose of thetra�c and scans the network for suspicious activity.

-Wireless

The Wireless Intrusion Detection System (WIDS) is similar to the NIDS. In-stead of analyzing wired network tra�c it can analyze wireless tra�c to detectsuspicious activity.

• Host-Based

Host-Based Intrusion Detection Systems (HIDS) monitor all activity that occurson the host (the platform comprising the computer hardware and the operatingsystem) being monitored. This system is capable of monitoring features of thesystem such as power consumption, opened �les, system call logs, etc.

This project will use a Host-Based Intrusion Detection System to monitorevents on Android devices. Section 3 will describe this approach in furtherdetail.

24

Page 33: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

2.2.2 Detection types

As regards types of IDS detection, we can divide these into two: Signature-Basedor Misuse detection and Anomaly-Based detection.

• Misuse detection

The technique of Misuse detection searches for speci�c indications or patternsof attacks, identifying raw byte sequences, protocol type, port numbers, etc.The aim of this type of detection is to �nd patterns in raw data. Signaturesare then created by a group of experts who analyze the code, behavior andmanifestation of the malware. Most antivirus companies still use this techniqueto create malware signatures and patterns

One of the disadvantages of this detection type is that the system mustbe familiar with all malware patterns and signatures in advance. This type ofdetection limits the ability to detect new malware.

The process of �nding and identifying new types of attacks and malwaremanually takes experts a great deal of time. Antivirus companies are trying tocome up with di�erent alternatives in order to avoid this problem through useof automated processes. Figure 9 shows the di�erences between the techniques

of Misuse detection and Anomaly detection.

Figure 9: Misuse detection versus Anomaly detection

25

Page 34: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

• Anomaly-Based detection

Anomaly-Based Intrusion Detection Systems use a prior training phase to estab-lish a model for normal system activity. This mode of detection is �rst trainedon the normal behavior of the system or application to be monitored. Usingthis model of normal behavior, it is possible to detect anomalous activities thatare occurring in the system by searching the system for strange behavior. Thistechnique is more complex and requires more resources than Misuse detection.Despite this, it has the advantage of being able to detect new attacks.

Typically, Misuse detection tries to identify/classify the new object by con-sulting known malware or malicious behavior patterns stored in a signaturedatabase. Unknown objects are compared with database objects, and if a matchis found between the unknown object being analyzed and the database object,the unknown object will be considered suspicious or malware. If there is nomatch, it will be classi�ed as unknown.

Anomaly-Based detection, on the other hand, creates a pattern of normalbehavior based on the system's model of normality. New objects will be com-pared with the normal behavior pattern, and if any of the objects show anyabnormal activity compared to that pattern of normal behavior, they will beconsidered malicious applications.

26

Page 35: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

2.3 System calls and Vectors

In Linux, a system call is the way in which a program requests a service from theoperating system's kernel. The Linux kernel has roughly 190 system calls, andeach system call is identi�ed by a unique number that is found in the kernel'ssystem call table [27].

A system call is invoked by an application using glibc library functions.Functions like getpid(), open(), read() and socket() are some of the functionsthat glibc can provide applications with to enable them to invoke a system call.

Every time an application from user space makes a request of the OS, therequest passes through the glibc library, the system call interface, the kerneland �nally reaches the hardware. The glibc library interprets the request andthe CPU switches to kernel mode. The system call interface gets the requestfrom the glibc library and executes the appropriate kernel function by consultingthe system call table. The kernel must interpret the request from the systemcall interface and make the request of the hardware platform. Afterwards, theuser receives the information requested by the application following the inverseprocess. Figure 10 describes the Linux user kernel space and the process bywhich an application sends requests to the hardware platform.

Figure 10: Linux User and Kernel space

The Linux kernel is executed in the lowest layer of the Android architecture.This means that all requests made from the upper layers pass through the kernelusing the system call interface before they are executed in the hardware.

27

Page 36: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Analyzing all of the system calls that pass through the system call interfacewill give us an accurate picture of the behavior of the application. The aimof hijacking6 these system calls is to create an output �le containing all ofthe events generated by the Android application. This �le will provide usefulinformation, such as opened and accessed �les, execution timestamps and thenumber of system calls executed by the application. We will use the numberof system call executions performed by the application to represent behavior.Section 2.5 will provide insight into this technique

This project will use the lists of system calls to create an anomaly detec-tion system, �rst creating the normality model for the Android application us-ing clear Android applications (applications free of malicious code). As statedabove, by extracting the number of system call executions generated by theAndroid application it is possible to create a behavioral vector representationfor Android applications. These vectors will be used to create the normalitymodel or pattern of normal behavior for the application. Here is an example ofan Android application behavior system call vector:

0 , 0 , 0 , 25 , 47 , 4 , 34 , 0 , 0 , 0 , 0 , 0 , 0 , 12 , 0 , 0 , 0 , 0 ,0 , 260 , 9 , 0 , 0 , 0 , 0 , 1649 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 10 , 0 ,0 , 0 , 5 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 22 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ,3466 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ,12 , 0 , 0 , 0 , 0 , 132 , 0 , 0 , 0 , 0 , 0 , 0 , 40 , 41 , 0 , 0 , 0 ,0 , 0 , 7 6 , 0 , 0 , 0 , 0 , 0 , 0 , 4 , 0 , 8 7 , 1 7 , 0 , . . .

Each number separated by commas represents a system call and the numberof system call requests/executions made by the Android application during themonitoring process. For instance, the system call open() is used 25 times andkill() 47 times. This means that the monitored application used the open()system call 25 times to open �les or libraries from the system, and the kill()system call 47 times to kill processes.

The list of Android system calls is too large to show here, but the systemcalls list can be found in the Android Linux kernel[9] bionic folder 7 or in Section2 of the Linux kernel manual pages[26].

6Hijacking, refers to all illegal actions to take over or stealing information by an attacker7bionic/libc/SYSCALLS.TXT

28

Page 37: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

2.4 Data Mining

Data mining is the process of extracting patterns from large data sets by com-bining methods from statistics and arti�cial intelligence in order to obtain usefulinformation. Data mining is also considered to be the set of techniques and tech-nologies used for exploring large databases in order to �nd repetitive patterns,trends or rules to explain the behavior of a given data set.

Figure 11 shows the sequence of the knowledge discovery process used indatabases (KDD) [46] to obtain useful information or knowledge from a rawdata set. The KDD process refers to the process of discovering useful knowledge.Data mining refers to a particular step in the process.

Figure 11: Knowledge Discovery in Databases (KDD) process[46]

2.4.1 Data collection in KDD process

1. Selection of raw data: This is the �rst phase of the KDD process. Whenwe are given a raw data set, the �rst step is to select information in orderto obtain relevant data. This project will use a crowdsourcing applicationinstalled on several Android devices and an information collector script toobtain the data set of the behavior of the Android application.

2. Data preprocessing: In order to avoid misleading or inappropriate rulesor patterns, it is necessary to �lter out irrelevant data. Collecting inap-propriate data results in poor interpretation and evaluation of the system,will render the system unreliable and produce undesired results.

3. Data transformation: This will transform relevant data collected fromprevious phases into a readable and organized structure. This data willdetermine the outcome of the analysis and will create the data set for thedata mining algorithm.

4. Data mining algorithm: This process uses a data mining algorithm todetect rules or patterns from the previously generated data set.

5. Interpretation and evaluation: In this phase a report is generated andthe obtained results evaluated.

29

Page 38: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Data mining techniques can be separated into many categories or groups,but this report will analyze classi�cation and clustering techniques, since theseare the most appropriate and relevant for the project.

Classi�cation

This is a technique used in data mining to classify data into di�erent �elds orgroups. One of the main characteristics of this technique is that the classi�cationof data is based on groups or patterns that are already known. This means thatall the information on groups in the system is already de�ned, and new datawill be compared with these groups in order to classify the data.

Clustering

The technique of clustering involves grouping a set of physical or abstractobjects into clusters of similar objects. In data mining, a cluster is a collectionor group of data that are similar to each other. One of the main di�erencescompared to the classi�cation method is that the clustering method uses rawdata to create the groups to be used later in order to make a decision. These arecreated without any prede�ned group. The given data set will be responsiblefor creating the groups or clusters, and afterwards a decision will be made onwhich cluster the data belongs to. At the beginning there will be no clusteror group created to which to assign the data, so the clustering algorithm willcreate a random cluster in any position.

One of the easiest ways to decide to which group the data belongs is tomeasure the Euclidean distance between the data and the formed groups. TheEuclidean distance is the result obtained by measuring the proximity of a pointto two or more cluster groups. Based on the analysis, the Euclidean distancewill cluster the data into the closest or nearest cluster.

30

Page 39: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

2.5 K-means Clustering algorithm

Clustering is a common technique used for statistical data analysis in many�elds, including machine learning, data mining, pattern recognition, image anal-ysis and bioinformatics[47].

This project will use an unsupervised learning or clustering technique to formgroups or cluster patterns in order to �nd the hidden structure or similaritieswithin the data set. Due to the lack of data sets available for the Androidplatform, we decided to design an Android application behavior database fromscratch, where all the Android app behavior data will be stored.

In order to get satisfactory results in the interpretation and evaluation phase,we must know which clustering method is the most suitable for detecting ma-licious applications in the Android platform, as well as which can provide thebest and the most useful information on the collected data.

This part of the document will describe two di�erent categories of clus-tering methods: Hierarchical methods and Non-Hierarchical or partitioningmethods[74]. Figure 12 shows the taxonomy of clustering methods.

Figure 12: Taxonomy clustering methods

Hierarchical clustering methods create a hierarchy or tree of clusters from agiven data set. The root of the tree contains all data observations in a singlecluster. The tree creates sub-clusters from the root.

Algorithms used in Hierarchical clustering methods are generally agglom-erative or divisive. Agglomerative algorithms start at the leaves of the smallclusters and merge into bigger clusters. Divisive algorithms start at the rootcluster and recursively split the clusters into smaller ones. Figure 13 shows thegraphical representation of agglomerative and divisive methods.

31

Page 40: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Figure 13: Hierarchical method: Agglomerative vs Divisive

Another method of clustering is the partitioning method. This method sets knumber of clusters as the objective, and the data set is split into those clusters.The partitioning method aims to discover clusters by iteration and relocationof points in the data set.

In unsupervised learning, the pattern classi�cation system is based on a setof training patterns, based on data with as yet unknown respective class labels.This occurs when labeling of each individual sample is almost impossible. Thistype of learning algorithm encompasses algorithms such as neural networks,nearest neighbor, k-means, etc.

Bearing in mind that the objective in this project is to cluster system callbehavior vectors into two di�erent clusters, i.e. Good and Malicious applicationbehaviors, it is appropriate to apply the partitioning method using the k-meansclustering algorithm.

32

Page 41: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

K-means Clustering algorithm

Every Android application has its own behavior data, and this data will beplaced in one of two possible clusters: Good and Malicious behavior clusters,k = 2. The Good application cluster will describe the proper behavior of An-droid applications and data clustered into the Malicious group or cluster will beconsidered to be malicious or dangerous applications.

The k-means clustering algorithm[62], is a clustering method which aims tocreate k clusters, given a data set of n observations.

The k-means clustering algorithm uses the following formula:

J =

k∑j=1

n∑i=1

∥∥∥x(j)i − cj

∥∥∥2

where∥∥∥x(j)

i − cj

∥∥∥2 is the distance measured between a data point x(j)i and

the cluster center cj . The cluster center cj indicates the distance of the n datapoints from their respective cluster centers.

Table 4 shows the steps of the k-means clustering algorithm:

1. Randomly place K cluster points into the space representedby n objects. These points will represent the initial centroids ofthe clusters2. Assign every object to the group that has the closet centroid.3. When all objects have been assigned, recalculate the positionsof the K centroids.4. Repeat the 2nd and 3rd steps until the centeroids stopmoving. This produces a separation of the objects into groups.

Table 4: K-means Clustering algorithm process

We suppose that we are given a data set, P , of n observations, with a typicalentry being pi, where each pi is a vector of D numbers.

We can think of each pi as a point in a D-dimensional space. Every pi vectorin the data set, will represent a system call vector produced by the user.

33

Page 42: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Figure 14: K-means applied as a detection system for android system calls

The n observations, will be the set of system call vectors collected by mon-

itoring the Android applications, and each x(j)i data point will be one such

system call vector. Applying the k-means algorithm to the Android applicationvector data set will create two clusters, with the good and malicious Androidapplications classi�ed (k=2) as described below.

The speed of the algorithm and the results obtained in training and testevaluation are the main reasons we chose to use the k-means algorithm in thisproject. Another reason why we chose k-means was the simplicity of implemen-tation in Matlab.

One of the most important tasks of the clustering algorithm is the selec-tion of the Distance measure. This measurement will determine the cluster towhich the data belongs. The calculation of this distance may vary dependingon which mathematical formula is used in the process. Euclidean, Manhat-tan, Mahalanobis and Hamming distances are some of the most commonly usedfunctions to measure such distances.

2.6 Crowdsourcing

Je� Howe de�ned �Crowdsourcing�[59], as the act of exporting tasks tradition-ally performed by one or more employees to an inde�nite group of persons or acommunity through an open call.

Using the crowdsourcing technique, we divided the responsibility of creatingthe Android application data set between the users of the Android Community.Considering that there are more than 8 million Android users in the world, usingthis technique to collect information from many di�erent Android devices is avery appealing option.

34

Page 43: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Chapter 3

3 Behavior-Based malware detection system for

Android Applications

3.1 Overview

The implementation of malware detection systems in mobile devices is a fairlya new concept that is gaining a lot of attention. Applying the security toolsand mechanisms used in computers to smartphones is not a feasible choice dueto excessive resource and energy consumption. Because of this, we decided toperform the entire analysis process on a dedicated remote server. This serverwill be dedicated exclusively to detecting malicious and suspicious applicationson the Android platform.

Figure 15 describes the general scheme of the behavior-based malware de-tection system for Android applications.

Figure 15: Android malware detection system scheme

35

Page 44: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

As the Android market is an open-market system, users can download theirapplications from sources other than the Android o�cial market. As a result,many users end up making heavy use of non-o�cial Android repositories wherea lack of supervision and control can result in their downloading third partyapplications that may contain malicious code. The aim of the server is to per-form dynamic analysis of Android applications to detect anomalies which maybe dangerous for the user.

Using information collector applications such as crowdsourcing and the datacollector script, we can obtain the necessary information from Android applica-tions and perform malware analysis on the system.

Using the crowdsourcing application installed on Android devices, commu-nity users will have a chance to contribute to the project by sending recordedlog �les of the behavior of Android applications to our malware detection server.

All collected log data �les result from use of the Strace Linux tool withAndroid applications8. This tool is assumed to be installed on each user device.Strace will collect information on the system calls executed by the application.Monitored system call logs and device information �les will be stored in the SDCard memory and will be sent to the malware detection system using an FTPclient in the crowdsourcing application. The FTP Server will be responsiblefor collecting the information sent by the crowdsourcing application and aninformation collector script. The data collector script will process and parsethe data collected from Android users' applications and create the system callsvectors. Afterwards, Matlab and the k-means clustering algorithm will use thesesystem call vectors to detect anomalies in the applications.

8Strace tool output �le (*.out)

36

Page 45: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

3.2 Android Data mining: Crowdsourcing and Self-written

applications

In order to collect Android application data, we will use two data collectorapplications. The �rst one is a crowdsourcing application developed for Androiddevices and the second one is a script running on the Android Emulator.

The �rst attempt we made to collect data was carried out by a script usingthe thirty most downloaded applications from the Android market in 2010. Thepurpose of the script was to monitor Android emulator activity and generatereports based on the analysis.

The second data mining trial was carried out by the crowdsourcing applica-tion for Android devices. The aim of the application was the same as that ofthe previous script, but this time the Android user community was used.

Both applications were able to collect essential information from AndroidDevices, such as installed applications, device information and most importantlythe system call log �les. See Figure16. The system call log �les contain thesystem call sequence generated by Android applications. Parsing these datapoints with a script will produce the system call vectors that will be used in theAndroid malware detection system.

Figure 16: Data acquisition process

The aim of the crowdsourcing and data collection script is to collect as muchinformation as possible from the Android devices and applications.

37

Page 46: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

3.2.1 Android Data collector script

As described above, in the �rst data mining trial we carried out the data miningprocess used a script to collect information from Android applications.

The purpose of the script was to:

• Use Android APK applications for training or testing the system.

• Install/Uninstall applications on the emulator or real Android device.

• Collect Linux system calls using the Linux tool Strace.

• Parse the collected data to create system call vectors, device information�les and a list of other actions performed by Android applications, suchus opened �les or accessed directories, execution timestamp, etc.

• Compile the report for the analyzed applications.

The data collector script is written in Perl. This gives us the opportunity to runthe script on several operating systems without changing it in any way. Figure17 shows the User Interface (UI) of the script.

Figure 17: Data collector script user interface

Figure 18 describes the data collector script in greater detail.

38

Page 47: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

The data collector script allows us to choose between installing applicationson the Android emulator or the real device. Training Data and Test Data folderscontain Good and Malicious Android applications. In order to create the goodbehavior pattern for Android applications, we will use applications from theTraining Data folder as a training phase.

The script will install applications from the training data folder and userswill start to interact with the installed application. The script will start mon-itoring and recording all system calls executed by an application. Afterwards,the script will remove the application from the device and create a new, clean in-stance of the system or emulator. This procedure ensures that every monitoredapplication has the same initial system condition and con�guration. Applica-tions in the Test Data folder will undergo the same procedure as the trainingdata applications.

Finally, the script will create a folder with all monitored/recorded applica-tions. Steps 4, 5 and 6 on the UI, Figure 17, will obtain the Android deviceinformation �le and installed application �le and create the system calls vector�le.

Figure 18: Data collector script process

39

Page 48: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

The script was designed to automate most of the data mining process andinteraction within the system. At �rst we decided to use a pseudo-random actionevent tool called ADB Monkey[2] for interacting with and collecting informationfrom Android applications. Taking into account the fact that there are morethan 250,000 applications available in the Android Market, it was natural toconclude that we needed to use an automatic process to record and interactwith the applications. After several attempts, we realized that ADBMonkey wasgenerating �awed pseudo-random events in Android applications. Consideringthis, data generated by this application was unsuitable for processing and forusing with the system if we intended to have good results.

Our next approach was to teach ADB Monkey to behave and interact withAndroid applications in the same way as humans. We realized, however, thatthis technique required arti�cial intelligence knowledge and generated too muchwork with processing data, so we decided to use a normal user to create the data.The complexity of writing a program to behave like a human was the main reasonwe decided to use a normal user for data creation. Even so, we found a smalldisadvantage associated with use of this technique, i.e. that a single user has tocreate the data set for more than 250,000 Android applications. Spending just5 minutes per application on monitoring and recording application system callsand the Android device information would require the user to spend almost twoyears collecting all of the information for the Android market apps.

We realized that even if we decided to use this technique for the most impor-tant 30 applications available on the Android market in January 2011, testing 30applications would not be su�cient to determine and create a Malware patternfor Android applications.

This brings us to the need for a crowdsourcing approach.

40

Page 49: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

3.2.2 Android Crowdsourcing and data mining application

The next solution is based on using Android community users to collect datathrough a lightweight application installed on their Android devices.

Je� Howe de�ned �Crowdsourcing� as the act of exporting tasks tradition-ally performed by one or more employees to an inde�nite group of persons orcommunity through an open call[59].

Using the crowdsourcing technique we shared the responsibility of creatingthe Android Application Data set among the Android Community users. Con-sidering that there are more than 8 million Android users in the world it is avery attractive opportunity to use this technique.

The crowdsourcing application is an Android application written in Javafor the Android OS platform. The Android SDK and the Java programmingenvironment will provide the tools necessary to compile the Java source codeand generate the APK �le that will run on the devices.

The crowdsourcing application has the same features as the data collectorscript mentioned in Section 3.2.1, but includes an FTP client to send collected�les to the Android malware detection system. Android Community users onlyneed to download the application and let it run in the background in order forit to start monitoring and collecting information from the applications runningon the device.

Figure 19: Android Crowdsourcing application

The user interface(UI) of the crowdsourcing application Figure 19, containstwo buttons, Start and Stop. If the user presses the Start button a monitoringservice will start running in the background, and the application will stop whenthe user presses the Stop button. Android users can also start the applicationand let it run in the background as a system service. The user can interactwith other applications while the application runs as a background process andcollects data. Files recorded by the crowdsourcing application will be storedin the SD Card memory and will later be sent as data to the behavior-basedAndroid malware detection system server via FTP.

41

Page 50: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

3.3 Behavior-Based malware detection system

3.3.1 Design of the Behavior-Based malware detection system

The behavior-based malware detection system is composed of several applica-tions, which together provide the resources and mechanisms needed to detectmalware on the Android platform. Each program has its own speci�c function-ality and purpose in the system and the combination of all of them creates theBehavior-Based malware detection system. The Android data mining scriptsand applications mentioned in Section 3.2 are the responsible for collecting datafrom Android applications, and the script running on the server will be the re-sponsible for parsing and storing all collected data. Furthermore, the script willbe responsible for creating the system call vectors for the k-means clusteringalgorithm.

Figure 20: Static and Dynamic Analysis

The methods of analysis of the behavior-based malware detection systemdeveloped in this project can be divided into two main groups: Static Analysisand Dynamic Analysis.

Static Analysis is responsible for analyzing Android source code �les in or-der to �nd malicious code patterns or signatures. This form of analysis willdecompress, disassemble and search for patterns in the APK �les. The methodis fast and does not generate a high processing load.

Dynamic Analysis analyzes the behavior of Android applications by mon-itoring system calls with the Strace tool. All input traces generated by theAndroid smartphone user will be collected using the data collector applicationdescribed in section 3.2 as well as the crowdsourcing and data collector script.In Dynamic Analysis the user will install, execute and generate input data forthe Android applications in order to obtain an application behavior output log�le.

Table 5 shows the advantages and disadvantages of Static and Dynamicanalysis.

42

Page 51: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Advantages Disadvantages

Static analysis Cheap and Fast.Not very resource

consuming

Have to know Malware patternsor signatures in advance

Dynamic Analysis Detection ofunknown attacks

Highly resource consuming, notfeasible for battery devices

Table 5: Static and Dynamic Malware analysis advantages and Disadvantages

43

Page 52: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Figure 21 describes the complete process of Android malware detection car-ried out by the system.

Figure 21: Android Malware Detection process

The malware detection process is divided into three main activities:

• Data acquisition: This activity allows application data to be obtainedfrom users via crowdsourcing or data collector script.

• Data processing manipulation: This activity consists of managingand parsing all of the information collected from Android users. The dataanalyzer scripts will collect, extract and analyze all of the parameters fromthe strace output �les (from the applications tested). One of the mostimportant pieces of data that can be obtained from the strace output �leis the number of system calls executed by an Android application. Anotherfeature that can be extracted from the output �le are the �les and librariesused during the monitoring process.

• Malware analysis and detection: This activity consists of analyzingand clustering the vectors obtained in the previous phase in order to cre-ate the normality model and subsequently be able to detect anomalousbehavior of Android applications. Matlab will be responsible for cluster-ing the di�erent vectors into di�erent groups using the k-means algorithm.This algorithm will create two clusters, a normality model and a maliciousbehavior or anomaly model. See Figure 9. All good application vectorswill be clustered into the normality model, and malicious behavior vectorsinto the malicious behavior model cluster.

44

Page 53: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

The following example shows �ve vectors created by an Android application.This example is just a proof-of-concept to illustrate how the system works.

A= [4 , 5 , 6 , 7 , 8 ] ; %GoodB=[4 , 5 , 6 , 6 , 8 ] ; %GoodC=[1 , 2 , 3 , 9 , 9 ] ; %MalwareD=[4 , 5 , 6 , 7 , 7 ] ; %GoodE=[1 , 3 , 3 , 9 , 8 ] ; %Malware

Each vector represents an interaction with an Android application installedon the emulator or the real device. Numbers separated by commas representthe number of times that a system call has been executed. For instance, in theA interaction the �rst system call was executed four times, the second one �vetimes, the third one six times and so on.

PROGRAM CODEclear a l l ;v e c to r s_var i ab l e = load ( ' Appl i cat ion_f i l e_Vector . txt ' ) ;vec to r s_di s tance = pd i s t ( vector s_var iab l e , ' Eucl idean ' ) ;matr ix_vectors = SQUAREFORM( vector_dis tance ) ;max_value = max( matr ix_vectors ( : ) ) ;c l u s t e r s = kmeans ( matrix_vectors , 2 ) ;

COMMENTS% Clear a l l v a r i a b l e s in the system% Loads to v e c t o r_var i a b l e

5 vec to r s from ∗ . tx t f i l e% pd i s t f unc t i on computes the

Eucl idean d i s t anc e betweenpa i r s o f ob j e c t s in m−by−ndata matrix X

% Makes the comparison betweenvec to r s and puts in matrix format

% Optiona l . Gets the maximumvalue o f the matrix

% k−means a l gor i thm cr ea t e stwo c l u s t e r s from input value .By de f au l t kmeans usesSquared Eucl idean d i s t ance .

Table 6: Matlab Clustering code for Android Malware Detection

45

Page 54: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

We used several functions provided by Matlab to perform the analysis ofAndroid applications. The pdist function, used in the malware vector clusteringMatlab code, Table 6, contains many ways to measure the distances betweenthe vectors. The pdist function includes several distance metrics, such as theEuclidean, Semi-Euclidean, City-block, Minkowski, Chebyshev, Mahalanobis,Spearman, Hamming and Jaccard metrics.

To determine which metric was the best suited for our purposes we performedseveral tests using di�erent distance metrics on the previous vector examplecode. Knowing which vectors are good and which ones are malicious, it is easyto select the metric that will hopefully produce the best results in the malwaredetection system. Vectors A,B,D belongs to Cluster 1 (Good)and vectors C,Eto Cluster 2 (Malicious).

A B C D E Result

Euclidean 1 1 2 1 2

Seuclidean 1 1 2 1 2

City-Block 1 2 2 1 2

Minkowski 2 2 1 1 1

Cosine 2 2 2 1 2

Mahalanobi 1 2 2 2 2

Spearman 1 1 1 2 1

Hamming 1 1 2 1 2

Jaccard 2 2 1 1 2

Table 7: Clustering algorithm metrics

Out of all the tested metrics, Euclidean, Semi-Euclidean and Hammingshowed the best results, as shown by Table 7.

46

Page 55: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Table 8 shows the similarities between vectors after applying the pdist func-tion with the Euclidean distance metric and squareform function. The pdistfunction computes the Euclidean distance between all vectors and the square-form function transforms the pdist result into matrix form. This table showsonly the Euclidean distance results, but similar results were obtained using theSemi-Euclidean and Hamming distance metrics. A comparison of the systemcall vectors of an Android application is the result. Vectors close to 0 are similaror equal vectors, and those vectors far from 0 are dissimilar vectors.

A B C D EA 0 1 5.6569 2.2361 5.0990B 1 0 6.0828 2.4495 5.5678C 5.6569 6.0828 0 7.1414 1.4142D 2.2361 2.4495 7.1414 0 6.2450E 5.0990 5.5678 1.4142 6.2450 0

Table 8: Vector comparison matrix

The objective of the project is to distinguish between good and malicioususe of Android applications using the system call vectors generated by Androidapplications. It is known in advance that vectors A,B,D are benign and vectorsC,E are malicious.

Table 9 shows the cluster results obtained using the k-means clustering al-gorithm with the Euclidean distance metric on the results obtained from thesquareform function.

A B C D ECluster 1 1 2 1 2

Table 9: Example vector clustering results

According to the previous table, interactions A,B,D belong to the normalitymodel and interactions C,E belong to the malicious behavior or anomaly model.

In conclusion, the system was able to distinguish malicious vectors fromnormal ones, showing that using the k-means clustering algorithms with theEuclidean distance metric is an accurate technique for malware detection. Othertest carried out using Semi-Euclidean and Hamming distance metrics showedvery similar results to Euclidean distance and hence we decided not to includethem in the report.

47

Page 56: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Chapter 4

4 Results and Evaluation

This chapter is divided in 3 di�erent sections. Section 4.1 describes the data setused in the project. Section 4.2 shows the devices and applications used in thesystem. A complete analysis of created and real Malware is described in Section4.3.

Our framework has been tested through analysis of the data collected on thecentral server, with two types of data sets: data from arti�cial malware createdfor test purposes, Table17, and data from real malware found in the wild, Table22. The method is shown to be an e�ective means of isolating malware andalerting users of downloaded malware, highlighting its potential for helping tostop the spread of detected malware to a larger community.

4.1 Data Set

The data set used in this project is that collected by several data collectorapplications, as described in Section 3.2. This data, described in Figure 16,contains device info, installed applications info and the system call vector log�les, and will be used as the data set or input data in the behavior-based malwaredetection system.

4.2 Devices and Programs

Tables 10 and 11 describe the tools and applications used during implementationof the project.

Devices Description

Android G1 First mobile phone with Android OS, version 1.6. It was used to

run Self Written Malware and Android applications.

Samsung Galaxy S One of the latest mobile phone, version 2.2. It was used to run

self written Malware and Android applications.

Table 10: Test Devices

48

Page 57: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Program Description

Ubuntu OS Ubuntu is a Debian-based Linux distribution operating system.

It was used as the main (OS) in this project.

Matlab Matlab is a mathematical software used for manipulation of

matrices, representation of data and functions, implementation

of algorithms and vector analysis. We used Matlab as a system

call vector analyzer and clustering method, in order to cluster

Good and Malicious Android applications.

Eclipse Eclipse is a platform for programming, development, and

Compilation of Java, C++ and many other programming

languages. We used Eclipse integrated with the Android SDK to

develop Android application and self written Malware in Java.

Android emulator The Android SDK includes a virtual mobile device that can run

on the computer. The emulator allows us to develop and test

Android applications without using a physical device. It was

used to run self written Malware and Android applications.

vsftpd Very Secure FTP Daemon is a FTP server for the Linux OS. We

used vsftpd to collect Android applications system call log �les

for the di�erent applications, as sent in by the users.

Perl scripts Perl is a high-level, general-purpose, interpreted, dynamic

programming language, useful for data manipulation. It was

used to create an automatic Android data mining script on

Ubuntu using an Android emulator. The script for system call

vector generator, Device info collector, etc was made with Perl.

Android Crowdsourcing app Crowdsourcing is the act of outsourcing tasks to an unde�ned,

large group of people or a community. We developed an Android

application to collect information about the applications from

user's devices. The application info contains system calls logs,

System device info, opened �les, ...

Postgre SQL Database Postgre SQL is an open relational database management system.

We designed an ERM architecture to store Android devices and

applications info.

Table 11: Programs used in the project

49

Page 58: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

4.3 Malware detection system Results

This section is divided into two di�erent subsections. Subsection 4.3.1 describesthe evaluation process and the results obtained with our own self-written An-droid malware. Next, the Steamy Window malware is analyzed in subsection4.3.2.

4.3.1 Self-written Malware

Due to the fast removal of infected applications from applications markets, �nd-ing real malware is a di�cult task. Antivirus companies can provide these ap-plications, but access to antivirus company databases is often restricted.

Due to these limitations and restricted access to these databases, we de-cided to create our Android programs and corresponding malware as a proof�of-concept in order to test the behavior-based malware detection system until newreal malware is released for the Android platform. These programs will simu-late programs available in di�erent Android markets. On one hand, the benignAndroid applications will simulate applications available on the o�cial Androidmarket, and on the other hand the equivalent applications containing maliciouscode will simulate non-o�cial repository Android applications.

Using this technique, it is easy to establish the normality model for Androidapplications. Vectors collected from good and malicious applications will formthe data set for the k-means clustering algorithm. Afterwards, the clusteringalgorithm will determine if the vector belongs to the normality model clusteror the malicious model cluster. Every version of an application has a normalityand a malicious model.

In order to create the application normality model we will use the followingthree applications:

• Calculator_G

• Countdown_G

• MoneyConverter_G

The good application pattern obtained will be compared against the incomingdata in order to decide if it belongs in the normality model or not.

All developed Android applications were tested using the Android emulatorand the Android mobile phone terminal Samsung Galaxy S, Table 10. All ofthese applications have been tested under equal conditions for a �xed period oftime (�ve minutes), with di�erent user interactions. The following pages willdescribe some of the results obtained for our malware with the behavior-basedAndroid malware detection system. We will also provide some information onthe data �les collected by the data collector script and crowdsourcing applica-tions.

50

Page 59: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Android Device Information

Table 12 shows the Android device information collected by the crowdsourcingapplication. This application will collect Android device information whereAndroid applications are running in order to understand the behavior of theapplications on di�erent devices.

−−−−−−−−−−−−−−−−−−−−−−−− DEVICE INFO −−−−−−−−−−−−−−−−−−−−ANDROID NAME : FROYOANDROID VERSION : 2 .2IMEI : 354795046233372BOARD : GT−I9000BOARDLOADER : unknownBRAND : samsungCPU_ABI : armeabi−v7aCPU_ABI2 : armeabiDEVICE : GT−I9000DISPLAY : FROYOFINGERPRINT : samsung/GT−I9000 /GT−I9000 /GT−I9000 :

2 .2/FROYO/XWJPA: user / r e l e a s e−keysHARDWARE : smdkc110HOST : SE−S608MANUFACTURER : samsungMODEL : GT−I9000PRODUCT : GT−I9000RADIO : GT−I9000TAGS : r e l e a s e−keysTYPE : userUSER : root−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

Table 12: Crowdsourcing application result - Android Device Information

51

Page 60: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Installed android applications on the device

Table 13 shows the installed applications list for the device, collected by thecrowdsourcing application.

−−−−−−−−−−−−−−− INSTALLED PACKAGES INFO −−−−−−−−−−−−VersionCode−package : 15Vers ion : 0 . 1 . 5I n s t a l l e d Appl i ca t ion : SharkReaderProcess Name : l v . n3o . sharkreaderPERMISSION :android . permis s ion .INTERNETandroid . permis s ion .ACCESS_NETWORK_STATEandroid . permis s ion .GET_TASKSandroid . permis s ion .READ_PHONE_STATE−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−VersionCode−package : 8Vers ion : 2 . 2 . 1I n s t a l l e d Appl i ca t ion : Network Locat ionProcess Name : com . goog l e . android . l o c a t i o nPERMISSION :android . permis s ion .RECEIVE_BOOT_COMPLETEDandroid . permis s ion .INSTALL_LOCATION_PROVIDERandroid . permis s ion .ACCESS_WIFI_STATEandroid . permis s ion .CHANGE_WIFI_STATEandroid . permis s ion .READ_PHONE_STATEandroid . permis s ion .ACCESS_COARSE_LOCATIONandroid . permis s ion .INTERNETandroid . permis s ion .WRITE_SECURE_SETTINGS−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−VersionCode−package : 1Vers ion : 1 . 0 I n s t a l l e dAppl i ca t ion : Camera FirmwareProcess Name : com . sec . android . app . camerafirmwarePERMISSION :android . permis s ion .WRITE_SETTINGSandroid . permis s ion .VIBRATEandroid . permis s ion .READ_PHONE_STATEandroid . permis s ion .MODIFY_PHONE_STATEandroid . permis s ion .CAMERAandroid . permis s ion .ACCESS_FINE_LOCATIONandroid . permis s ion .WAKE_LOCK android . permis s ion .SET_WALLPAPER−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

Table 13: Crowdsourcing application result - Installed applications

52

Page 61: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Self written Malware results

Some results obtained from the data collector using Self Written applicationsapplications are shown in Table 16.

CALCULATOR_G REPORT -Calculator Malware free Application

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−ANDROID_APPLICATION_REPORT−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−Autor : I k e r Burguera HidalgoDate : Tue Feb 22 15 : 47 : 22 2011Email : i ke rburguera ( at ) gmail ( dot )comApplication_Name :STRACE−com .mu. r t s l a b . i k e r . ca l cu la torG . apk . out_Report . txt−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− system c a l l STATISTIC −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−system c a l l Name Number o f Execut ions−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−f o rk 2read 202wr i t e 266open 235c l o s e 243time 6712l s e e k 90getp id 4737ptrace 7944ac c e s s 84k i l l 66brk 173s e t g i d 1i o c t l 15930gett imeofday 84wr i tev 191mmap2 3vfork 1−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− REPORT OF USED FILES −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− OPEN FILES −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−F i l e : / system/usr / keychars /qwerty . kcm . binF i l e : / proc /922/ cmdline−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

Table 14: Self Written Application report - Calculator Good Application

53

Page 62: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

CALCULATOR_B REPORT - Calculator application, malicious code at-tached.

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−ANDROID_APPLICATION_REPORT−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−Autor : I k e r Burguera HidalgoDate : Tue Feb 22 15 : 48 : 38 2011Email : i ke rburguera ( at ) gmail ( dot )comApplication_Name :STRACE−com .mu. r t s l a b . i k e r . ca l cu l a to rB . apk . out_Report . txt−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− system c a l l STATISTIC −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−system c a l l Name Number o f Execut ions−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−f o rk 2read 235wr i t e 696open 807c l o s e 812time 7194l s e e k 101getp id 5457s e tu id 1ptrace 9354ac c e s s 179k i l l 73dup 2times 1brk 188s e t g i d 1s i g n a l 1i o c t l 18792gett imeofday 184mmap 455munmap 499g e t p r i o r i t y 113s t a t 134f s t a t 133recv 17901mprotect 514sigprocmask 1236msgget 109901s y s c a l l 6wr i tev 445mmap2 455vfork 1−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− REPORT OF USED FILES −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

54

Page 63: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− OPEN FILES −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−F i l e : / proc /300/ cmdlineF i l e : / proc /300/ cmdlineF i l e : / system/usr / share / zone in f o / zone in f o . datF i l e : / sdcard /Calculator_B/TrashInfo−0.8663092891957762−−−−2011−2−20−−−10−54−25−. txtF i l e : / sdcard /Calculator_B/TrashInfo−0.0878473753052298−−−−2011−2−20−−−10−54−27−. txtF i l e : / sdcard /Calculator_B/TrashInfo−0.6006282967641784−−−−2011−2−20−−−10−54−29−. txtF i l e : / sdcard /Calculator_B/TrashInfo−0.8340689635440677−−−−2011−2−20−−−10−54−30−. txtF i l e : / sdcard /Calculator_B/TrashInfo−0.1437738552877451−−−−2011−2−20−−−10−54−31−. txtF i l e : / sdcard /Calculator_B/TrashInfo−0.7376069611353528−−−−2011−2−20−−−10−54−33−. txtF i l e : / sdcard /Calculator_B/TrashInfo−0.4984244802612797−−−−2011−2−20−−−10−54−44−. txtF i l e : / sdcard /Calculator_B/TrashInfo−0.5530206720597484−−−−2011−2−20−−−10−54−46−. txtF i l e : / sdcard /Calculator_B/TrashInfo−0.2674243132841187−−−−2011−2−20−−−10−54−47−. txtF i l e : / sdcard /Calculator_B/TrashInfo−0.2960847053244705−−−−2011−2−20−−−10−54−49−. txtF i l e : / sdcard /Calculator_B/TrashInfo−0.2947512088951718−−−−2011−2−20−−−10−54−51−. txtF i l e : / sdcard /Calculator_B/TrashInfo−0.5186420445867813−−−−2011−2−20−−−10−54−56−. txt−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− ACCESS FILES −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−F i l e : /mnt/ sdcard /Calculator_BF i l e : / system/usr / share / zone in f o /Europe/StockholmF i l e : /mnt/ sdcard /Calculator_BF i l e : /mnt/ sdcard /Calculator_BF i l e : /mnt/ sdcard /Calculator_BF i l e : /mnt/ sdcard /Calculator_BF i l e : /mnt/ sdcard /Calculator_BF i l e : /mnt/ sdcard /Calculator_B−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

Table 15: Self Written Application report - Calculator Malicious Application

55

Page 64: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Name

Type

Description

Objective

Picture

Calculator_

GNorm

al

Given

twonumbers,canmaketrivialoperationsliketheSum,

rest,MultipleandDivide.

Number

Calculation

CountDown_G

Norm

al

Given

aninputnumber,counts

downsuntilthevalueis0every

second.

Second

countdown

MoneyConverter_

GNorm

al

Given

aninputnumber,converts

thevaluefrom

Eurosto

SwedishKRandviceversa.

Money

conversion

Calculator_

BMalware

Sim

ilarbehaviorasCalculator_

Gbutwithmaliciouscode

attached.Iftheresultoftheoperationishigher

than100,the

programswrite

useless

inform

ationin

atext�le.

FillSDCard

mem

ory

CountDown_B

Malware

Sim

ilarbehaviorasCountD

own_G

butwithmaliciouscode

attached.Everytimeyoupress

resetbutton,theapplicationgets

allyourcontact

inform

ationandsendsto

aserver.

Senduser

contactsto

a

particular

server

MoneyConverter_

BMalware

Sim

ilarbehaviorasMoneyconverter_

Gbutwithmaliciouscode

attached.When

�Swedish->

Euro�buttonispressed

starts

runningtheGPSservicein

thebackgroundandwritesyour

locationin

SDCard.

Get

GPS

positionand

storesin

SDCard

Table16:Selfwritten

android

applicationsdescription

56

Page 65: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Self Written Android application Malware report

Table 17 shows the results of using the behavior-based Android malwaredetection system on the Android malware and apps we created.

In order to test the system we performed 60 interactions for each type ofapplication. At the end of the process we had 60 interactions, with the goodapplication of created 50 good traces, and 10 malicious application traces.

Interactions Clusteringresult

Detectionrate

Good Malware GoodClustered

MalwareClustered

Calculator 50 10 50 10 100%Countdown 50 10 50 10 100%

MoneyConverter 50 10 50 10 100%

Table 17: Self written Android Malware result

As detailed above, we developed three di�erent Android applications withcorresponding malware. Every application was executed 60 times, with 50 inter-actions performed with the good Android app and another 10 with the maliciousAndroid application. These 50 interactions will represent the normality modelof the application.

Another data collector script will collect all generated output �les from everyinteraction and will create three vector �les:

• Calculator_Vector.txt

• Countdown_Vector.txt

• MoneyConverter_Vector.txt

Each �le will contain 60 system call interaction vectors, including good and badapplication interaction vectors.

The next step was to test the system using real Android Malware applica-tions.

57

Page 66: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

4.3.2 Real Malware

In the previous chapter we performed an analysis of self-written Android appsdeveloped by us as a proof-of-concept to ensure that the behavior-based Androidmalware detection system was working properly and could detect maliciousAndroid applications.

We understood that detecting our Android malware was not as interestingas detecting real Android malware. Therefore, we decided to look at di�erentAndroid markets and repositories in order to �nd real malicious applications.Our �rst approach was to contact several antivirus companies, such as Hispasecand Panda Antivirus, in order to obtain real malicious Android applications.

Hispasec, a Spanish security company, was very interested in the project anddecided to share some real Android malware with us. It also provided us withaccess to the VirusTotal service and to its malware database.

As long as antivirus companies can provide us with real malware and we can�nd the original applications on the Android market, we can test the behavior-based Android malware detection system.

Steamy Window

We performed several tests using the only Android malware that we had atthe time, Steamy Window.

Figure 22: Steamy Window application

Steamy Window, shown in Figure 22, was the �rst Malware to be tested inthe system. Steamy Window, is a harmless application that can be found in theAndroid o�cial market for free. However, the same application can be found onnon-o�cial Android repositories with malicious code attached. The �rst stepwas to perform Dynamic Analysis.

58

Page 67: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Dynamic analysis on Steamy Window

We installed the Steamy Window application in the Android emulator andrecorded the performance and user interactions of the applications using thedata collector script and the crowdsourcing application.

Six interactions were performed in total with the malicious and non-maliciousSteamyWindow application. These vectors were collected using the crowdsourc-ing application installed in six di�erent devices with six di�erent users. Some ofthe users installed the Android o�cial Steamy Window application, and othersdownloaded the application from the non-o�cial or uno�cial Android market.Every interaction with the application represents a unique system call vector.This vector will be analyzed by the behavior-based Android malware detectionsystem.

Figure 23: Interaction with Steamy window application

Interaction_A= 0,0,0,3,7,7,7,0,0,1,1,0,0,11,0,1,0,0,0,3,438,0,0,0,0,0,2405,0,0,0,0,0,0,0,5,0,0,0,1,1,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,5164,12,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,12,7,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,8,1,3,4065,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,2,2,0,0,0,0,0,14011,0,0,0,0,0,648,0,0,0,0,0,0,0,0,6,0,0,0,0,0,0,12,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,

Interaction_B=0,0,0,34,43,45,87,0,0,5,5,0,0,47,0,5,0,0,0,31,2695,0,0,0,4,0,8468,0,0,0,0,0,0,0,22,0,0,0,5,5,0,0,27,0,0,0,46,0,0,0,0,0,0,0,0,20324,48,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,16,0,0,0,0,0,0,0,0,0,0,0,132,88,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,60,5,27,13717,0,0,0,0,0,0,0,0,0,0,0,16,0,0,0,0,68,262,0,0,0,0,0,49976,0,0,0,0,0,2328,0,0,0,0,0,0,0,0,38,0,0,0,0,0,0,132,0,0,0,0,0,2,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,

Interaction_C=0,0,0,19,12,28,29,0,0,1,1,0,0,22,0,1,0,0,0,19,1718,0,0,0,0,0,6632,0,0,0,0,0,0,0,11,0,0,0,3,1,0,0,4,0,0,0,8,0,0,0,0,0,0,0,0,15089,36,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,41,21,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,24,1,19,10580,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,27,15,0,0,0,0,0,37324,0,0,0,0,0,1855,0,0,0,0,0,0,0,0,11,0,0,0,0,0,0,41,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,

Interaction_D=0,0,0,16,12,27,28,0,0,1,1,0,0,19,0,1,0,0,0,16,1214,0,0,0,0,0,5663,0,0,0,0,0,0,0,8,0,0,0,2,1,0,0,4,0,0,0,7,0,0,0,0,0,0,0,0,12376,24,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,40,20,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,21,1,16,8597,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,27,15,0,0,0,0,0,29712,0,0,0,0,0,1549,0,0,0,0,0,0,0,0,11,0,0,0,0,0,0,40,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,

59

Page 68: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Interaction_E=0,0,0,48,73,67,139,0,0,8,8,0,0,56,0,8,0,0,0,38,2964,0,0,0,8,0,8803,0,0,0,0,0,0,0,28,0,0,0,6,8,0,0,45,0,0,0,78,0,0,0,0,0,0,0,0,20937,48,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,24,0,0,0,0,0,0,0,0,0,0,0,210,151,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,93,8,37,14230,0,0,0,0,0,0,0,0,0,0,0,21,0,0,0,0,108,501,0,0,0,0,0,52168,0,0,0,0,0,2328,0,0,0,0,0,0,0,0,65,0,0,0,0,0,0,210,0,0,0,0,0,2,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,

Interaction_F=0,0,0,22,13,29,30,0,0,1,1,0,0,32,0,1,0,0,0,22,2512,0,0,0,0,0,8253,0,0,0,0,0,0,0,14,0,0,0,4,1,0,0,4,0,0,0,12,0,0,0,0,0,0,0,0,19940,48,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,44,22,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,27,1,22,13363,0,0,0,0,0,0,0,0,0,0,0,7,0,0,0,0,28,15,0,0,0,0,0,48565,0,0,0,0,0,2328,0,0,0,0,0,0,0,0,12,0,0,0,0,0,0,

44,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,

In order to detect malicious behavior in those interactions, we applied theMatlab program described in Table 6.

60

Page 69: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Table 18 shows the similarities between the Steamy Window system callvectors after applying the pdist function with Euclidean distance as the metric.The pdist function computes the Euclidean distance between all vectors, andthe squareform function transforms the result of pdist into matrix form.

Interaction A B C D E F

A 0 0.1818 0.1414 0.1414 0.1818 0.1414B 0.1818 0 0.1768 0.1768 0.1616 0.1667C 0.1414 0.1768 0 0.1010 0.1818 0.1212D 0.1414 0.1768 0.1010 0 0.1818 0.1212E 0.1818 0.1616 0.1818 0.1818 0 0.1717F 0.1414 0.1667 0.1212 0.1212 0.1717 0

Table 18: Steamy Window system call vectors comparison matrix table

Vectors with a result close to 0, are equal or similar vectors. Vectors whitha value far from 0 are dissimilar vectors. For instance, vectors F and C are verysimilar vector, with a similarity 0.1212 out of a maximum 0.1818. Futhermore,we can see that the distance between F and E (0.1717) is greater than thatfrom F to C (0.1212). This means that the clustering algorithm considers Fand E dissimilar vectors and F and C similar or equal vectors.

The last step was to cluster the previous results into two di�erent clusters.In order to do that we used the k-means clustering algorithm, de�ning twoclusters, k = 2, and using the Euclidean distance metric. Compared to others,this metric gave us the best outcome in the analysis and testing when detectingdi�erent vectors.

Interaction A B C D E F

Cluster 1 2 1 1 2 1

Application

Table 19: Steamy window clustering result

The �nal outcome was a vector with the results of the k-means clusteringalgorithm, see Table 19. The system can identify two malicious system callvectors, B and E. Thus we can we can con�rm that the behavior-based Androidmalware detection system can detect interactions performed by the maliciousSteamy Window application.

61

Page 70: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Another way of representing the results obtained using the behavior-basedAndroid malware detection system is with bar graphs. These graphs, see Figure24, depict the executed system call vectors of Android apps. .

As stated in Section 2.5, we have n observations in a data set P , with severalpi system call vectors. We can assume that each pi is a point in a D-dimensionalspace. Since it is not possible to graphically represent more than three vectorsin a D-dimensional space, we used bar graphs.

The blue bars represent the normal behavior of the Steamy Window appli-cation and the red bars represent the behavior of the malicious version of theSteamy Window application.

Every system call has its own number and the X axis, represents the numberof the executed system call or the count of executed system call. The Y axisshows the number of times that the system call has been executed.

Upon studying Figure 24, we can note some distinct di�erences betweengood and malicious interactions. Given that the blue bars represent the nor-mal behavior of the Steamy Window application, we can clearly see that themalicious version of the Steamy Window application is executing additional sys-tem calls; open(), read(), access(), kill(), chmod() and chown() system calls aresome of these. Taking into account that both applications have the same versionnumber, we can assume that the Steamy Window application downloaded fromnon-o�cial Android repositories, interactions B and E, is a suspicious/harmfulAndroid application.

62

Page 71: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Steamy Window Android market Report

The following pages will show some of the reports generated by the datacollector script when run on the system call log �les collected from good andmalicious versions of the Steamy Window application. The �les contain in-formation such as executed system calls, count of system call executions andopened and accessed �les.

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−ANDROID_APPLICATION_REPORT−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−Autor : I k e r Burguera HidalgoDate : Thu Mar 3 16 : 35 : 57 2011Email : i ke rburguera ( at ) gmail ( dot )comApplication_Name :STRACE−com . appspot . swisscodemonkeys . steam . apk . out_Report . txt−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− system c a l l STATISTIC −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−system c a l l Name Number o f Execut ions−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−read 1219wr i t e 263open 1192c l o s e 1311l i n k 21unl ink 21time 81chmod 12l s e e k 280getp id 19762getu id 15ptrace 49675ac c e s s 112sync 48k i l l 25rename 9mkdir 1i o c t l 123934f c n t l 489gett imeofday 29mmap 406munmap 270f s t a t 1188recv 68683mprotect 2319sigprocmask 1128msgget 475966s y s c a l l 7277wr i tev 106mmap2 406sched_yie ld 11−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− REPORT OF USED FILES −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− OPEN FILES −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−F i l e : / dev/ashmem−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− ACCESS FILES −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−F i l e : /data/data/com . appspot . swisscodemonkeys . steam/databases / goog l e_ana ly t i c s . db−j ou rna lF i l e : /data/data/com . appspot . swisscodemonkeys . steam/databases / goog l e_ana ly t i c s . db−j ou rna lF i l e : /data/data/com . appspot . swisscodemonkeys . steam/databases / goog l e_ana ly t i c s . db−j ou rna l−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

63

Page 72: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Figure

24:SteamyWindow

Interactionsbarplot

64

Page 73: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Steamy Window Android non-o�cial repository application Re-

port

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−ANDROID_APPLICATION_REPORT−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−Autor : I k e r Burguera HidalgoDate : Thu Mar 3 16 : 21 : 06 2011Email : i ke rburguera ( at ) gmail ( dot )comApplication_Name :STRACE−com . appspot . swisscodemonkeys . steam . apk−348. out_Report . txt−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− system c a l l STATISTIC −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−system c a l l Name Number o f Execut ions−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−read 530wr i t e 163open 530c l o s e 591l i n k 13unl ink 13time 31chmod 7l s e e k 162getp id 5441getu id 6ptrace 12378ac c e s s 44sync 36k i l l 6rename 4mkdir 1dup 35brk 110i o c t l 29213f c n t l 230gett imeofday 15mmap 176munmap 124g e t p r i o r i t y 3s t a t 550l s t a t 4f s t a t 514recv 16310f sync 36c lone 22mprotect 1009sigprocmask 637msgget 112187s y s c a l l 2047wr i tev 53mmap2 176sched_yie ld 3−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− REPORT OF USED FILES −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

65

Page 74: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− OPEN FILES −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−F i l e : / proc /348/ cmdlineF i l e : / system/usr / keychars /qwerty . kcm . binF i l e : / proc /348/ cmdlineF i l e : / data/data/com . appspot . swisscodemonkeys . steam/ shared_prefs /

com . appspot . swisscodemonkeys . steam_preferences . xmlF i l e : / proc /348/ cmdlineF i l e : / data/data/com . appspot . swisscodemonkeys . steam/databases /webview . dbF i l e : / data/data/com . appspot . swisscodemonkeys . steam/databases /webview . db−j ou rna lF i l e : / data/data/com . appspot . swisscodemonkeys . steam/databasesF i l e : / dev/urandom F i l e : / data/data/com . appspot . swisscodemonkeys . steam/databases

/webview . db−j ou rna lF i l e : / data/data/com . appspot . swisscodemonkeys . steam/databasesF i l e : / data/data/com . appspot . swisscodemonkeys . steam/databases /webview . db−j ou rna lF i l e : / data/data/com . appspot . swisscodemonkeys . steam/databasesF i l e : / data/data/com . appspot . swisscodemonkeys . steam/databases /webviewCache . dbF i l e : / data/data/com . appspot . swisscodemonkeys . steam/databases /webviewCache . db−j ou rna lF i l e : / data/data/com . appspot . swisscodemonkeys . steam/databasesF i l e : / data/data/com . appspot . swisscodemonkeys . steam/databases /webviewCache . db−j ou rna lF i l e : / data/data/com . appspot . swisscodemonkeys . steam/databasesF i l e : / data/data/com . appspot . swisscodemonkeys . steam/databases /webviewCache . db−j ou rna lF i l e : / data/data/com . appspot . swisscodemonkeys . steam/databasesF i l e : / dev/ashmemF i l e : / system/usr / keychars /qwerty . kcm . binF i l e : / proc /348/ cmdlineF i l e : / dev/ashmemF i l e : / dev/ashmemF i l e : / dev/ashmemF i l e : / dev/ashmemF i l e : / dev/ashmem−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− ACCESS FILES −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−F i l e : /data/data/com . appspot . swisscodemonkeys . steam/ shared_prefs /

com . appspot . swisscodemonkeys . steam_preferences . xmlF i l e : /data/data/com . appspot . swisscodemonkeys . steam/ shared_prefs /

com . appspot . swisscodemonkeys . steam_preferences . xml . bakF i l e : /data/data/com . appspot . swisscodemonkeys . steam/databases /webview . db−j ou rna lF i l e : /data/data/com . appspot . swisscodemonkeys . steam/databases /webview . db−j ou rna lF i l e : /data/data/com . appspot . swisscodemonkeys . steam/databases /webview . db−j ou rna lF i l e : /data/data/com . appspot . swisscodemonkeys . steam/databases /webview . db−j ou rna lF i l e : /data/data/com . appspot . swisscodemonkeys . steam/databases /webview . db−j ou rna lF i l e : /data/data/com . appspot . swisscodemonkeys . steam/databases /webview . db−j ou rna lF i l e : /data/data/com . appspot . swisscodemonkeys . steam/databases /webviewCache . db−j ou rna lF i l e : /data/data/com . appspot . swisscodemonkeys . steam/databases /webviewCache . db−j ou rna lF i l e : /data/data/com . appspot . swisscodemonkeys . steam/databases /webviewCache . db−j ou rna lF i l e : /data/data/com . appspot . swisscodemonkeys . steam/databases /webviewCache . db−j ou rna lF i l e : /data/data/com . appspot . swisscodemonkeys . steam/databases /webviewCache . db−j ou rna lF i l e : /data/data/com . appspot . swisscodemonkeys . steam/databases /webviewCache . db−j ou rna l−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

After several tests of the Steamy Window application on our behavior-basedAndroid malware detection system, we can conclude that the Steamy Windowapplication downloaded from the uno�cial Android repository is potentiallydangerous Android application malware.

66

Page 75: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

Chapter 5

5 Conclusions, Contributions and Future Work

This chapter summarizes the results of the work described in this project intwo di�erent sections. Section 5.1 will summarize the work carried out over thecourse of this Master's thesis. Section 5.2 will suggest new ideas that can bepursued based on this project.

5.1 Conclusions

All market indicators forecast a massive increase in the number of smartphonespurchased over the next 5 years. This will pave the way for a potentially massiveincrease in malware creation, in particular for the leading OS on the market,Android.

In this report we have proposed a new framework for obtaining and ana-lyzing smartphone application activity. In collaboration with the Android usercommunity, it will be capable of distinguishing between benign and maliciousapplications with the same name and version by detecting anomalous behaviorfor known applications. In addition, by deploying our platform on a number oftest Smartphones, we have created a proof-of-concept for this mechanism as ameans of analyzing emerging threats.

We have indicated that monitoring system calls is a feasible way for detect-ing malware. According to a brief survey of related works, we have seen thatthere are many di�erent approaches designed to detect malware. We reasonedthat monitoring system calls is one of the most accurate ways to determine thebehavior of Android applications, since they provide detailed low level informa-tion. We realize that API call analysis, information �ow tracking and networkmonitoring techniques can contribute to a deeper analysis of malware, provid-ing more useful information about malware behavior and more accurate results.On the other hand, more monitoring capability places a higher demand on theamount of resources consumed on the device.

We have seen that open(), read(), access(), kill(), chmod() and chown() arethe system calls most commonly used by malware. A benign application couldmake moderate or heavy use of those system calls, thus triggering false positives.Even when dealing with slightly modi�ed Trojans, the system would still classthem correctly . We have seen that Trojanized applications made more systemcall executions and invoked di�erent system calls to the kernel in comparisonwith the original applications.

The most important contribution of this project is the mechanism we pro-pose for obtaining real traces of application behavior. In previously publishedworks, we have seen that it is possible to obtain information on behavior usingarti�cially created user actions or creating replicas of smartphones, but crowd-sourcing helps the community to obtain real application traces from hundredsor even thousands of applications.

67

Page 76: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

A paper has been published based on this report for the ACM CCS Work-shop on Security and Privacy in Smartphones and Mobile Devices 2011 - SPSM2011. The paper summarizes the essential details of the framework and containsfurther tests performed using the framework on the latest Android malware.

5.2 Future Directions

The next step is to deploy the Crowdroid lightweight client on Google's An-droid market and distribute it to as many users as possible. Users running ourapplication will be able to see their own smartphone behavior. We could evenalert the users when one of their applications shows an abnormal trace. Thesystem can also act as an early warning system, capable of detecting maliciousor abnormally behaving applications in the early stages of propagation.

By implementing a set of tools, we have demonstrated that one can ob-tain behavior-based information and have it processed and clustered on a cen-tral server. Clustering results have been �awless for self-written malware, andpromising with real malware. Whether the performance of a single central serverwould su�ce for large-scale deployment is an interesting topic for further study.A con�guration with multiple cooperating servers, each with a lower load andfaster response, is an avenue to explore.

We have chosen a simple 2-means clustering algorithm to distinguish betweenbenign applications and their corresponding malware version. The results havebeen encouraging, although we need to address some issues that remain unre-solved. First, the system would always separate the system call data vectors intotwo clusters even if there was no malware present. The cluster mapping wouldchange drastically whenever a malicious execution vector entered the dataset.This issue requires some manual checks or further automatic analysis. Secondly,one could intentionally submit incorrect data into the system, thus leaving thedataset corrupted. One of the next steps is to authenticate the submitting ap-plication in order to ensure that nobody is deliberately sending incorrect datato the system. As regards the communication mechanism between the Crow-droid client and our server, it is carried out using the FTP protocol in this �rstversion and thus does not focus on protecting the privacy of transferred data.If an attacker sni�s and manipulates the tra�c in the communication processit can lead to misclassi�cation errors. In order to avoid this, we are introducingencryption mechanisms to preserve the integrity of the data and the authenticityof the sender. We have to take into account that when applying this techniqueon the mobile device it might have an extra overhead in the processing stage,resulting in higher energy consumption.

Finally, we have the challenge of convincing the Android user communityto install the Crowdroid application. We need to manage the perception ofa loss of privacy associated with supplying personal behavior information tothe research community, weighing this against the bene�t of having access toup-to-date behavioral-based statistics on detected malware.

68

Page 77: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

References

[1] 50 Malware applications found on Android O�cial Market. Access date:25 Nov 2010.http://m.guardian.co.uk/technology/blog/2011/mar/02/

android-market-apps-malware?cat=technology&type=article.

[2] Adb Monkey UI- Application exerciser. Access date: 12 Nov 2010.http://developer.android.com/guide/developing/tools/monkey.

html.

[3] Android apk format. Access date: 25 Jan 2011.http://en.ophonesdn.com/article/show/354.

[4] Android application. Access date: 1 Feb 2011.http://www.androidenea.com/2009/06/android-boot-process-from-power-on.

html.

[5] Android Arquitecture. Access date: 4 Nov 2010. [Online]. Available from:http://developer.android.com/guide/basics/what-is-android.

html.

[6] Android boot. Access date: 13 Jan 2011.http://reminisce06.springnote.com/pages/7407623?print=1.

[7] Android build process. Access date: 26 Jan 2011.http://www.alittlemadness.com/2010/06/07/

understanding-the-android-build-process/.

[8] Android init. Access date: 13 Jan 2011.http://bootloader.wikidot.com/linux:boot:android.

[9] Android Kernel. Access date: 20 Apr 2011. [Online]. Available from: http://android.git.kernel.org/.

[10] Android SDK. Access date: 29 Oct 2010.http://developer.android.com/sdk/index.html.

[11] Angry Birds Bonus Level. J. Oberheide. Access date: 27 Dec 2010.http://m.guardian.co.uk/technology/blog/2011/mar/02/

android-market-apps-malware?cat=technology&type=article.

[12] Apk �le generation. Access date: 27 Jan 2011.http://facinatingandroid.blogspot.com/2011/09/

android-apk-file.html.

[13] Baksmali. Access date: 9 Jan 2011. [Online]. Available from: http://

code.google.com/p/smali/.

[14] Cabir Malware variants.Access date: 26 Nov 2010.http://www.f-secure.com/weblog/archives/00000414.html.

69

Page 78: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

[15] Cabir, Smartphone Malware. Access date: 26 Nov 2010.http://www.f-secure.com/v-descs/cabir.shtml.

[16] Dalvik Virtual Machine. Access date: 15 Dec 2010.http://www.dalvikvm.com/.

[17] Dex �le compilation. Access date: 14 Feb 2011.http://en.ophonesdn.com/article/show/354.

[18] Distance metric. Access date: 4 Mar 2011.http://ai.stanford.edu/~ang/papers/nips02-metric.pdf.

[19] Eclipse. Access date: 23 Nov 2010. [Online]. Available from: http://www.eclipse.org/.

[20] Hispasec Security Company. Access date: 25 Mar 2011.http://www.hispasec.com/.

[21] IDC Forecast 2010 2015. Access date: 12 Feb 2011.http://www.idc.com/getdoc.jsp?containerId=227360.

[22] IDC Forecast 2011-2015. Access date: 2 Dec 2010.http://www.idc.com/getdoc.jsp?containerId=prUS22762811.

[23] International Data Corporation, IDCweb. Access date: 4 Dec 2010.http://www.idc.com.

[24] Intrusion Detection System, IDS. Access date: 24 Dec 2010.http://www.sans.org/reading_room/whitepapers/detection/

intrusion-detection-systems-definition-challenges_343.

[25] Iseclab. International Secure Systems Laboratory. Access date: 14 Jan2011.http://www.iseclab.org/.

[26] Linux Kernel manual pages. Access date: 14 Mar 2011.http://www.kernel.org/doc/man-pages/online/dir_section_2.html.

[27] Linux Kernel system call list table. Access date: 24 Mar 2011.http://bluemaster.iu.hio.no/edu/dark/lin-asm/syscalls.html.

[28] Malware developers attacks. Access date: 29 Nov 2010.http://adtmag.com/articles/2011/03/03/

android-attacks-on-rise.aspx.

[29] Malware economic damage in 2007. Access date: 23 Oct 2010.http://www.computereconomics.com/page.cfm?name=Malware%

20Repor.

[30] Malware evolution. Access date: 22 Nov 2010.http://pages.cs.wisc.edu/~pb/comsnets09.pdf.

70

Page 79: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

[31] Malware increase. Access date: 22 Dec 2010.http://www.topnews.in/android-malware-increase-400-report-2328121.

[32] Malware types. Access date: 2 Dec 2010.http://www.spamlaws.com/malware-types.html.

[33] Netbeans. Access date: 23 Nov 2010. [Online]. Available from: http:

//netbeans.org/.

[34] Netqin, mobile security service provider.Access date: 13 Nov 2010.http://www.netqin.com/en/.

[35] Nokia con�rms Microsoft partnership. Access date: 11 Feb 2010.http://techcrunch.com/2011/02/10/nokia-confirms-microsoft-partnership-new-leadership-team/.

[36] Operating Systems in Smartphones. Access date: 14 Dec 2010.http://www.idc.com/getdoc.jsp?containerId=prUS22486010.

[37] SAI - Business Insider. Access date: 17 Dec 2010.http://www.businessinsider.com/sai.

[38] Samsung HTC Smartphone vendor companies market share. Access date:3 Jan 2011.http://www.eweek.com/c/a/Mobile-and-Wireless/

Android-Helps-Samsung-HTC-Double-Market-Share-IDC-792965.

[39] Sandbox. Access date: 29 Oct 2010.http://www.cs.bgu.ac.il/~dsec022/papers/j9a.pdf.

[40] Smartphone applications evolution. Access date: 21 Dec 2011.http://www.businessinsider.com/chart-of-the-day-smartphone-apps-2011-3.

[41] Smartphones vendors sales 2011. Access date: 14 Feb 2011.http://www.idc.com/about/viewpressrelease.jsp?containerId=

prUS22689111.

[42] Stack-Based architecture. Access date: 18 Nov 2010.http://en.wikipedia.org/wiki/Stack_machine.

[43] Steamy Window Malware. Access date: 25 Feb 2011.http://www.netqin.com/en/.

[44] Worldwide Smartphone Users. Access date: 2 Nov 2010.http://www.parksassociates.com//blog/article/

number-of-smartphone-users-to-quadruple--exceeding-1-billion-worldwide-by-2014-4.

[45] Jan A Bergstra and Alban Ponse. Register-machine based processes. Jour-nal of the ACM, 48(6):1207�1241, 2001.

[46] P Berkhin. Survey of clustering data mining techniques. Techniques, 10:1�56, 2002.

71

Page 80: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

[47] P Berkhin. Survey of clustering data mining techniques. Techniques, 10:1�56, 2002.

[48] Thomas Bl, Leonid Batyuk, Aubrey-Derrick Schmidt, Seyit AhmetCamtepe, Sahin Albayrak, and Technische Universit. An android appli-cation sandbox system for suspicious software detection. Techniques, pages55�62, 2010.

[49] Abhijit Bose, Xin Hu, Kang G. Shin, and Taejoon Park. Behavioral de-tection of malware on mobile handsets. In Proceeding of the 6th interna-tional conference on Mobile systems, applications, and services, MobiSys'08, pages 225�238, New York, NY, USA, 2008. ACM.

[50] Timothy K. Buennemeyer, Theresa M. Nelson, Lee M. Clagett, John P.Dunning, Randy C. Marchany, and Joseph G. Tront. Mobile device pro-�ling and intrusion detection using smart batteries. In Proceedings of theProceedings of the 41st Annual Hawaii International Conference on Sys-tem Sciences, HICSS '08, pages 296�, Washington, DC, USA, 2008. IEEEComputer Society.

[51] Iker Burguera, Urko Zurutuza, and Simin Nadjm-Tehrani. Crowdroid:Behavior-based malware detection system for android. In Workshop onSecurity and Privacy in Smartphones and Mobile Devic es 2011 - SPSM2011. ACM, October 2011.

[52] Jerry Cheng, Starsky H Y Wong, Hao Yang, and Songwu Lu. SmartSiren:virus detection and alert for smartphones, pages 258�271. ACM, 2007.

[53] David Dagon, Tom Martin, and Thad Starner. Mobile phones as computingdevices: The viruses are coming! IEEE Pervasive Computing, 3:11�15,October 2004.

[54] Anhai Doan, Raghu Ramakrishnan, and Alon Y Halevy. Crowdsourcingsystems on the world-wide web. Communications of the ACM, 54(4):86,2011.

[55] Manuel Egele. A survey on automated dynamic malware analysis tech-niques and tools vienna university of technology. Computing, V:1�49, 2011.

[56] William Enck, Peter Gilbert, Byung-Gon Chun, Landon P. Cox, JaeyeonJung, Patrick McDaniel, and Anmol N. Sheth. Taintdroid: an information-�ow tracking system for realtime privacy monitoring on smartphones. InProceedings of the 9th USENIX conference on Operating systems design andimplementation, OSDI'10, pages 1�6, Berkeley, CA, USA, 2010. USENIXAssociation.

[57] Usama M. Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth. Ad-vances in knowledge discovery and data mining. chapter From data miningto knowledge discovery: an overview, pages 1�34. American Association forArti�cial Intelligence, Menlo Park, CA, USA, 1996.

72

Page 81: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

[58] By Fengmin Gong, Chief Scientist, Mcafee Network, and Security Technolo-gies. Deciphering detection techniques : Part ii anomaly-based intrusiondetection. Network, (March), 2003.

[59] Je� Howe. Crowdsourcing: Why the Power of the Crowd Is Driving theFuture of Business. Crown Publishing Group, New York, NY, USA, 1edition, 2008.

[60] Nwokedi Idika and Aditya P Mathur. A survey of malware detection tech-niques. Purdue University, page 48, 2007.

[61] G A Jacoby and Nathaniel J Davis Iv. Battery-based intrusion detection.Design, page 224, 2005.

[62] J Macqueen. Some methods for classi�cation and analysis, volume 233,pages 281�297. 1967.

[63] Georgios Portokalidis, Philip Homburg, Kostas Anagnostakis, and HerbertBos. Paranoid android: versatile protection for smartphones. In Proceedingsof the 26th Annual Computer Security Applications Conference, ACSAC'10, pages 347�356, New York, NY, USA, 2010. ACM.

[64] Georgios Portokalidis, Philip Homburg, Kostas Anagnostakis, Herbert Bos,and Universiteit Amsterdam. Paranoid android : Zero-day protection forsmartphones using the. csvunl, pages 1�20, 2010.

[65] J.Blasco P.Rincon. Hong toutou malware analysis. wTF is happeninginside my android phone. Access date: 24 Dec 2011. Technical report.http://www.slideshare.net/JaimeBlasco/

wtf-is-happeninginsidemyandroidphonepublic.

[66] Aubrey-Derrick Schmidt, Rainer Bye, Hans-Gunther Schmidt, Jan Clausen,Osman Kiraz, Kamer A. Yüksel, Seyit A. Camtepe, and Sahin Albayrak.Static analysis of executables for collaborative malware detection on an-droid. In Proceedings of the 2009 IEEE international conference on Com-munications, ICC'09, pages 631�635, Piscataway, NJ, USA, 2009. IEEEPress.

[67] Aubrey-Derrick Schmidt, Jan Hendrik Clausen, Ahmet Camtepe, andSahin Albayrak. Detecting symbian os malware through static function callanalysis. 2009 4th International Conference on Malicious and UnwantedSoftware MALWARE, (March 2006):15�22, 2009.

[68] Aubrey-Derrick Schmidt, Frank Peters, Florian Lamour, Christian Scheel,Seyit Ahmet Çamtepe, and Sahin Albayrak. Monitoring smartphones foranomaly detection. Mob. Netw. Appl., 14:92�106, February 2009.

[69] Aubrey-Derrick Schmidt, Hans-Gunther Schmidt, Jan Clausen, AhmetCamtepe, and Sahin Albayrak. Enhancing security of linux-based androiddevices. Image Rochester NY, 2008.

73

Page 82: Institutionen för datavetenskap - DiVA portal475428/FULLTEXT01.pdf · Symbian 40.1% 32.9% -18.0% BlackBerry OS 17.9% 17.3% -3.5% Android 16.3% 24.6% 51.2% iOS 14.7% 10.9% -25.8%

[70] Asaf Shabtai, Uri Kanonov, and Yuval Elovici. Intrusion detection formobile devices using the knowledge-based, temporal abstraction method.J. Syst. Softw., 83:1524�1537, August 2010.

[71] Asaf Shabtai, Robert Moskovitch, Yuval Elovici, and Chanan Glezer. De-tection of malicious code by applying machine learning classi�ers on staticfeatures: A state-of-the-art survey. Inf. Secur. Tech. Rep., 14:16�29, Febru-ary 2009.

[72] Ashkan Shari� Shamili, Christian Bauckhage, and Tansu Alpcan. Malwaredetection on mobile devices using distributed machine learning. In Pro-ceedings of the 2010 20th International Conference on Pattern Recognition,ICPR '10, pages 4348�4351, Washington, DC, USA, 2010. IEEE ComputerSociety.

[73] Symantec. Trojanized android application, steamy window. Access date:12 Apr 2011. Technical report.http://www.techeye.net/security/androids-steamy-window-trojan-sends-sms-to-premium-numbers.

[74] Urko Zurutuza, Roberto Uribeetxeberria, and Diego Zamboni. A datamining approach for analysis of worm activity through automatic signa-ture generation. In Proceedings of the 1st ACM workshop on Workshop onAISec, AISec '08, pages 61�70, New York, NY, USA, 2008. ACM.

74