Risk-Aware Management of Virtual Resources in Access ...web.ics.purdue.edu/~msarfraz/assets/IEEE-TCC.pdf · information: DOI 10.1109/TCC.2015.2453981, IEEE Transactions on Cloud Computing

2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citationinformation: DOI 10.1109/TCC.2015.2453981, IEEE Transactions on Cloud Computing

1

Risk-Aware Management of Virtual Resources inAccess Controlled Service-oriented Cloud

DatacentersAbdulrahman Almutairi, Muhammad Ihsanulhaq Sarfraz and Arif Ghafoor

Abstract—For economic benefits and efficient management of resources, organizations are increasingly moving towards the paradigmof “cloud computing” by which they are allowed on-demand delivery of hardware, software and data as services. However, there aremany security challenges which are particularly exacerbated by the multitenancy and virtualization features of cloud computing thatallow sharing of resources among potentially untrusted tenants in access controlled cloud datacenters. This can result in increased riskof data leakage. To address this risk vulnerability, we propose an efficient risk-aware virtual resource assignment mechanism for cloudsmultitenant environment. In particular, we introduce the notion of sensitivity in datacenters and the objective is to minimize the risk ofdata leakage. In addition, the risk should not exceed in high sensitivity datacenters in comparison to low sensitivity datacenters. Wepresent three assignment heuristics and compare their relative performance.

Index Terms—Cloud services, access control, risk assessment, vulnerability

F

1 INTRODUCTION

C LOUDS allow on-demand delivery of software, hard-ware, and data as services. Services are made available

to users on demand from a cloud providers servers andhence organizations save the cost of building and maintain-ing their own on-premise servers and pay only for the ser-vices they actually use. Cloud services can be dynamicallyscaled to meet the needs of its users and are designed toprovide easy and scalable access to applications. However,the security challenges in cloud computing present a majorobstacle for adopting the cloud computing paradigm [1].The number of cloud security related incidents are on therise [2]. Cloud Security Alliance (CSA)1 identified the top se-curity threats in cloud computing for 2013 as data breachesresulting from vulnerability of shared virtual resources.An example of data breach is the vulnerability of GoogleDocs where many documents are exposed to unauthorizedusers [3]. The cloud providers usually implement resourceisolation mechanisms to counter the risk of data leakage andto increase the resource utilization at the same time. How-ever, resource sharing remains the main security concernof cloud customers [4]. Therefore, security-aware resourcescheduling is needed in order to minimize the risk to whichcloud customers application services are exposed in a multi-tenant cloud environment. In this paper, we propose anefficient risk-aware virtual resource assignment mechanismfor multitenant cloud environment.

• A. Almutairi, M. I. Sarfraz and A. Ghafoor are with the School ofElectrical and Computer Engineering and Purdue’s Center for Educationand Research in Information Assurance and Security (CERIAS), PurdueUniversity, West Lafayette,IN,

1. https://cloudsecurityalliance.org/research/top-threats/

1.1 Multi-tenant Cloud System

We take the perspective of Software as a Service (SaaS)provider who has its application services hosted by theInfrastructure as a Service (IaaS) providers. A SaaS providermay choose different IaaS providers to increase its servicereliability by avoiding single IaaS failure. In Figure 1a, weprovide a system view of the cloud architecture. The consistsof a Virtual Resource Manager (VRM), a datacenter, and anAccess Control Module (ACM). We can have PaaS providersinstead of IaaS providers but this option can restrict theSaaS provider to host only those applications that can bedeployed in PaaS. In contrast, IaaS is more flexible as itallows the selection of different types of VM images [1].

Since cloud is a multi-tenant environment, it requiressecurity isolation among its tenants. Generally, four typesof security isolation mechanisms are used for cloud archi-tecture as depicted in Figure 1b. The first type of isolation isphysical isolation where services/applications are hosted onseparate physical resources. The SaaS provider achieves thisisolation either by using different IaaS providers or differentprivate clouds for a given IaaS provider [5]. The secondtype of isolation is at the hypervisor level where by a SaaSprovider guarantees separate virtual machines (VMs) forevery tenant. However, VMs may be collocated in the samephysical server. The third type of isolation uses a container-based approach at the OS level where tenant applicationsrun as separate containers sharing a common OS [6]. Thefourth type of isolation is the shared service isolation wheretwo or more tenants require a common application whichis isolated among the tenants. The shared service isolationtype is enforced by the application running on top of one ormore VMs. In this paper, we don’t consider container-basedisolation because we assume that a VM can host only oneservice at a time as mentioned in [7]. The cloud architecturein Figure 1a depicts the other three types of isolations. In this



2

(a) Virtual resource management. (b) Multi-tenant resource isolation types

Fig. 1: Virtual resource management architecture.

paper, we do not address the problem of VM isolation. Butrather, given the VM isolation techniques implemented byIaaS providers which can vary in their isolation guarantees[5], our objective is to manage the virtual resource, suchthat, if a vulnerability weakens the isolation among VMs,then the total risk of data leakage is minimized. If the VMisolation becomes stronger, the risk decreases but it does notvanish.

As noted in Figure 1a, the SaaS provider uses a servicedirectory to control the service subscription where a useraccesses a service only if he has the subscription to thatservice [8]. In addition, the SaaS provider controls its dataoperations using a Role-Based Access Control (RBAC) pol-icy. Under this policy, a user of a service can perform anaction in the datastore if the user is assigned a role andthe role has the authorization to perform that action. Asshown Figure 2a, the RBAC policy is enforced by the SaaSprovider to protect the backend datastore. We assume thatthe roles of the RBAC policy can share data thus constitutinga multi-tenant datastore environment. A role can accessdata within an enterprise or across enterprises under someService Level Agreement (SLA) [9]. Collectively, the intra-and inter-enterprise data accesses by various roles can begoverned by a federated RBAC policy. The SaaS providerimplements the authorization mechanisms based on thisfederated policy. A leading practical example of using RBACin cloud is Microsoft Azure, which uses RBAC to authorizeaccess to its resources through the Azure Active Directory.In addition, Azure controls its data operations using RBACfor performing create/read/update/delete operations in aSQL DB [8]. Also, Oracle Sales Cloud has proposed usingRBAC to secure access to its functionalities and data [10].We only consider RBAC policy to authorize the accesses tothe datastore and not the services as shown in Figure 2a.

For a datacenter residing with a SaaS provider, it isimperative that users are allowed access to multiple servicesbased on service subscription. Such services are deployed oncompatible VMs provided by IaaS providers. Due to the het-erogeneity of SaaS services such as computational intensiveservices or data intensive services, SaaS requires differenttypes of VMs. For such environment, the compatibilitybetween the services and the VMs is needed. In summary,these requirements can be stated as ”A role requires multiple

services and each service requires compatible VMs” as depictedin Figure 2a. However, sharing of data among potentiallyuntrusted multiple tenants, or multiple roles in our case, cancause the risk of data leakage due to cloud vulnerabilities.We define a bi-tuple/pair of a service and its compatibleVM as Integrated Service-aware Virtual Machine (ISVM).We assume data can be leaked across VMs collocated in thesame physical server. Such type of data leakage is due tointer-ISVMs vulnerabilities. In addition, the data leakagecan happen in the SaaS provider applications shared bydifferent tenants and such data leakage is due to intra-ISVMvulnerabilities. In Section 6, we consider two vulnerabilitymodels to quantify the probability of data leakage associ-ated with both intra- and inter-ISVM vulnerabilities.

In [9], we have proposed a distributed access control ar-chitecture for which a key component is the VRM. The VRMallocates the virtual resource to services to satisfy some SLArequirements for each cloud user and to minimize the costof provisioning for the SaaS providers. As illustrated inFigure 1a, VRM consists of workload estimation, resourcevulnerability estimation, service compatibility and vulner-ability estimation, and resource assignment components.The workload estimation component estimates the sharingof data among different roles of the RBAC policy. Theresource vulnerability estimation component of VRM usessecurity analysis tools [11] to estimate the virtual resourcevulnerability. Subsequently, this component can be usedto characterize a virtual resource with respect to differentvulnerability security measurements e.g. a highly secureor an insecure virtual resource. The service compatibilityand vulnerability estimation component maintains a vendorsupplied service-VM compatibility matrix and estimates theISVM vulnerabilities as discussed in Subsection 1.2. Theresource assignment component uses the workload andvulnerability estimations to assign virtual resources to cloudusers application services with a goal to minimize the totalrisk of data leakage.

1.2 Cloud Vulnerability and Risk

The security vulnerabilities of a cloud system are consideredcloud specific if they result from core cloud computingtechnologies. Among others, VM escape and SQL injection



3

(a) Generic model of an access controlled datacenter (b) An example for a manufacturing enterprise hostingmultiple services in a multi-tenant cloud datacenter.

Fig. 2: Access control roles and data centric services mapping.

vulnerabilities in virtualization and web application tech-nologies, respectively, are considered cloud vulnerabilities[12]. The VM escape vulnerability is a weakness of securityisolation (Type 2) in the virtualization layer that might allowan attacker to escape from a local VM to a collocated ten-ant’s VM. Since IaaS offering is based on the virtualizationtechnology, the SaaS or PaaS providers, built on top of IaaS,are exposed to the same type of vulnerabilities. The SQLinjection vulnerability is a weakness of security isolation(Type 4) in SaaS applications that can allow attackers tohave unauthorized access to datastore. Physical isolationamong tenants can reduce the data leakage risk. However,this isolation mechanism increases the SaaS deployment costresulting in underutilization of its resources [13].

According to NIST [14], the security risk can be de-scribed as the loss of integrity, loss of availability, and loss ofconfidentiality. The loss of confidentiality can result in lossof public confidence, embarrassment, or legal action againstthe organization. In this paper, we focus on the risk of lossof confidentiality which can range from jeopardizing thenational security to disclosure of the Privacy Act data. Toquantify this risk which is due to data leakage attack acrosscloud tenants, we adopt the definition of risk by ISO 27005as: ”The potential that a given threat will exploit vulnerabilityof an asset or group of assets and thereby cause harm to theorganization”. The quantitative risk is the product of thelikelihood of the attack and its consequences or impact [15].The likelihood of an attack depends on the vulnerabilities ofthe system and the attacker threat. The consequence of dataleakage attack is measured in terms of the size of tenant’sdata asset which is leaked. Accordingly, we can formulatethe risk due to data leakage in SaaS datacenter as:

Risk = Assets× V ulnerability × Threatwhere Threat = 1

(1)

We consider the cloud tenant’s assets to be its datastored in the cloud datacenter. The assets are similar forboth cloud computing and conventional systems. How-ever, the main difference between the risk in cloud andin conventional systems is due to cloud specific securityvulnerabilities as mentioned above. The vulnerability inEquation 1 is the overall probability of data leakage as aresult of cloud system software (i.e. application and vir-tualization) vulnerabilities. There are many vulnerabilitiesin the cloud system software that can cause data leakage

across tenants e.g. VM escape, SQL injection, and cross-sitescripting. To find the overall probability of data leakageacross tenants, we first identify the software vulnerabili-ties that can cause data leakage. Such vulnerabilities havedifferent levels of criticality which are calculated based onCommon Vulnerability Scoring System (CVSS) scores [16].Subsequently, the probabilities of finding each of these vul-nerabilities are estimated using statistical methods such asVulnerability Discovery Models (VDMs) [17]. Consequently,each vulnerability probability can be weighted based on itscriticality and used to find the overall probability of dataleakage (vulnerability in Equation 1). Various aggregationapproaches can be used to capture the overall probabilityof data leakage such as taking the maximum (Weakest LinkModel) and the semantic-based aggregation (DependenceModel) as discussed in Section 6. The level of threat of anattacker in the risk equation depends on his/her experience,motivation, and time of access to the asset. There is noquantifiable measurement for the attacker experience ormotivation and therefore, we assume all attackers pose thesame threat.

Motivating Scenario: In a manufacturing company, var-ious departments such as finance, engineering, human re-sources, and marketing as well as external contractors out-source their data to a SaaS provider and collaborate on acertain project. This project may require sharing of multiplesensitive documents among these various departments andthe contractors as well as among the contractors themselves.The data sharing is controlled by a federated RBAC policy[9]. Users from various departments and from externalcontractors are assigned roles according to the RBAC policy.These roles access project data and process the data throughproject software packages provided by the SaaS provideras shown in Figure 2b. Contractors can be competitorsand hence can be viewed as untrusted tenants. We assumethe data from the parent company and its contractors isstatic and preloaded in a cloud datacenter. This scenarioconstitutes a RBAC multi-tenant cloud environment. In thisenvironment, intra- and inter-ISVM vulnerabilities pose arisk of data leakage within the parent company and amongits contracts.

In this paper, we propose the notion of the sensitivity ofmulti-tenant datacenters in terms of degree of data sharingamong tenants. For example, limited sharing implies a highsensitivity datacenter and high sharing of data means lowsensitivity datacenter. The goal of this paper is to present



4

virtual resource management methodologies such that Risk(Equation 1) is minimized. In addition, it is desirable thatfor a given size of a datacenter, Risk should not exceed ifthe datacenter has high sensitivity as compared to the caseif the datacenter has low sensitivity.

The paper is organized as follows. Section 2 discussesrelated work. Section 3 provides an overview of the RBACmodel and defines RBAC policy and spectral model. Sec-tion 4 proposes a workload approximation model basedon a given RBAC policy. Section 5 characterizes the clouddatacenter by introducing the notion of sensitivity. Section 6discusses the key models from literature to quantify ISVMvulnerabilities. Section 7 presents the risk-aware schedulingproblem and the cost functions. Section 8 presents assign-ment heuristics for ISVM allocation. Section 9 compares theperformance of the heuristics. Finally, Section 10 outlines theconclusion. Proofs of all theorems and lemmas are given inthe Appendix.

2 RELATED WORK

Security-aware Scheduling: Security-aware scheduling hasbeen mostly reported in the literature in the context oftask scheduling in untrusted computer grid systems. In[18] the authors have proposed an availability-aware se-cure scheduling algorithm for multi-classes applicationswith varying availability and performance requirements.The proposed algorithm provides high availability with thecost of increasing the response time. In [19], the authorshave proposed a secure scheduling algorithm based on thetrustworthiness measurements of the computing node incomputer grid system. The application security overheadand risk probability are computed and incorporated in thescheduling decision. In addition, a risk-resilient schedulingalgorithm is proposed in [20] which ensures secure grid jobexecution by requiring grid node security index to be higherthan user job trust requirement. In [21], the authors havepresented a security-aware resource allocation mechanismfor real-time applications in the case of homogeneous andheterogeneous platform. The above works do not considerthe data confidentiality of application or the access controlpolicy, which restricts permissions for a given task. A keyfeature that differentiates our scheduling algorithms fromother approaches is that the privilege set for each task is themain parameter considered in the scheduling decision.

Access Control in Cloud Datacenter: Several re-searchers have addressed access control issues for cloudcomputing. Nurmi et al. [22] have provided an authoriza-tion system to control the execution of VMs to ensure thatonly administrators and owners could access them. Bergeret al. [23] have proposed an authorization model based onboth RBAC and security labels to control access to shareddata, VMs, and network resources. Calero et al. [24] havepresented a centralized authorization system that providesa federated path-based access control mechanism. Whatdistinguishes our work is that we address the problemsof virtual resource vulnerabilities in the presence of multi-tenancy and virtualization.

Vulnerability Models: VDMs are used to estimate theprobability of software vulnerability. The authors in [25]have proposed an attack surface metric that measures the

security of software by analyzing its source code and findthe potential flows. In [17] the vulnerability discovery ismodeled based on the empirical analysis of history of actualvulnerability dataset and followed by predicting the vulner-ability discovery. The source code quality and complexityare incorporated in the VDM to generate better estimation[26]. CVSS [27] metrics have been extensively used by bothindustry and academia to score the vulnerabilities of thesystems and estimate risks. The US National VulnerabilityDatabase (NVD) provides a catalog of known vulnerabilitieswith their CVSS scores.

3 ROLE BASED ACCESS CONTROL MODEL

The RBAC policy defines permissions on objects based onroles in an organization. The policy is composed of a set ofUsers (U), a set of Roles (R), and a set of Permissions (P).UA is a user-to-role assignment relation, and PA is a role-to-permission assignment relation. An RBAC access controlpolicy for a cloud datacenter defines permissions to accessdata objects for its roles [28]. Such assignment is formallydefined as follow.Definition 3.1. Given an RBAC policy P for a datacenter

where R is the set of roles and given O being the setof data objects, the role-to-permission assignment PAcan be represented as a directed bipartite graph G(V,E),where V = R∪O s.t. R∩O = φ. The edges erioj ∈ E inG represents the existence of role-to-permission assign-ment (ri×oj) ∈ PA in the RBAC policy P , where ri ∈ Rand oj ∈ O.

The out-degree of a role vertex represents the cardinalityof the role and the in-degree of a data object vertex repre-sents the degree of sharing of that object among roles. InFigure 3a, an RBAC policy with |R| = 4 and |O| = 10 isrepresented as bipartite graph model. We can notice that thecardinality of role r1 is out-degree(r1) = 6. Also, the degreeof sharing of data object o10 is in-degree(o10) = 4.

The resource assignment component of VRM requiresthe cardinality of shared data objects among roles, as dis-cussed in Section 7. For datacenters, determining thesecardinalities from the bipartite graph is a computationalintensive task. We propose an alternative representation ofRBAC by clustering all data objects that are accessed by thedifferent roles into a set of non-overlapping partitions. Theset of cardinalities of these partitions, called W , is termedas spectral model for RBAC policy. This model is formallydefined as follows.Definition 3.2. (RBAC spectral model). Given a bipartite

graph representation of RBAC policy G(V,E), let P (R)be the power set of R excluding the null set φ. Thespectral representation of RBAC is the vectorW , indexedby P (R) and lexicographically ordered. Formally, letp ∈ P (R) be a set of roles, then wp ∈ W is definedas follows:

wp = |{ok : ok ∈ O|∀ri ∈ p ∃eriok ∈ E}|and |W| = 2n − 1.

(2)

The spectral model provides two advantages over thebipartite graph model. First, it can be used to characterizean RBAC policy in terms of sensitivity of a datacenter using



5

o1 o2 o3 o4 o5 o6 o7 o8 o9 o10

r1 r2 r3 r4

Objects

Roles Per

mis

sion

Ass

ignm

ent

(a) Example of RBAC permission assignment.

w1,2,3,4= 1

w1= 1

w2= 2

w3= 1

w4= 0

w1,2= 0

w1,3= 0

w1,4= 2

w2,3= 0

w2,4= 0

w3 ,4= 1

w1,2,3= 1

w1,2,4= 1

w1,3,4= 0

w2,3,4= 0

(b) Spectral lattice representation of RBAC.

Fig. 3: RBAC policy representation.

a single parameter. Based on the degree of sharing amongroles, the datacenter sensitivity can be high, medium, or lowas elaborated later in Section 5. In addition, the spectral modelallows resource assignment based on a given percentageof datacenter. Varying this percentage can lead to variablecomplexity of an assignment algorithm.

The set W can be generated from the bipartite graphmodel of RBAC. The members wp ∈ W are non overlappingand can be viewed as vertices of a lattice (i.e. binary n-cube)with n levels, where n is the number of roles. For examplenodes at level 1 of the lattice represent the cardinalities ofpartitions corresponding to unshared data objects belongingto individual roles. The nodes at level 2 represent the car-dinalities of data partitions that are accessed by two roles.Similarly, the nodes at level n contain data objects shared byall the roles. The nodes ofW can be indexed using the roleIDs associated with the partition as shown in Figure 3b. Itcan be observed that indices ofW vector are subsets of P (R)and its elements are the cardinalities of partitions that can beaccessed by all the roles in these subsets. Note,

∑∀wp∈W wp

equals to the total size of datacenter. The following exampleillustrates the spectral model.Example 3.1. In Figure 3a, an access control policy with|R| = 4 and |O| = 10 is shown as a bipartite graph.The spectral model is shown as lattice in Figure 3b.Notice that w{1,4} = |{o5, o6}| = 2 because o5 and o6are accessed by both r1 and r4.

We now define the Attackability (AT) of a spectral par-tition. Attackability measures the maximum amount datathat the adversaries (roles) can potentially access in anunauthorized manner. Formally, this parameter is definedas follows.Definition 3.3. Given spectral representation of access con-

trolW , the Attackability of wp ∈ W is:AT (wp) = wp × (n− |p|)

Example 3.2. In Figure 3b, the Attackability for w{2} andw{1,2,4} is as follows:AT (w{2}) = 2× 3 = 6; AT (w{1,2,4}) = 1× 1 = 1

We can notice that as more roles have access permissionsof a partition, its Attackability decreases. Note, that in thespectral representation of RBAC, the tuples with highest ATbelong to a partition in level n

2 .

4 WORKLOAD ESTIMATION FOR RBAC POLICY

For a datacenter, specifying the exact value of wp’s can becomputationally intensive. One practical approach is to use

cardinality estimation techniques [29]. For example, for atransactional workload, the selectivity estimation of queryprocessing for a datacenter can be used for query size esti-mation [29], whereby query (or a collection of queries) cancorrespond to a role. In case role mining is used to designa RBAC policy, the role mining technique such as multi-assignment clustering [30] can provide an estimate ofW . Inthis paper, we assume the access of data objects in a data-center follows a Zipfian distribution, an assumption that issupported by Yahoo! Cloud Serving Benchmark (YCSB) [31].YCSB uses five types of workload to benchmark differentcloud data serving systems. For four of these workloads, itis assumed that data access follows the Zipfian distribution.Only one of the workloads uses the Lates distribution whichis similar to the Zipfian distribution except that the mostrecently inserted records are in the head of the distribution.

Motivated by this real world benchmark, our experimentworkload follows Zipfian distribution. According to thisdistribution, some objects are shared by a large number ofroles (queries) while most of the objects are shared among afewer number of roles (queries). Hence, this distribution canprovide a heterogeneous workload for RBAC. The Zipfiandistribution is given as follows:

f(α; s,N) =α−s∑Ni=1 i

−s(3)

Where, N : maximum rank, α :selected rank, s : sensitivityparameter.

In Equation 3, if parameter s = 1, then the probabilitythat a data object is assigned to a single role (belongs to rankα = 1) doubles the probability that the same data object isassigned to two roles (belongs to rank α = 2). As value ofs increases, the number of data objects exclusively assignedto an individual role becomes larger as shown in Figure 4.Note, the value of s should be greater than or equal to 1.

The Zipfian distribution can be used to generate hetero-geneous RBAC-based workload in two steps as proposedin Algorithm 1. In the first step, data objects are classifiedinto n buckets where each bucket represents the numberof total data objects assigned to a lattice level of Figure3b. For example, data objects in bucket 1 are exclusivelyaccessed by only one role. On the other hand, data objectsin bucket n are shared by all roles. The number of dataobjects in each bucket follows Zipfian distribution. In thesecond step, we assign data objects of bucket i to randomlyselected partitions at level i of the lattice. Note, the numberof partitions at level i is

(ni

).



6

Algorithm 1: Workload generation algorithmInput: Number of data objects |O|, number of roles n,

constant s.Output: spectral representation of RBACW .

1. Let B = {B1, . . . , Bn} bucket array;2. foreach i = 1, . . . , |O| do3. α = zipf(n,s);4. Bα = Bα + 1;

5. foreach i = 1, . . . , n do6. foreach j = 1, . . . , Bi do7. α = zipf(

(ni

),s);

8. map α to random partition in level i call it p;9. wp = wp +1;

10. add wp toW11. returnW

5 SENSITIVITY OF DATACENTER USING SPECTRALMODEL AND RISK MANAGEMENT

Based on the statistical property of access control workload,we propose the notion of data sensitivity based classificationof cloud datacenters. The sensitivity classification is depen-dent on the level of sharing of data objects among the roles.In particular, we define the sensitivity of a datacenter asthe average degree of sharing among its data objects. Incase, the average degree of sharing is low, we term thedatacenter to have a high sensitivity. On the contrary, ifextensive sharing of data objects among roles is present, weterm this datacenter to have a low sensitivity. The mediumsensitivity class falls in the middle. The goal is to minimizethe total risk in a highly sensitive datacenter environment.It can be noticed that the sharing of data objects and theclassification of a datacenter can be modeled using theZipfian distribution. The key parameter to characterize thesensitivity of datacenter is the scalar parameter s of theZipfian density function shown in Equation 3. As shownin Figure 4, for a smaller value of s, more data objects areuniformly distributed in theW vector of the spectral modelof RBAC. In the following example, we illustrate how theZipfian parameter s can be used to classify the sensitivity ofa datacenter.

Example 5.1. For a datacenter with 0.5 × 106 data objects,suppose we have three RBAC policies (P1, P2, P3) eachwith n = 30 roles. Figure 4 shows a histogram ofdata objects across the spectral lattice. Depending onthe Zipfian distribution, the three classes of datacenters,namely; High Sensitive Datacenter (HSD), Medium Sen-sitive Datacenter (MSD), and Low Sensitive Datacenter(LSD), can be identified with respect to policies P1,P2, and P3. For example, HSD has a large value of s(s ≥ 2) since the sharing of data objects among P1 rolesis very small. On other hand, LSD has a small value ofs (1.5 > s ≥ 1) depicting the case of extensive sharingof data objects among roles of policy P3. The MSD has avalue of s which falls in the middle (2 > s ≥ 1.5 ).

Fig. 4: A statistical characterization of sensitivity of clouddatacenters

6 INTEGRATED SERVICE-VM VULNERABILITYMODEL

In order to quantify the probability of data leakage associ-ated with intra- and inter-ISVM vulnerabilities, we use twomodels namely, Weakest Link model [32] and Dependencymodel [16] based on Common Vulnerability Scoring System(CVSS) [27].

6.1 Weakest Link ModelThe Weakest Link model is to identify the vulnerability thatrequires the least amount of effort to exploit in order toobtain the desired level of privilege resulting in data leakageto the attacker. Selecting the Weakest Link involves choosingthe single most critical vulnerability (as designated by theCVSS Base Score) of every system [32]. If there are severalvulnerabilities with the same Base Score but different ImpactScore or Exploitability Score, an expert should be presentto judge their severity. It is not known whether the easeof exploitation (CVSS Exploitability Score), the effect of anattack (CVSS Impact Score), or a combination of the two(CVSS Base Score) is the most valuable to an attacker.

6.2 Dependency ModelThe Weakest Link approach ignores casual and dependencyrelationships smog vulnerabilities. Such an approach maycause useful semantics of individual scores to be lost. Thedependency relationship of vulnerabilities being ignored orhandled in an arbitrary way brings doubts to the metric re-sults and prevents their adoption. Moreover, only the attackprobability is considered for aggregating CVSS scores in theWeakest Link approach which may limit the scope of theapplication and may give misleading results since differentaspects demand different algebra for aggregating the scores.This issue can be addressed by interpreting and aggregatingCVSS scores from three aspects namely, probability, effort,and skill [16].Definition 6.1. (The Cloud ISVM Model). Given the set of

l services S = {s1, . . . , sl} hosted in the cloud andV = {v1, v2, ..., vm} as a set of m virtual machinesavailable to VRM, let di,j,k,q represent the probabilityof data leakage among ISVM bi-tuple pairs (si,vj) and(sk,vq). This probability is estimated by the resourcevulnerability estimation component as shown in Figure1a. The di,j,k,q can be computed based on Weakest Linkmodel or dependency model. Accordingly, in order to



7

compute all inter- and intra-ISVM data leakage proba-bilities due to inter- and intra-ISVM vulnerabilities, theISVM leakage probabilities can be modeled as a fullyconnected undirected graph G(Q,E) where vertices inQ = {(si, vj) : si ∈ S and vj ∈ V } represent ISVM andthe weights on the edges represent di,j,k,q .

v1,s1 v2,s2

v3,s1

d1311 d1322

w12 w12

Risk(w12) = w12* max(d1311,d1322)

r1 r2

r3

(a) Max model

v1,s1 v2,s2

v3,s1

d1311 d1322

w12 w12

Risk(w12) = w12* (1- ((1-d1311),(1-d1322)))

r1 r2

r3(b) Parallel model

Fig. 5: Examples of risk probability in max and parallelmodels.

7 RESOURCE ASSIGNMENT

Based on G(Q,E) graph, two models namely, max andparallel, can be used to estimate the risk of data leakageas illustrated in Figure 5a and 5b. Accordingly, we proposetwo cost functions namely RASP-MAX and RASP-PAR. Weuse both of these models and do not prefer one over theother as they provide independent cost functions.

7.1 Risk-aware Scheduling Problem - MAX (RASP-MAX)

The Risk-aware Scheduling Problem (RASP-MAX) is con-structed with the following information given:• spectral representationW of a RBAC policy• a role to service requirement matrix Γri,sj depicting

that role ri requires service sj• the compatibility matrix C , where Csi,vj indicates

which service si is compatible with VM vj• and the vulnerability matrix represented as G(Q,E)

providing the probabilities of data leakage in terms ofdi,j,k,q

Then RASP-MAX is to minimize the total risk Risk ofdata leakage by allocating the virtual resource to all the

required services assuming all roles are active such that eachvirtual machine can host only one compatible service at atime. Formally, the optimization problem is:

Minimize Risk =∑

wp∈WRisk(wp)

=∑

wp∈Wwp ×

∑ra∈R

Threat(p, ra)×maxA

B

Where A : ∀si, sk ∈ S, ∀vj , vq ∈ V,∀rb ∈ pB : di,j,k,q × Ira,si,vj × Irb,sk,vq

(4)

Such that:

1) Ira,si,vj × Irb,sk,vj = 0, when si 6= sk∀ra, rb ∈ R, ∀si, sk ∈ S, ∀vj ∈ V2)∑

vj∈V Ira,si,vj = 1, when Γra,si = 1∀ra ∈ R, ∀si ∈ S3) Ira,si,vj = 0, when Csi,vj

= 0 ∀ra ∈ R

Where:R,V ,S, and d as given in Table1 in Appendixwp: is number of tuples accessed by roles in set p as definedin Equation 2

Ira,si,vj =

{1 if ra is assigned to ISVM (si,vj)0 Otherwise

Γra,si =

{1 if ra requires service si0 Otherwise

Csi,vj =

{1 if service si can run on virtual machine vj0 Otherwise

Threat(p, ra) =

{1 if ra /∈ p0 Otherwise (see Equation 1)

Theorem 7.1. RASP-MAX problem is NP-complete.

7.2 Risk-aware Scheduling Problem - PAR (RASP-PAR)

The Risk-aware Scheduling Problem (RASP-PAR) is con-structed as follows. Given spectral representation W , therequirement matrix Γ, the compatibility matrix C , and thevulnerability matrix represented as G(Q,E), then RASP-PAR is to minimize the total risk Risk of data leakage byallocating the virtual resource to all the required services.

Minimize Risk =∑

wp∈WRisk(wp)

=∑

wp∈Wwp ×

∑ra∈R

Threat(p, ra)× (1−∏A

B)

A : ∀si, sk ∈ S, ∀vj , vq ∈ V,∀rb ∈ pB : (1− di,j,k,q)× Ira,si,vj

× Irb,sk,vq(5)

All the constraints of Equation 4 are also applied here. Itcan be noted that the notion of attackability is also capturedin the two summations shown in Equation 5.

Theorem 7.2. RASP-PAR problem is NP-complete.



8

8 ASSIGNMENT HEURISTICS

In this section, we propose three heuristics for solving theRASP which can be deployed by the resource assignmentcomponent of VRM as shown in Figure 1a. The first heuristicis called the Best Fit Heuristic (BFH) and uses the best fitstrategy. In BFH, each role is assigned to the best availablevirtual machine such that its probability of leakage and anyincrease in the total risk is kept to minimum. The secondheuristic is called Single Move Heuristic (SMH) which re-cursively improves the assignment solution based on InitialAssignment (IA) till no further solution is possible (may notbe optimal). For further refinement to SMH, we proposeMulti Move Heuristic (MMH).

Since all the three heuristics use the spectral model andas |W| = 2n − 1, in order to reduce the complexity of BFH,SMH and MMH, we propose an approximation strategy forworkload characterization. The strategy is based on consid-ering a smaller percentage of the total size of datacenter. Letsuch percentage be denoted as D. In particular, D identifiesthe cutoff level k in the lattice of P (R) which can be used byBFH, SMH and MMH. The following lemma describes theprocess of determining this cutoff k.

Lemma 8.1. Given W (spectral representation of RBAC), s(the scalar parameter of the Zipfian distribution), and D(the percentage of data considered), the cutoff level kin the lattice P (R) is defined as k = min0≤k≤nHk,s ≥D100 × Hn,s where Hk,s =

∑ki=1 i

−s is kth generalizedHarmonic number.

For a given value D of a datacenter, the spectral vectorW needs to be truncated accordingly as detailed in Lemma8.1. The truncated vectorW ′ is defined as follows.

Definition 8.1. We define W ′ to consist of all the partitionsof W starting from level one up to and including thecutoff level k in the lattice of Figure 3b. The vector W ′is formally defined as: W ′ = {wp|(wp ∈ W) ∧ |p| ≤ k}where the size of W ′ can be approximated in term ofcutoff k, as giving by the following lemma.

Lemma 8.2. |W ′| ≤ (n+ 1)k

Note, different sensitivity classes of datacenter yield dif-ferent cutoff levels for the same percentage D. Accordingly,the size ofW ′ varies. The example below illustrates how Dand the sensitivity classes can affect the value of k.

Example 8.1. For W of Example 5.1, when D = 70%; thecutoff (k) is 2 for HSD and is 8 for LSD. For D = 95%the cutoff for MSD is 18 as compared to 9 and 24 forHSD and LSD, respectively.

In following section, we describe BFH, SMH and MMHusing a two-phase approach. In phase one, a heuristic usesa greedy approach to generate an assignment based onpartitions in W ′. In phase two, all the remaining partitions(W −W ′) are assigned automatically with their associatedroles. In Section 9, we provide a trade off between theaccuracy of risk assessment based on the total workloadrepresented by W vs. approximate workload representedbyW ′.

Algorithm 2: Best Fit HeuristicInput: Spectral representation of RBACW ′, vulnerability

graph G(Q,E), set of roles R, set of services S,requirement matrix Γ, compatibility matrix C, andinitial assignment matrix IA.

Output: A new assignment matrix I of roles to serviceand VM pairs

1. I = IA;2. Let A = {(ra, si)} ∀ra ∈ R, ∀si ∈ S where Γra,si = 1 ;3. while A 6= φ do4. Randomly select (ra, si) pair from A;5. Let vList = {vq} where Csi,vq = 1 for all vq ∈ V ;6. foreach vq ∈ vList do7. if ∀rb ∈ R, Irb,sj ,vq = 1 ∧ sj 6= si then8. remove vq from vList;

9. Compute function RISK(ra, si, vq) for all vq ∈ vList;10. Let vq be the VM with minimum risk;11. if Ira,si,vq == 0 then12. Ira,si,vq = 1;

13. A = A− (ra, si);

14. return I;15. Function RISK(ra, si, vq)16. let t =0;17. foreach rb ∈ R do18. construct vul[ra][rb] based on assignment of ra to

(si, vq)

19. foreach wp ∈ W ′ do20. foreach rb /∈ p do21. let max = 0;22. foreach rm ∈ p do23. if vul[rm][rb] ≥ max then24. max = vul[rm][rb];

25. let t = t+ wp× max

26. return t;

8.1 Initial Assignment (IA)Initial assignment is a random assignment of services toVMs in order to find a feasible solution. IA is generated asfollows: Given compatibility matrix C , IA represents C as abipartite graph with two sets of vertices namely S (services)and V (VM’s) such that there is and edge from service si tovirtual machine vj when Csi,vj = 1. IA finds a maximumbipartite matching. The problem has a feasible solution if themaximum bipartite matching cardinality equals |S|. Fromrequirement matrix Γ, if a role ra is requiring service si thenIAra,si,vj

= 1 such that (si, vj) pair is the one produced bythe maximum bipartite matching. This assignment guaran-tees a feasible solution however it may incur high risk.

8.2 Best Fit Heuristic (BFH)The algorithm for BFH is presented in Algorithm 2. Giventhe initial assignment IA, BFH randomly selects a role andservice pair (ra, si) from the requirement matrix Γ and findsa new assignment with maximum reduction in resultingrisk. We now explain the operation of the Algorithm asfollows. The initial assignment matrix IA is assigned to thenew assignment matrix I (Line 1). In Line 2, the algorithmconstructs a set A which represents elements of Γ for whichΓra,si = 1. A random (ra, si) pair is selected from set A(Line 4). A list of VMs, named vList, is then constructed



9

Algorithm 3: Single Move Heuristic

Input: Spectral representation of RBACW ′, vulnerabilitygraph G(Q,E) , set of roles R, set of services S,requirement matrix T , compatibility matrix C, andinitial assignment matrix IA.


1. flag = 1;2. I = IA;3. while flag = 1 do4. Let A = {(ra, si)} ∀ra ∈ R, ∀si ∈ S where

Γra,si = 1 ;5. flag =0;6. while A 6= φ and flag =0 do7. Randomly select (ra, si) pair from A;8. Let vList = {vq} where Csi,vq = 1 for all vq ∈ V ;9. foreach vq ∈ vList do

10. if ∀rb ∈ R, Irb,sj ,vq = 1 ∧ sj 6= si then11. remove vq from vList;

12. Compute function RISK(ra, si, vq) for allvq ∈ vList;

13. Let vq be the VM with minimum risk;14. if Ira,si,vq == 0 then15. flag =1;16. Ira,si,vq = 1;17. else18. A = A− (ra, si);

19. return I;

with all VMs that are compatible with the service si ex-cluding the VMs that have already been assigned to adifferent service (Lines 5-8). For each VM in the list, the riskis computed (Line 9). The VM with maximum reductionin risk, vq , is selected (Line 10). If the vq is not in thecurrent assignment, it is added to the current assignment.The selected pair (ra, si) is removed from set A in Line 13and the loop in Lines 3-13 iterates until the set A is empty.

In order to find the maximum reduction in risk, BFHcalls function RISK(ra, si, vq) which calculates the totalrisk based on the assignment Ira,si,vq = 1. In Lines 17 and18 the function generates pairwise vulnerability betweenrole ra and all other roles. The function then iterates throughall wp ∈ W ′ and for each wp computes the risk based onthe maximum vulnerability (in case of RASP-MAX) betweenra ∈ p and rb /∈ p in Lines 19 -25. It returns the summationof product of wp and the maximum vulnerabilities based onassignment Ira,si,vq = 1.Lemma 8.3. The complexity of BFH is O(m× l× n3 × |W ′|)

Note, due to its linear complexity in terms of numberof services and number of VMs, BFH provides a scalable 2

solution for both RASP-MAX and RASP-PAR problems.

8.3 Single Move Heuristic (SMH)The algorithm for SMH is presented in Algorithm 3. Wenow explain the operation of this algorithm as follows.The initial assignment matrix IA is assigned to the newassignment matrix I (Line 2). In Line 4, the initial membersofA represent elements of Γ for which Γra,si = 1. A random

2. http://en.wikipedia.org/wiki/Scalability

(ra, si) pair is selected from set A (Line 7). A list of VMsis then constructed with all VMs that are compatible withthe service si excluding the VMs that have already been as-signed to a different service (Lines 8-11). For each VM in thelist, the risk is computed (Line 12). The VM with minimumrisk is selected (Line 13). If the selected VM is not in thecurrent assignment, it is added to the current assignmentand the algorithm returns to Line 4. Otherwise, the randompair (ra, si) is removed from A and algorithm returns toLine 7. The algorithm iterates in the inner loop Lines 6-18until either the it finds a better assignment (flag =1) or theset A is empty which means no further improvement can beachieved for any (ra, si) in A.Lemma 8.4. The complexity of SMH is O(m× l×n4×|W ′|).

8.4 Multi Move Heuristic (MMH)The algorithm for MMH is presented in Algorithm 4. Wenow explain the operation of the Algorithm as follows.The initial assignment matrix IA is assigned to the newassignment matrix I (Line 2). IA is constructed as explainedearlier. The f(vq) function returns the service si hosted byvq based on the current assignment matrix (Line 5). Thealgorithm iterates for each possible pair of VMs (vi, vj). Ifboth vi and vj host the same service (f(vi) = f(vj)), thenall roles assigned to these both VMs are determined (Line 9).The risk is then computed for each role belonging to the twoVMs (Line 10). For a given role, if the risk is less for vi, thenthe role is assigned to vi. Otherwise, the role is assignedto vj (Lines 12-16). If the new assignment different thanthe current assignment, it implies that the algorithm findsan assignment that reduces the risk and updates the newassignment to become the current assignment. Otherwise,the current assignment does not change (Lines 17-19). Iff(vi) 6= f(vj), the algorithm swaps the services if each VMis compatible with other VM’s service (Lines 21). The swaproutine generates a new assignment I and the algorithmupdates the current assignment to the new assignment onlyif there is a reduction in the risk (Lines 23-34). The aboveprocedure is repeated for all possible pairs of VMs. Thealgorithm iterates outer loop (Lines 2 -24) until no furtherrisk improvement can be achieved in the current assignmentfor all possible pairs of VMs.Lemma 8.5. The complexity of MMH isO(m2×l×n4×|W ′|).

8.5 Performance Metrics for Resource AssignmentWe introduce two sets of metrics to compare the perfor-mance of the proposed BFH, SMH and MMH. The first setis concerned with the overall risk exhibited by all roles. Thesecond set of metrics focuses on the risk incurred per role.The formal definitions of these metrics are presented in thissubsection and the experiment results are given in the nextsection.

The data leakage risk is the main metric that needs to beminimized. The risk metric has two sub-metrics which weuse to evaluate the proposed heuristics. The first risk metricis computed from the total workload which represents allthe partitions in W . This risk metric represents the totalrisk and is denoted as T . In other words T representsthe potential risk for the entire datacenter. The second



10

Algorithm 4: Multi Move HeuristicInput: Spectral representation of RBACW ′, vulnerability

graph G(Q,E) , set of roles R, set of services S,requirement matrix Γ , compatibility matrix C,and initial assignment IA.


1. flag = 1;2. I = IA;3. while flag = 1 do4. flag=0;5. Let f(vq) = si when Ira,si,vq = 1;6. foreach vi ∈ V do7. foreach vj ∈ V do8. if f(vi) = f(vj) then9. Let F = {ra} when Ira,f(vi),vq = 1; where

vq ∈ {vi, vj};10. Compute gvqra = RISK(ra, f(vi), vq) for all

ra ∈ F ;11. Let I = I;12. foreach ra ∈ F do13. if gvira ≤ g

vjra then

14. Ira,f(vi),vi = 115. else16. Ira,f(vi),vj = 1

17. if I ! = I then18. flag = 1;19. I = I;

20. else21. if Cf(vi),vj == Cf(vj),vi then22. Let I = I;23. if swap(vi, vj) reduces total risk then24. I = I;25. flag =1;

26. return I;

sub-metric of risk is based on all the partitions of W ′which is used to study the performance of the heuristicsand compare their effectiveness. This sub-metric is calledpartial risk P and corresponds to the risk resulting from theworkload approximation by W ′. Formally, the risk metricsare given as:

T =∑

wp∈WRisk(wp) (6)

P =∑

wp∈W′

Risk(wp) (7)

Note, the difference between total risk and partial riskrepresents the relative risk error introduced by the heuristic.This error is the second performance metric and it is a resultof the heuristic’s lack of knowledge of all the partitions dueto workload approximation (Section 4). We define this erroras:

E =(T − P )

P= (

T

P− 1) (8)

As mentioned in Definition 3.2, the Attackability ATmeasures the maximum amount of data leakage risk for agiven policy. The difference between the Attackability of a

policy and the resulting partial risk represents the quality ofrisk reduction (δ). Formally, we define δ as:

δ = AT − P, whereAT =∑

wp∈W′

AT (wp) (9)

Note, δ provides a quality measures of a heuristic in terms ofits partial risk P as compared to the maximum incurred risk(Attackability). A high value of δ implies that the resultingP is far from the maximum risk, thus the heuristic performswell.

The second set of metrics are concerned with perfor-mance of heuristics at the role level. One such metric is theDiscriminator Index (DI) which is an indirect indicator ofperformance of heuristic in term of risk reduction per role.Semantically, DI not only implies the role quality of reduc-tion but also disproportionality in terms of risk managementi.e. a heterogeneous bias in managing the risk. Formally, itis defined as following:

Definition 8.2. Given a scheduling heuristic, DI for a heuris-tic can be formally defined by extending the definitionof generic discriminator index in [33]. For this purpose,given a role ri we define the role attackability (ATi), therole partial risk (Pi), and the role quality of risk reductionδi. Let:ATi =

∑wp∈W′∧ri∈pAT (wp)

Pi =∑

wp∈W′∧ri∈pRisk(wp)δi = ATi − Pi

Accordingly, DI is defined as

DI = 1−(∑n

i=1 δi)2

n×∑n

i=1(δi)2(10)

It can be noticed that wp’s are ”non linearly” dependenton parameter s as given by Equation 3 and Algorithm1 (Line 7-9). Accordingly both the cost functions givenin Equations 4 and 5 are non-linearly dependent on s.Therefore any heuristic designed to solve these problemsshould reduce the overall risk ”non-linearly” with the threeclasses of datacenters i.e. HSD, MSD and LSD. As men-tioned earlier, the Attackability (AT) parameter of thesethree classes namely HSD, MSD and LSD decreases intheir respective order. Using this order, we can evaluate thequality of the solution of these heuristics based on their riskreduction and how sensitive is such reduction in terms ofs,D, n and m. In particular, the sensitivity analysis focuseson the number of roles and number of VMs.

9 PERFORMANCE EVALUATION

In this section, we present a detailed evaluation of resultsin terms of s (the sensitivity parameter of the Zipfiandistribution), D and the size of the problem i.e. number ofVMs (m), and roles (n). These are independent parameterswhich are critical to the risk management of the RASP-MAXand RASP-PAR problems. The results are organized inthree categories based on classification of cloud datacentersi.e. HSD, MSD, and LSD as mentioned in Section 5. Foreach category, we present the performance results of threeheuristic algorithms for the metrics P , E and DI . Theseresults are presented for varying percentage D.



11

0 2e+007 4e+007 6e+007 8e+007 1e+008

1.2e+008 1.4e+008 1.6e+008 1.8e+008

65 70 75 80 85 90 95 100

Part

ial R

isk

(P)

% Size of Database (D)

AT IA BFH SMH MMH

(a) LSD

0 2e+007 4e+007 6e+007 8e+007 1e+008

1.2e+008 1.4e+008 1.6e+008 1.8e+008

65 70 75 80 85 90 95 100

Part

ial R

isk

(P)


AT IA BFH SMH MMH

(b) MSD

0 2e+007 4e+007 6e+007 8e+007 1e+008

1.2e+008 1.4e+008 1.6e+008 1.8e+008

65 70 75 80 85 90 95 100

Part

ial R

isk

(P)


AT IA BFH SMH MMH

(c) HSDFig. 6: Partial Risk P with a problem size of 70 roles and 50 VMs for RASP-MAX.

0

0.2

0.4

0.6

0.8

1

65 70 75 80 85 90 95 100

Erro

r (E

)


BFH SMH MMH

(a) LSD

0

0.2

0.4

0.6

0.8

1

65 70 75 80 85 90 95 100

Erro

r (E

)


BFH SMH MMH

(b) MSD

0

0.2

0.4

0.6

0.8

1

65 70 75 80 85 90 95 100

Erro

r (E

)


BFH SMH MMH

(c) HSDFig. 7: Error E with a problem size of 70 roles and 50 VMs for RASP-MAX.

0

0.2

0.4

0.6

0.8

1

1 1.2 1.4 1.6 1.8 2

Dis

crim

inat

ion

Inde

x (D

I)

Sensitivity (s)

BFH SMH MMH

(a) D=70%

0

0.2

0.4

0.6

0.8

1

1 1.2 1.4 1.6 1.8 2

Dis

crim

inat

ion

Inde

x (D

I)

Sensitivity (s)

BFH SMH MMH

(b) D=90%

0

0.2

0.4

0.6

0.8

1

1 1.2 1.4 1.6 1.8 2

Dis

crim

inat

ion

Inde

x (D

I)

Sensitivity (s)

BFH SMH MMH

(c) D=95%Fig. 8: Discrimination Index DI with a problem size of 70 roles and 50 VMs for RASP-MAX.

0 2e+007 4e+007 6e+007 8e+007 1e+008

1.2e+008 1.4e+008 1.6e+008 1.8e+008

65 70 75 80 85 90 95 100

Part

ial R

isk

(P)


AT IA BFH SMH MMH

(a) LSD

0 2e+007 4e+007 6e+007 8e+007 1e+008

1.2e+008 1.4e+008 1.6e+008 1.8e+008

65 70 75 80 85 90 95 100

Part

ial R

isk

(P)


AT IA BFH SMH MMH

(b) MSD

0 2e+007 4e+007 6e+007 8e+007 1e+008

1.2e+008 1.4e+008 1.6e+008 1.8e+008

65 70 75 80 85 90 95 100

Part

ial R

isk

(P)


AT IA BFH SMH MMH

(c) HSDFig. 9: Partial Risk P with a problem size of 120 roles and 200 VMs for RASP-MAX.

0

0.2

0.4

0.6

0.8

1

65 70 75 80 85 90 95 100

Erro

r (E

)


BFH SMH MMH

(a) LSD

0

0.2

0.4

0.6

0.8

1

65 70 75 80 85 90 95 100

Erro

r (E

)


BFH SMH MMH

(b) MSD

0

0.2

0.4

0.6

0.8

1

65 70 75 80 85 90 95 100

Erro

r (E

)


BFH SMH MMH

(c) HSDFig. 10: Error E with a problem size of 120 roles and 200 VMs for RASP-MAX.



12

0

0.2

0.4

0.6

0.8

1

1 1.2 1.4 1.6 1.8 2

Dis

crim

inat

ion

Inde

x (D

I)

Sensitivity (s)

BFH SMH MMH

(a) D=70%

0

0.2

0.4

0.6

0.8

1

1 1.2 1.4 1.6 1.8 2

Dis

crim

inat

ion

Inde

x (D

I)

Sensitivity (s)

BFH SMH MMH

(b) D=90%

0

0.2

0.4

0.6

0.8

1

1 1.2 1.4 1.6 1.8 2

Dis

crim

inat

ion

Inde

x (D

I)

Sensitivity (s)

BFH SMH MMH

(c) D=95%Fig. 11: Discrimination Index DI with a problem size of 120 roles and 200 VMs for RASP-MAX.

0 2e+007 4e+007 6e+007 8e+007 1e+008

1.2e+008 1.4e+008 1.6e+008 1.8e+008

65 70 75 80 85 90 95 100

Part

ial R

isk

(P)


AT IA BFH SMH MMH

(a) LSD

0 2e+007 4e+007 6e+007 8e+007 1e+008

1.2e+008 1.4e+008 1.6e+008 1.8e+008

65 70 75 80 85 90 95 100

Part

ial R

isk

(P)


AT IA BFH SMH MMH

(b) MSD

0 2e+007 4e+007 6e+007 8e+007 1e+008

1.2e+008 1.4e+008 1.6e+008 1.8e+008

65 70 75 80 85 90 95 100

Part

ial R

isk

(P)


AT IA BFH SMH MMH

(c) HSDFig. 12: Partial Risk P with a problem size of 120 roles and 200 VMs for RASP-PAR.

0

0.2

0.4

0.6

0.8

1

65 70 75 80 85 90 95 100

Erro

r (E

)


BFH SMH MMH

(a) LSD

0

0.2

0.4

0.6

0.8

1

65 70 75 80 85 90 95 100

Erro

r (E

)


BFH SMH MMH

(b) MSD

0

0.2

0.4

0.6

0.8

1

65 70 75 80 85 90 95 100

Erro

r (E

)


BFH SMH MMH

(c) HSDFig. 13: Error E with a problem size of 120 roles and 200 VMs for RASP-PAR.

0

0.2

0.4

0.6

0.8

1

1 1.2 1.4 1.6 1.8 2

Dis

crim

inat

ion

Inde

x (D

I)

Sensitivity (s)

BFH SMH MMH

(a) D=70%

0

0.2

0.4

0.6

0.8

1

1 1.2 1.4 1.6 1.8 2

Dis

crim

inat

ion

Inde

x (D

I)

Sensitivity (s)

BFH SMH MMH

(b) D=90%

0

0.2

0.4

0.6

0.8

1

1 1.2 1.4 1.6 1.8 2

Dis

crim

inat

ion

Inde

x (D

I)

Sensitivity (s)

BFH SMH MMH

(c) D=95%Fig. 14: Discrimination Index DI with a problem size of 120 roles and 200 VMs for RASP-PAR.

9.1 Experimental Setup

In our evaluation, we use three types of cloud datacentersi.e. HSD, MSD, and LSD as mentioned in Section 5. In ad-dition, we vary the percentage of datacenter D from 70% to95% where the complexity of the problem increases with D.To generate ISVM matrix values, we assume that probabilityof data leakage among VMs belonging to different IaaSproviders is negligible. We generate four classes of inter-ISVM vulnerabilities representing four IaaS providers secu-rity control [5]. In addition, the intra-ISVM vulnerabilitiesare randomly generated from four security groups, namedhighly secure, medium secure, low secure, and insecure.

All intra-ISVM vulnerabilities are higher than inter-ISVMvulnerabilities due to the fact that in general, virtualizationisolation is more secure than application/VM isolation. Auniformly random IA used in terms of bipartite graph rep-resenting the VM-services compatibility matrix is generatedsuch that each service is compatible with at least one VMand each VM is compatible with at least one service. Also,the role-service requirement is randomly generated usinga uniform distribution such that each role requires at leastone service and each service is required by one or moreroles. In this experiment, we have two different problem sizesettings in terms of number of roles and number of VMs.The first problem size assumes 70 roles and 50 VMs and the



13

second problem size, which is the larger scale problem size,assumes 120 roles and 200 VMs. The number of services is20 in both cases.

9.2 Results for RASP-MAXIn this section, we discuss the aforementioned performancemetrics i.e. Partial Risk (P ), Relative Risk Error (E), andDiscrimination index (DI).

9.2.1 Partial Risk (P )For the initial assignment (IA), we observe P increases ass increases from a low value to a high value correspondingto LSD and HSD cases respectively as shown in Figure 6.The reason being that Attackability (AT ) as per Definition3.4 and Algorithm 1 (Line 7-9) increases with s for a givenD. Since the initial assignment is a random assignment andit only ensures feasibility, it follows the attackability curve.However, among the three heuristics in general, SMH out-performs BFH since it iterates several times until obtaininga local minimum in comparison to BFH that iterates onlyonce. MMH always outperforms other heuristics since itcan change multiple assignments simultaneously for eachiteration in comparison to SMH which changes only oneassignment and hence as expected, MMH performs better.We observe that P is higher for LSD than HSD. Also,P converges with increasing s and D. Furthermore, thefollowing observations of P and AT from Figure 6 areexplained in terms of the quality of reduction (δ) as definedin Equation 9:• For a given heuristic, δ increases with D. The reason be-

ing that with the increase inD, the number of partitionsi.e. wis considered by the heuristics increases resultingin an improved assignment decision.

• For a given D, δ increases with s. The reason being thatwith increasing value of s, heterogeneity of partitions ismore pronounced resulting in an improved value of δ.

9.2.2 Relative Risk Error (E)The metric E elucidates the effect on assignment decisionswhen (W − W ′) partitions of data are not considered bythe proposed heuristics. We observe from Figure 7 that Edecreases as D increases. The reason being that as morepartitions are considered for assignment decision, the dif-ference between T and P is less thus reducing E. In otherwords, excluding (W−W ′) partitions of data result in worseassignment decisions and hence resulting in a high value ofE. In addition, we can notice for a given D, E increaseswith increasing s. As mentioned above, AT is higher forHSD as compared to LSD. Hence, if (W − W ′) partitionsare not considered by the heuristic for the case of HSD, theresulting E is higher in comparison to LSD for the samepartitions.

9.2.3 Discrimination Index (DI)The effect of D and s on heuristics is further illustratedusing the DI metric. We observe from Figure 8 that DIincreases as s increases from low to high corresponding toLSD and HSD cases respectively. The reason being that DIas mentioned earlier can be viewed as the degree of hetero-geneity among the quality of risk reduction (δis) associated

with role ri. For the case of HSD, the heterogeneity of rolesattackabilitiy (ATi) is higher than the case for LSD due tohigh value of s for HSD. We expect δi heterogeneity to behigher in HSD than LSD. According to Equation 10, thisleads to a higher value of DI for HSD compared to LSD. Wealso observe that DI decreases as D increases from low tohigh i.e. from 70% to 95%. Note, for different percentage D,the cutoff parameter k in Lemma 8.1 is less in lower D ascompared to higher D. Therefore, as k increases, the hetero-geneity of the RBAC policy in term of wis decreases. SinceDI represents such heterogeneity, DI for a low percentageD is higher as compared to high percentage D. For MSD(s = 1.5), an important observation (as shown in Figure 8) isthat there is a rapid shift in DI parameter form D = 70% toD = 95%. The figure displays the interplay of parameterDIandD for a medium value of s. In essence, for small value ofD, the heterogeneity of the RBAC policy is similar betweenMSD and HSD. As D increases, the MSD heterogeneity is inthe middle of LSD and HSD.

Same observations as mentioned above with respect toP , DI and E parameters can been made while increasingthe problem size in terms of number of roles and VMs.For completeness purpose, we provide Figures 9, 10, and11 that depict the results for large scale problems. Forlarge problems, we observe that the overall difference in Pappears to diminish and the performance of heuristics con-verges. This is because while increasing the number of VMs,there are more options for assignment and hence all theheuristics tend to perform almost equally good. However,for a small number of VMs i.e. for a smaller problem size,the heuristic with a higher intelligence and high complexityyields an improved performance as is the case with MMH.Hence, we conclude that for a large problem size, BFH is apreferable choice since it is; scalable (Lemma 8.3), performsequally good as other heuristics in terms of P (Figure 9), andits resulting E for a low value of D is also low (Figure 10).

9.3 Results for RASP-PAR

Regarding the performance of the three heuristics, we ob-serve similar phenomenons with RASP-PAR as observedwith RASP-MAX with respect to P , DI and E parameters.For example MMH consistently outperforms other heuris-tics. Note, the value of E for BFH case is lower than thevalues of the other two heuristics. The performance resultsfor RASP-PAR are shown in Figures 12, 13, and 14. It canbe noted that although no cross comparison between RASP-MAX and RASP-PAR is intended since they represent costfunctions of two different models of ISVM vulnerability,RASP-PAR produces higher risk than RASP-MAX strictlyby definition of ISVM and not by comparison. By definition,due to highly parallel terms to compute vulnerability, itresults in a high value of P as depicted in Figure 12. Forlarger problems, even for RASP-PAR, BFH is a better choicefor the three reasons given for the case of RASP-MAX.

10 CONCLUSION

The security of datacenters is becoming a primary issue indeploying cloud cloud computing . In particular, the vir-tualization and multitenancy features of cloud computing



14

that allow sharing of resources among potentially untrustedtenants exacerbate the security challenges in terms of in-creased risk of data leakage. In this paper we have presenteda formal model for RBAC policy which was casted for for-mulating a risk aware VM assignment problem. We proposethe notion of sensitivity in datacenters with the objectiveof minimizing the risk of data leakage. We present three as-signment heuristics and compare their relative performance.

Acknowledgement: This work was partially supportedby NSF grant IIS-0964639.

REFERENCES

[1] F. Hu, M. Qiu, J. Li, T. Grant, D. Tylor, S. McCaleb, L. Butler, andR. Hamner, “A review on cloud computing: Design challengesin architecture and security,” Journal of Computing and InformationTechnology, vol. 19, no. 1, pp. 25–55, 2011.

[2] C. V. W. Group, “Cloud computing vulnerability incidents: Astatistical overview,” Technical Report, March, 2013.

[3] J. Kincaid, “Google privacy blunder shares your docs withoutpermission,” TechCrunch, March, 2009.

[4] H. Takabi, J. B. Joshi, and G.-J. Ahn, “Security and privacychallenges in cloud computing environments,” Security & Privacy,IEEE, vol. 8, no. 6, pp. 24–31, 2010.

[5] D. Gonzales, J. Kaplan, E. Saltzman, Z. Winkelman, and D. Woods,“Cloud-trust - a security assessment model for infrastructure asa service (iaas) clouds,” Cloud Computing, IEEE Transactions on,vol. PP, no. 99, pp. 1–1, 2015.

[6] S. Soltesz, H. Potzl, M. E. Fiuczynski, A. Bavier, and L. Peter-son, “Container-based operating system virtualization: a scalable,high-performance alternative to hypervisors,” in ACM SIGOPSOperating Systems Review, vol. 41, no. 3, 2007, pp. 275–287.

[7] D. Ardagna, B. Panicucci, and M. Passacantando, “Generalizednash equilibria for the service provisioning problem in cloudsystems,” Services Computing, IEEE Transactions on, vol. 6, no. 4,pp. 429–442, 2013.

[8] J. Hall. (2015) Role-based access control in the microsoft azureportal. [Online]. Available: https://azure.microsoft.com/en-us/documentation/articles/role-based-access-control-configure/

[9] A. Almutairi, M. Sarfraz, S. Basalamah, W. Aref, and A. Ghafoor,“A distributed access control architecture for cloud computing,”Software, IEEE, vol. 29, no. 2, pp. 36–44, 2012.

[10] S. D. S. K. Shannon Connaire, Jiri Weiss. (2015) Oraclesales cloud securing oracle sales cloud. [Online]. Available:http://docs.oracle.com/cloud/latest/salescs gs/OSCUS.pdf

[11] M. Balduzzi, J. Zaddach, D. Balzarotti, E. Kirda, and S. Loureiro,“A security analysis of amazon’s elastic compute cloud service,”in Proceedings of the 27th Annual ACM Symposium on AppliedComputing. ACM, 2012, pp. 1427–1434.

[12] B. Grobauer, T. Walloschek, and E. Stocker, “Understanding cloudcomputing vulnerabilities,” Security & privacy, IEEE, vol. 9, no. 2,pp. 50–57, 2011.

[13] T. Ristenpart, E. Tromer, H. Shacham, and S. Savage, “Hey, you,get off of my cloud: exploring information leakage in third-partycompute clouds,” in Proceedings of the 16th ACM conference onComputer and communications security. ACM, 2009, pp. 199–212.

[14] G. Stoneburner, A. Goguen, and A. Feringa, “Risk managementguide for information technology systems,” Nist special publication,vol. 800, no. 30, pp. 800–30, 2002.

[15] K. Djemame, D. Armstrong, J. Guitart, and M. Macias, “A riskassessment framework for cloud computing,” Cloud Computing,IEEE Transactions on, vol. PP, no. 99, pp. 1–1, 2014.

[16] P. Cheng, L. Wang, S. Jajodia, and A. Singhal, “Aggregating cvssbase scores for semantics-rich network security metrics.” in SRDS,2012, pp. 31–40.

[17] O. H. Alhazmi, Y. K. Malaiya, and I. Ray, “Measuring, analyzingand predicting security vulnerabilities in software systems,” Com-puters & Security, vol. 26, no. 3, pp. 219–228, 2007.

[18] X. Qin and T. Xie, “An availability-aware task scheduling strat-egy for heterogeneous systems,” Computers, IEEE Transactions on,vol. 57, no. 2, pp. 188–199, 2008.

[19] T. Xiaoyong, K. Li, Z. Zeng, and B. Veeravalli, “A novel security-driven scheduling algorithm for precedence-constrained tasks inheterogeneous distributed systems,” Computers, IEEE Transactionson, vol. 60, no. 7, pp. 1017–1029, 2011.

[20] S. Song, K. Hwang, and Y.-K. Kwok, “Risk-resilient heuristicsand genetic algorithms for security-assured grid job scheduling,”Computers, IEEE Transactions on, vol. 55, no. 6, pp. 703–719, 2006.

[21] T. Xie and X. Qin, “Security-aware resource allocation for real-time parallel jobs on homogeneous and heterogeneous clusters,”Parallel and Distributed Systems, IEEE Transactions on, vol. 19, no. 5,pp. 682–697, 2008.

[22] D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman,L. Youseff, and D. Zagorodnov, “The eucalyptus open-sourcecloud-computing system,” in Cluster Computing and the Grid, 2009.9th IEEE/ACM International Symposium on, 2009, pp. 124–131.

[23] S. Berger, R. Caceres, K. Goldman, D. Pendarakis, R. Perez, J. R.Rao, E. Rom, R. Sailer, W. Schildhauer, D. Srinivasan et al., “Se-curity for the cloud infrastructure: Trusted virtual data centerimplementation,” IBM Journal of Research and Development, vol. 53,no. 4, pp. 6–1, 2009.

[24] J. Alcaraz Calero, N. Edwards, J. Kirschnick, L. Wilcock, andM. Wray, “Toward a multi-tenancy authorization system for cloudservices,” Security Privacy, IEEE, vol. 8, no. 6, pp. 48–55, 2010.

[25] P. K. Manadhata and J. M. Wing, “An attack surface metric,”Software Engineering, IEEE Transactions on, vol. 37, no. 3, pp. 371–386, 2011.

[26] S. Rahimi and M. Zargham, “Vulnerability scrying method forsoftware vulnerability discovery prediction without a vulnerabil-ity database,” Reliability, IEEE Transactions on, vol. 62, no. 2, pp.395–407, 2013.

[27] P. Mell, K. Scarfone, and S. Romanosky, “A complete guide to thecommon vulnerability scoring system version 2.0,” in Published byFIRST-Forum of Incident Response and Security Teams, 2007, pp. 1–23.

[28] D. F. Ferraiolo, R. Sandhu, S. Gavrila, D. R. Kuhn, and R. Chan-dramouli, “Proposed nist standard for role-based access control,”ACM Transactions on Information and System Security (TISSEC),vol. 4, no. 3, pp. 224–274, 2001.

[29] H. Zhang, I. F. Ilyas, and K. Salem, “Psalm: Cardinality estimationin the presence of fine-grained access controls,” in Data Engineer-ing, 2009. ICDE’09. IEEE 25th International Conference on. IEEE,2009, pp. 505–516.

[30] M. Frank, A. P. Streich, D. Basin, and J. M. Buhmann, “Multi-assignment clustering for boolean data,” The Journal of MachineLearning Research, vol. 13, no. 1, pp. 459–489, 2012.

[31] B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears,“Benchmarking cloud serving systems with ycsb,” in Proceedingsof the 1st ACM symposium on Cloud computing, 2010, pp. 143–154.

[32] C. Wang and W. A. Wulf, “Towards a framework for security mea-surement,” in 20th National Information Systems Security Conference,Baltimore, MD, 1997, pp. 522–533.

[33] R. Jain, D.-M. Chiu, and W. R. Hawe, “A quantitative measureof fairness and discrimination for resource allocation in sharedcomputer system,” Digital Equipment Corp., Littleton, Mass.,Tech. Rep. TR-301, Sept 1984.

Abdulrahman Almutairi is a PhD candidate inthe School of Electrical and Computer Engineer-ing at Purdue University. He received his MSdegree from same department in 2008. His re-search interests include information security andprivacy, and cloud computing systems.

Muhammad Ihsanulhaq Sarfraz received theB.Sc. degree in computer science and the M.Sc.degree in computer engineering in 2009 and2012, respectively. He is currently pursuing hisPhD in Electrical and Computer Engineering atPurdue University. His research interests includesecure systems and applied cryptography. He isa student member of IEEE.

Arif Ghafoor is currently a Professor with theSchool of Electrical and Computer Engineer-ing at Purdue University. He received his PhDin 1984 from Columbia University. His areasof interest include information security and dis-tributed multimedia systems. He is a Fellow ofthe IEEE. In 2001, he received an IEEE Com-puter Society Technical Achievement Award.

Documents

Risk-Aware Management of Virtual Resources in Access ...web.ics.purdue.edu/~msarfraz/assets/IEEE-TCC.pdf · information: DOI 10.1109/TCC.2015.2453981, IEEE Transactions on Cloud Computing