Click here to load reader

Xiaoqiao Meng, Vasileios Pappas, Li Zhang IBM T.J. Watson Research Center Improving the Scalability of Data Center Networks with Traffic-aware Virtual

Embed Size (px)

Citation preview

  • Slide 1
  • Slide 2
  • Xiaoqiao Meng, Vasileios Pappas, Li Zhang IBM T.J. Watson Research Center Improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement IEEE INFOCOM 2010
  • Slide 3
  • INTRODUCTION The scalability of modern data centers has become a practical concern and has attracted significant attention in recent years. existing solutions that require changes in the network architecture and the routing protocols to balance traffic load With an increasing trend towards more communication intensive applications in data centers, the bandwidth usage between virtual machines (VMs) is rapidly growing.
  • Slide 4
  • INTRODUCTION This paper proposes using traffic-aware virtual machine (VM) placement to improve the network scalability Many VM placement solutions seek to consolidate VMs for CPU, physical memory and power consumption savings, yet without considering consumption of network resources This paper tackling the scalability issue by optimizing the placement of VMs on host machines.
  • Slide 5
  • INTRODUCTION e.g. VMs with large mutual bandwidth usage are assigned to host machines in close proximity design a two-tier approximate algorithm that efficiently solves the VM placement problem for very large problem sizes
  • Slide 6
  • INTRODUCTION Contributions 1.We address the scalability issue of data center networks with network-aware VM placement. We formulate it as an optimization problem, prove its hardness and propose a novel two-tier algorithm.
  • Slide 7
  • INTRODUCTION Contributions 2.We analyze the impact of data center network architectures and traffic patterns on the scalability gains attained by network-aware VM placement.
  • Slide 8
  • INTRODUCTION Contributions 3.We measure traffic patterns in production data center environments, and use the data to evaluate the proposed algorithm as well as the impact analysis.
  • Slide 9
  • INTRODUCTION Problem definition: the Traffic-aware VM Placement Problem (TVMPP) as an optimization problem. Input: traffic matrix, cost matrix Output: where VMs should be placed Goal: minimize the total cost
  • Slide 10
  • Data Center Traffic Patterns Examine traces from two data-center-like systems: 1. a data warehouse hosted by IBM Global Services the incoming and outgoing traffic rates for 17 thousand VMs 2. the incoming and outgoing TCP connections for 68 VMs 10 days measurement
  • Slide 11
  • Data Center Traffic Patterns Uneven distribution of traffic volumes from VMs
  • Slide 12
  • Data Center Traffic Patterns Stable per-VM traffic at large timescale:
  • Slide 13
  • Data Center Traffic Patterns Weak correlation between traffic rate and latency:
  • Slide 14
  • Data Center Traffic Patterns The potential benefit : increased network scalability and reduced average traffic latency. The observed traffic stability over large timescale suggests that it is feasible to find good placements based on past traffic statistics
  • Slide 15
  • Data Center Network Architectures
  • Slide 16
  • Slide 17
  • Problem Formulation Cij: the communication cost from slot i to j Dij: traffic rate from VM i to j ei: external traffic rate for VMi gi:communication cost between VMi and the gateway
  • Slide 18
  • Problem Formulation The above objective function is equivalent to X: permutation matrix
  • Slide 19
  • Problem Formulation we define Cij as the number of switches on the routing path from VM i to j. Accordingly, optimizing TVMPP is equivalent to minimizing average traffic latency caused by network infrastructure. Offline scenario & Online scenario
  • Slide 20
  • Complexity Analysis TVMPP falls into the category of Quadratic Assignment Problem (QAP) QAP: n things are put into n location, with distance & flows NP-hard, finding the optimality of QAP problems with size > 15 is practically impossible
  • Slide 21
  • Complexity Analysis Theorem 1: For a TVMPP problem defined on a data center that takes one of the topology in Figure 4, finding the TVMPP optimality is NP-hard
  • Slide 22
  • Theorem 1 This can be proved by a reduction from the Balanced Minimum K-cut Problem (BMKP) BMKP: G = (V,E) undirected, weighted graph with n vertices A k-cut on G is defined as a subset of E that partition G into k components BMKP is NP-hard
  • Slide 23
  • Theorem 1 Now considering a data center network, regardless of which topology being used, we can always create a network topology that satisfy: n slots that are partitioned into k slot-clusters of equal size n/k Every two slots have a connection with certain cost within the same cluster : ci across clusters cost: co co > ci
  • Slide 24
  • Theorem 1 Suppose there are n VMs with traffic matrix D. By assigning these n VMs to the n slots, we obtain a TVMPP problem if we define a graph with the n VMs as nodes and D as edge weights, we obtain a BMKP problem It can be shown that when the TVMPP is optimal, the associated BMKP is also optimal. And vise versa
  • Slide 25
  • Theorem 1 when the TVMPP is optimal, if we swap any two VMs i, j that have been assigned to two slot-cluster r1, r2 respectively, the k-cut weight will increase. Let s1 denote the set of VMs assigned to r1 Let s2 denote the set of VMs assigned to r2
  • Slide 26
  • the TVMPP objective value increases: The amount of change for the k-cut weight is:
  • Slide 27
  • ALGORITHMS Proposition 1: Suppose 0 a1 a2... an and 0 b1 b2... bn, the following inequalities hold for any permutation on [1,..., n]
  • Slide 28
  • ALGORITHMS First, according to Proposition 1, solving TVMPP is intuitively equivalent to finding a mapping of VMs to slots such that VM pairs with heavy mutual traffic be assigned to slot pairs with low-cost connections.
  • Slide 29
  • ALGORITHMS The second design principle is divide-and-conquer: we partition VMs into VM-clusters and partition slots into slot clusters. VM-clusters are obtained via classical min-cut graph algorithm which ensures that VM pairs with high mutual traffic rate are within the same VM cluster Slot-clusters are obtained via standard clustering techniques which ensures slot pairs with low-cost connections belong to the same slot-cluster
  • Slide 30
  • ALGORITHMS SlotClustering: Minimum k-clustering,NP-hard. ( an approximation ratio 2 ) O(nk) VMMinKcut: minimum k-cut algorithm O(n4) Assign VMs to slots Recursive call
  • Slide 31
  • ALGORITHMS
  • Slide 32
  • IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNS Global Traffic Model: each VM sends traffic to every other VM at equal and constant rate For any permutation, matrix X, holds This simplifies the TVMPP problem to the following:
  • Slide 33
  • IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNS which is the classical Linear Sum Assignment Problem (LSAP). The complexity for LSAP is O(n3) Random placement:
  • Slide 34
  • IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNS
  • Slide 35
  • Partitioned Traffic Model: Under the partitioned traffic model, each VM belongs to a group of VMs and it sends traffic only to other VMs in the same group The GLB is a lower bound for the optimal objective value of a QAP problem
  • Slide 36
  • IMPACT OF NETWORK ARCHITECTURES AND TRAFFIC PATTERNS
  • Slide 37
  • Slide 38
  • observation
  • Slide 39
  • EVALUATION
  • Slide 40
  • DISCUSSION Combining VM migration with dynamic routing protocols VM placement by joint network and server resource optimization:
  • Slide 41
  • Amos Brocco, Apostolos Malatras, Ye Huang, Beat Hirsbrunner Department of Informatics University of Fribourg, Switzerland ARiA: A Protocol for Dynamic Fully Distributed Grid Meta-Scheduling ICDCS 2010
  • Slide 42
  • INTRODUCTION An advantage of grid systems is their ability to guarantee efficient meta-scheduling (optimal allocation of jobs across a pool of sites with diverse local scheduling policies ) The centralized nature of current meta-scheduling solutions is not well suited for the envisioned increasing scale and dynamicity
  • Slide 43
  • INTRODUCTION This paper focuses on grid task meta-scheduling, by presenting a fully distributed protocol named ARiA to achieve efficient global dynamic scheduling across multiple sites The meta-scheduling process is performed online, and takes into account the availability of new resources as well as changes in actual allocation policies
  • Slide 44
  • ARiA PROTOCOL Job Submission Phase Job Acceptance Phase Dynamic Rescheduling Phase
  • Slide 45
  • Job Submission Phase Jobs are assigned a universal unique identifier (UUID) Nodes receiving job submissions are referred to as initiators for these jobs Initiators issue resource discovery queries across the grid peer-to-peer overlay by broadcasting REQUEST messages
  • Slide 46
  • Job Acceptance Phase If the request cannot be satisfied, the message is further forwarded on the peer-to-peer overlay otherwise a cost value for the job based on actual resources and current scheduling is computed and sent back to the jobs initiator by means of an ACCEPT message The initiator evaluates incoming ACCEPT responses, and selects the best qualified node, and sends an ASSIGN message
  • Slide 47
  • Dynamic Rescheduling Phase the assignee attempts to find candidates for rescheduling of jobs in its queue while their execution has not yet started by the INFORM messagges The structure of INFORM messages relates to that of REQUEST messages
  • Slide 48
  • EVALATION For the evaluation of ARiA, an overlay of 500 nodes with a target average path length of 9 hops was deployed in a custom simulator The average nodes degree attained during simulations was 4, resulting in about 2000 overlay links
  • Slide 49
  • EVALATION In all scenarios a total of 1000 jobs is submitted to random nodes on the grid. Unless otherwise specified, jobs are submitted at 10 seconds intervals when dynamic rescheduling is enabled, INFORM messages are sent for at most 2 scheduled jobs every 5 minutes
  • Slide 50
  • EVALATION REQUEST messages are forwarded on the overlay for at most 9 hops; at each step, at most 4 random neighbors of the current node are contacted INFORM messages a more lightweight approach is followed, with at most 8 hops and up to 2 neighbors
  • Slide 51
  • EVALATION
  • Slide 52
  • Slide 53
  • Slide 54
  • Slide 55
  • Slide 56
  • Slide 57
  • Slide 58
  • Slide 59
  • Slide 60
  • Slide 61
  • Slide 62
  • Amir Epstein Dean H. Lorenz Ezra Silvera Inbar Shapira Virtualization Technologies, System Technologies & Services IBM Haifa Research Lab, Haifa, Israel Virtual Appliance Content Distribution for a Global Infrastructure Cloud Service IEEE INFOCOM 2010
  • Slide 63
  • INTRODUCTION An emerging cloud service is a virtual server shop, that allows cloud customers to order virtual appliances to be delivered virtually on the cloud Global cloud providers need to create customized virtual-server disk images and deliver them on time to meet the customer reservations and service level In order to reduce provisioning time and meet reservation deadlines, one approach is to stage images on storage near the customer
  • Slide 64
  • INTRODUCTION This introduces an optimization problem of finding an optimal staging schedule, according to network bandwidth, pending reservations schedule, and customer value Continuous model vs integral model
  • Slide 65
  • Problem Definition a staging storage space with capacity C n appliance deployment requests Each request i is for an appliance that consume Ci staging capacity and has a desired due date di propagation time pi (assume pi = kCi)
  • Slide 66
  • Continuous model
  • Slide 67
  • Lemma 4.1: Any feasible schedule S can be turned into right-tight feasible schedule by right-shifting job execution intervals. Lemma 4.2: There exists an optimal schedule in which the jobs are processed in EDD (Earliest Due Date) order
  • Slide 68
  • Continuous model Algorithm 1 is a dynamic program that finds an optimal schedule that is right-tight and in EDD order O(nW ) time and space
  • Slide 69
  • THE INTEGRAL MODEL p1=3, d1=5 p2=3, d2=8 P3=2, d3=9 Capacity = 5 s1=2, s2=5, s3=0 But can not be an EDD order
  • Slide 70
  • SOLUTION Our algorithms for the integral model have two steps In the first step, we solve the problem for the continuous model In the second step, we discard jobs from this schedule, without losing too much weight, to obtain a feasible schedule for the integral model
  • Slide 71
  • THE INTEGRAL MODEL Lemma 5.1: For unweighted jobs, any feasible schedule S for the continuous model can be transformed to a feasible schedule S for the integral model with w(S)>=1/2w(S). Lemma 5.2: For weighted jobs, any feasible schedule S for the continuous model can be transformed to a feasible schedule S* for the integral model with w(S*)>=1/2w(S)
  • Slide 72
  • SOLUTION
  • Slide 73
  • IMPLEMENTATION AND SIMULATION RESULTS