Upload
yin
View
61
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Public Clouds (EC2, Azure, Rackspace, …). Multi-tenancy Different customers’ virtual machines (VMs) share same server. VM. VM. VM. VM. VM. VM. VM. Tenant: Why Cloud? Pay-as-you-go Infinite Resources Cheaper Resources. Provider: Why multi-tenancy? Improved resource utilization - PowerPoint PPT Presentation
Citation preview
1
Public Clouds (EC2, Azure, Rackspace, …)
VM
Multi-tenancyDifferent customers’ virtual machines (VMs) share same server
Provider: Why multi-tenancy?• Improved resource utilization• Benefits of economies of scale
VM
VM
VM
VM
VM
VM
Tenant: Why Cloud?• Pay-as-you-go• Infinite Resources• Cheaper Resources
Implications of Multi-tenancy
• VMs share many resources– CPU, cache, memory, disk, network, etc.
• Virtual Machine Managers (VMM) – Goal: Provide Isolation
• Deployed VMMs don’t perfectly isolate VMs– Side-channels [Ristenpart et al. ’09, Zhang et al. ’12]
2
VM
VM
VMM
3
Lies to by the Cloud
• Infinite resources
• All VMs are created equally
• Perfect isolation
This Talk
Taking control of where your instances run• Are all VMs created equally?• How much variation exists and why?• Can we take advantage of the variation to improve
performance?
Gaining performance at any cost• Can users impact each other’s performance?• Is there a way to maliciously steal another user’s resource?• Is tehre
5
Heterogeneity in EC2
• Cause of heterogeneity:– Contention for resources: you are sharing!– CPU Variation:
• Upgrades over time• Replacement of failed machined
– Network Variation: • Different path lengths• Different levels of oversubscription
6
Are All VMs Created Equally?• Inter-architecture:
– Is there differences between architectures– Can this be used to predict perform aprior?
• Intra-architecture:– Within an architecture– If large, then you can’t predict performance
• Temporal– On the same VM over time?– There is no hope!
7
Benchmark Suite & Methodoloy
• Methodology:– 6 Workloads– 20 VMs (small instances) for 1 week– Each run micro-benchmarks every hour
8
Inter-Architecture
9
Intra-Architecture
CPU is predictable – les than 15%Storage is unpredictable --- as high as 250%
10
Temporal
11
Overall
CPU type can only be used to predict CPU performance
For Mem/IO bound jobs need to empirically learn how good an instance is
12
What Can We Do about it?
• Goal: Run VM on best instances
• Constraints:– Can control placement – can’t control which instance
the cloud gives us– Can’t migrate
• Placement gaming:– Try and find the best instances simply by starting and
stopping VMs
Measurement Methodology
• Deploy on Amazon EC2– A=10 instances– 12 hours
• Compare against no strategy: – Run initial machines with no strategy• Baseline varies for each run
– Re-use machines for strategy
EC2 results
1 2 30
20
40
60
80
100Baseline Strategy
Apache Runs
MB/
sec
1 2 38
9
10
11
12Baseline Strategy
NER Runs
Reco
rds/
sec
16 migrations
15
Placement Gaming
• Approach:– Start a bunch of extra instances– Rank them based on performance– Kill the under performing instances
• Performing poorer than average– Start new instances.
• Interesting Questions:– How many instances should be killed in each round?– How frequently should you evaluate performance of
instances.
16
Resource-Freeing Attacks:Improve Your Cloud Performance
(at Your Neighbor's Expense)
(Venkat)anathan Varadarajan, Thawan Kooburat,
Benjamin Farley, Thomas Ristenpart,
and Michael Swift
DEPARTMENT OF COMPUTER SCIENCES
17
Contention in Xen
• Same Core– Same core & same L1 Cache & Same memory
• Same Package– Diff core but share L1 Cache and memory
• Different Package– Diff core & diff Cache but share Memory
18
I/O contends with self
• VMs contend for the same resource– Network with Network:
• More VMs Fair share is smaller– Disk I/O with Disk I/O:
• More disk access longer seek times
• Xen does N/W batching to give better performances– BUT: this adds jitter and delay– ALSO: you can get more than your fairshare because
of the batch
19
I/O contends with self
• VMs contend for the same resource– Network with Network:
• More VMs Fair share is smaller– Disk I/O with Disk I/O:
• More disk access longer seek times
• Xen does N/W batching to give better performances– BUT: this adds jitter and delay– ALSO: you can get more than your fairshare because
of the batch
20
Everyone Contends with Cache
• No contention on same core– VMs run in serial so access to cache is serial
• No contention on diff package– VMs use different cache
• Lots of contention when same package– VMs run in parallel but share same cache
21
CPU Net Disk Cache0
100
200
300
400
500
600
Perfo
rman
ce D
egra
datio
n (%
)
Contention in Xen
Local Xen TestbedMachine Intel Xeon E5430,
2.66 GhzCPU 2 packages each
with 2 coresCache Size 6MB per package
VM
VM
Non-work-conserving CPU scheduling
Work-conservingscheduling
3x-6x Performance loss Higher cost
This work: Greedy customer can recover performance by interfering with other tenants
Resource-Freeing Attack
What can a tenant do?
22
Pack up VM and move(See our SOCC 2012 paper)… but, not all workloads cheap to move
VM
VM
Ask provider for better isolation… requires overhaul of the cloud
23
Resource-freeing attacks (RFAs)
• What is an RFA? • RFA case studies
1. Two highly loaded web server VMs2. Last Level Cache (LLC) bound VM and
highly loaded webserver VM• Demonstration on Amazon EC2
24
The Setting
Victim:– One or more VMs– Public interface (eg, http)
Beneficiary:– VM whose performance we want
to improve
Helper:– Mounts the attack
Beneficiary and victim fighting over a target resource
Helper
VM
VM
Victim
Beneficiary
Example: Network Contention
• Beneficiary & Victim– Apache webservers hosting static and
dynamic (CGI) web pages.• Target Resource: Network Bandwidth• Work-conserving scheduler– network bandwidth
25
Net
Clients
What can you do?
VictimBeneficiary
Loca
l Xen
Test
be
d
26
Recipe for a Successful RFAShift resource away from the target resource towards the bottleneck resource
Shift resource usage via public interface
Proportion of Network usage
CPU intensive dynamic pages
Static pagesProp
ortio
n of
CPU
usa
ge
Push
tow
ards
CPU
bott
lene
ck
Reduce target resource usage
Limits
29
An RFA in Our Example
Net
Helper
CGI R
eque
st
CPU Utilization
Clients
Result in our testbed:Increases beneficiary’s share of bandwidth
No RFA: 1800 page requests/secW/ RFA: 3026 page requests/sec
50% 85%share of
bandwidth
30
Shared CPU Cache:– Ubiquitous: Almost all workloads need cache– Hardware controlled: Not easily isolated via
software– Performance Sensitive: High performance cost!
Resource-freeing attacks 1) Send targeted requests to victim 2) Shift resources use from target to a bottleneck
Can we mount RFAs when targetresource is CPU cache?
31
Cache Contention
1000 2000 30000
50
100
150
200
250
Webserver Request Rate
Cach
e Pe
rform
ance
Deg
rada
tion
(%)
RFA Goal
Case Study: Cache vs. Network
• Victim : Apache webserver hosting static and dynamic (CGI) web pages
• Beneficiary: Synthetic cache bound workload (LLCProbe)
• Target Resource: Cache• No cache isolation:– ~3x slower when sharing
cache with webserver
32
Net
Cache
$$$ Clients
Loca
l Xen
Test
be
d
VictimBeneficiary
CoreCore
33
Net
Cache vs. NetworkVictim webserver frequently interrupts, pollutes the cache– Reason: Xen gives higher
priority to VM consuming less CPU time
Cache
Clients$$$
Cache state time line
Beneficiary starts to run
Core Core
decreased cache efficiency
Webserver receives a
request
Heavily loaded web server
cache state
34
Net
Cache vs. Network w/ RFARFA helps in two ways:1. Webserver loses its
priority.2. Reducing the capacity
of webserver.Cache
Clients$$$
Cache state time line
Core Core
HelperHeavily loaded webserver requests under RFA
CGI R
eque
stBeneficiary starts to run
Webserver receives a
request
Heavily loaded web server
cache state
35
RFA: Performance ImprovementRFA intensities – time in ms per second
196% slowdown
86% slowdown
60%Performance Improvement
36
Discussion: Practical Aspects
RFA case studies used CPU intensive CGI requests– Alternative: DoS vulnerabilities
(Eg. hash-collision attacks)Identifying co-resident victims– Easy on most clouds
(Co-resident VMs have predictable internal IP addresses)
No public interface? – Paper discusses possibilities for RFAs
VM
VM
37
Limitations
• Experiments setup:– Only 1 VMs in each experiment– Don’t vary the number of each type of job
38
Discussion: Practical Aspects
RFA case studies used CPU intensive CGI requests– Alternative: DoS vulnerabilities
(Eg. hash-collision attacks)Identifying co-resident victims– Easy on most clouds
(Co-resident VMs have predictable internal IP addresses)
No public interface? – Paper discusses possibilities for RFAs
VM
VM
39
Discussion: Practical Aspects
RFA case studies used CPU intensive CGI requests– Alternative: DoS vulnerabilities
(Eg. hash-collision attacks)Identifying co-resident victims– Easy on most clouds
(Co-resident VMs have predictable internal IP addresses)
No public interface? – Paper discusses possibilities for RFAs
VM
VM
40
Discussion: Practical Aspects
RFA case studies used CPU intensive CGI requests– Alternative: DoS vulnerabilities
(Eg. hash-collision attacks)Identifying co-resident victims– Easy on most clouds
(Co-resident VMs have predictable internal IP addresses)
No public interface? – Paper discusses possibilities for RFAs
VM
VM
41
Discussion: Practical Aspects
RFA case studies used CPU intensive CGI requests– Alternative: DoS vulnerabilities
(Eg. hash-collision attacks)Identifying co-resident victims– Easy on most clouds
(Co-resident VMs have predictable internal IP addresses)
No public interface? – Paper discusses possibilities for RFAs
VM
VM
42
Discussion: Practical Aspects
RFA case studies used CPU intensive CGI requests– Alternative: DoS vulnerabilities
(Eg. hash-collision attacks)Identifying co-resident victims– Easy on most clouds
(Co-resident VMs have predictable internal IP addresses)
No public interface? – Paper discusses possibilities for RFAs
VM
VM
43
Discussion: Practical Aspects
RFA case studies used CPU intensive CGI requests– Alternative: DoS vulnerabilities
(Eg. hash-collision attacks)Identifying co-resident victims– Easy on most clouds
(Co-resident VMs have predictable internal IP addresses)
No public interface? – Paper discusses possibilities for RFAs
VM
VM
44
Discussion: Practical Aspects
RFA case studies used CPU intensive CGI requests– Alternative: DoS vulnerabilities
(Eg. hash-collision attacks)Identifying co-resident victims– Easy on most clouds
(Co-resident VMs have predictable internal IP addresses)
No public interface? – Paper discusses possibilities for RFAs
VM
VM
45
Discussion: Practical Aspects
RFA case studies used CPU intensive CGI requests– Alternative: DoS vulnerabilities
(Eg. hash-collision attacks)Identifying co-resident victims– Easy on most clouds
(Co-resident VMs have predictable internal IP addresses)
No public interface? – Paper discusses possibilities for RFAs
VM
VM