Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
1
SOFT CONTAINER TOWARDS 100% RESOURCE UTILIZATION
ACCELA ZHAO, LAYNE PENG
2
Accela Zhao, Technologist at EMC OCTO, active Openstack community contributor, experienced in cloud scheduling and container technologies.
WHO ARE THOSE GUYS …
Layne Peng, Principal Technologist at EMC OCTO, experienced cloud architect, one of the earliest contributors to Cloud Foundry in China, 9 patents owner and a book author.
Mail: [email protected]
Mail: [email protected] Twitter: @layne_peng
3
WHAT IS RESOURCE UTILIZATION?
This is what we buy
This is what we use
A gap of $$$ wasted
4
ENERGY AND RESOURCE UTILIZATION
Energy-related costs 42% of total (including buy new machines)
An idle server consumes even 70% as much energy as running in full-speed
Low resource utilization is energy inefficient Waste energy, waste money
Real world resource utilization is usually low: around 20% or less
5
A CLOSER LOOK TO CLOUD
The key advantage - cloud consolidation
Less machines, more apps. Energy-efficient and saves money.
Improved resource utilization
6
• Scheduling - choose the best resource placement when app starts – Examples: Green Cloud, Paragon. And the schedulers in
Openstack, Kubernetes, Mesos, …
• Migration - continuously optimize the resource placement when app is running – Examples: Openstack Watcher, VMware DRS
• Soft Container - elastic, and dynamically adjust resource constraints in response to co-located apps – Related: Google Heracles
RESOURCE UTILIZATION ON CLOUD
Soft Container
7
RESOURCE UTILIZATION ON CLOUD
Scheduler
Migration
Apps
Soft Container
Manages resource utilization at app kick-off
Manages resource utilization cross hosts while app running
Manages resource utilization at fine granularity inside host
8
RESOURCE UTILIZATION ON CLOUD
A battle of putting more apps in each host
vs. guaranteed app SLA
The key problem: resource interference
9
• What is resource interference? – Apps co-located in one host share resources like CPU,
cache, memory, …
– They interfere with each other, result in poor performance compared to running standalone
– Resource interference make SLA unenforceable
• Related readings – Google Heracles: an analysis of resource interference
– Paragon: resource interference-aware scheduling
– Bubble-up: to measure resource interference
THE KEY PROBLEM: RESOURCE INTERFERENCE
10
RESOURCE INTERFERENCE: HOW IT LOOKS?
MySQL standalone running vs co-located with a CPU & disk hungry task
11
• Bubble-up – The setup
• Run app co-located with resource benchmarks, each benchmark stresses one type of resource
– App tolerated resource interference • Slowly increase resource benchmark stress until app fails its SLA.
• The critical point shows how much resource interference the app can tolerate.
– App caused resource interference • Run app at what its SLA requires.
• The stress it causes on each type of resource is the app’s caused resource interference.
• Where to use it? – Better resource utilization management
– Scheduling, Migration, Soft Container, …
RESOURCE INTERFERENCE: HOW TO MEASURE?
12
RESOURCE INTERFERENCE: HOW TO MEASURE?
MySQL standalone running, vs co-located with CPU stress, vs disk stress. In my case, MySQL is much more sensitive to CPU interference.
13
• Motivations – Increase resource utilization by co-locating more apps
• E.g. Business services is critical but may not use all resources on the host. Add the low priority hadoop batching tasks to fill what is left.
– Respond to the dynamic nature of time-varying workload • E.g. Business service may become more idle at lunch time, hadoop
tasks can then expand its resource bubble and utilize the leftover.
– Guarantee the SLA of critical apps • E.g. When the business service suddenly requires more resource for
processing, hadoop tasks will shrink instantly to give out resources.
• Challenges – Resource control and isolation of interference
– Respond to dynamic workload change
INTRODUCING TO SOFT CONTAINER
14
• What does “Soft” mean? – Varying container resources needs based upon neighbors
and SLAs. (The container becomes elastic)
– “Expanding” (bubble up) resources when idle resources exist
– Shrinking resources on a specific container, when another critical app demands more resources
INTRODUCING TO SOFT CONTAINER
Container resource bubble
Time
Resource
15
THE FEEDBACK CONTROL LOOP
Controller
Watcher Limiter
Containers
Soft Container
16
RESOURCES TO LIMIT
• CPU – Core
– Time Quota
– …
• Memory – Size
– Bandwidth
– …
• Disk I/O – IOPS
– Throughput
– …
17
RESOURCES TO LIMIT - MISSING
• CPU – Core
– Time Quota
– …
• Cache – LLC
– …
• Memory – Size
– Bandwidth*
– …
• GPU – …
• Device* – …
• Network – Ulimit
– Bandwidth
– …
…
• Disk I/O – IOPS
– Throughput
– …
Kernel 3.6, most supports can be found in the community…
18
ISOLATION THE RESOURCES - NAMESPACE
/proc/<pid>/ns: • lrwxrwxrwx 1 root root 0 Jun 21 18:38 ipc -> ipc:[4026532509] • lrwxrwxrwx 1 root root 0 Jun 21 18:38 mnt -> mnt:[4026532507] • lrwxrwxrwx 1 root root 0 Jun 16 18:24 net -> net:[4026532512] • lrwxrwxrwx 1 root root 0 Jun 21 18:38 pid -> pid:[4026532510] • lrwxrwxrwx 1 root root 0 Jun 21 18:38 user -> user:[4026531837] • lrwxrwxrwx 1 root root 0 Jun 21 18:38 uts -> uts:[4026532508]
• clone(): create a new process and attached to a new namespace • unshare(): create a new namespace and attaches to a existed process • setns(): Set a a process to a existing namespace
• security namespace • security keys namespace • device namespace • time namespace
We are still waiting…
19
LIMIT THE RESOURCE - CGROUP
Task, Control Group & Hierarchy Subsystem – Control options
• blkio • cpu • cpuacct • cpuset • devices
• freezer • memory • net_cls • net_prio • ns
Create a cgroup subsystem Change the limitation…
Usage
# echo 524288000 > /sys/fs/cgroup/memory/foo/memory.limit_in_bytes
20
MISSING - NETWORK
Isolation, does not means resource controlling
10
Suppose two containers in a machine, totally 100Gbps b/w
80
100Gbps
21
MISSING - NETWORK
Isolation, does not means resource controlling
10
Suppose two containers in a machine, totally 100Gbps b/w
80
100Gbps
95
100Gbps
If the GREEN container consumes the majority of b/w, which may have a negative impact on the BLUE one… How we can avoid this case from happening?
22
MISSING - NETWORK
Community attempts: Base on Traffic Control (tc)
Nightmare of the PaaS providers…
23
MISSING - NETWORK
Community attempts: Base on Traffic Control (tc)
Nightmare of the PaaS providers…
24
MISSING - GPU
Nvidia’s efforts:
a. GPU exposed as separated normal devices in /dev
Ref: https://github.com/NVIDIA/nvidia-docker/wiki/GPU-isolation
b. devices cgroup: • Allow/Deny/List • Access
i. R ii. W iii. M
25
MISSING - GPU
Nvidia’s efforts:
a. GPU exposed as separated normal devices in /dev
Ref: https://github.com/NVIDIA/nvidia-docker/wiki/GPU-isolation
b. devices cgroup: • Allow/Deny/List • Access
i. R ii. W iii. M
Usable, but insufficient… 1. Launch multiple jobs in parallel, each one us a subset of avaiable GPUs; 2. How about share GPU between Jobs with proper isolation? Can we share
a GPU like we can a CPU?
26
MISSING - CACHE
Intel’s efforts:
Cache Monitor Technology (CMT) • For an OS or VMM to indicate a software-
defined ID for each of applications or VMs that are scheduled to run on a core. This ID is called the Resource Monitoring ID (RMID).
• To Monitor cache occupancy on a per RMID basis
• For an OS or VMM to read LLC occupancy for a given RMID at any time.
Cache Allocation Technology (CAT) • The ability to enumerate the CAT capability and
the associated LLC allocation support via CPUID.
• Interfaces for the OS/hypervisor to group applications into classes of service (CLOS) and indicate the amount of last-level cache available to each CLOS. These interfaces are based on MSRs (Model-Specific Registers).
Code and Data Prioritization (CDP) • Extension to CAT • a new CPUID feature flag is added within the
CAT sub-leaves at CPUID.0x10.[ResID=1]:ECx[bit 2] to indicate support
27
MISSING – MEMORY BANDWIDTH
Monitor
Memory Bandwidth Monitoring (MBM) • Mechanisms in hardware to monitor cache
occupancy and bandwidth statistics as applicable to a given product generation on a per software-id basis.
• Mechanisms for the OS or hypervisor to read back the collected metrics such as L3 occupancy or Memory Bandwidth for a given software ID at any point during runtime.
Control
Ref Memory Bandwidth Management for Efficient Performance Isolation in Multi-core Platform: http://pertsserver.cs.uiuc.edu/~mcaccamo/papers/private/IEEE_TC_journal_submitted_C.pdf Code: https://github.com/heechul/memguard
28
MISSING – MEMORY BANDWIDTH
Monitor
Memory Bandwidth Monitoring (MBM) • Mechanisms in hardware to monitor cache
occupancy and bandwidth statistics as applicable to a given product generation on a per software-id basis.
• Mechanisms for the OS or hypervisor to read back the collected metrics such as L3 occupancy or Memory Bandwidth for a given software ID at any point during runtime.
Control
Ref Memory Bandwidth Management for Efficient Performance Isolation in Multi-core Platform: http://pertsserver.cs.uiuc.edu/~mcaccamo/papers/private/IEEE_TC_journal_submitted_C.pdf Code: https://github.com/heechul/memguard
29
• Latencies – App request latency
– Disk IO await
– Network response time
• Queue length – CPU load average
– Disk request queue size
– Network queue length
• Utilization – CPU util rate
– Disk util rate
– Network util rate
WATCH THE WORKLOAD CHANGE
• Bandwidth – DRAM bandwidth
– CPU bandwidth
– Disk bandwidth
• Request count – App request count
– Disk IOPS / req/s
– Network IOPS / req/s
• Granularity – Global level
– Per container level
30
THE FEEDBACK CONTROL LOOP
Controller
Watcher Limiter
Containers
Soft Container
31
THE FEEDBACK CONTROL LOOP
Controller
Watcher Limiter
Containers
Soft Container
Immediate response
32
THE FEEDBACK CONTROL LOOP
Controller
Watcher Limiter
Containers
Soft Container
Immediate response
How to immediately resize the containers?
33
HOW WE LOOK AT RESIZE?
a. Create a new container; b. Live migrate the contents to new container:
1. Transfer existed data to new container; 2. Transfer the instant data to new container.
c. Stop the old container d. Start the new container e. Route the traffic to new container
34
9527 /usr/sbin/httpd
Control Groups (cgroup): • CPU time: 20 • System memory: 1G • Disk bandwidth: 2000 • Network bandwidth: 100Mbs
Control Groups (cgroup): • CPU time: 70 • System memory: 5G • Disk bandwidth: 8000 • Network bandwidth: 1Gbs
a. Mount to new cgroup or change the value of the cgroup
b. Done!
IN CONTAINER’S WORLD…
35
9527 /usr/sbin/httpd
Control Groups (cgroup): • CPU time: 20 • System memory: 1G • Disk bandwidth: 2000 • Network bandwidth: 100Mbs
Control Groups (cgroup): • CPU time: 70 • System memory: 5G • Disk bandwidth: 8000 • Network bandwidth: 1Gbs
a. Mount to new cgroup or change the value of the cgroup
b. Done!
IN CONTAINER’S WORLD…
We need to take a fresh look at the resources management from
Container’s perspective.
36
SOFT CONTAINER: IMPLEMENTATION
Controller Algorithm ”expand”
Algorithm ”pin_idle”
Algorithm plugin N
Watcher CPU plugin
Disk plugin
Watcher plugin N
Limiter RunC plugin
Docker plugin
Limiter plugin N
Metrics Store
CPU statistics
Disk …
More …
Container Repo
RunC plugin
Docker plugin
Container type N
Containers
Auto discovery
37
• Early version
• Support RunC and Docker containers
• A few controller algorithms which are effective
• Able to expand with more plugins
SOFT CONTAINER: CURRENT STATUS
Completely runnable!
39
BENCHMARK RESULTS: BEFORE
If uncontrolled, MySQL workload is severely interfered by co-located low priority task
40
BENCHMARK RESULTS: BEFORE
The CPU utilization is far from saturation while workload varies by time (Although in my case, disk IO is highly utilized)
41
BENCHMARK RESULTS: SOFT CONTAINER
With Soft Container (green line), latency impact is controlled. (We can improve the algorithm to cope better with peak workload)
42
BENCHMARK RESULTS: SOFT CONTAINER
Soft Container helps improve CPU utilization by co-locating new tasks with MySQL
43
BENCHMARK RESULTS: SOFT CONTAINER
CPU utilization looks close to saturation, after adding in iowait time
44
• Soft Container monitors app resource needs and overall resource utilization in realtime
• Soft Container issues resource controls in realtime, to guard app SLA and balance resource utilization
HOW DOES SOFT CONTAINER DID THIS?
45
BENCHMARK RESULTS: SOFT CONTAINER
How the resource bubble floats under the control of Soft Container. (The vibration threshold are made very sensitive to workload change)
Q&A