Let Me Contain That For You - Linux Plumbers Conf · Google Confidential and Proprietary Victor...

Preview:

Citation preview

Google Confidential and Proprietary

Victor Marmol (vmarmol@google.com)Rohit Jnagal (jnagal@google.com)

Let Me Contain That For You

Google Confidential and ProprietaryLPC 2013

Containers @ Google

● Early users: Scaling process management and isolation.● What: Linux cgroups + user-space policies and monitoring.● Everywhere: SaaS, PaaS, IaaS; Private and Public clouds.● Containerizing shared machines

○ Asymmetric workloads : Latency, bandwidth, and priority○ Asymmetric Isolation○ High churn

● Goals:○ Performance guarantees.○ High utilization across resources.○ Shared resources.○ Overcommitment: Invisible workload from reclaimed resources.○ Near zero overhead.

● Other use cases: ChromeOS et al

Google Confidential and ProprietaryLPC 2013

I/O:CPU:MemSensitive Front End Job Back End Job

Allocation

BACKGROUND TASKS

A Shared Google Machine

System Daemons Batch workload Soaker workload

Google Confidential and ProprietaryLPC 2013

Resource Isolation

● Quality of service○ Bandwidth - Fair share, progress guarantees, availability.○ Latency - wakeup, allocation, access times○ Priority - Order of importance.○ Performance: Microarchitecture interference (CPI2); Locality

● Solution: ○ Scheduling a good mix.○ Hierarchical resource management for effective sharing.○ Maximize utilization across all dimensions.○ Cgroup-aware tasks:

■ User subcontainers [eg. Query management]■ User schedulers.■ Self-correcting tasks: Notifications

image credit

Google Confidential and ProprietaryLPC 2013

Scalability

● Churn○ 1 Creation/Deletion per 10 seconds

● Per Container○ Read: O(10) cgroup-based stats per second○ Write: O(1) cgroup-based param per second

● Per Machine○ O(100) containers○ Looks to grow dramatically

● Overall○ Read: 1000’s per second○ Write: 100’s per second

● Users can do a lot more.

● Precise accounting for chargeback● Monitoring built in at multiple layers● Extremely low overhead

Google Confidential and ProprietaryLPC 2013

ContainLet Me That For You

● Revised container management○ Separate cgroup abstraction from policies.○ Configuring cgroups with an intent-based resource specification.

● Built for scalability and parallel access.● Also includes extra kernel patches for:

○ Improving resource isolation.○ Providing tighter performance guarantees.○ Precise accounting in face of sharing.○ Cap for global resources.

● Allow users to create subcontainers with restrictions.● Open-source: Sharing use-cases, problems, and benchmarks.● Implement policies in a higher layer:

○ Continuous monitoring and fine-tuning.○ No critical loops [Remember LPC2011?]○ Machine-level utilization and isolation management.○ Isolated from system APIs.

Google Confidential and ProprietaryLPC 2013

T1[1536]

T2[512]

T1[2G]

T2[3G]

/dev/cgroup/cpu/A1[2048]

/dev/cgroup/mem/A1[4G]

Task running in an allocation sharing resources with co-located siblings.

An allocation A1 with two tasks T1 and T2

Hierarchical Sharing

Google Confidential and ProprietaryLPC 2013

Managing priority across resources

T1[0.1]

T1[1G]

T3[1G]

Block I/O Cpu

Default[0.1]

T3[0.1]

T2[0.8]

T1[512]

Default[2]

T3[256]

T2[1024]

Memory

T2[2G]

Cgroups for low-priority batch tasks Cgroups for a latency sensitive task

Google Confidential and ProprietaryLPC 2013

T2[2G]

Block I/O Cpu

Default[0.1]

T2[0.1]

T1[0.8]

Default[2]

T2[1024]

T1[2048]

Memory

T1[0.3]

T1[PRIO][0.5]

T1[4G]

Cgroups for a high I/O priority latency sensitive task

Cgroups for a low priority task

A task may require multiple containers for the same resource to balance its workload priorities. I/O server T1 uses two subcontainers to differentiate incoming I/O requests and moves threads to the right subcontainer.

Managing priority across resources

Google Confidential and ProprietaryLPC 2013

Splitting hierarchies for performance

T2[2G]

Block I/O Cpu

Default[0.1]

T3[0.1]

T1[0.8] Default

[2]

T3[1024]

T1[2048]

Memory

T1[4G]

Splitting hierarchies reduces stranded resources and

improves performance for highly sensitive tasks.

T2[1024]

T3[2G]

T2[0.1]

T1[0.5|P]

T1[0.3]

Cpu, Memory and I/O sensitive task

Cpu & Memory sensitive task with low I/O priority

Low priority batch task

Google Confidential and ProprietaryLPC 2013

User Subcontainers

App Engine Task

Server Instances

Instance1 Instance3Instance2

App Engine uses on-demand container creation:fair sharing, notifications, and isolation of misbehaving apps

Protected server app

Subcontainers with tailored spec and priority

OOM

Google Confidential and ProprietaryLPC 2013

Takeaways

Come find us for chat, discussions, BoF, and drinks. Or virtually:jnagal@google.comvmarmol@google.com

● Cgroups support goes beyond containerized VMs.● Sharing and overcommitment is a key to higher

utilization.● Managing each resource separately helps fine-tune

utilization and performance.● More power to users means better flexibility and

scalability.

Recommended