12
Google Confidential and Proprietary Victor Marmol ([email protected] ) Rohit Jnagal ([email protected] ) Let Me Contain That For You

Let Me Contain That For You - Linux Plumbers Conf · Google Confidential and Proprietary Victor Marmol ([email protected]) Rohit Jnagal ([email protected]) Let Me Contain That For

  • Upload
    dinhbao

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Let Me Contain That For You - Linux Plumbers Conf · Google Confidential and Proprietary Victor Marmol (vmarmol@google.com) Rohit Jnagal (jnagal@google.com) Let Me Contain That For

Google Confidential and Proprietary

Victor Marmol ([email protected])Rohit Jnagal ([email protected])

Let Me Contain That For You

Page 2: Let Me Contain That For You - Linux Plumbers Conf · Google Confidential and Proprietary Victor Marmol (vmarmol@google.com) Rohit Jnagal (jnagal@google.com) Let Me Contain That For

Google Confidential and ProprietaryLPC 2013

Containers @ Google

● Early users: Scaling process management and isolation.● What: Linux cgroups + user-space policies and monitoring.● Everywhere: SaaS, PaaS, IaaS; Private and Public clouds.● Containerizing shared machines

○ Asymmetric workloads : Latency, bandwidth, and priority○ Asymmetric Isolation○ High churn

● Goals:○ Performance guarantees.○ High utilization across resources.○ Shared resources.○ Overcommitment: Invisible workload from reclaimed resources.○ Near zero overhead.

● Other use cases: ChromeOS et al

Page 3: Let Me Contain That For You - Linux Plumbers Conf · Google Confidential and Proprietary Victor Marmol (vmarmol@google.com) Rohit Jnagal (jnagal@google.com) Let Me Contain That For

Google Confidential and ProprietaryLPC 2013

I/O:CPU:MemSensitive Front End Job Back End Job

Allocation

BACKGROUND TASKS

A Shared Google Machine

System Daemons Batch workload Soaker workload

Page 4: Let Me Contain That For You - Linux Plumbers Conf · Google Confidential and Proprietary Victor Marmol (vmarmol@google.com) Rohit Jnagal (jnagal@google.com) Let Me Contain That For

Google Confidential and ProprietaryLPC 2013

Resource Isolation

● Quality of service○ Bandwidth - Fair share, progress guarantees, availability.○ Latency - wakeup, allocation, access times○ Priority - Order of importance.○ Performance: Microarchitecture interference (CPI2); Locality

● Solution: ○ Scheduling a good mix.○ Hierarchical resource management for effective sharing.○ Maximize utilization across all dimensions.○ Cgroup-aware tasks:

■ User subcontainers [eg. Query management]■ User schedulers.■ Self-correcting tasks: Notifications

image credit

Page 5: Let Me Contain That For You - Linux Plumbers Conf · Google Confidential and Proprietary Victor Marmol (vmarmol@google.com) Rohit Jnagal (jnagal@google.com) Let Me Contain That For

Google Confidential and ProprietaryLPC 2013

Scalability

● Churn○ 1 Creation/Deletion per 10 seconds

● Per Container○ Read: O(10) cgroup-based stats per second○ Write: O(1) cgroup-based param per second

● Per Machine○ O(100) containers○ Looks to grow dramatically

● Overall○ Read: 1000’s per second○ Write: 100’s per second

● Users can do a lot more.

● Precise accounting for chargeback● Monitoring built in at multiple layers● Extremely low overhead

Page 6: Let Me Contain That For You - Linux Plumbers Conf · Google Confidential and Proprietary Victor Marmol (vmarmol@google.com) Rohit Jnagal (jnagal@google.com) Let Me Contain That For

Google Confidential and ProprietaryLPC 2013

ContainLet Me That For You

● Revised container management○ Separate cgroup abstraction from policies.○ Configuring cgroups with an intent-based resource specification.

● Built for scalability and parallel access.● Also includes extra kernel patches for:

○ Improving resource isolation.○ Providing tighter performance guarantees.○ Precise accounting in face of sharing.○ Cap for global resources.

● Allow users to create subcontainers with restrictions.● Open-source: Sharing use-cases, problems, and benchmarks.● Implement policies in a higher layer:

○ Continuous monitoring and fine-tuning.○ No critical loops [Remember LPC2011?]○ Machine-level utilization and isolation management.○ Isolated from system APIs.

Page 7: Let Me Contain That For You - Linux Plumbers Conf · Google Confidential and Proprietary Victor Marmol (vmarmol@google.com) Rohit Jnagal (jnagal@google.com) Let Me Contain That For

Google Confidential and ProprietaryLPC 2013

T1[1536]

T2[512]

T1[2G]

T2[3G]

/dev/cgroup/cpu/A1[2048]

/dev/cgroup/mem/A1[4G]

Task running in an allocation sharing resources with co-located siblings.

An allocation A1 with two tasks T1 and T2

Hierarchical Sharing

Page 8: Let Me Contain That For You - Linux Plumbers Conf · Google Confidential and Proprietary Victor Marmol (vmarmol@google.com) Rohit Jnagal (jnagal@google.com) Let Me Contain That For

Google Confidential and ProprietaryLPC 2013

Managing priority across resources

T1[0.1]

T1[1G]

T3[1G]

Block I/O Cpu

Default[0.1]

T3[0.1]

T2[0.8]

T1[512]

Default[2]

T3[256]

T2[1024]

Memory

T2[2G]

Cgroups for low-priority batch tasks Cgroups for a latency sensitive task

Page 9: Let Me Contain That For You - Linux Plumbers Conf · Google Confidential and Proprietary Victor Marmol (vmarmol@google.com) Rohit Jnagal (jnagal@google.com) Let Me Contain That For

Google Confidential and ProprietaryLPC 2013

T2[2G]

Block I/O Cpu

Default[0.1]

T2[0.1]

T1[0.8]

Default[2]

T2[1024]

T1[2048]

Memory

T1[0.3]

T1[PRIO][0.5]

T1[4G]

Cgroups for a high I/O priority latency sensitive task

Cgroups for a low priority task

A task may require multiple containers for the same resource to balance its workload priorities. I/O server T1 uses two subcontainers to differentiate incoming I/O requests and moves threads to the right subcontainer.

Managing priority across resources

Page 10: Let Me Contain That For You - Linux Plumbers Conf · Google Confidential and Proprietary Victor Marmol (vmarmol@google.com) Rohit Jnagal (jnagal@google.com) Let Me Contain That For

Google Confidential and ProprietaryLPC 2013

Splitting hierarchies for performance

T2[2G]

Block I/O Cpu

Default[0.1]

T3[0.1]

T1[0.8] Default

[2]

T3[1024]

T1[2048]

Memory

T1[4G]

Splitting hierarchies reduces stranded resources and

improves performance for highly sensitive tasks.

T2[1024]

T3[2G]

T2[0.1]

T1[0.5|P]

T1[0.3]

Cpu, Memory and I/O sensitive task

Cpu & Memory sensitive task with low I/O priority

Low priority batch task

Page 11: Let Me Contain That For You - Linux Plumbers Conf · Google Confidential and Proprietary Victor Marmol (vmarmol@google.com) Rohit Jnagal (jnagal@google.com) Let Me Contain That For

Google Confidential and ProprietaryLPC 2013

User Subcontainers

App Engine Task

Server Instances

Instance1 Instance3Instance2

App Engine uses on-demand container creation:fair sharing, notifications, and isolation of misbehaving apps

Protected server app

Subcontainers with tailored spec and priority

OOM

Page 12: Let Me Contain That For You - Linux Plumbers Conf · Google Confidential and Proprietary Victor Marmol (vmarmol@google.com) Rohit Jnagal (jnagal@google.com) Let Me Contain That For

Google Confidential and ProprietaryLPC 2013

Takeaways

Come find us for chat, discussions, BoF, and drinks. Or virtually:[email protected]@google.com

● Cgroups support goes beyond containerized VMs.● Sharing and overcommitment is a key to higher

utilization.● Managing each resource separately helps fine-tune

utilization and performance.● More power to users means better flexibility and

scalability.