29
Linux Containers (LXC) IEEE P2302 Working Group Brief Boden Russell ([email protected] / [email protected])

Linux Container Brief for IEEE WG P2302

Embed Size (px)

DESCRIPTION

A brief into to Linux Containers presented to IEEE working group P2302 (InterCloud standards and portability). This deck covers: - Definitions and motivations for containers - Container technology stack - Containers vs Hypervisor VMs - Cgroups - Namespaces - Pivot root vs chroot - Linux Container image basics - Linux Container security topics - Overview of Linux Container tooling functionality - Thoughts on container portability and runtime configuration - Container tooling in the industry - Container gaps - Sample use cases for traditional VMs Overall, a bulk of this deck is covered in other material I have posted here. However there are a few new slides in this deck, most notability some thoughts on container portability and runtime config.

Citation preview

Page 1: Linux Container Brief for IEEE WG P2302

Linux Containers (LXC)IEEE P2302 Working Group Brief

Boden Russell ([email protected] / [email protected])

Page 2: Linux Container Brief for IEEE WG P2302

04/11/2023 2

Definitions

Linux Containers (LXC LinuX Containers)– Lightweight virtualization– Realized using features provided by a modern Linux kernel – VMs without the hypervisor (kind of)

Containerization of– (Linux) Operating Systems (less the kernel)– Single or multiple applications

LXC as a technology ≠ LXC “tools”

Page 3: Linux Container Brief for IEEE WG P2302

04/11/2023 3

Why LXC

Provision in seconds / milliseconds Near bare metal runtime performance VM-like agility – it’s still “virtualization” Flexibility– Containerize a “system”– Containerize “application(s)”

Lightweight– Just enough Operating System (JeOS)– Minimal per container penalty

Open source – free – lower TCO Supported with OOTB modern Linux kernel Growing in popularity

“Linux Containers are poised as the next VM in our modern Cloud era…”

Manual VM LXC

Provision Time

Days

Minutes

Seconds / ms

linpack performance @ 45000

0

50

100

150

200

250

vcpus

GF

lop

s

Google trends - LXC Google trends - docker

Page 4: Linux Container Brief for IEEE WG P2302

04/11/2023 4

LXC Technology Stack

Use

r Spa

ceKe

rnel

Spa

ce

Kernel

System Call Interface

Architecture Dependent Kernel Code

GLIBC / Pseudo FS / User Space Tools & Libs

Linux Container Tooling

Linux Container Commoditization

Orchestration & Management

Hardware

cgro

ups

nam

espa

ces

chro

ots

LSM

lxc

Page 5: Linux Container Brief for IEEE WG P2302

04/11/2023 5

Hypervisors vs. Linux Containers

Hardware

Operating System

Hypervisor

Virtual Machine

Operating System

Bins / libs

App App

Virtual Machine

Operating System

Bins / libs

App App

Hardware

Hypervisor

Virtual Machine

Operating System

Bins / libs

App App

Virtual Machine

Operating System

Bins / libs

App App

Hardware

Operating System

Container

Bins / libs

App App

Container

Bins / libs

App App

Type 1 Hypervisor Type 2 Hypervisor Linux Containers

Containers share the OS kernel of the host and thus are lightweight.However, each container must have the same OS kernel.

Containers are isolated, but share OS and, where appropriate, libs / bins.

Page 6: Linux Container Brief for IEEE WG P2302

04/11/2023 6

Linux cgroups History– Work started in 2006 by google engineers– Merged into upstream 2.6.24 kernel due to wider spread LXC usage– A number of features still a WIP

Functionality– Access; which devices can be used per cgroup– Resource limiting; memory, CPU, device accessibility, block I/O, etc.– Prioritization; who gets more of the CPU, memory, etc.– Accounting; resource usage per cgroup– Control; freezing & check pointing– Injection; packet tagging

Usage– cgroup functionality exposed as “resource controllers” (aka “subsystems”)– Subsystems mounted on FS– Top-level subsystem mount is the root cgroup; all procs on host– Directories under top-level mounts created per cgroup– Procs put in tasks file for group assignment– Interface via read / write pseudo files in group

Cgroups provide throttling, prioritization, control and metrics for a group of processes.

Page 7: Linux Container Brief for IEEE WG P2302

04/11/2023 7

Linux cgroups Subsystems cgroups provided via kernel modules– Not always loaded / provided by default– Locate and load with modprobe

Some features tied to kernel version See: https://www.kernel.org/doc/Documentation/cgroups/

Subsystem Tunable Parameters

blkio - Weighted proportional block I/O access. Group wide or per device.- Per device hard limits on block I/O read/write specified as bytes per second or IOPS per second.

cpu - Time period (microseconds per second) a group should have CPU access.- Group wide upper limit on CPU time per second.- Weighted proportional value of relative CPU time for a group.

cpuset - CPUs (cores) the group can access.- Memory nodes the group can access and migrate ability.- Memory hardwall, pressure, spread, etc.

devices - Define which devices and access type a group can use.

freezer - Suspend/resume group tasks.

memory - Max memory limits for the group (in bytes).- Memory swappiness, OOM control, hierarchy, etc..

hugetlb - Limit HugeTLB size usage.- Per cgroup HugeTLB metrics.

net_cls - Tag network packets with a class ID.- Use tc to prioritize tagged packets.

net_prio - Weighted proportional priority on egress traffic (per interface).

Page 8: Linux Container Brief for IEEE WG P2302

04/11/2023 8

Linux namespaces History– Initial kernel patches in 2.4.19– Recent 3.8 patches for user namespace support – A number of features still a WIP

Functionality– Provide process level isolation of global resources• MNT (mount points, file systems, etc.)• PID (process)• NET (NICs, routing, etc.)• IPC (System V IPC resources)• UTS (host & domain name)• USER (UID + GID)

– Process(es) in namespace have illusion they are the only processes on the system– Generally constructs exist to permit “connectivity” with parent namespace

Usage– Construct namespace(s) of desired type– Create process(es) in namespace (typically done when creating namespace)– If necessary, initialize “connectivity” to parent namespace– Process(es) in name space internally function as if they are only proc(s) on system

Namespaces provide isolation of Linux resources for a group of processes.

Page 9: Linux Container Brief for IEEE WG P2302

04/11/2023 9

Linux namespaces: Conceptual Overview

global (i.e. root) namespace

MNT NS//proc/mnt/fsrd/mnt/fsrw/mnt/cdrom/run2

UTS NSglobalhostrootns.com

PID NSPID COMMAND1 /sbin/init2 [kthreadd]3 [ksoftirqd]4 [cpuset]5 /sbin/udevd6 /bin/sh7 /bin/bash

IPC NSSHMID OWNER32452 root43321 boden

SEMID OWNER0 root1 Boden

MSQID OWNER

NET NSlo: UNKNOWN…eth0: UP…eth1: UP…br0: UP…

app1 IP:5000app2 IP:6000app3 IP:7000

USER NSroot 0:0ntp 104:109mysql 105:110boden 106:111

purple namespace

MNT NS//proc/mnt/purplenfs/mnt/fsrw/mnt/cdrom

UTS NSpurplehostpurplens.com

PID NSPID COMMAND1 /bin/bash2 /bin/vim

IPC NSSHMID OWNER

SEMID OWNER0 root

MSQID OWNER

NET NSlo: UNKNOWN…eth0: UP…

app1 IP:1000app2 IP:7000

USER NSroot 0:0app 106:111

blue namespace

MNT NS//proc/mnt/cdrom/bluens

UTS NSbluehostbluens.com

PID NSPID COMMAND1 /bin/bash2 python3 node

IPC NSSHMID OWNER

SEMID OWNER

MSQID OWNER

NET NSlo: UNKNOWN…eth0: DOWN…eth1: UP

app1 IP:7000app2 IP:9000

USER NSroot 0:0app 104:109

Page 10: Linux Container Brief for IEEE WG P2302

04/11/2023 10

Linux namespaces: Common Idioms

It’s not required to use all namespaces – Pick & choose; if your toolset allows it

Constructs exist to permit “connectivity” between parent / child namespace

Various linux user space tools have namespace support Linux sys API supports flexible namespace creation

Page 11: Linux Container Brief for IEEE WG P2302

04/11/2023 11

Linux namespaces & cgroups: Availability

Note: user namespace support in upstream kernel 3.8+, but distributions rolling out phased support:- Map LXC UID/GID between

container and host- Non-root LXC creation

Page 12: Linux Container Brief for IEEE WG P2302

04/11/2023 12

Linux chroot & pivot_root Using pivot_root with MNT namespace addresses escaping chroot

concerns The pivot_root target directory becomes the “new root FS”

Page 13: Linux Container Brief for IEEE WG P2302

04/11/2023 13

LXC ImagesLXC images provide a flexible means to deliver only what you need – lightweight and minimal footprint.

Basic constraints– Same architecture & endian– Linux’ish Operating System; you can run different Linux distros on same host

Image types– System; virtualize Operating System(s) – standard distro root FS less the kernel– Application; virtualize application(s) – only package apps + dependencies (aka JeOS – Just

enough Operating System) Bind mount host libs / bins into LXC to share host resources Container image init process

– Container init command provided on invocation – can be an application or a full fledged init process

– Init script customized for image – skinny SysVinit, upstart, etc.– Reduces overhead of lxc start-up and runtime foot print

Various tools to build images– SuSE Kiwi– Debootstrap– Etc.

LXC tooling options often include numerous image templates

Page 14: Linux Container Brief for IEEE WG P2302

04/11/2023 14

Linux Security Modules & MAC Linux Security Modules (LSM) – kernel modules which provide a

framework for Mandatory Access Control (MAC) security implementations MAC vs DAC– In MAC, admin (user or process) assigns access controls to subject / initiator– In DAC, resource owner (user) assigns access controls to individual resources

Existing LSM implementations include: AppArmor, SELinux, GRSEC, etc.

Page 15: Linux Container Brief for IEEE WG P2302

04/11/2023 15

Linux Capabilities

Per process privileges which define sys call access

Can be assigned to LXC process(es)

Page 16: Linux Container Brief for IEEE WG P2302

04/11/2023 16

Other Security Measures

Reduce shared FS access using RO bind mounts Linux seccomp– Confine system calls

Keep Linux kernel up to date User namespaces in 3.8+ kernel; mitigates container root

escape concerns– Launching containers as non-root user– Mapping UID / GID into container

Page 17: Linux Container Brief for IEEE WG P2302

04/11/2023 17

LXC Security Triangle Private environments (trusted code)– App packaging / deployment / management / etc, devOps, Cloud, etc…

No additional worries about security Public environments– Single tenant • Same restrictions as private envs; tenant trusted code

– Multi-tenant

Privileges, Multitenancy, Untrusted Code

Secu

rity

Mea

sure

s

LSM, capabilities,seccomp, RO bind mounts,

GRSEC, etc.

LXC Security Triangle

NOTE: User namespaces will mitigate root escape concerns

Page 18: Linux Container Brief for IEEE WG P2302

04/11/2023 18

So You Want To Build A Container?

Page 19: Linux Container Brief for IEEE WG P2302

04/11/2023 19

LXC Engine: A “Hypervisor” for Containers

Hardware

HypervisorVirtual Machine

Operating System

Bins / libs

App App

Virtual Machine

Operating System

Bins / libs

App App

Hardware

Operating System

Container

Bins / libs

App App

Container

Bins / libs

App App

Hardware

Operating System

Container

Bins / libs

App App

Container

Bins / libs

App App

LXC Engine / Tooling

Type 1 Hypervisor Linux Containers LXC Engine / Tooling

Conceptual Mapping

VM ContainerHypervisor LXC Engine

Page 20: Linux Container Brief for IEEE WG P2302

04/11/2023 20

Docker Container Engine

PC Mac Virtual Machine Bare Metal

DockerImage

Develop container image once – run anywhere the docker engine is supported.

Page 21: Linux Container Brief for IEEE WG P2302

04/11/2023 21

Typical Container Engine / Tooling Functionality

Standard container operations to realize / manage LXCs– Run, start, stop, delete, attach, snapshot, etc.

APIs– CLI, Web API (RESTful or other), Automation files (e.g. dockerfile)– Security – who can use the APIs

Images– Format (typically RAW), required dev files, etc..– Image repository or catalog– Container FS layers (union FS style)– Import / export images and containers

Metrics– CPU, memory, block IO, etc..

Eventing– Push events on container changes

Logs– Inspect / manage container logs

Checkpointing– Freeze a container’s state

Sample concepts below; not a complete list.

Page 22: Linux Container Brief for IEEE WG P2302

04/11/2023 22

LXC Host / Engine Portability Considerations

Linux Kernel dependencies– Non-standard kernel modules needed by app(s) on image– LXC related kernel features tied to specific versions of the Kernel

LSM profile settings; security– Different profile config per LSM (e.g. AppArmor, SELinux, etc.)

Container file system / storage type & capability– FS type– FS capabilities such as direct I/O

External storage– FS vols bind mounted into container

Network options– Linux bridging, OVS, etc…

Shared host libs / bins; dependencies from host OS

Sample concepts below; not a complete list.Not well solidified in current open source community.

Page 23: Linux Container Brief for IEEE WG P2302

04/11/2023 23

LXC Runtime Configuration Considerations

Linux capabilities– Applied to container process for Linux security– Sysctl settings

Cgroups settings– Allowed devices, limits, UID / GID mappings, etc.

Environment variables– Exposed to shell / procs in container

Namespaces– Which to use for container procs

Networking– Type, IP, gateway, hostname, etc..

Init– Which cmd / bin to run on container init

Example: http://tinyurl.com/nj4u3le

Sample concepts below; not a complete list.Community consolidating around docker’s libcontainer project.

Page 24: Linux Container Brief for IEEE WG P2302

04/11/2023 24

LXC Industry ToolingParallels OpenVZ Linux

VServerLibvirt-lxc Lxc (tools) Warden lmctfy Docker

Summary Commercial product using OpenVZ under the hood

Custom Kernel (pre GNU 3.13) providing well seasoned LXC support.

A set of kernel patches providing LXC. Not based on cgroups or namespaces.

Libvirt support for LXC via cgroups and namespaces.

Lib + set of user spaces tools /bindings for LXC.

LXC management tooling used by CF.

Similar to LXC, but provides more intent based focus.

Commoditization of LXC adding support for images, build files, etc.

Part of upstream Kernel?

No No Partial Yes Yes Yes Yes, but additional patches needed for specific features.

Yes

License Commercial GNU GPL v2 GNU GPL v2 GNU LGPL GNU LGPL Apache v2 Apache v2 Apache v2

APIs / Bindings

- CLI- API

- CLI- C

- CLI- C- Python- Java- C#- PHP

- Python- Lua- GO- CLI

- GO- REST- CLI- Python- Other 3rd

party libs

Management plane/ Dashboard

Parallels Cloud Server

Parallels + others

- OpenStack- Archipel- Virt-

Manager

- LXC web panel

- Lexy

- OpenStack- Shipyard- Docker UI

Page 25: Linux Container Brief for IEEE WG P2302

04/11/2023 25

LXC Orchestration & Management Docker & libvirt-lxc in OpenStack– Manage containers heterogeneously with traditional VMs… but not w/the level

of support & features we might like CoreOS– Zero-touch admin Linux distro with docker images as the unit of operation– Centralized key/value store to coordinate distributed environment

Various other 3rd party apps– Maestro for docker– Shipyard for docker– Fleet for CoreOS– Etc.

LXC migration– Container migration via criu

But…– Still no great way to tie all virtual resources together with LXC – e.g. storage +

networking• IMO; an area which needs focus for LXC to become more generally applicable

Page 26: Linux Container Brief for IEEE WG P2302

04/11/2023 26

LXC Gaps

There are gaps…

Lack of industry tooling / support Live migration still a WIP Full orchestration across resources (compute / storage / networking) Fears of security Not a well known technology… yet Integration with existing virtualization and Cloud tooling Not much / any industry standards Missing skillset Slower upstream support due to kernel dev process Etc.

Page 27: Linux Container Brief for IEEE WG P2302

04/11/2023 27

LXC: Use Cases For Traditional VMs

There are still use cases where traditional VMs are warranted.

Virtualization of non Linux based OSs– Windows– AIX– Etc.

LXC not supported on host VM requires unique kernel setup which is not applicable to other VMs on the host

(i.e. per VM kernel config) Etc.

Page 28: Linux Container Brief for IEEE WG P2302

04/11/2023 28

Questions?

Page 29: Linux Container Brief for IEEE WG P2302

04/11/2023 29

Links & Resources http://www.slideshare.net/BodenRussell/realizing-linux-containerslxc http://www.slideshare.net/BodenRussell/kvm-and-docker-lxc-benchmarking-with-

openstack https://

github.com/dotcloud/docker/tree/master/vendor/src/github.com/docker/libcontainer

https://github.com/docker/libchan https://github.com/docker/libswarm https://help.ubuntu.com/lts/serverguide/lxc.html http://www.docker.com/ http://www.parallels.com/ http://openvz.org/Main_Page http://criu.org/LXC https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt https://lwn.net/Articles/531114/ http://www.hansenpartnership.com/OpenStackContainers2014/#/begin