Upload
boden-russell
View
1.197
Download
7
Embed Size (px)
DESCRIPTION
A brief into to Linux Containers presented to IEEE working group P2302 (InterCloud standards and portability). This deck covers: - Definitions and motivations for containers - Container technology stack - Containers vs Hypervisor VMs - Cgroups - Namespaces - Pivot root vs chroot - Linux Container image basics - Linux Container security topics - Overview of Linux Container tooling functionality - Thoughts on container portability and runtime configuration - Container tooling in the industry - Container gaps - Sample use cases for traditional VMs Overall, a bulk of this deck is covered in other material I have posted here. However there are a few new slides in this deck, most notability some thoughts on container portability and runtime config.
Citation preview
Linux Containers (LXC)IEEE P2302 Working Group Brief
Boden Russell ([email protected] / [email protected])
04/11/2023 2
Definitions
Linux Containers (LXC LinuX Containers)– Lightweight virtualization– Realized using features provided by a modern Linux kernel – VMs without the hypervisor (kind of)
Containerization of– (Linux) Operating Systems (less the kernel)– Single or multiple applications
LXC as a technology ≠ LXC “tools”
04/11/2023 3
Why LXC
Provision in seconds / milliseconds Near bare metal runtime performance VM-like agility – it’s still “virtualization” Flexibility– Containerize a “system”– Containerize “application(s)”
Lightweight– Just enough Operating System (JeOS)– Minimal per container penalty
Open source – free – lower TCO Supported with OOTB modern Linux kernel Growing in popularity
“Linux Containers are poised as the next VM in our modern Cloud era…”
Manual VM LXC
Provision Time
Days
Minutes
Seconds / ms
linpack performance @ 45000
0
50
100
150
200
250
vcpus
GF
lop
s
Google trends - LXC Google trends - docker
04/11/2023 4
LXC Technology Stack
Use
r Spa
ceKe
rnel
Spa
ce
Kernel
System Call Interface
Architecture Dependent Kernel Code
GLIBC / Pseudo FS / User Space Tools & Libs
Linux Container Tooling
Linux Container Commoditization
Orchestration & Management
Hardware
cgro
ups
nam
espa
ces
chro
ots
LSM
lxc
04/11/2023 5
Hypervisors vs. Linux Containers
Hardware
Operating System
Hypervisor
Virtual Machine
Operating System
Bins / libs
App App
Virtual Machine
Operating System
Bins / libs
App App
Hardware
Hypervisor
Virtual Machine
Operating System
Bins / libs
App App
Virtual Machine
Operating System
Bins / libs
App App
Hardware
Operating System
Container
Bins / libs
App App
Container
Bins / libs
App App
Type 1 Hypervisor Type 2 Hypervisor Linux Containers
Containers share the OS kernel of the host and thus are lightweight.However, each container must have the same OS kernel.
Containers are isolated, but share OS and, where appropriate, libs / bins.
04/11/2023 6
Linux cgroups History– Work started in 2006 by google engineers– Merged into upstream 2.6.24 kernel due to wider spread LXC usage– A number of features still a WIP
Functionality– Access; which devices can be used per cgroup– Resource limiting; memory, CPU, device accessibility, block I/O, etc.– Prioritization; who gets more of the CPU, memory, etc.– Accounting; resource usage per cgroup– Control; freezing & check pointing– Injection; packet tagging
Usage– cgroup functionality exposed as “resource controllers” (aka “subsystems”)– Subsystems mounted on FS– Top-level subsystem mount is the root cgroup; all procs on host– Directories under top-level mounts created per cgroup– Procs put in tasks file for group assignment– Interface via read / write pseudo files in group
Cgroups provide throttling, prioritization, control and metrics for a group of processes.
04/11/2023 7
Linux cgroups Subsystems cgroups provided via kernel modules– Not always loaded / provided by default– Locate and load with modprobe
Some features tied to kernel version See: https://www.kernel.org/doc/Documentation/cgroups/
Subsystem Tunable Parameters
blkio - Weighted proportional block I/O access. Group wide or per device.- Per device hard limits on block I/O read/write specified as bytes per second or IOPS per second.
cpu - Time period (microseconds per second) a group should have CPU access.- Group wide upper limit on CPU time per second.- Weighted proportional value of relative CPU time for a group.
cpuset - CPUs (cores) the group can access.- Memory nodes the group can access and migrate ability.- Memory hardwall, pressure, spread, etc.
devices - Define which devices and access type a group can use.
freezer - Suspend/resume group tasks.
memory - Max memory limits for the group (in bytes).- Memory swappiness, OOM control, hierarchy, etc..
hugetlb - Limit HugeTLB size usage.- Per cgroup HugeTLB metrics.
net_cls - Tag network packets with a class ID.- Use tc to prioritize tagged packets.
net_prio - Weighted proportional priority on egress traffic (per interface).
04/11/2023 8
Linux namespaces History– Initial kernel patches in 2.4.19– Recent 3.8 patches for user namespace support – A number of features still a WIP
Functionality– Provide process level isolation of global resources• MNT (mount points, file systems, etc.)• PID (process)• NET (NICs, routing, etc.)• IPC (System V IPC resources)• UTS (host & domain name)• USER (UID + GID)
– Process(es) in namespace have illusion they are the only processes on the system– Generally constructs exist to permit “connectivity” with parent namespace
Usage– Construct namespace(s) of desired type– Create process(es) in namespace (typically done when creating namespace)– If necessary, initialize “connectivity” to parent namespace– Process(es) in name space internally function as if they are only proc(s) on system
Namespaces provide isolation of Linux resources for a group of processes.
04/11/2023 9
Linux namespaces: Conceptual Overview
global (i.e. root) namespace
MNT NS//proc/mnt/fsrd/mnt/fsrw/mnt/cdrom/run2
UTS NSglobalhostrootns.com
PID NSPID COMMAND1 /sbin/init2 [kthreadd]3 [ksoftirqd]4 [cpuset]5 /sbin/udevd6 /bin/sh7 /bin/bash
IPC NSSHMID OWNER32452 root43321 boden
SEMID OWNER0 root1 Boden
MSQID OWNER
NET NSlo: UNKNOWN…eth0: UP…eth1: UP…br0: UP…
app1 IP:5000app2 IP:6000app3 IP:7000
USER NSroot 0:0ntp 104:109mysql 105:110boden 106:111
purple namespace
MNT NS//proc/mnt/purplenfs/mnt/fsrw/mnt/cdrom
UTS NSpurplehostpurplens.com
PID NSPID COMMAND1 /bin/bash2 /bin/vim
IPC NSSHMID OWNER
SEMID OWNER0 root
MSQID OWNER
NET NSlo: UNKNOWN…eth0: UP…
app1 IP:1000app2 IP:7000
USER NSroot 0:0app 106:111
blue namespace
MNT NS//proc/mnt/cdrom/bluens
UTS NSbluehostbluens.com
PID NSPID COMMAND1 /bin/bash2 python3 node
IPC NSSHMID OWNER
SEMID OWNER
MSQID OWNER
NET NSlo: UNKNOWN…eth0: DOWN…eth1: UP
app1 IP:7000app2 IP:9000
USER NSroot 0:0app 104:109
04/11/2023 10
Linux namespaces: Common Idioms
It’s not required to use all namespaces – Pick & choose; if your toolset allows it
Constructs exist to permit “connectivity” between parent / child namespace
Various linux user space tools have namespace support Linux sys API supports flexible namespace creation
04/11/2023 11
Linux namespaces & cgroups: Availability
Note: user namespace support in upstream kernel 3.8+, but distributions rolling out phased support:- Map LXC UID/GID between
container and host- Non-root LXC creation
04/11/2023 12
Linux chroot & pivot_root Using pivot_root with MNT namespace addresses escaping chroot
concerns The pivot_root target directory becomes the “new root FS”
04/11/2023 13
LXC ImagesLXC images provide a flexible means to deliver only what you need – lightweight and minimal footprint.
Basic constraints– Same architecture & endian– Linux’ish Operating System; you can run different Linux distros on same host
Image types– System; virtualize Operating System(s) – standard distro root FS less the kernel– Application; virtualize application(s) – only package apps + dependencies (aka JeOS – Just
enough Operating System) Bind mount host libs / bins into LXC to share host resources Container image init process
– Container init command provided on invocation – can be an application or a full fledged init process
– Init script customized for image – skinny SysVinit, upstart, etc.– Reduces overhead of lxc start-up and runtime foot print
Various tools to build images– SuSE Kiwi– Debootstrap– Etc.
LXC tooling options often include numerous image templates
04/11/2023 14
Linux Security Modules & MAC Linux Security Modules (LSM) – kernel modules which provide a
framework for Mandatory Access Control (MAC) security implementations MAC vs DAC– In MAC, admin (user or process) assigns access controls to subject / initiator– In DAC, resource owner (user) assigns access controls to individual resources
Existing LSM implementations include: AppArmor, SELinux, GRSEC, etc.
04/11/2023 15
Linux Capabilities
Per process privileges which define sys call access
Can be assigned to LXC process(es)
04/11/2023 16
Other Security Measures
Reduce shared FS access using RO bind mounts Linux seccomp– Confine system calls
Keep Linux kernel up to date User namespaces in 3.8+ kernel; mitigates container root
escape concerns– Launching containers as non-root user– Mapping UID / GID into container
04/11/2023 17
LXC Security Triangle Private environments (trusted code)– App packaging / deployment / management / etc, devOps, Cloud, etc…
No additional worries about security Public environments– Single tenant • Same restrictions as private envs; tenant trusted code
– Multi-tenant
Privileges, Multitenancy, Untrusted Code
Secu
rity
Mea
sure
s
LSM, capabilities,seccomp, RO bind mounts,
GRSEC, etc.
LXC Security Triangle
NOTE: User namespaces will mitigate root escape concerns
04/11/2023 18
So You Want To Build A Container?
04/11/2023 19
LXC Engine: A “Hypervisor” for Containers
Hardware
HypervisorVirtual Machine
Operating System
Bins / libs
App App
Virtual Machine
Operating System
Bins / libs
App App
Hardware
Operating System
Container
Bins / libs
App App
Container
Bins / libs
App App
Hardware
Operating System
Container
Bins / libs
App App
Container
Bins / libs
App App
LXC Engine / Tooling
Type 1 Hypervisor Linux Containers LXC Engine / Tooling
Conceptual Mapping
VM ContainerHypervisor LXC Engine
04/11/2023 20
Docker Container Engine
PC Mac Virtual Machine Bare Metal
DockerImage
Develop container image once – run anywhere the docker engine is supported.
04/11/2023 21
Typical Container Engine / Tooling Functionality
Standard container operations to realize / manage LXCs– Run, start, stop, delete, attach, snapshot, etc.
APIs– CLI, Web API (RESTful or other), Automation files (e.g. dockerfile)– Security – who can use the APIs
Images– Format (typically RAW), required dev files, etc..– Image repository or catalog– Container FS layers (union FS style)– Import / export images and containers
Metrics– CPU, memory, block IO, etc..
Eventing– Push events on container changes
Logs– Inspect / manage container logs
Checkpointing– Freeze a container’s state
Sample concepts below; not a complete list.
04/11/2023 22
LXC Host / Engine Portability Considerations
Linux Kernel dependencies– Non-standard kernel modules needed by app(s) on image– LXC related kernel features tied to specific versions of the Kernel
LSM profile settings; security– Different profile config per LSM (e.g. AppArmor, SELinux, etc.)
Container file system / storage type & capability– FS type– FS capabilities such as direct I/O
External storage– FS vols bind mounted into container
Network options– Linux bridging, OVS, etc…
Shared host libs / bins; dependencies from host OS
Sample concepts below; not a complete list.Not well solidified in current open source community.
04/11/2023 23
LXC Runtime Configuration Considerations
Linux capabilities– Applied to container process for Linux security– Sysctl settings
Cgroups settings– Allowed devices, limits, UID / GID mappings, etc.
Environment variables– Exposed to shell / procs in container
Namespaces– Which to use for container procs
Networking– Type, IP, gateway, hostname, etc..
Init– Which cmd / bin to run on container init
Example: http://tinyurl.com/nj4u3le
Sample concepts below; not a complete list.Community consolidating around docker’s libcontainer project.
04/11/2023 24
LXC Industry ToolingParallels OpenVZ Linux
VServerLibvirt-lxc Lxc (tools) Warden lmctfy Docker
Summary Commercial product using OpenVZ under the hood
Custom Kernel (pre GNU 3.13) providing well seasoned LXC support.
A set of kernel patches providing LXC. Not based on cgroups or namespaces.
Libvirt support for LXC via cgroups and namespaces.
Lib + set of user spaces tools /bindings for LXC.
LXC management tooling used by CF.
Similar to LXC, but provides more intent based focus.
Commoditization of LXC adding support for images, build files, etc.
Part of upstream Kernel?
No No Partial Yes Yes Yes Yes, but additional patches needed for specific features.
Yes
License Commercial GNU GPL v2 GNU GPL v2 GNU LGPL GNU LGPL Apache v2 Apache v2 Apache v2
APIs / Bindings
- CLI- API
- CLI- C
- CLI- C- Python- Java- C#- PHP
- Python- Lua- GO- CLI
- GO- REST- CLI- Python- Other 3rd
party libs
Management plane/ Dashboard
Parallels Cloud Server
Parallels + others
- OpenStack- Archipel- Virt-
Manager
- LXC web panel
- Lexy
- OpenStack- Shipyard- Docker UI
04/11/2023 25
LXC Orchestration & Management Docker & libvirt-lxc in OpenStack– Manage containers heterogeneously with traditional VMs… but not w/the level
of support & features we might like CoreOS– Zero-touch admin Linux distro with docker images as the unit of operation– Centralized key/value store to coordinate distributed environment
Various other 3rd party apps– Maestro for docker– Shipyard for docker– Fleet for CoreOS– Etc.
LXC migration– Container migration via criu
But…– Still no great way to tie all virtual resources together with LXC – e.g. storage +
networking• IMO; an area which needs focus for LXC to become more generally applicable
04/11/2023 26
LXC Gaps
There are gaps…
Lack of industry tooling / support Live migration still a WIP Full orchestration across resources (compute / storage / networking) Fears of security Not a well known technology… yet Integration with existing virtualization and Cloud tooling Not much / any industry standards Missing skillset Slower upstream support due to kernel dev process Etc.
04/11/2023 27
LXC: Use Cases For Traditional VMs
There are still use cases where traditional VMs are warranted.
Virtualization of non Linux based OSs– Windows– AIX– Etc.
LXC not supported on host VM requires unique kernel setup which is not applicable to other VMs on the host
(i.e. per VM kernel config) Etc.
04/11/2023 28
Questions?
04/11/2023 29
Links & Resources http://www.slideshare.net/BodenRussell/realizing-linux-containerslxc http://www.slideshare.net/BodenRussell/kvm-and-docker-lxc-benchmarking-with-
openstack https://
github.com/dotcloud/docker/tree/master/vendor/src/github.com/docker/libcontainer
https://github.com/docker/libchan https://github.com/docker/libswarm https://help.ubuntu.com/lts/serverguide/lxc.html http://www.docker.com/ http://www.parallels.com/ http://openvz.org/Main_Page http://criu.org/LXC https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt https://lwn.net/Articles/531114/ http://www.hansenpartnership.com/OpenStackContainers2014/#/begin