Upload
lukasz-siudut
View
665
Download
3
Embed Size (px)
Citation preview
How to swim with a whale
Łukasz SiudutDevOpsKRK #9
2016-04-14
What really makes Docker swim?
CGroups - hardware resource management
- CPU- CPU pinning
- CPU accounting
- Memory limits
- Disk I/O access priority
- Devices access limitations
- net_cls - marking packets belonging to cgroup
- freezer - suspends/resume tasks in cgroup
Namespaces - system resource management
- UTS - UNIX Timesharing System - hostname isolation
- IPC - Interprocess Communication isolation- System V IPC objects
- POSIX message queues
- PID - processes and processes number isolation
- Network - resources associated with networking
- Mount - mountpoints isolation
- UID - user and group ID number spaces (UID mapping)
What possibly may go wrong?
What can possibly go wrong?
- Exceeding open files limits (sic!)
- Exceeding amount of interfaces in bridge
- Low performance storage engine
- Lack of inter-boxes routing
- Hanging docker master process
- Image building takes ages
- Logs stored in json? Srsly, docker?
- Random kernel panics
- Orphaned processes
What can possibly go wrong?
- Exceeding open files limits (sic!)
- Exceeding amount of interfaces in bridge
- Low performance storage engine
- Lack of inter-boxes routing
- Hanging docker master process
- Image building takes ages
- Logs stored in json? Srsly, docker?
- Random kernel panics
- Orphaned processes
- If a remote TCP syslog server is down, docker does not start- Docker daemon can send application/json type with invalid text
payload- Docker inspect gave default log options when the option is emtpy- Networking or DNS not available immediately after container start- EXPOSE and publish need to behave similarly with IPv4 and
IPv6- Container start fails, but volume remains mounted- Cannot remove docker volume- When you create a directory while bind-mounting it, it's owned by
root- error removing container (1.10, 1.11/master) with AUFS- Cannot start container with low probability of repetition- Need better handling/error path from tar/untar failures (especially
in userns)- Image layer contents are unstable- Docker pushing too much- Docker Volume not deleted after machine susspension- Docker network in internal mode will not use IPv4 by default and
is very slow.- Streaming logs, json driver "Error streaming logs: unexpected
EOF"- Docker running on high CPU and crashes in the end- Migration 1.9.1 to 1.10.3 failure: "Failed to register container XYZ:
name is reserved" on RHEL7
What can possibly go wrong?
- Exceeding open files limits (sic!)
- Exceeding amount of interfaces in bridge
- Low performance storage engine
- Lack of inter-boxes routing
- Hanging docker master process
- Image building takes ages
- Logs stored in json? Srsly, docker?
- Random kernel panics
- Orphaned processes
- If a remote TCP syslog server is down, docker does not start- Docker daemon can send application/json type with invalid text
payload- Docker inspect gave default log options when the option is empty- Networking or DNS not available immediately after container start- EXPOSE and publish need to behave similarly with IPv4 and
IPv6- Container start fails, but volume remains mounted- Cannot remove docker volume- When you create a directory while bind-mounting it, it's owned by
root- error removing container (1.10, 1.11/master) with AUFS- Cannot start container with low probability of repetition- Need better handling/error path from tar/untar failures (especially
in userns)- Image layer contents are unstable- Docker pushing too much- Docker Volume not deleted after machine suspension- Docker network in internal mode will not use IPv4 by default and
is very slow.- Streaming logs, json driver "Error streaming logs: unexpected
EOF"- Docker running on high CPU and crashes in the end- Migration 1.9.1 to 1.10.3 failure: "Failed to register container XYZ:
name is reserved" on RHEL7AND MOAR
Exceeding open files limit
Exceeding open files limit# lsof -p `pidof docker` | wc -l35# docker run -tid alpine:3.3 cat -# lsof -p `pidof docker` | wc -l43# docker run -tid alpine:3.3 cat -# lsof -p `pidof docker` | wc -l51# docker run -tid alpine:3.3 cat -# lsof -p `pidof docker` | wc -l59# docker run -tid alpine:3.3 cat -# lsof -p `pidof docker` | wc -l67…# docker run -tid alpine:3.3 cat -# lsof -p `pidof docker` | wc -l
Exceeding open files limit# lsof -p `pidof docker` | wc -l35# docker run -tid alpine:3.3 cat -# lsof -p `pidof docker` | wc -l43# docker run -tid alpine:3.3 cat -# lsof -p `pidof docker` | wc -l51# docker run -tid alpine:3.3 cat -# lsof -p `pidof docker` | wc -l59# docker run -tid alpine:3.3 cat -# lsof -p `pidof docker` | wc -l67…# docker run -tid alpine:3.3 cat -# lsof -p `pidof docker` | wc -l
2016/04/12 14:20:21 http: Accept error: accept unix /var/run/docker.sock: accept4: too many open files; retrying in 5ms2016/04/12 14:20:21 http: Accept error: accept unix /var/run/docker.sock: accept4: too many open files; retrying in 10ms2016/04/12 14:20:21 http: Accept error: accept unix /var/run/docker.sock: accept4: too many open files; retrying in 5ms
Exceeding open files limit# lsof -p `pidof docker` | wc -l35# docker run -tid alpine:3.3 cat -# lsof -p `pidof docker` | wc -l43# docker run -tid alpine:3.3 cat -# lsof -p `pidof docker` | wc -l51# docker run -tid alpine:3.3 cat -# lsof -p `pidof docker` | wc -l59# docker run -tid alpine:3.3 cat -# lsof -p `pidof docker` | wc -l67…# docker run -tid alpine:3.3 cat -# lsof -p `pidof docker` | wc -l
1) /var/lib/docker/network/files/CONTAINER_ID.sock type=STREAM
2) net
3) /home/docker/containers/CONTAINER_ID/CONTAINER_ID.log
4) /dev/ptmx
5) /sys/fs/cgroup/memory/docker/CONTAINER_ID/memory.oom_control
6) [eventfd]
7) /sys/fs/cgroup/memory/docker/CONTAINER_ID/memory.oom_control
8) [eventfd]
Exceeding amount of interfaces in bridge
Exceeding amount of interfaces in bridgeWARN[0300] failed to cleanup ipc mounts:failed to umount /mnt/docker/containers/778954ce1188fd2deb1ac6bfbf82515e8a4d97284a8f100d7f9c07d6ab720472/shm: no such file or directory ERRO[0300] Handler for POST /v1.22/containers/778954ce1188fd2deb1ac6bfbf82515e8a4d97284a8f100d7f9c07d6ab720472/start returned error: failed to create endpoint adoring_euler on network bridge: adding interface veth1dd191f to bridge docker0 failed: exchange full WARN[0300] failed to cleanup ipc mounts:failed to umount /mnt/docker/containers/26b5ddbb4b0fe464768b74218699465c90e0c2292af540cfa77451b311c93514/shm: no such file or directory ERRO[0300] Handler for POST /v1.22/containers/26b5ddbb4b0fe464768b74218699465c90e0c2292af540cfa77451b311c93514/start returned error: failed to create endpoint kickass_goldwasser6 on network bridge: adding interface veth2738bff to bridge docker0 failed: exchange full WARN[0300] failed to cleanup ipc mounts:failed to umount /mnt/docker/containers/aad9becaa4823e7711aaea6144dc0278569a18f1285f040125c22713c1ee5045/shm: no such file or directory ERRO[0300] Handler for POST /v1.22/containers/aad9becaa4823e7711aaea6144dc0278569a18f1285f040125c22713c1ee5045/start returned error: failed to create endpoint suspicious_gates on network bridge: adding interface vethf893b1c to bridge docker0 failed: exchange full
Exceeding amount of interfaces in bridge
DO YOU KNOW THE LIMIT?
1024
Exceeding amount of interfaces in bridge
#define BR_PORT_BITS 10
#define BR_MAX_PORTS (1<<BR_PORT_BITS)
Exceeding amount of interfaces in bridge
- Put multiple containers in same network namespace
- OpenVSwitch
- Multiple networks for container groups
- Run containers without interface (yeah, sure)
Low performance disk engine
Low performance disk engine
- Devicemapper
- Sparse files (slow)
- Raw disk devices (faster, but slow)
- AUFS (require custom kernel patch, unstable)
- OverlayFS (since kernel 3.18, immature)
- Btrfs - “production ready”
- VFS*
Docker storage drivers - devicemapper
- Developed by Red Hat and Docker to fulfill demands of customers
- Utilises in-kernel capabilities (dm-mapper + thin provisioning)
- Works on block device level thus doesn’t support page cache*
- Block size equals 64k, small writes impacts performance significantly
- Removing file in container does not affect space on host
- It’s memory hungry
- Writing new data is accomplished by allocate-on-demand
- Writing existing file uses copy-on-write operation on modified blocks
Docker storage drivers - AUFS
- Implements a union mount for Linux filesystem- Idea is based on UnionFS- First storage driver in use with Docker- Works on the top of other filesystem…- … so adding multiple layers may consume a lot of inodes- It also has to track hierarchy tree, so it is memory consuming- Works on file level - all AUFS CoW operations copy entire files…- … however it happens only once- It works on filesystem level page caching is being utilised efficiently- Deleted files are marked by whiteout file (but they still exist in lower layers)- I managed to crash Docker master process when using AUFS
Docker storage drivers - AUFSmount("none", "/home/docker/aufs/mnt/1d5f54eef0ac66b82a007ed55141c377a4d947f3c4bb0d3f660b0ca10b184bb9", "aufs", 0, "br:/home/docker/aufs/diff/1d5f54eef0ac66b82a007ed55141c377a4d947f3c4bb0d3f660b0ca10b184bb9=rw:/home/docker/aufs/diff/1d5f54eef0ac66b82a007ed55141c377a4d947f3c4bb0d3f660b0ca10b184bb9-init=ro+wh:/home/docker/aufs/diff/4500e1e0e0f482e19f4f4626165c16a7107ccb3a2a54fbeeb52ce675e01d9841=ro+wh:/home/docker/aufs/diff/23450d8936d7685169bb1f17c49b523f759e569bc9b3e9a8b667634e233cf359=ro+wh:/home/docker/aufs/diff/984a940d62c9b682f0518fbe7412730d05a0d665814ed5253f93d1153adea37c=ro+wh:/home/docker/aufs/diff/eafbd7ec3933e09cca3625d1d48efc646509490ed15022b4c55917e700ee9a91=ro+wh:/home/docker/aufs/diff/1f9f739c84c72c69d5a48f5715e5eb4e4bb55ec7832d886bc71d4997662f0e20=ro+wh:/home/docker/aufs/diff/c028a82aa3ff93427df1327b50621a9c741f5d73f031bf3a49e21a29bb92b55c=ro+wh:/home/docker/aufs/diff/e79c2f7c809e4608a3712b3f6404d036ed294d45577d9770a7bfe2b7891eea34=ro+wh:/home/docker/aufs/diff/0ff5998bd3fa6c2e039fb6f9154d8faa0a3924c67e9b67052475f1685eb00a06=ro+wh:/home/docker/aufs/diff/2fdb0e1a146e2edf035fbee3030854d4911eea52158d5fcbcc57fd22c41503c3=ro+wh:/home/docker/aufs/diff/3f709244513aa38fbddb423d4e1b0bd1d51daa5d1448a3d3a52be9b90782886d=ro+wh:/home/docker/aufs/diff/301eb2b701883975c12282094e5ff4235d19ae924558c38de93d31051c01e064=ro+wh:/home/docker/aufs/diff/a4b54186305ced5a457467c44a7d3cb50f7bce45b13a718c23a4555d715c8d4d=ro+wh:/home/docker/aufs/diff/5234f32576bb6522c18676c5e8d274ea466f22a96c440e284ed927addbdc6a28=ro+wh:/home/docker/aufs/diff/b71baf15872ca55884daa5d17057fd40802240f6693e6ef79c8c2219d0e7c18e=ro+wh:/home/docker/aufs/diff/20272dd081b1ec8984438f29154342a9a8d89ba4683bc6c8029def6cbce5dd1b=ro+wh:/home/docker/aufs/diff/ad686c634be4f7bf1656db8208c1197d562136fd01182b2afbaaf9b42c99b524=ro+wh:/home/docker/aufs/diff/f0a30f2605bbeb8c3574393d1c215fab1d5512262931e686da5bca6323fb2fc9=ro+wh:/home/docker/aufs/diff/6283fbfe4d98f095eed8f039665b3bd0c72cd3c785cddfc96528a9e0b1f69a4c=ro+wh:/home/docker/aufs/diff/acb08f019cfcbc302a4f9df6f2fa30540285fa30e397cbfd1d1b1322c6aa36c5=ro+wh:/home/docker/aufs/diff/c5f1142781b6f053301bb11880c1cec448aef29bd5ddf4b8145189cb397e348a=ro+wh:/home/docker/aufs/diff/47eea3aa543710082284622dfcb3c3c19f8c2df80bef6fa15988d9e865f89c9e=ro+wh:/home/docker/aufs/diff/fe1231ddd1407a216250ae2a814a370bd660940db6046b7bea845b05e845775c=ro+wh:/home/docker/aufs/diff/0af574eab2d6f8e8b428a9d5d2e3f94cf383703d6194408c4067f22bc1d4b361=ro+wh,dio,xino=/dev/shm/aufs.xino,dirperm1" <unfinished…>
Docker storage drivers - AUFS
ERRO[0028] Error saving dying container to disk: open /mnt/docker/containers/00878fe1ad98a55c79295d6fbe152425873eb8714ded6c7ac443c36677bfb20c/config.v2.json: no such file or directory
ERRO[0028] Error removing mounted layer 00878fe1ad98a55c79295d6fbe152425873eb8714ded6c7ac443c36677bfb20c: rename /mnt/docker/aufs/mnt/ad9829a80251e071ac2130f979a9e1253ae5a9a8b6ff0e09ff7ee31a111a31ca /mnt/docker/aufs/mnt/ad9829a80251e071ac2130f979a9e1253ae5a9a8b6ff0e09ff7ee31a111a31ca-removing: device or resource busy
ERRO[0028] Handler for DELETE /v1.22/containers/00 returned error: Driver aufs failed to remove root filesystem 00878fe1ad98a55c79295d6fbe152425873eb8714ded6c7ac443c36677bfb20c: rename /mnt/docker/aufs/mnt/ad9829a80251e071ac2130f979a9e1253ae5a9a8b6ff0e09ff7ee31a111a31ca /mnt/docker/aufs/mnt/ad9829a80251e071ac2130f979a9e1253ae5a9a8b6ff0e09ff7ee31a111a31ca-removing: device or resource busy
Docker storage drivers - AUFS
# rm -rf /mnt/docker/*
rm: cannot remove '/mnt/docker/aufs/mnt/9f9f2c192645858a0ab1db63d4b29dc2ec4bbdfc130e218395e8d6cb4ff36d24': Device or resource busy
rm: cannot remove '/mnt/docker/aufs/mnt/ad9829a80251e071ac2130f979a9e1253ae5a9a8b6ff0e09ff7ee31a111a31ca': Device or resource busy
Docker storage drivers - AUFS
# rm -rf /mnt/docker/*
rm: cannot remove '/mnt/docker/aufs/mnt/9f9f2c192645858a0ab1db63d4b29dc2ec4bbdfc130e218395e8d6cb4ff36d24': Is a directory
rm: cannot remove '/mnt/docker/aufs/mnt/ad9829a80251e071ac2130f979a9e1253ae5a9a8b6ff0e09ff7ee31a111a31ca': Is a directory
Docker storage drivers - AUFS
# ls -l /mnt/docker/aufs/mnt/ls: cannot access '/mnt/docker/aufs/mnt/9f9f2c192645858a0ab1db63d4b29dc2ec4bbdfc130e218395e8d6cb4ff36d24': Stale file handlels: cannot access '/mnt/docker/aufs/mnt/ad9829a80251e071ac2130f979a9e1253ae5a9a8b6ff0e09ff7ee31a111a31ca': Stale file handletotal 0
Docker storage drivers - AUFS
# ls -l /mnt/docker/aufs/mnt/ls: cannot access '/mnt/docker/aufs/mnt/9f9f2c192645858a0ab1db63d4b29dc2ec4bbdfc130e218395e8d6cb4ff36d24': Stale file handlels: cannot access '/mnt/docker/aufs/mnt/ad9829a80251e071ac2130f979a9e1253ae5a9a8b6ff0e09ff7ee31a111a31ca': Stale file handletotal 0
d????????? ? ? ? ? ? 9f9f2c192645858a0ab1db63d4b29dc2ec4bbdfc130e218395e8d6cb4ff36d24d????????? ? ? ? ? ? ad9829a80251e071ac2130f979a9e1253ae5a9a8b6ff0e09ff7ee31a111a31ca
Docker storage drivers - AUFS
# ls -l /mnt/docker/aufs/mnt/ls: cannot access '/mnt/docker/aufs/mnt/9f9f2c192645858a0ab1db63d4b29dc2ec4bbdfc130e218395e8d6cb4ff36d24': Stale file handlels: cannot access '/mnt/docker/aufs/mnt/ad9829a80251e071ac2130f979a9e1253ae5a9a8b6ff0e09ff7ee31a111a31ca': Stale file handletotal 0
# reboot
Docker storage drivers - OverlayFS
- Included in mainline kernel since 3.18
- Works similarly to AUFS (layers, page cache)
- It design is simpler than AUFS (lowerdir, upperdir)
- Writing to files from lower layers issues copy_up operation (similar to AUFS)
- Deleting file creates whiteout file in upperdir
- Deleting directory creates opaque directory in upperdir (?)
- Implements only a subset of the POSIX standards
- I managed to crash Docker master process when using OverlayFS too :)
Docker storage drivers - OverlayFS
root:/etc# cat timezone Etc/UTCroot:/etc# exec 3< timezoneroot:/etc# exec 4<> timezoneroot:/etc# read -n 4 <&4root:/etc# echo -n . >&4root:/etc# exec 4>&-root:/etc# cat <&3Etc/UTCroot:/etc# exec 3>&-root:/etc# cat timezoneEtc/.TC
root:/etc# exec 3< tim ezone
root:/etc# exec 4< > tim ezone
root:/etc# read -n 5 < & 4
root:/etc# echo -n . > &4
root:/etc# exec 4> &-
root:/etc# cat < & 3
Etc/..C
root:/etc# cat tim ezone
Etc/..C
Docker storage drivers - Btrfs
- New generation CoW filesystem
- Present in kernel for some time already, considered being “production ready”
- Native support for “layers” - subvolumes and snapshots
- Writing to new file invokes allocate-on-demand to allocate new data block
- Updating existing file invokes copy-on-write for new blocks -> little overhead
- Btrfs is prone to fragmentation due to CoW behavior
- Lot of small writes may
- Does not utilise page caching :(
- df is not reliable - btrfs filesystem df
Docker storage drivers
# vmtouch data Files: 1 Directories: 0 Resident Pages: 0/26214400 0/100G 0% Elapsed: 0.38062 seconds
# for _ in {1..10}; do docker run -id alpine:3.3 cat -; done
# vmtouch data Files: 1 Directories: 0 Resident Pages: 12877/26214400 50M/100G 0.0491% Elapsed: 0.41442 seconds
Network drivers
Network drivers
- CNM / libnetwork
- Bridge
- Standard driver, limited by amount of interfaces that can be bound under one bridge
- Overlay
- OVS based interface, great performance, require external k/v store service
- Others
- Weave
- Custom central routing
Building docker images
Building docker images
Building container images for large applications is still a challenge. If we are to rely on container images for testing, CI, and emergency deploys, we need to have an image ready in less than a minute.
Dockerfiles make this almost impossible for large applications.
Building docker images
- Microservices are way to go
- You don’t have to build docker image every time!
- Prebuild containers with runtime environment
- Building code in separate container with build environment
- docker run -ti -v ./app_source:/build_dir mybuilder:latest
- Put code into prebuild containers
- Use minimalistic base images - Alpine Linux ~5MB
- Use less reliable storage driver to speed up builds?
Logs
Logging drivers
- Dealing with standard logger and json format is painful
- Available logging drivers
- json-file
- syslog
- journald
- gelf
- fluentd
- awslogs
- splunk
Logging drivers
container
fluentd
fluentd
fluentd
AWS S3
Elasticsearch
Designing Docker friendly application
Designing Docker friendly application
- Docker loves cloud
- Cloud is not reliable - instances will always go down unexpectedly…
- … therefore infrastructure must be disposable!
- So are the applications
- Replication factor ~3-4
Designing Docker friendly application
- Logs goes to standard output or standard error only
- Do not write to filesystem (mount ro), if unavoidable use volumes- No need to be worried about POSIX compliance or low performance!
- Run containers in non-privileged mode as non root
- Assume that application MAY be killed unexpectedly
- Make containers configurable - ENTRYPOINT + startup script- docker run -d -e KEY1=value1 -e KEY2=value2 myapp:latest
- Handle signals
- Avoid forking! If unavoidable, use some kind of container-init
Questions? Answers?
Thank you!