23
Shifter: Containers in HPC environments HPC Advisory Council Switzerland Miguel Gila, CSCS March 21, 2016

Shifter: Containers in HPC environments · santis docker READY 2934337e50 2015-12-10T09:03:38 python:2.7-slim santis docker READY a4f04f71be 2015-12-10T09:47:17 python:3.5-slim santis

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Shifter: Containers in HPC environments · santis docker READY 2934337e50 2015-12-10T09:03:38 python:2.7-slim santis docker READY a4f04f71be 2015-12-10T09:47:17 python:3.5-slim santis

Shifter: Containers in HPC environmentsHPC Advisory Council SwitzerlandMiguel Gila, CSCSMarch 21, 2016

Page 2: Shifter: Containers in HPC environments · santis docker READY 2934337e50 2015-12-10T09:03:38 python:2.7-slim santis docker READY a4f04f71be 2015-12-10T09:47:17 python:3.5-slim santis

Docker

Page 3: Shifter: Containers in HPC environments · santis docker READY 2934337e50 2015-12-10T09:03:38 python:2.7-slim santis docker READY a4f04f71be 2015-12-10T09:47:17 python:3.5-slim santis

What is Docker and how does it work?

§ “Docker containers wrap up a piece of software in a complete filesystem that contains everything it needs to run: code, runtime, system tools, system libraries – anything you can install on a server. This guarantees that it will always run the same, regardless of the environment it is running in.” [*]

Shifter: Containers in HPC environments 3

Virtual Machines Containers[*] https://www.docker.com/what-docker

[*]

[*]

Page 4: Shifter: Containers in HPC environments · santis docker READY 2934337e50 2015-12-10T09:03:38 python:2.7-slim santis docker READY a4f04f71be 2015-12-10T09:47:17 python:3.5-slim santis

What is Docker and how does it work?

§ A Docker image is a file that contains a filesystem within it. It can be a full filesystem (i.e. CentOS) or partial (i.e. python3.4-slim). Images are stored either on a public registry (Docker hub) or private

§ Docker leverages the namespaces feature of the kernel to isolate processes

§ On its simplest form, Docker basically 1. Pulls an image to the local system2. Creates a chrooted environment with the image (=container)3. Runs our application in the container (’isolated’ from the host thanks to kernel namespaces)

§ However, it can also do other things:§ Isolate network by creating NAT or bridge devices§ Can use a nice GUI

Shifter: Containers in HPC environments 4

Page 5: Shifter: Containers in HPC environments · santis docker READY 2934337e50 2015-12-10T09:03:38 python:2.7-slim santis docker READY a4f04f71be 2015-12-10T09:47:17 python:3.5-slim santis

Docker in HPC environments

§ Docker is a nice tool, but it’s not built for HPC environments, because:§ Does not integrate well with workload managers § Does not isolate users on shared filesystems§ Requires running a daemon on all nodes§ Not designed to run on diskless clients§ Network is, by default, ‘NATed’§ Building Docker is done within a Docker container. It can be done outside, but is a complex

task (Go language, seriously??)

§ But after all, a sysadmin can make anything to work on a cluster, right?§ We can create (and hopefully maintain) monstrous wrappers to run Docker containers…

Shifter: Containers in HPC environments 5

Page 6: Shifter: Containers in HPC environments · santis docker READY 2934337e50 2015-12-10T09:03:38 python:2.7-slim santis docker READY a4f04f71be 2015-12-10T09:47:17 python:3.5-slim santis

Shifter

Page 7: Shifter: Containers in HPC environments · santis docker READY 2934337e50 2015-12-10T09:03:38 python:2.7-slim santis docker READY a4f04f71be 2015-12-10T09:47:17 python:3.5-slim santis

What is Shifter and how does it work?

§ Shifter is a container-based solution thought from the ground up for HPC environments

§ It leverages the current Docker environment and can use Docker images to create containers

§ Shifter basically 1. Pulls an image to a shared location (/scratch)2. Creates a loop device with the image (=container)3. Creates a chrooted environment on the loop device4. Runs our application in chrooted environment

§ Designed to work on HPC clusters and, particularly, on Cray systems

§ It is possible to choose which filesystems to expose to the containers

Shifter: Containers in HPC environments 7

Page 8: Shifter: Containers in HPC environments · santis docker READY 2934337e50 2015-12-10T09:03:38 python:2.7-slim santis docker READY a4f04f71be 2015-12-10T09:47:17 python:3.5-slim santis

External nodeCompute node

Architecture of Shifter

§ Shifter consists on two parts§ udiRoot is responsible for creating the loop devices, doing chroot and cleaning up after the

binary execution is done. Workload manager plugins are available. Written in C

§ imageGateway is responsible for fetching a Docker image, converting it to a squashfs file and transfer it to a shared location on a filesystem. It also keeps track of the images and tells udiRoot which are available. Written in Python

Shifter: Containers in HPC environments 8

scratch

udiRoot imageGateway

queryrestful API

db

Docker Hub

Or private registry

Page 9: Shifter: Containers in HPC environments · santis docker READY 2934337e50 2015-12-10T09:03:38 python:2.7-slim santis docker READY a4f04f71be 2015-12-10T09:47:17 python:3.5-slim santis

Workflow

1. User creates an image on his/her computer and pushes it to Docker Hub

Shifter: Containers in HPC environments 9

Docker image

1 2

$ docker build .Uploading context 10240 bytesStep 1 : FROM busyboxPulling repository busybox---> e9aa60c60128MB/2.284 MB (100%) endpoint: https://cdn-registry-

1.docker.io/v1/Step 2 : RUN ls -lh /---> Running in 9c9e81692ae9

total 24drwxr-xr-x 2 root root 4.0K Mar 12 2013 bindrwxr-xr-x 5 root root 4.0K Oct 19 00:19 devdrwxr-xr-x 2 root root 4.0K Oct 19 00:19 etcdrwxr-xr-x 2 root root 4.0K Nov 15 23:34 liblrwxrwxrwx 1 root root 3 Mar 12 2013 lib64 -> libdr-xr-xr-x 116 root root 0 Nov 15 23:34 proclrwxrwxrwx 1 root root 3 Mar 12 2013 sbin -> bindr-xr-xr-x 13 root root 0 Nov 15 23:34 sysdrwxr-xr-x 2 root root 4.0K Mar 12 2013 tmpdrwxr-xr-x 2 root root 4.0K Nov 15 23:34 usr---> b35f4035db3f

Step 3 : CMD echo Hello world---> Running in 02071fceb21b---> f52f38b7823e

Successfully built f52f38b7823eRemoving intermediate container 9c9e81692ae9Removing intermediate container 02071fceb21b

docker build docker push

$ docker push miguelgila/wlcg_wn:20161212

Docker Hub

Or private registry

Page 10: Shifter: Containers in HPC environments · santis docker READY 2934337e50 2015-12-10T09:03:38 python:2.7-slim santis docker READY a4f04f71be 2015-12-10T09:47:17 python:3.5-slim santis

Workflow

1. User creates an image on his/her computer and pushes it to Docker Hub

2. User tells imageGateway to pull his image and make it available

Shifter: Containers in HPC environments 10

$ ssh santis01 $ module load shifter$ shifterimg pull docker:ubuntu:14.04$ sleep 300 # :-)$ shifterimg imagessantis docker READY 2934337e50 2015-12-10T09:03:38 python:2.7-slimsantis docker READY a4f04f71be 2015-12-10T09:47:17 python:3.5-slimsantis docker READY b0197931ac 2015-12-10T09:53:54 python:3.2-slimsantis docker READY 0bb6b50d84 2015-12-10T10:34:24 python:3.3-slimsantis docker READY cd86406b32 2015-12-10T10:36:55 python:3.5.1santis docker READY 48bc52cc28 2015-12-10T10:47:35 python:3.4.3santis docker READY 0b92f1735f 2015-12-10T11:17:26 python:3.4-slimsantis docker READY 3fba104814 2015-12-10T11:31:23 centos:6.7santis docker READY 12c9d795d8 2015-12-10T11:37:35 centos:6.6santis docker READY 3e0c71ada2 2015-12-10T15:19:24 ubuntu:15.10santis docker READY e751d10964 2015-12-10T13:38:05 miguelgila/wlcg_wn:20151125v2santis docker READY d55e68e6cc 2015-12-10T15:52:09 ubuntu:14.04

shifterimg pull

/scratch/santis

imageGateway

1

2

3 4

Docker image

Shifterimage

Shifterimage

Docker Hub

Or private registry

Page 11: Shifter: Containers in HPC environments · santis docker READY 2934337e50 2015-12-10T09:03:38 python:2.7-slim santis docker READY a4f04f71be 2015-12-10T09:47:17 python:3.5-slim santis

Compute node

Workflow

1. User creates an image on his/her computer and pushes it to Docker Hub

2. User tells imageGateway to pull his image and make it available3. User runs SBATCH and prepends shifter to his/her executable

Shifter: Containers in HPC environments 11

#!/bin/bash –l#SBATCH --job-name="shifter_osversion”#SBATCH --nodes=1#SBATCH --ntasks-per-node=1#SBATCH --time=00:30:00#SBATCH --exclusive#SBATCH --output=/users/miguelgi/jobs/out/shifter_hostname.stdout.log.%j#SBATCH --error=/users/miguelgi/jobs/out/shifter_hostname.stdout.log.%j#SBATCH --image=docker:miguelgila/wlcg_wn:20151218#SBATCH --imagevolume=/users/miguelgi:/users/miguelgi#SBATCH --imagevolume=/apps:/apps#SBATCH --imagevolume=/scratch/santis:/scratch/santis

module load slurmmodule load shifter/15.12.0

srun shifter --volume=/scratch/santis:/scratch/santis --volume=/users:/users cat /etc/redhat-release

sbatch

/scratch/santis

SRUN/SPANK

1

3

udiRoot

Loop device /dev/loopY

Shifterimage

Create loop dev

shifter binaryRun bin in loop dev

/users

2

Page 12: Shifter: Containers in HPC environments · santis docker READY 2934337e50 2015-12-10T09:03:38 python:2.7-slim santis docker READY a4f04f71be 2015-12-10T09:47:17 python:3.5-slim santis

Use cases

Page 13: Shifter: Containers in HPC environments · santis docker READY 2934337e50 2015-12-10T09:03:38 python:2.7-slim santis docker READY a4f04f71be 2015-12-10T09:47:17 python:3.5-slim santis

Full OS containers

§ Cray compute nodes run CLE (a version of SLES 11)

§ With Shifter it is possible to run applications built for specific OS:

Shifter: Containers in HPC environments 13

[miguelgi@santis01]-[11:34:45]-[~/examples]:-($ salloc -t 00:15:00 -N1 --image=docker:centos:6.7salloc: Granted job allocation 16313salloc: Waiting for resource configurationsalloc: Nodes nid00012 are ready for job

[miguelgi@santis01]-[11:34:50]-[~/examples]:-)$ srun shifter cat /etc/redhat-releaseCentOS release 6.7 (Final)

[miguelgi@santis01]-[11:35:03]-[~/examples]:-)$ srun shifter yum --version |head -43.2.29

Installed: rpm-4.8.0-47.el6.x86_64 at 2015-08-19 18:25Built : CentOS BuildSystem <http://bugs.centos.org> at 2015-07-24 11:28Committed: Lubos Kardos <[email protected]> at 2015-06-15

CentOS 6 – non-interactive session

[miguelgi@santis01]-[10:52:40]-[~/examples]:-)$ salloc -t 00:15:00 -N1 --image=docker:debian:7.9salloc: Granted job allocation 16294salloc: Waiting for resource configurationsalloc: Nodes nid00012 are ready for job

[miguelgi@santis01]-[10:58:29]-[~/examples]:-($ srun --pty shifter /bin/bash

[miguelgi@nid00012]-[10:58:31]-[~/examples]:-)$ cat /etc/debian_version7.9

[miguelgi@nid00012]-[10:58:33]-[~/examples]:-)$ uname –aLinux nid00012 3.0.101-0.46.1_1.0502.8871-cray_ari_c #1 SMP Tue Aug 25 21:41:26 UTC 2015 x86_64 GNU/Linux

[miguelgi@nid00012]-[10:59:46]-[~/examples]:-)$ apt-get --version |head -n1apt 0.9.7.9 for amd64 compiled on Oct 17 2014 09:15:56

Debian 7.9 – interactive session

Page 14: Shifter: Containers in HPC environments · santis docker READY 2934337e50 2015-12-10T09:03:38 python:2.7-slim santis docker READY a4f04f71be 2015-12-10T09:47:17 python:3.5-slim santis

Application containers: Python/Ruby

Shifter: Containers in HPC environments 14

[miguelgi@santis01]-[09:57:23]-[~]:-)$ salloc -t 01:00:00 -n1 --image=docker:python:3.5.1salloc: Granted job allocation 16274salloc: Waiting for resource configurationsalloc: Nodes nid00012 are ready for job

[miguelgi@santis01]-[10:00:51]-[~]:-)$ srun --pty shifter /bin/bash

[miguelgi@nid00012]-[09:01:04]-[~]:-)$ python –VPython 3.5.1

[miguelgi@nid00012]-[09:01:08]-[~]:-)$ which python/usr/local/bin/python

Python 3.5.1 – interactive session

[miguelgi@santis01]-[10:16:35]-[~]:-)$ salloc -t 01:00:00 -n1 --image=docker:python:3.2-slimsalloc: Granted job allocation 16278salloc: Waiting for resource configurationsalloc: Nodes nid00012 are ready for job

[miguelgi@santis01]-[10:19:34]-[~/tmp]:-)$ srun shifter ./myscript.py3.2.6

Python 3.2 – non-interactive session

$ cat myscript.py#! /usr/bin/env pythonimport platformprint(platform.python_version())

myscript.py

[miguelgi@santis01]-[09:57:23]-[~]:-)$ salloc -t 01:00:00 -n1 --image=docker:ruby:2.1.8salloc: Granted job allocation 16280salloc: Waiting for resource configurationsalloc: Nodes nid00012 are ready for job

[miguelgi@santis01]-[10:22:14]-[~]:-)$ srun --pty shifter /bin/bash

miguelgi@nid00012]-[09:22:28]-[~]:-)$ ruby –vruby 2.1.8p440 (2015-12-16 revision 53160) [x86_64-linux

Ruby 2.1.8 – interactive session

[miguelgi@santis01]-[10:22:05]-[~]:-)$ salloc -t 01:00:00 -n1 --image=docker:ruby:2.1.8salloc: Granted job allocation 16280salloc: Waiting for resource configurationsalloc: Nodes nid00012 are ready for job

[miguelgi@santis01]-[10:22:08]-[~/tmp]:-)$ srun shifter ./myscript.rb2.1.8

Ruby 2.1.8 – non-interactive session

$ cat myscript.rb#!/usr/bin/env rubyputs RUBY_VERSION

myscript.rb

§ Can run application specific containers:

Page 15: Shifter: Containers in HPC environments · santis docker READY 2934337e50 2015-12-10T09:03:38 python:2.7-slim santis docker READY a4f04f71be 2015-12-10T09:47:17 python:3.5-slim santis

Multi-node containers

§ It is possible to run the same container across multiple nodes:

§ Working on getting MPI across nodes to function

§ Working on getting GPUs to be accessible to containers with good results so far (native performance!)

Shifter: Containers in HPC environments 15

lucasbe@santis01 ~/shifter-gpu> sbatch ./nvidia-docker/samples/cuda-stream/benchmark.sbatchSubmitted batch job 496

lucasbe@santis01 /scratch/santis/lucasbe/jobs> cat shifter-gpu.out.logLaunching GPU stream benchmark on nid00012 ...STREAM Benchmark implementation in CUDAArray size (double precision) = 1073.74 MBusing 192 threads per block, 699051 blocksFunction Rate (GB/s) Avg time(s) Min time(s) Max time(s)Copy: 184.3169 0.01167758 0.01165104 0.01170397Scale: 183.1849 0.01175387 0.01172304 0.01178598Add: 180.3075 0.01790012 0.01786518 0.01792288Triad: 180.1056 0.01790700 0.01788521 0.01794291

GPU

#! /usr/bin/env pythonimport platformimport socketprint(platform.python_version())print(socket.gethostname())

hostname.py

[miguelgi@santis01]-[10:39:43]-[~/examples]:-)$ salloc -t 00:15:00 -N2 -w nid00013,nid00014 --image=docker:python:3.2-slimsalloc: Granted job allocation 16285salloc: Waiting for resource configurationsalloc: Nodes nid000[13-14] are ready for job

[miguelgi@santis01]-[10:41:21]-[~/examples]:-)$ srun shifter ./hostname.py3.2.6nid000133.2.6nid00014

Python 3.2 – non-interactive session

Page 16: Shifter: Containers in HPC environments · santis docker READY 2934337e50 2015-12-10T09:03:38 python:2.7-slim santis docker READY a4f04f71be 2015-12-10T09:47:17 python:3.5-slim santis

Practical use case: WLCG Swiss Tier-2

§ CSCS operates the cluster Phoenix on behalf of CHIPP, the Swiss Institute of Particle Physics

§ Phoenix runs Tier-2 jobs for ATLAS, CMS and LHCb, 3 experiments of the LHC at CERN and part of WLCG (Worldwide LHC Computing Grid)

§ WLCG jobs need and expect RHEL-compatible OS. All software is precompiled and exposed in a cvmfs[*] filesystem

§ But Cray XC compute nodes run CLE, a modified version of SLES 11 SP3

§ So, how do we get these jobs to run on a Cray?

Shifter: Containers in HPC environments 16 [*] https://cernvm.cern.ch/portal/filesystem

Page 17: Shifter: Containers in HPC environments · santis docker READY 2934337e50 2015-12-10T09:03:38 python:2.7-slim santis docker READY a4f04f71be 2015-12-10T09:47:17 python:3.5-slim santis

Practical use case: WLCG Swiss Tier-2

§ Using Shifter, we are able to run unmodified ATLAS, CMS and LHCb production jobs on a Cray XC40 TDS

§ Jobs see standard CentOS 6 containers§ Nodes are shared: multiple single-core and multi-core

jobs, from different experiments, can run on the same compute node

§ Job efficiency is comparable in both systems

Shifter: Containers in HPC environments 17

JOBID USER ACCOUNT NAME NODELIST ST REASON START_TIME END_TIME TIME_LEFT NODES CPU82471 atlasprd atlas a53eb5f8_34f0_ nid00043 R None 15:03:33 Thu 15:03 1-23:54:18 1 8 82476 cms04 cms gridjob nid00043 R None 15:08:39 Tomorr 03:08 11:59:24 1 2 82451 lhcbplt lhcb gridjob nid00043 R None 15:00:10 Tomorr 03:00 11:50:55 1 2 82447 lhcbplt lhcb gridjob nid00043 R None 14:59:31 Tomorr 02:59 11:50:16 1 2 82448 lhcbplt lhcb gridjob nid00043 R None 14:59:31 Tomorr 02:59 11:50:16 1 2 82449 lhcbplt lhcb gridjob nid00043 R None 14:59:31 Tomorr 02:59 11:50:16 1 2 82450 lhcbplt lhcb gridjob nid00043 R None 14:59:31 Tomorr 02:59 11:50:16 1 2 82446 lhcbplt lhcb gridjob nid00043 R None 14:49:01 Tomorr 02:49 11:39:46 1 2 82444 lhcbplt lhcb gridjob nid00043 R None 14:48:01 Tomorr 02:48 11:38:46 1 2 82445 lhcbplt lhcb gridjob nid00043 R None 14:48:01 Tomorr 02:48 11:38:46 1 2

Page 18: Shifter: Containers in HPC environments · santis docker READY 2934337e50 2015-12-10T09:03:38 python:2.7-slim santis docker READY a4f04f71be 2015-12-10T09:47:17 python:3.5-slim santis

Wrap-up

Page 19: Shifter: Containers in HPC environments · santis docker READY 2934337e50 2015-12-10T09:03:38 python:2.7-slim santis docker READY a4f04f71be 2015-12-10T09:47:17 python:3.5-slim santis

§ Can isolate network by creating NAT or bridge devices. What about IB?

§ Users can write as root on exposed RW filesystems

§ Needs a local daemon running

§ Isn’t SLURM-friendly

§ Can run on multiple nodes with own tool (Swarm)

§ Can use GPUs?

§ MPI?

§ Can run images on private registry

§ It shows all /dev, /sys and /proc to the container environment. Easy

§ Users can write as their $USER on any exposed RW filesystem

§ Does not need a local daemon on CN

§ Is SLURM-friendly (SPANK plugin)

§ Can run on multiple nodes with WLM integration

§ Can use GPUs. Working on it!

§ MPI on its way!

§ Can run images on private registry

Shifter: Containers in HPC environments 19

Docker vs. Shifter

Docker

Page 20: Shifter: Containers in HPC environments · santis docker READY 2934337e50 2015-12-10T09:03:38 python:2.7-slim santis docker READY a4f04f71be 2015-12-10T09:47:17 python:3.5-slim santis

Conclusion

§ Shifter works very well on our HPC environment

§ It’s being constantly developed and new features are appearing on a weekly basis

§ It’s open source and developed by the HPC community

§ It needs some additional work to cover basic HPC use cases (MPI)

§ Interacting with some parts of Shifter is not very user-friendly:§ The process of pulling images is easy, but has no visual feedback§ At times, error messages are difficult to understand

§ No ACLs yet

Shifter: Containers in HPC environments 20

Page 21: Shifter: Containers in HPC environments · santis docker READY 2934337e50 2015-12-10T09:03:38 python:2.7-slim santis docker READY a4f04f71be 2015-12-10T09:47:17 python:3.5-slim santis

Reference links

§ NERSC info: http://www.nersc.gov/research-and-development/user-defined-images/

§ The code: https://github.com/NERSC/shifter

Shifter: Containers in HPC environments 21

Page 22: Shifter: Containers in HPC environments · santis docker READY 2934337e50 2015-12-10T09:03:38 python:2.7-slim santis docker READY a4f04f71be 2015-12-10T09:47:17 python:3.5-slim santis

Questions?

Page 23: Shifter: Containers in HPC environments · santis docker READY 2934337e50 2015-12-10T09:03:38 python:2.7-slim santis docker READY a4f04f71be 2015-12-10T09:47:17 python:3.5-slim santis

Thank you for your attention.