Upload
ben-hall
View
255
Download
2
Embed Size (px)
Citation preview
• In the next 25/30 minutes
• Learning to Learn
• Creating Kubernetes Experiment Playground
• Running Tensorflow on Kubernetes
• Keeping up to date with the community
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: inception-deployment
labels:
k8s-app: inception-deployment
spec:
replicas: 3
selector:
matchLabels:
k8s-app: inception-deployment
template:
metadata:
labels:
k8s-app: inception-deployment
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
containers:
- name: inception-container
image: katacoda/tensorflow_serving
imagePullPolicy: Never
command:
- /bin/sh
- -c
args:
- /serving/bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server
--port=9000 --model_name=inception --model_base_path=/serving/inception-export
ports:
- containerPort: 9000
apiVersion: batch/v1
kind: Job
metadata:
name: inception-client
spec:
template:
metadata:
name: inception-client
spec:
containers:
- name: inception-client
image: katacoda/tensorflow_serving
imagePullPolicy: Never
command:
- /bin/bash
- -c
args:
- /serving/bazel-bin/tensorflow_serving/example/inception_client
--server=inception-deployment:9000 --image=/data/cat.jpg
volumeMounts:
- name: inception-persistent-storage
mountPath: /data
volumes:
- name: inception-persistent-storage
hostPath:
path: /root
restartPolicy: Never
# On ps0.example.com:
$ python trainer.py \
--ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
--worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
--job_name=ps --task_index=0
# On ps1.example.com:
$ python trainer.py \
--ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
--worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
--job_name=ps --task_index=1
# On worker0.example.com:
$ python trainer.py \
--ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
--worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
--job_name=worker --task_index=0
# On worker1.example.com:
$ python trainer.py \
--ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \
--worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \
--job_name=worker --task_index=1
Kubernetes namespace
Kubernetes PodKubernetes PodKubernetes Pod
Containerized TF Worker
Containerized TF Worker
Containerized TF Worker
Kubernetes Deployment
Containerized TF PS
Server 1 Server 2 Server 3
Kubernetes namespace
Kubernetes PodKubernetes PodKubernetes Pod
Containerized TF Worker
Containerized TF Worker
Containerized TF Worker
Kubernetes Deployment
Containerized TF PS
Storage
Server 4 Server 5
Kubernetes namespace
Kubernetes PodKubernetes PodKubernetes Pod
Containerized TF Worker
Containerized TF Worker
Containerized TF Worker
Kubernetes Deployment
Containerized TF PS
GPU1 GPU2 GPU3
Server 1 Server 2
GPU1 GPU2 GPU3
Server 3
Kubernetes namespace
Kubernetes PodKubernetes PodKubernetes Pod
Containerized TF Worker
Containerized TF Worker
Containerized TF Worker
Kubernetes Deployment
Containerized TF PS
Storage
Server 4 Server 5
Docker Container and GPU docker run -it \
--device /dev/nvidia0:/dev/nvidia0 \
--device /dev/nvidia1:/dev/nvidia1 \
--device /dev/nvidiactl:/dev/nvidiactl \
--device /dev/nvidia-uvm:/dev/nvidia-uvm \
tf-cuda:v1.1beta /bin/bash
Summary
• Kubernetes is designed for running
distributed systems at scale
• The model of Tensorflow fits cleanly into
Kubernetes
• As Tensorflow usage increases,
Kubernetes can scale to meet demands
Call To Action
• Interested in sharing your Kubernetes or
Tensorflow experience? Write your own
scenarios and teach interactively!
• Teaching teams internally? Private
Katacoda environments