41
Experimenting and Learning Kubernetes and Tensorflow @Ben_Hall [email protected] Katacoda.com

Experimenting and Learning Kubernetes and Tensorflow

Embed Size (px)

Citation preview

Experimenting and Learning

Kubernetes and Tensorflow@Ben_Hall

[email protected]

Katacoda.com

Experimenting and Learning

Kubernetes and Tensorflow@Ben_Hall

[email protected]

Katacoda.com

@Ben_Hall / Blog.BenHall.me.uk

WH

O A

M I?

Learn via Interactive Browser-Based LabsKatacoda.com

• In the next 25/30 minutes

• Learning to Learn

• Creating Kubernetes Experiment Playground

• Running Tensorflow on Kubernetes

• Keeping up to date with the community

Learn By

Doing

Goals are clear

Feedback is immediate

Demo Time!

Minikube

Tensorflow Playground

Kubeadm

Tensorflow on Kubernetes

Create Kubernetes Cluster

Minikube, Kubeadm

apiVersion: extensions/v1beta1

kind: Deployment

metadata:

name: inception-deployment

labels:

k8s-app: inception-deployment

spec:

replicas: 3

selector:

matchLabels:

k8s-app: inception-deployment

template:

metadata:

labels:

k8s-app: inception-deployment

annotations:

scheduler.alpha.kubernetes.io/critical-pod: ''

spec:

containers:

- name: inception-container

image: katacoda/tensorflow_serving

imagePullPolicy: Never

command:

- /bin/sh

- -c

args:

- /serving/bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server

--port=9000 --model_name=inception --model_base_path=/serving/inception-export

ports:

- containerPort: 9000

apiVersion: batch/v1

kind: Job

metadata:

name: inception-client

spec:

template:

metadata:

name: inception-client

spec:

containers:

- name: inception-client

image: katacoda/tensorflow_serving

imagePullPolicy: Never

command:

- /bin/bash

- -c

args:

- /serving/bazel-bin/tensorflow_serving/example/inception_client

--server=inception-deployment:9000 --image=/data/cat.jpg

volumeMounts:

- name: inception-persistent-storage

mountPath: /data

volumes:

- name: inception-persistent-storage

hostPath:

path: /root

restartPolicy: Never

Kubernetes and Tensorflow at

scale?

https://www.tensorflow.org/deploy

/distributed

https://www.youtube.com/watch?v=yFXNASK0cPk

Worker Worker Worker Worker

Parameter Server

Parameter Server

# On ps0.example.com:

$ python trainer.py \

--ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \

--worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \

--job_name=ps --task_index=0

# On ps1.example.com:

$ python trainer.py \

--ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \

--worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \

--job_name=ps --task_index=1

# On worker0.example.com:

$ python trainer.py \

--ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \

--worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \

--job_name=worker --task_index=0

# On worker1.example.com:

$ python trainer.py \

--ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \

--worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \

--job_name=worker --task_index=1

Kubernetes namespace

Kubernetes PodKubernetes PodKubernetes Pod

Containerized TF Worker

Containerized TF Worker

Containerized TF Worker

Kubernetes Deployment

Containerized TF PS

Server 1 Server 2 Server 3

Kubernetes namespace

Kubernetes PodKubernetes PodKubernetes Pod

Containerized TF Worker

Containerized TF Worker

Containerized TF Worker

Kubernetes Deployment

Containerized TF PS

Storage

Server 4 Server 5

Kubernetes namespace

Kubernetes PodKubernetes PodKubernetes Pod

Containerized TF Worker

Containerized TF Worker

Containerized TF Worker

Kubernetes Deployment

Containerized TF PS

GPU1 GPU2 GPU3

Server 1 Server 2

GPU1 GPU2 GPU3

Server 3

Kubernetes namespace

Kubernetes PodKubernetes PodKubernetes Pod

Containerized TF Worker

Containerized TF Worker

Containerized TF Worker

Kubernetes Deployment

Containerized TF PS

Storage

Server 4 Server 5

Docker Container and GPU docker run -it \

--device /dev/nvidia0:/dev/nvidia0 \

--device /dev/nvidia1:/dev/nvidia1 \

--device /dev/nvidiactl:/dev/nvidiactl \

--device /dev/nvidia-uvm:/dev/nvidia-uvm \

tf-cuda:v1.1beta /bin/bash

Summary

• Kubernetes is designed for running

distributed systems at scale

• The model of Tensorflow fits cleanly into

Kubernetes

• As Tensorflow usage increases,

Kubernetes can scale to meet demands

www.katacoda.com

Call To Action

• Interested in sharing your Kubernetes or

Tensorflow experience? Write your own

scenarios and teach interactively!

• Teaching teams internally? Private

Katacoda environments

Thank you!@Ben_Hall

[email protected]

Blog.BenHall.me.uk

www.Katacoda.com