28
© 2017 IBM Corporation 이보란 IBM Cognitive Systems 컨테이너를 활용하는 GPU 딥러닝 소개

Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation

이보란IBM Cognitive Systems

컨테이너를 활용하는 GPU

딥러닝 소개

Page 2: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 2Page

Introduction to Cognitive workloads and POWERRequirements for GPUs in Data CenterDocker on POWER system (with GPU)Orchestration for container with GPU resource

Agenda

Page 3: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 3Page

Welcome to the GPU World!

Introduction to Cognitive workloads and POWER

Page 4: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 4Page

HPC/HPDA Platform

– Accelerated computing

GPUs, FPGAs, Intel Xeon Phi

– Open Software / Open Standards / Open Stacks

– Data-centric computing

– Low latency Interconnect

– Intel OmniPath, InfiniBand, RoCE

– Software Defined Infrastructure

Introduction to Cognitive workloads and POWER

• S822LCforHPC“Minsky서버”• 4NVIDIATeslaP100GPU장착 (w/POWER8)• 업계 유일한 CPU-GPU NVLink 아키텍처• PCIe 대비 2.5배의 NVLink 1.0대역폭 제공• PowerAI Package포함

• 차세대 POWER9+ Volta GPU(Soon)

OpenPOWER SystemswithNVLink

POWER9

GPUGPU NVLink2.0

75+75GB/s

차세대 POWER9 + Volta GPU 서버

NVLink 2.0NVLink 1.0

• NVLink 2.0 링크 당 25GB/s

• 최대 6개 링크 연결 (150GB/s)

• Cache Coherence 지원

Page 5: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 5Page

딥러닝을 위한 준비단계

Introduction to Cognitive workloads and POWER

데이터 준비

- 데이터 라벨링

- 데이터 Preprocessing

개발

- 모델 개발

- 프레임워크 테스트

훈련

- 배치 Job 수행

Page 6: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 6Page

대량의 연산이 필요한 훈련단계

Introduction to Cognitive workloads and POWER

• 딥러닝의 심층신경망(CNN)에 훈련 데이터를 넣어 입력의 정확도를 나타내는 가중치의 균형을 찾는 과정è 학습 내용에 대한 인식 정확도가 높은 CNN 모델 (Trained model)을 얻기 위한 과정

• Training의 중요 요소• 더 많은 데이터 (ex 이미지, 영상, etc) = 더 정확도가 높은 trained 모델을 얻을 확률 ↑• 엄청난 연산 자원 (ex 중국어 음석 인식 모델 연구용 4TB 훈련데이터의 경우, 20EFLOPS 연산 필요)

Input : Labeled datas Goal Result : Trained Model

Page 7: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 7Page

딥러닝을 위한 인프라

Introduction to Cognitive workloads and POWER

로그인 서버(사용자 접근 용도)

고성능 I/O인터커넥트

Data-centric스토리지(공유 데이터 저장소)

개발 서버- 개발/훈련 환경 이미지 빌드

- 프레임워크 추가 개발

훈련 서버(Computenode)

Page 8: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 8Page

어떻게 하면 GPU를 효율적으로 사용할 수 있을까?

Requirements for GPUs in the Data Center

“최신 버전의 오픈소스로 테스트하고 싶은데,다른 개발자들이 사용하는 버전과 달라요.”

“ GPU 수량은 많이 확보 했는데,막상 사용하다 보니 전체 사용률이 너무

낮아요.”“저는 다른 개발자보다 더 많은 GPU를확보해야 테스트를 해볼 수 있어요.”

“우리팀은 늘 GPU 개수가 부족해요. 1인 1GPU는 있어야하는데.. “

Job Scheduler?

Container?

Page 9: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 9Page

다양한 사용자의 Job을 적절한 GPU서버에 배치

Requirements for GPUs in the Data Center

3. Training 서버의GPU 자원 모니터링

GPU GPU GPU

GPU GPU GPU

GPU

GPU

가용한 GPU 자원

사용중인 GPU 자원

Training 서버 Pool

1. Training batch job 제출

Job Scheduler(Spectrum LSF)

4. 가용 GPU 자원이 있는서버에 Batch job 배치

2. Queue에서 Job 대기

5. Job 수행

- GPU 자원현황 파악할 필요 없이 즉시 Job 제출- 상관관계에 따라 연속적으로 Job 수행하도록 여러개

의 Job 제출

관리자

- 사용자/그룹별 자원 사용제한 가능

- Job 수행 우선순위 설정

Page 10: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 10Page

Spectrum LSF의 10.1 GPU-aware

Requirements for GPUs in the Data Center

• CUDA 7 이상 지원

• Power8 LE 지원

• GPU/공유 GPU 사용률 제공

• 공유 모드에서 할당

• CPU-GPU Affinity

- Now “Best effort”

- Forced allocation (Planned)

• Control Groups 통한 안정된 자원 할당

• 복수 GPU사용 작업에서 터보모드 통제

• 작업 부재 시 GPU 절전 모드

• MPS Integration 지원

• DCGM Integration 지원

Page 11: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 11Page

Spectrum LSF의 10.1 GPU-aware

Requirements for GPUs in the Data Center

Page 12: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 12Page

Spectrum LSF의 10.1 GPU-aware

Requirements for GPUs in the Data Center

lsf.cluster.<name> Sharedmode:bsub -R“select[ngpus>0]rusage [ngpus_shared=2]”gpu_app

ExclusiveProcessmode(MPIjob):bsub -n4-R“select[ngpus>0]rusage[ngpus_excl_p=2]”mpirun –lsf gpu_app

ExclusiveThreadmode(MPIjob):bsub -n2-R“select[ngpus>0]rusage[ngpus_excl_t=2]”mpirun -lsf gpu_app

Page 13: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 13Page

Spectrum LSF의 Job Scheduling

Requirements for GPUs in the Data Center

Page 14: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 14Page

Spectrum LSF의 Job Scheduling

Requirements for GPUs in the Data Center

Page 15: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 15Page

Spectrum LSF의 GPU Allocation

Requirements for GPUs in the Data Center

login1:~/lab1 $ bsub -R "select[ngpus>0]

rusage[ngpus_excl_p=2]"

-q P100_pool nvidia-docker run

lab1/caffe0.15:ppc64le ./caffe train -gpu 0,1

--solver=/gpfs/solver.prototxt

Login 서버 #1

LSF master

LSF slave GPU

#0

GPU

#1

GPU

#2

GPU

#3 LSF

slave GPU

#0

GPU

#1

GPU

#2

GPU

#3 LSF

slave GPU

#0

GPU

#1

GPU

#2

GPU

#3

P100_pool queue 소속 서버들

Job#1 Job#2Job#3

MyJob

Page 16: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 16Page

연구원들의 독립된 SW 개발 환경을 제공

Requirements for GPUs in the Data Center

Page 17: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 17Page

이미지 빌드/보관을 위한 개발시스템

Requirements for GPUs in the Data Center

OS서버HW

Build Store Run

OS서버HW

OS서버HW

OS서버HW

OS서버HW

OS서버HW

OS서버HW

Private Hub

빌드 전용 개발시스템 내부 repository Training용 GPU pool

Page 18: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 18Page

로그인 서버

Mellanox Infiniband Switch System

IBM Power S822LC IBM Spectrum LSF

IBM Power S822LCIBM Spectrum LSF

Ethernet Switch System

Data-centric 스토리지

IBM Power S812LIBM Spectrum Scale

IBM Power S812LIBM Spectrum Scale

Storage

개발 서버 훈련 서버

IBM S822LC for HPCTesla P100 GPU * 4EA

PowerAI on Linux ……

IBM S822LC for HPCTesla P100 GPU * 4EA

PowerAI on Linux

IBM S822LC for HPCTesla P100 GPU * 4EA

PowerAI on Linux ……

Docker + Scheduler 사용 사례

Docker on POWER system (with GPU)기능 서버 Switch Storage Software

로그인 서버 #1 로그인 서버 #2

Mellanox 100Gb Infiniband Switch

Data-centric 스토리지

NSD 서버 #1 NSD 서버 #2

Storage

Docker Private Hub(Local Repository)

개발 서버

개발서버 #1

이미지 빌드

훈련 서버

훈련서버 #1

……

훈련서버 #n

컨테이너 수행(배치 Job)

Pull

여러개의 컨테이너 수행 가능

SharedStorage

/docker_repository/user_workplace1…./user_workplaceX

Page 19: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 19Page

Bare-metal vs Container for GPU

Docker on POWER system (with GPU)

Now, nvidia-docker2# docker run –runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 --rm nvidia/cuda nvidia-smi

Page 20: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 20Page

확장되는 GPU 인프라 환경에 대한 고민– GPU Allocation

– GPU Isolation

– GPU Discovery

– GPU Priority placement

– GPU Liveness check

– Volume Manager (Injection)

– Visibility, control, and sharing (to be done)

Docker on POWER system (with GPU)

Multi-GPU Support on Kubernetes 1.6 -- Upstream

Page 21: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 21Page

오픈소스를 활용한 Orchestration, GPU-scheduling

Orchestration for container with GPU resource

GPU 서버 pool

kubectl

kubelet cAdvisor proxy

pod pod

kubelet cAdvisor proxy

pod pod

kubelet cAdvisor proxy

pod pod

k8sMaster

ReplicationController

API Server Scheduler

etcd

IBM Cloud Private – GUI Dashboard

Learnmoreandregisteronourcommunitypage:http://ibm.biz/ConductorForContainers

Page 22: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 22Page

GPU Auto-discovery

Orchestration for container with GPU resource

Page 23: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 23Page

Page 24: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 24Page

Page 25: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 25Page

GPU Allocation & Monitoring

Page 26: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 26Page

Example yaml2 apiVersion:extensions/v1beta1kind:Deploymentmetadata:name:gpu-demo3spec:replicas:1template:metadata:labels:run:gpu-demoannotations:scheduler.alpha.kubernetes.io/nvidiaGPU:"{\n

\"AllocationPriority\":\“Dense\"\n}\n"spec:containers:- name:gpu-demo3image:nvidia/cuda-ppc64le:8.0-cudnn6-runtime-ubuntu16.04command:- "/bin/sh"- "-c"args:- nvidia-smi&&tail-f/dev/nullresources:limits:alpha.kubernetes.io/nvidia-gpu:4

Page 27: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 27Page

root@firestone:~#dockerps|grepgpu-demo*f16724f8615envidia/cuda-ppc64le"/bin/sh-c'nvidi..."AboutanhouragoUpAboutanhourk8s_gpu-demo_gpu-demo-3432291976-c152n_default_32ca2ca5-bd81-11e7-b8ab-98be9467eb68_017dec7d500b0ibmcom/pause-ppc64le:3.0"/pause"AboutanhouragoUpAboutanhourk8s_POD_gpu-demo-3432291976-c152n_default_32ca2ca5-bd81-11e7-b8ab-98be9467eb68_0

Page 28: Session4 GPU 이보란 v0.4 (Final) · 2017-11-16 · HPC/HPDA Platform –Accelerated computing GPUs, FPGAs, Intel Xeon Phi –Open Software / Open Standards / Open Stacks –Data-centric

© 2017 IBM Corporation 28Page

개발자라면 지금 방문하세요!developer.ibm.com/kr

Thank you