35
TRAINING NEURAL TRAINING NEURAL NETWORKS ON THE NETWORKS ON THE EDGE EDGE Navjot Kukreja, Alena Shilova

TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

TRAINING NEURALTRAINING NEURALNETWORKS ON THENETWORKS ON THE

EDGEEDGENavjot Kukreja, Alena Shilova

Page 2: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

Also:

Olivier BeaumontJan HuckelheimNicola FerrierPaul HovlandGerard Gorman

Page 3: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

BACKGROUNDBACKGROUND

Page 4: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

Typical data �ow pattern for adjoint problems

Page 5: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

Memory consumption during an adjoint problem

Page 6: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

Checkpointing (Revolve)

Page 7: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

    Setup

Forward step Executing forward step Saved forward step Reverse step Executing reverse step Reverse step completed    

Page 8: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

Where else do we see the same data-access pattern?

VGGNet

Page 9: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

ARRAY OF THINGSARRAY OF THINGS

Page 10: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow
Page 11: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

WAGGLE PAYLOAD COMPUTERWAGGLE PAYLOAD COMPUTERODROID XU4 based on the Samsung Exynos5422CPUfour A15 cores, four A7 coresMali-T628 MP6 GPU that supports OpenCL, 2GBLPDDR3 RAMattached �ash storage

Page 12: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

VIEWPOINT PROBLEMVIEWPOINT PROBLEM

Page 13: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow
Page 14: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow
Page 15: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

STUDENT-TEACHER MODELSTUDENT-TEACHER MODEL

Page 16: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

CHALLENGESCHALLENGESNetwork (not a challenge)Storage (not a challenge)Computation (not necessarily a challenge)Memory!

Page 17: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

MEMORY REQUIRED TO TRAIN RESNETMEMORY REQUIRED TO TRAIN RESNET

Page 18: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

Memory required (MB) for image size $224 \times224$

Page 19: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

Memory required (MB) for batch size 1

Page 20: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

Memory required (GB) for batch size 8

Page 21: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

CHECKPOINTINGCHECKPOINTING

Page 22: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

PyTorch

fast-evolving Python package widely applied in deeplearninguses Tensors as a basic classTensors are similar to NumPy arrays which also allowto work with them on GPUdynamically de�nes the computational graph of themodeldesigned to be memory ef�cient: there ischeckpointing strategy

Page 23: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

Checkpoint sequential: number of segments = 2

Page 24: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

Checkpoint sequential: number of segments = 2

Page 25: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

Checkpoint sequential: number of segments = 2

Page 26: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

Checkpoint sequential: number of segments = 2

Page 27: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

Checkpoint sequential: number of segments = 2

$$ \mbox{Memory} = s - 1 + \bigl(l - \left\l�oor l/s\right\r�oor (s -1) \bigr). $$

Page 28: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

Revolve: dynamic programming

$$ \small{\mbox{Opt}[\ell,1] = \frac{\ell (\ell +1)}{2} u_f+ (\ell+1 ) u_b}$$

$$\small{\mbox{Opt}[1, c] = u_f +2 u_b}$$

$$\small{\mbox{Opt}[\ell, c] = \min_{1 \leq i \leq \ell-1} (i u_f +\mbox{Opt}[\ell - i, c -1] + \mbox{Opt}[i-1, c]) }$$

Page 29: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

Comparison of Checkpoint sequential and Revolve

Batch Size: $1$, Image Size: $224 \times 224$

Page 30: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow
Page 31: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

Batch Size: $8$, Image Size: $224 \times 224$

Page 32: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

Batch Size: $1$, Image Size: $500 \times 500$

Page 33: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

Batch Size: $8$, Image Size: $500 \times 500$

Page 34: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

PRACTICAL IMPLEMENTATION ANDPRACTICAL IMPLEMENTATION ANDCONCLUDING REMARKSCONCLUDING REMARKS

Page 35: TRAINING NEURAL NETWORKS ON THE EDGE · Navjot Kukreja, Alena Shilova. Also: Olivier Beaumont Jan Huckelheim Nicola Ferrier Paul Hovland Gerard Gorman. BACKGROUND. Typical data ow

THANK YOUTHANK YOU