Lecture 12 TensorLayer 2.0 50mins - Deep Generative Models · 2021. 4. 29. · TensorLayer2.0. Hao Dong Peking University. 1. 2. •History of Deep Learning Tools •History of TensorLayer

TensorLayer 2.0Hao Dong

Peking University

1

2

• History of Deep Learning Tools• History of TensorLayer• Future of TensorLayer• Static vs. Dynamic Models• Switching Train/Test Modes• Reuse Weights• Model Information• Customize Layer without Weights• Customize Layer with Weights• Dataflow• Distributed Training

Background

How to Use

How to Use Better

3


Background

How to Use

How to Use Better

4

History of Deep Learning Tools

• Automatic Differentiation

Key reasons for TensorFlow● Largest user base● Widest production adoption● Well-maintained documents● Battlefield-proof quality● TPU !

5


Deep learning high-level elements: neural networks, layers and tensors

http://www.asimovinstitute.org/neural-network-zoo/

Abstraction gap

Low-level interface: dataflow graph, placeholder, session, queue runner, devices …

https://www.tensorflow.org/extend/architecture

Bridged by wrappers: TensorLayer, Keras and TFLearn

• Beyond Automatic Differentiation

6


Theano TensorFlow

Torch Pytorch

Caffe Caffe2

Time

Keras Tflearn TensorLayer

CNTK2016

2010 2015

2017

2014

2016 Sep

2017

2014

2015 2016

2018

7


Theano TensorFlow

Torch Pytorch

Caffe Caffe2

Time

CNTK2016

2010 2015

2017

2014 2017

2002

2018

transfer

transfer

transfermerge

TensorFlow 2.02019

8


Background

How to Use

9

History of TensorLayer

2016 2019

Time

2015 2019

10


10

• TensorLayer 2.0

TensorLayer: A Versatile Library for Efficient Deep Learning Development. H. Dong, A. Supratak et al. ACM MM 2017.

11


GithubDocumentation(English)

12


Background

How to Use

13

Future of TensorLayer

GPU TPU Huawei 晟腾 ShengTeng …

• Platform-agnostic• Research oriented

TensorFlow MindSpore …

14


Contributors

15


16


Background

How to Use

17

Static vs. Dynamic ModelsStaticModel

LasagneFashio

n

18

Static vs. Dynamic ModelsDynamicModel

ChainerFashion

19

Static vs. Dynamic Models

StaticModel DynamicModel

ChainerFashionLasagn

eFashion

20


Background

How to Use

21

Switching Train/Test Models

22


Background

How to Use

23

Reuse Weights

ReuseWeightsinStaticModel

24

Reuse Weights

ReuseWeightsinDynamicModel

25

Reuse Weights

ReuseWeightsinStaticModel ReuseWeightsinDynamicModel

26


Background

How to Use

27

Model Information

PrintModelArchitecture

[{'args': {'dtype': tf.float32,'layer_type': 'normal','name': '_inputlayer_1','shape': [None, 784]},

'class': '_InputLayer','prev_layer': None},

{'args': {'keep': 0.8, 'layer_type': 'normal', 'name': 'dropout_1'},'class': 'Dropout','prev_layer': ['_inputlayer_1_node_0']},

{'args': {'act’: ‘relu','layer_type': 'normal','n_units': 800,'name': 'dense_1'},

'class': 'Dense','prev_layer': ['dropout_1_node_0']},

{'args': {'keep': 0.8, 'layer_type': 'normal', 'name': 'dropout_2'},'class': 'Dropout','prev_layer': ['dense_1_node_0']},

{'args': {'act': ‘relu','layer_type': 'normal','n_units': 800,'name': 'dense_2'},

'class': 'Dense','prev_layer': ['dropout_2_node_0']},

{'args': {'keep': 0.8, 'layer_type': 'normal', 'name': 'dropout_3'},'class': 'Dropout','prev_layer': ['dense_2_node_0']},

{'args': {'act’: ‘relu','layer_type': 'normal','n_units': 10,'name': 'dense_3'},

'class': 'Dense','prev_layer': ['dropout_3_node_0']}]

28

Model Information

PrintModelInformation

GetSpecificWeights

SaveWeightsOnly

SaveWeights+ Architecture

.trainable_weights

.nontrainable_weights

29


Background

How to Use

30

Customize Layer without Weights

31


Background

How to Use

32

Customize Layer with Weights

33

Customize Layer with Weights

34


Background

How to Use

35

Dataflow

𝑡𝑖𝑚𝑒ModelTraining

(GPU)DataProcessing

(CPU)GPU Utilization: 70%

ModelTraining(GPU)

DataProcessing(CPU)

GPU Utilization: 50%

• Training without Dataflow

𝑡𝑖𝑚𝑒

Prepareabatchoftrainingdata

Oneiteration GPUIdle

36

Dataflow

𝑡𝑖𝑚𝑒ModelTraining

(GPU)DataProcessing

(CPU)GPU Utilization: 100%

ModelTraining(GPU)

DataProcessing(CPU)


• Training with Dataflow

𝑡𝑖𝑚𝑒

ModelTraining(GPU)

DataProcessing(CPU)


𝑡𝑖𝑚𝑒

37


Background

How to Use

38

Distributed Training

Model part 3Worker 3



Model Parallelism

Entire ModelWorker 3

Entire ModelWorker 2 Entire ModelWorker 1

Entire ModelWorker 4

Data Parallelism

39


Worker 1

Parameter Server

Worker 2 Worker 3

Averages gradients

Worker 1

Parameter Server 1

Worker 2 Worker 3

Averages different parts of the gradients

Parameter Server 2

• Distributed Training: Data Parallelism - Parameter Server


40

Worker 1

Worker 2 Worker 3

Worker 1

Worker 2 Worker 3

Worker 1

Worker 2 Worker 3

Step 1 Step 2

Step 3

• Distributed Training: Data Parallelism - Horovod - All ringreduce

41


• Distributed Training: Data Parallelism - Horovod - All ringreduce

Model Gradients Averaged GradientsWorker 1


Ring-allreduce


42

Please install TensorFlow and TensorLayer and make sure this code is runnable

https://github.com/tensorlayer/tensorlayer/blob/master/examples/basic_tutorials/tutorial_mnist_mlp_static.py

Thanks

43

Documents

Lecture 12 TensorLayer 2.0 50mins - Deep Generative Models · 2021. 4. 29. · TensorLayer2.0. Hao Dong Peking University. 1. 2. •History of Deep Learning Tools •History of TensorLayer