39
Parallelizations within Neural Networks James Mnatzaganian Jim Mazza

Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

  • Upload
    hakiet

  • View
    232

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Parallelizations within Neural

Networks

James Mnatzaganian

Jim Mazza

Page 2: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Outline

• Artificial Neural Networks (ANNs)

• Types of Parallelism

• Task Partitioning and Grain Size

• Cluster vs. GPU Implementations

• Network Considerations

• Final Remarks

5/13/2014 J. Mnatzaganian & J. Mazza 2

Page 3: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Artificial Neural Networks

(ANNs)

5/13/2014 J. Mnatzaganian & J. Mazza 3

Page 4: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Outline

• ANN Overview

• Multilayer Perceptron (MLP)

• Restricted Boltzmann Machine (RBM)

• Deep Belief Network (DBN)

5/13/2014 J. Mnatzaganian & J. Mazza 4

Page 5: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

ANN – Overview

• “Biologically-inspired”

– Neurons (sum of products)

– Synapses (weight + activation function)

• Non-linear, universal, function approximators

5/13/2014 J. Mnatzaganian & J. Mazza 5

Pre-Neuron

SynapsePost-

Neuron

Weight

IN OUT

Page 6: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

S0

S1

Sn-1

Sn

S0,1,…,n-1,n ≡

Multilayer Perceptron (1)

5/13/2014 J. Mnatzaganian & J. Mazza 6

𝑏𝑙 ≡ 𝑏𝑖𝑎𝑠 𝑎𝑡 𝑙𝑎𝑦𝑒𝑟 𝑙

𝑛 ≡ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑛𝑝𝑢𝑡, 𝑛𝑒𝑢𝑟𝑜𝑛𝑠, 𝑜𝑟 𝑜𝑢𝑡𝑝𝑢𝑡𝑠 𝑎𝑡 𝑡ℎ𝑎𝑡 𝑙𝑎𝑦𝑒𝑟 − 1

𝑆𝑖 ≡ 𝑆𝑦𝑛𝑎𝑝𝑠𝑒 𝑖 𝑁𝑙𝑖 ≡ 𝑁𝑒𝑢𝑟𝑜𝑛 𝑖 𝑎𝑡 𝑙𝑎𝑦𝑒𝑟 𝑙

𝑂𝑖 ≡ 𝑂𝑢𝑝𝑢𝑡 𝑖

𝐿 ≡ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 ℎ𝑖𝑑𝑑𝑒𝑛 𝑙𝑎𝑦𝑒𝑟𝑠 − 1

𝐼𝑖 ≡ 𝐼𝑛𝑝𝑢𝑡 𝑖

I0

I1

In-1

In

𝑆𝑏0,𝐼0,𝐼1,…,𝐼𝑛−1,𝐼𝑛

𝑆𝑏0,𝐼0,𝐼1,…,𝐼𝑛−1,𝐼𝑛

𝑆𝑏0,𝐼0,𝐼1,…,𝐼𝑛−1,𝐼𝑛

𝑆𝑏0,𝐼0,𝐼1,…,𝐼𝑛−1,𝐼𝑛

𝑁00

𝑁01

𝑁0𝑛−1

𝑁0𝑛

Input Layer ≡

𝑏1

𝑏0

Page 7: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Multilayer Perceptron (2)

5/13/2014 J. Mnatzaganian & J. Mazza 7

𝑏𝑙

𝑆𝑏𝑙,𝑁𝑙0 ,𝑁𝑙1 ,…,𝑁𝑙𝑛−1 ,𝑁𝑙𝑛𝑁𝑙0

𝑁𝑙1

𝑁𝑙𝑛−1

𝑁𝑙𝑛

Hidden Layerl ≡

𝑆𝑏𝑙,𝑁𝑙0 ,𝑁𝑙1 ,…,𝑁𝑙𝑛−1 ,𝑁𝑙𝑛

𝑆𝑏𝑙,𝑁𝑙0 ,𝑁𝑙1 ,…,𝑁𝑙𝑛−1 ,𝑁𝑙𝑛

𝑆𝑏𝑙,𝑁𝑙0 ,𝑁𝑙1 ,…,𝑁𝑙𝑛−1 ,𝑁𝑙𝑛

𝑁𝑖0

𝑁𝑖1

𝑁𝑖𝑛−1

𝑁𝑖𝑛

𝑏𝑙+1

Page 8: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Multilayer Perceptron (3)

5/13/2014 J. Mnatzaganian & J. Mazza 8

𝑏𝐿

𝑆𝑏𝐿,𝑁𝐿0 ,𝑁𝐿1 ,…,𝑁𝐿𝑛−1,𝑁𝐿𝑛𝑁𝐿0

𝑁𝐿1

𝑁𝐿𝑛−1

𝑁𝐿𝑛

Output Layer ≡

𝑆𝑏𝐿,𝑁𝐿0 ,𝑁𝐿1 ,…,𝑁𝐿𝑛−1,𝑁𝐿𝑛

𝑆𝑏𝐿,𝑁𝐿0 ,𝑁𝐿1 ,…,𝑁𝐿𝑛−1,𝑁𝐿𝑛

𝑆𝑏𝐿,𝑁𝐿0 ,𝑁𝐿1 ,…,𝑁𝐿𝑛−1,𝑁𝐿𝑛

𝑁𝑖0

𝑁𝑖1

𝑁𝑖𝑛−1

𝑁𝑖𝑛

Page 9: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Multilayer Perceptron (4)

5/13/2014 J. Mnatzaganian & J. Mazza 9

Input Layer

Hidden Layer0

Output Layer

Hidden Layer1

Hidden LayerL-1

Hidden LayerL

Hidden LayersInput Layer Output Layer+ +

MLP

Page 10: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Restricted Boltzmann Machine

5/13/2014 J. Mnatzaganian & J. Mazza 10

ℎ𝑖 ≡ ℎ𝑖𝑑𝑑𝑒𝑛 𝑛𝑒𝑢𝑟𝑜𝑛 𝑖 𝑣𝑖 ≡ 𝑣𝑖𝑠𝑖𝑏𝑙𝑒 𝑛𝑒𝑢𝑟𝑜𝑛 𝑖𝑏

ℎ0

ℎ1

ℎ𝑛−1

𝑆𝑏,ℎ0,ℎ1,…,ℎ𝑛−1,ℎ𝑛

𝑆𝑏,ℎ0,ℎ1,…,ℎ𝑛−1,ℎ𝑛

𝑆𝑏,ℎ0,ℎ1,…,ℎ𝑛−1,ℎ𝑛

𝑣0

𝑣1

𝑣𝑛−1

𝑣𝑛ℎ𝑛

𝑆𝑏,ℎ0,ℎ1,…,ℎ𝑛−1,ℎ𝑛

RBM ≡

Page 11: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Deep Belief Network

• DBN Training

– Unsupervised, pre-training

with stacked RBMs

– Fine-tune, supervised,

training

• Initialize weights that RBMs

learned to weights of deep

MLP5/13/2014 J. Mnatzaganian & J. Mazza 11

RBM0

RBM1

RBMn

RBMn-1

Trains on input

RBM1 – RBMn train on either the mean activations or the samples:

𝑀𝑒𝑎𝑛 𝐴𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑖 = 𝑝(ℎ𝑖+1 = 1|ℎ𝑖)

𝑆𝑎𝑚𝑝𝑙𝑒𝑠 𝑖 = 𝑝(ℎ𝑖+1|ℎ𝑖)

Page 12: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Types of Parallelism

5/13/2014 J. Mnatzaganian & J. Mazza 12

Page 13: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Outline

• Data Parallelism

• Functional Parallelism

5/13/2014 J. Mnatzaganian & J. Mazza 13

Page 14: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Data Parallelism

5/12/2014 Mnatzaganian & Mazza 14

• First type of parallelism present lies in

update synapse step

• Data parallelism typically scales with

problem size

• Each synapse not dependent on

others

• Can all update in parallel

• Easy to load balance

• Computation is the same among each

neuron http://en.wikipedia.org/wiki/Artificial_neural_network

Page 15: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Functional Parallelism

5/12/2014 Mnatzaganian & Mazza 15

• Functional parallelism in multiply

accumulate stages of the network

• Refers to sets of tasks that can be

done in parallel

• Not perfectly load balanced

• In networks with cyclic connections,

number of synapses summed in

each stage may not be constant

• Could be used to pipeline synapse

updating or neuron multiply

accumulate functions http://www.alanturing.net/

Page 16: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Task Partitioning and Grain Size

5/13/2014 J. Mnatzaganian & J. Mazza 16

Page 17: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Outline

• Task Partitioning

• Task Grain Size

5/13/2014 J. Mnatzaganian & J. Mazza 17

Page 18: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Task Partitioning

• Size of network is known at time of execution

• Network can be statically partitioned into different sections

• Execution time can be reasonably estimated ahead of time

• Static assignment determines number of synapses and nodes per task

• Dynamic partitioning not necessary and could incur additional overheads without justification

5/12/2014 Mnatzaganian & Mazza 18

Page 19: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Task Grain Size

5/12/2014 Mnatzaganian & Mazza 19

• Many nodes

could be

grouped

together

• Increases c-

to-c ratio if

task size too

small

• Load balance

could

become more

uneven if size

too large

• Layers could be

pipelined to

increase

performance

• Exploits inherent

function

parallelism

• Grain size

determines size of

pipeline stage and

number of nodes

included in each

stage

http://www.neurosolutions.com/

Page 20: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Cluster vs. GPU Implementations

5/13/2014 J. Mnatzaganian & J. Mazza 20

Page 21: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Outline

• Cluster Implementation

• GPU Utilization

• Cluster vs. GPU

• Cluster Speedup

• GPU Speedup

5/13/2014 J. Mnatzaganian & J. Mazza 21

Page 22: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Cluster Implementation

• Neural network can be implemented on

clusters using OpenMP and MPI

programming models

• Master-slave model is used

• Master takes care of seed generation,

updating the weights, and determining

execution

• Slaves perform highly parallel

computational steps

5/12/2014 Mnatzaganian & Mazza 22

Parameter Initialization

Weight Generation

Send Data (MPI)

Synapse Update

Neuron Update

Page 23: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

GPU Utilization

5/12/2014 Mnatzaganian & Mazza 23

• Neural Networks have also been

implemented using Cuda for Nvidia

GPUs

• The neuron update step is

essentially a vector dot product with

a column of the weight matrix

• Each step is extremely data parallel

• Vector operations are well suited to

GPU architectures

Casas [3]

Page 24: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Cluster vs. GPU

• Cluster Advantages– Master-Slave architecture works well for learning section of algorithm

– Slaves can compute synapse values and neuron function in parallel

• Cluster Disadvantages– May not scale well depending on network

– Extreme contention at master node if grain size is too small

• GPU Advantages– Vector operations are extremely well suited for GPU’s

– Scales well with increasing problem size

• GPU Disadvantages– Some portions of algorithm are very inefficient on GPU architecture

– Power consumption of GPU is very high compared to CPU approach

5/12/2014 Mnatzaganian & Mazza 24

Page 25: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Cluster Speedup

5/13/2014 J. Mnatzaganian & J. Mazza 25

• Cellular Neural Network speedup

• 16 nodes w/ Pentium 4 @3.06 GHz

• ANN used in time

forecasting applications

• Intel Xeon E5504 @2.0

GHz

Gonzalez [4]

Weishaupl [5]

Page 26: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

GPU Speedup

5/13/2014 J. Mnatzaganian & J. Mazza 26

Strigl [6]

Page 27: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Network Considerations

5/13/2014 J. Mnatzaganian & J. Mazza 27

Page 28: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Outline

• Mapping Neural Networks

• Mapping Scheme Comparison

• Mapping Conclusions

5/13/2014 J. Mnatzaganian & J. Mazza 28

Page 29: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Mapping Neural Networks

• Neural Networks can be mapped to any number of

networks but hyper cubes and meshes are typically used

• Forward learning stages include large amounts of

communication between nodes

• For cluster implementations large amounts of contention

exist at the master node

5/12/2014 Mnatzaganian & Mazza 29

Page 30: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Mapping Scheme Comparisons

Mapping Scheme Architecture Communications Multiplications Additions

Kung Systolic Ring O(N) O(N) O(N)

Shams SIMD 2D Mesh O(N) O(N) O(N)

Lin SIMD 2D Mesh O(N) O(1) O(N)

Shams SIMD 2D Mesh O(N) O(N) O(N)

Kim SIMD MPAA O(1) O(N) O(N)

Malluhi SIMD Hypercube O(logN) O(1) O(logN)

Ayoubi SIMD 2D Mesh O(logN) O(1) O(logN)

5/12/2014 Mnatzaganian & Mazza 30

Ayoubi [7]

Page 31: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Final Remarks

5/13/2014 J. Mnatzaganian & J. Mazza 31

Page 32: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Outline

• Parameter Optimizations

• Applications

• Conclusion

• References

5/13/2014 J. Mnatzaganian & J. Mazza 32

Page 33: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Parameter Optimizations

• DBN consists of 10 + L parameters

• Model for ideal parameters is non-linear

• Perform “search” in parallel

– Master utilizes genetic algorithm (or similar)

– Dynamic queue

– Slaves compute specific parameter set

• Sequential operation OR

• Parallelizations within each slave’s parameter set

5/13/2014 J. Mnatzaganian & J. Mazza 33

Page 34: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Applications

• Computer Vision

– Optical character recognition

– Image recognition

• Natural Language Processing

– Document recognition

– Sentiment analysis

• Automatic Speech Recognition

– Call routing

– Speech to text

5/13/2014 J. Mnatzaganian & J. Mazza 34

Page 35: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Conclusion

• ANNs provide a powerful modeling

framework

• ANNs are highly parallelizable

– Both data and functional parallelism

• Cluster and GPU can be targeted for

implementation

• ANNs are best suited for solving non-

traditional problems

5/13/2014 J. Mnatzaganian & J. Mazza 35

Page 36: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

References

[1] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley and Y. Bengio.

“Theano: A CPU and GPU Math Expression Compiler”. Proceedings of the Python for Scientific Computing

Conference (SciPy) 2010. June 30 - July 3, Austin, TX

[2] G.E. Hinton and R.R. Salakhutdinov, Reducing the Dimensionality of Data with Neural Networks, Science, 28 July

2006, Vol. 313. no. 5786, pp. 504 - 507

[3] Casas, C.A., "Parallelization of artificial neural network training algorithms: A financial forecasting

application," Computational Intelligence for Financial Engineering & Economics (CIFEr), 2012 IEEE Conference on ,

vol., no., pp.1,6, 29-30 March 2012

[4] Gonzalez, B.P.; Donate, J.P.; Cortez, P.; Sanchez, G.G.; de Miguel, A.S., "Parallelization of an evolving Artificial Neural

Networks system to Forecast Time Series using OPENMP and MPI," Evolving and Adaptive Intelligent Systems

(EAIS), 2012 IEEE Conference on , vol., no., pp.186,191, 17-18 May 2012

[5] Weishaupl, T.; Schikuta, E., "Parallelization of cellular neural networks for image processing on cluster

architectures,"Parallel Processing Workshops, 2003. Proceedings. 2003 International Conference on , vol., no.,

pp.191,196, 6-9 Oct. 2003

[6] Strigl, D.; Kofler, K.; Podlipnig, S., "Performance and Scalability of GPU-Based Convolutional Neural

Networks," Parallel, Distributed and Network-Based Processing (PDP), 2010 18th Euromicro International

Conference on , vol., no., pp.317,324, 17-19 Feb. 2010

[7] Ayoubi, R.A.; Bayoumi, M.A., "Efficient mapping algorithm of multilayer neural network on torus architecture,"Parallel

and Distributed Systems, IEEE Transactions on , vol.14, no.9, pp.932,943, Sept. 2003

5/13/2014 J. Mnatzaganian & J. Mazza 36

Page 37: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Questions?

5/13/2014 J. Mnatzaganian & J. Mazza 37

Page 38: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Backup

5/13/2014 J. Mnatzaganian & J. Mazza 38

Page 39: Parallelizations within Neural Networksmeseec.ce.rit.edu/756-projects/spring2014/1-2.pdf · • Cellular Neural Network speedup ... "Parallelization of cellular neural networks for

Virtual Layers in Mapping

5/13/2014 J. Mnatzaganian & J. Mazza 39

• Example of Mapping 5x6 weight matrix onto 4x4 processor array

• Needs 4 virtual layers (A,B,C,D)

Ayoubi [7]