37
Efficient BP Algorithms for General Feedforward Neural Networks 1 S. Espa ˜ na Boquera, M.J. Castro Bleda, F. Zamora Mart´ ınez, J. Gorbe Moya Dep. Sistemas Inform´ aticos y Computaci ´ on Universidad Polit´ ecnica de Valencia, Spain 18-21 Jun 2007, Murcia, Spain 1 Work partially supported under contracts TIN2006-12767 and GVA06/302. Espa ˜ na, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 1 / 37

Efficient BP Algorithms for General Feedforward Neural Networks

Embed Size (px)

Citation preview

Page 1: Efficient BP Algorithms for General Feedforward Neural Networks

Efficient BP Algorithms for General FeedforwardNeural Networks1

S. Espana Boquera, M.J. Castro Bleda,F. Zamora Martınez, J. Gorbe Moya

Dep. Sistemas Informaticos y ComputacionUniversidad Politecnica de Valencia, Spain

18-21 Jun 2007, Murcia, Spain

1Work partially supported under contracts TIN2006-12767 and GVA06/302.Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 1 / 37

Page 2: Efficient BP Algorithms for General Feedforward Neural Networks

Index

1 Introduction and motivation

2 Preprocessing the ANN: The Consecutive Retrieval Problem

3 Tests of efficiencyEfficiency with general feedforward topologies

4 Additional Features of the BP implementationData description and manipulation facilitiesExample of use of the application

5 Conclusions and future work

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 2 / 37

Page 3: Efficient BP Algorithms for General Feedforward Neural Networks

Introduction and motivation

The BackPropagation (BP) algorithm is one of the most widelyused supervised learning techniques to train feedforward ArtificialNeural Networks (ANNs).There are many variations of the BP algorithm.What is usually required:

Specify general topologies.Good data description facilities. Not always practical or evenpossible to provide a set of input and output pairs.Fast implementation. Since training require many cpu time, arelative improvement (speed-up) makes a difference.

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 3 / 37

Page 4: Efficient BP Algorithms for General Feedforward Neural Networks

Introduction and motivation

The BackPropagation (BP) algorithm is one of the most widelyused supervised learning techniques to train feedforward ArtificialNeural Networks (ANNs).There are many variations of the BP algorithm.What is usually required:

Specify general topologies.Good data description facilities. Not always practical or evenpossible to provide a set of input and output pairs.Fast implementation. Since training require many cpu time, arelative improvement (speed-up) makes a difference.

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 4 / 37

Page 5: Efficient BP Algorithms for General Feedforward Neural Networks

Introduction and motivation

Specify general topologies.Example: ngram language model estimation

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 5 / 37

Page 6: Efficient BP Algorithms for General Feedforward Neural Networks

Introduction and motivation

The BackPropagation (BP) algorithm is one of the most widelyused supervised learning techniques to train feedforward ArtificialNeural Networks (ANNs).There are many variations of the BP algorithm.What is usually required:

Specify general topologies.Good data description facilities. Not always practical or evenpossible to provide a set of input and output pairs.Fast implementation. Since training require many cpu time, arelative improvement (speed-up) makes a difference.

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 6 / 37

Page 7: Efficient BP Algorithms for General Feedforward Neural Networks

Introduction and motivation

Good data description facilitiesExample: neural convolution kernel filter

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 7 / 37

Page 8: Efficient BP Algorithms for General Feedforward Neural Networks

Introduction and motivation

The BackPropagation (BP) algorithm is one of the most widelyused supervised learning techniques to train feedforward ArtificialNeural Networks (ANNs).There are many variations of the BP algorithm.What is usually required:

Specify general topologies.Good data description facilities. Not always practical or evenpossible to provide a set of input and output pairs.Fast implementation. Since training require many cpu time, arelative improvement (speed-up) makes a difference.

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 8 / 37

Page 9: Efficient BP Algorithms for General Feedforward Neural Networks

Introduction and motivation

Fast implementation

BP algorithm implementations: Tradeoff between efficiency andflexibility:

Specialized topologies (e.g.: layered)Connection weights stored in matricesBetter data locality and simplified data access.

General feedforward topologiesList of neurons in topological orderIncreased flexibility at expense of data locality.

→ Our main aim was to obtain a BP implementation which is as fastas specialized BP algorithms with arbitrary feedforward topologies.

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 9 / 37

Page 10: Efficient BP Algorithms for General Feedforward Neural Networks

Introduction and motivation

In this work, we present an efficient implementation of the BPalgorithm to train general feedforward ANNs with the followingfeatures:

Incremental mode → faster training than batch mode.Momentum: Adding a momentum term allows a network to respondnot only to the local gradient but also to recent trends in the errorsurface.Weight decay: effect similar to a prunning algorithm.Softmax, sigmoid/logistic, tanh, linear. . . activation functionsTied and constant weights

This BP implementation is written in C++ and is part of a toolkitnamed April (A Pattern Recognizer in Lua) which also providesmany data description facilities and other functionalities (HMMs,DTW, clustering, etc.)

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 10 / 37

Page 11: Efficient BP Algorithms for General Feedforward Neural Networks

Index

1 Introduction and motivation

2 Preprocessing the ANN: The Consecutive Retrieval Problem

3 Tests of efficiencyEfficiency with general feedforward topologies

4 Additional Features of the BP implementationData description and manipulation facilitiesExample of use of the application

5 Conclusions and future work

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 11 / 37

Page 12: Efficient BP Algorithms for General Feedforward Neural Networks

Preprocessing the ANN

The bottleneck in the simulation of a big ANN is the dot product ofthe inputs −→x and the weights −→w of each neuron for feedforwardand traversal of connections for backpropagation.Preprocessing of the network topology.

Given a neuron, the activation values of its predecessors is aconsecutive subvector which can be efficiently traversed.Therefore, the product −→x · −→w of each neuron, as well as thebackpropagation of the error, are improved (cheaper iteration andbetter locality).This property cannot be guaranteed for general topologies → someneuron values may need to be duplicated.The operations for feedforward and backpropagation are “compiled”into a sequence of actions to be performed.

→ Consecutive retrieval problem.

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 12 / 37

Page 13: Efficient BP Algorithms for General Feedforward Neural Networks

Preprocessing the ANN

Consecutive retrieval problemLet X be a set, having P subsets C1, C2, . . . , CP . The goal is to obtaina sequence A = a1, . . . , ak of elements of X so that every Ci appearsin A as a contiguous subsequence, while keeping the length of A assmall as possible.

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 13 / 37

Page 14: Efficient BP Algorithms for General Feedforward Neural Networks

Preprocessing the ANN

Consecutive retrieval problemLet X be a set, having P subsets C1, C2, . . . , CP . The goal is to obtaina sequence A = a1, . . . , ak of elements of X so that every Ci appearsin A as a contiguous subsequence, while keeping the length of A assmall as possible.

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 14 / 37

Page 15: Efficient BP Algorithms for General Feedforward Neural Networks

Preprocessing the ANN

Consecutive retrieval problemLet X be a set, having P subsets C1, C2, . . . , CP . The goal is to obtaina sequence A = a1, . . . , ak of elements of X so that every Ci appearsin A as a contiguous subsequence, while keeping the length of A assmall as possible.

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 15 / 37

Page 16: Efficient BP Algorithms for General Feedforward Neural Networks

Preprocessing the ANN

Consecutive retrieval problemLet X be a set, having P subsets C1, C2, . . . , CP . The goal is to obtaina sequence A = a1, . . . , ak of elements of X so that every Ci appearsin A as a contiguous subsequence, while keeping the length of A assmall as possible.

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 16 / 37

Page 17: Efficient BP Algorithms for General Feedforward Neural Networks

Preprocessing the ANN

Consecutive retrieval problemLet X be a set, having P subsets C1, C2, . . . , CP . The goal is to obtaina sequence A = a1, . . . , ak of elements of X so that every Ci appearsin A as a contiguous subsequence, while keeping the length of A assmall as possible.

It is proven to be a NP-Complete problem.A greedy algorithm which achieves packing rates very superior toa “naive” consecutive arrangement has been developed.

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 17 / 37

Page 18: Efficient BP Algorithms for General Feedforward Neural Networks

Index

1 Introduction and motivation

2 Preprocessing the ANN: The Consecutive Retrieval Problem

3 Tests of efficiencyEfficiency with general feedforward topologies

4 Additional Features of the BP implementationData description and manipulation facilitiesExample of use of the application

5 Conclusions and future work

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 18 / 37

Page 19: Efficient BP Algorithms for General Feedforward Neural Networks

Tests of efficiency

Comparison with Stuttgart Neural Network Simulator (SNNS)Handwritten digits classification (16× 16 B/W pixels).The corpus consists of 1 000 digits and is stored in a uniqueimage in PNG format.256 input neurons (the size of a image), 10 output neurons.

......

......

......

......

......

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 19 / 37

Page 20: Efficient BP Algorithms for General Feedforward Neural Networks

Experiment set-up

Layered feedforward networks with one and two hidden layerscontaining between 10 and 200 neurons.

sigmoid activation functions10 training epochs

AMD Athlon (1 333 MHz) with 384 MB of RAM under Linux.

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 20 / 37

Page 21: Efficient BP Algorithms for General Feedforward Neural Networks

Analysis of the efficiency results

0

1

2

3

4

5

6

7

8

9

0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000

Tim

e/ep

och

(sec

/epo

ch)

W

SNNSApril

Temporal cost by epoch (sec./epoch) of April and SNNS

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 21 / 37

Page 22: Efficient BP Algorithms for General Feedforward Neural Networks

Analysis of the efficiency results

0

2

4

6

8

10

12

14

16

0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000

Tim

e co

st S

NN

S (

sec/

epoc

h) /

Tim

e co

st A

pril

(sec

/epo

ch)

W

Ratio between temporal costs of April and SNNS

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 22 / 37

Page 23: Efficient BP Algorithms for General Feedforward Neural Networks

Analysis of the efficiency results

The analysis tool valgrind has been used to analyse thenumber of cache misses in training networks with different numberof weights W .

April SNNSW # Accesses L1 L1&L2 # Accesses L1 L1&L2

misses misses misses misses2 790 3.10×108 0.14% 0.10% 4.45×108 4.46% 0.05%

25 120 1.70×109 1.71% 0.03% 2.43×109 7.39% 4.81%

62 710 4.25×109 1.85% 1.11% 6.00×109 7.18% 5.63%

“L1 misses” is the percentage of data accesses which result in L1cache misses,“L1&L2 misses” shows the percentage of data accesses whichmiss at both L1(fastest) and L2(slightly slower) cache, resulting ina slow access to main memory.

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 23 / 37

Page 24: Efficient BP Algorithms for General Feedforward Neural Networks

Efficiency with general feedforward topologies

Figure: Feedforward network with shortcuts between the layers: a neuron isconnected to neurons of all the previous layers.

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 24 / 37

Page 25: Efficient BP Algorithms for General Feedforward Neural Networks

Efficiency with general feedforward topologies

Figure: Feedforward network with segmented input: the whole 16× 16 imageis divided in four 8× 8 fragments forming four groups of hidden neurons.

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 25 / 37

Page 26: Efficient BP Algorithms for General Feedforward Neural Networks

Efficiency with general feedforward topologies

Other topologies tested in this work:Each layer connected to all previous layersSegmented 16× 16 input in four 8× 8 pixel fragments.

Training these topologies with April and SNNS has givenanaloguous time results as before.April is able to train efficiently general feedforward topologies aswell as specific feedforward topologies.

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 26 / 37

Page 27: Efficient BP Algorithms for General Feedforward Neural Networks

Index

1 Introduction and motivation

2 Preprocessing the ANN: The Consecutive Retrieval Problem

3 Tests of efficiencyEfficiency with general feedforward topologies

4 Additional Features of the BP implementationData description and manipulation facilitiesExample of use of the application

5 Conclusions and future work

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 27 / 37

Page 28: Efficient BP Algorithms for General Feedforward Neural Networks

Additional Features of the BP implementation

Softmax activation functionnumerical instability problemspossibility of grouping output neurons so that the softmaxcalculations are performed independently in each group

Tied and constant weights.Weight decay.Fixed point version of feedforward for embedded systems.Reproducibility of experiments.

Ability to stop and resume experiments.Useful for process migration and grid computing

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 28 / 37

Page 29: Efficient BP Algorithms for General Feedforward Neural Networks

Data description and manipulation facilities

Lua is an extensible procedural embedded programminglanguage especially designed for extending and customizingapplications with powerful data description facilities.Besides the Lua description facilities, April adds the matrix anddataset classes which allow the definition and manipulation ofpossibly huge sets of samples in way easier and more flexiblethan simply enumerating the pairs of inputs and outputs.

Iterators over n-dimensional matrices.Combination of datasets: slicing, union, indexing, etc.

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 29 / 37

Page 30: Efficient BP Algorithms for General Feedforward Neural Networks

Example

First, the image is loaded in a matrix and later a datasetcontaining the 10× 100 samples of size 16× 16 pixel values isgenerated from it.

samples = matrix.loadImage("digits.png")input_data = dataset.matrix(samples, {

patternSize= {16,16}, -- sample sizeoffset = {0,0}, -- initial window positionnumSteps = {100,10}, -- #steps in each directionstepSize = {16,16}, -- step sizeorderStep = {1,0} -- step direction

})

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 30 / 37

Page 31: Efficient BP Algorithms for General Feedforward Neural Networks

Example

Later, the matrix [1 0 0 0 0 0 0 0 0 0] is iterated cicularly in orderto obtain the dataset for the associated desired output.

m2 = matrix(10,{1,0,0,0,0,0,0,0,0,0})output_data = dataset.matrix(m2, {

patternSize= {10},offset = {0},numSteps = {input_data:numPatterns()},circular = {true},-- circular datasetstepSize = {-1}

})

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 31 / 37

Page 32: Efficient BP Algorithms for General Feedforward Neural Networks

Example

The corresponding training, validation and test input and outputdatasets are obtained by slicing the former datasets.

train_input =dataset.slice(input_data , 1, 600)train_output =dataset.slice(output_data, 1, 600)validation_input =dataset.slice(input_data ,601, 800)validation_output=dataset.slice(output_data,601, 800)test_input =dataset.slice(input_data ,801,1000)test_output =dataset.slice(output_data,801,1000)

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 32 / 37

Page 33: Efficient BP Algorithms for General Feedforward Neural Networks

Example

Although more complex ANN can be described, for layered ANNsit is possible to give a simple description like “256 inputs 30logistic 10 linear”.

rnd = random(1234) -- pseudo-random generatorthe_net=mlp.generate("256 inputs "..

"30 logistic ".."10 linear",rnd, -0.7, 0.7)

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 33 / 37

Page 34: Efficient BP Algorithms for General Feedforward Neural Networks

Example

for i=1,100 domse_train = the_net:train {

learning_rate = 0.2,momentum = 0.2,input_dataset = train_input,output_dataset= train_output,shuffle = rnd

}mse_val = the_net:validate {

input_dataset = validation_input,output_dataset= validation_output

}printf ("Cycle %3d MSE %f %f\n",i,mse_train,mse_val)

endmse_test = the_net:validate {input_dataset = test_input,output_dataset = test_output

}printf ("MSE of the test set: %f\n",mse_test)

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 34 / 37

Page 35: Efficient BP Algorithms for General Feedforward Neural Networks

Index

1 Introduction and motivation

2 Preprocessing the ANN: The Consecutive Retrieval Problem

3 Tests of efficiencyEfficiency with general feedforward topologies

4 Additional Features of the BP implementationData description and manipulation facilitiesExample of use of the application

5 Conclusions and future work

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 35 / 37

Page 36: Efficient BP Algorithms for General Feedforward Neural Networks

Conclusions

April toolkit is up to 16 times faster than SNNS. In addition, itscapacity to train general feedforward networks does not decreaseits efficiency due to the use of data structures with a greatmemory locality instead of linked lists (as SNNS).An approximation algorithm for the NP-Complete consecutiveretrieval problem has been designed in order to mantain thisefficiency.

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 36 / 37

Page 37: Efficient BP Algorithms for General Feedforward Neural Networks

Future work

Nearly finished:SSE implementation → Streaming SIMD Extensions for IntelPentium architecture

Considered extensions:GPU implementation → General-Purpose GPU ComputingRecurrent networks. This type of networks has demonstrated tobe very useful in diverse fields, like in Natural LanguageProcessing.Graphical interface. Adding a graphical interface could orient theapplication towards a didactic use.

Espana, Castro, Zamora, Gorbe (DSIC) IWINAC 2007 May 2007 37 / 37