Parallelization and optimization of the neuromorphic ... · of the neuromorphic simulation code. Application on the ... Task of handwritten digits recognition ... Parallelization

Parallelization and optimizationof the neuromorphic simulation code.Application on the MNIST problem

Raphaël Couturier, Michel Salomon

FEMTO-ST - DISC Department - AND Team

November 2 & 3, 2015 / BesançonDynamical Systems and Brain-inspired Information Processing Workshop

IntroductionBackground• Emergence of hardware RC implementation

Analogue electronic ; optoelectronic ; fully opticalLarger et al. - Photonic information processing beyond Turing : an optoelectronic

implementation of reservoir computing, Opt. Express 20, 3241-3249 (2012)

• Matlab simulation code• Study processing conditions• Tuning parameters• Pre and post-processing by computer

Motivation• Study the concept of Reservoir Computing• Design a faster simulation code• Apply it to new problems

FEMTO-ST Institute 2 / 16

Outline

1. Neuromorphic processing

2. Parallelization and optimization

3. Performances on the MNIST problem

4. Conclusion and perspectives


Delay Dynamics as a ReservoirSpatio-temporal viewpoint of a DDE (Larger et al. - Opt. Express 20:3 2012)

• δτ → temporal spacing; τD → time delay• f (x)→ nonlinear transformation; h(t)→ impulse response

Computer simulation with an Ikeda type NLDDE

τdxdt

(t) = −x(t) + β sin2[α x(t − τD) + ρuin(t − τD) + Φ0]

α→ feedback scaling;β → gain; ρ→ amplification; Φ0 → offsetFEMTO-ST Institute 4 / 16

Spoken Digits Recognition

Input (pre-processing)• Lyon ear model transformation of each speech sample→ 60 samples × 86 frequency channels

• Channels connection to reservoir (400 neurons)→ sparse and random

Reservoir transient responseTemporal series recorded for Read-Out processing


Spoken Digits Recognition

Output (post-processing)• Training of the Read-Out→ optimize W R matrix for the digits of the training set

• Regression problem for A×W R ≈ B

W Ropt =

(AT A− λI

)−1AT B

• A = concatenates reservoir transient response for each digit• B = concatenates target matrices

Testing• Dataset of 500 speech samples→ 5 female speakers• 20-fold cross-validation→ 20 × 25 test samples• Performance evaluation→Word Error Rate


Matlab Simulation Code

Main steps1. Pre-processing

• Input data formatting (1D vector ; sampling period→ δτ )• W I initialization (randomly ; normalization)

2. Concatenation of 1D vectors→ batch processing3. Nonlinear transient computation

• Numerical integration using a Runge-Kutta C routine• Computation of matrices A and B

4. Read-out training→ Moore-Penrose matrix inversion5. Testing of the solution (cross-validation)

Computation time12 min for 306 “neurons” on a quad-core i7 1,8 GHz (2013)


Parallelization Scheme

Guidelines• Reservoir response is independent, whatever the data→ computation of matrices A and B can be parallelized

• Different regression tests are also independentIn practice• Simulation code rewritten in C++• Eigen C++ library for linear algebra operations• InterProcess Communication→ Message Passing Interface

Performance on speech recognition problem• Similar classification accuracy→ same WER• Reduced computation time

We can study problems with huge Matlab computation time


Finding Optimal Parameters

What parameters can be optimized ?• Currently

• Pitch of the Read-Out• Amplitude parameters→ δ;β;φ0• Regression parameter→ λ

• Next• Number of nodes significantly improving the solution

(threshold)• Input data filter (convolution filter for images)

Potentially any parameter can be optimizedOptimization heuristics• Currently→ simulated annealing

(probabilistic global search controlled by a cooling schedule)

• Next→ other metaheuristics like evolutionary algorithms


Application on the MNIST problem

Task of handwritten digits recognitionNational Institute of Standards and Technology database• Training dataset→ american census bureau employees• Test dataset→ american high school students

Mixed-NIST database is widely used in machine learningMixing of both datasets and improved images

• Datasets• Training→ 60K samples• Test→ 10K samples

• Grayscale Images• Normalized to fit into a 20× 20 pixel

bounding box• Centered and anti-aliased


Performances of the parallel codeClassification error for 10K images• 1 reservoir of 2000 neurons→ Digit Error Rate: 7.14%• 1000 reservoirs of 2 neurons→ DER: 3.85%

Speedup

0

5

10

15

20

25

30

35

0 5 10 15 20 25 30 35

speed u

p

nb. of cores

1000 reservoirs 2 neuronsideal

1 reservoir 2000 neurons


Exploring ways to improve the results

Using the parallel NTC code• Many small reservoirs and one read out• Features extraction using a simple 3× 3 convolution filter• Best error without convolution : around 3%

Using the Oger toolbox• Increasing the dataset with transformed images→ 15× 15 pixel bounding box and rotated images

• Subsampling of the reservoir response• Committee of reservoirs• Lower errors with the complete reservoir response

• 1 reservoir of 1200 neurons→ 1.42%• Committee of 31 reservoirs of 1000 neurons→ 1.25%


Comparison with other approaches

Convolutional Neural Networks• Feedforward multilayer network for visual information• Different type of layers

• Convolutional layer→ features extraction• Pooling layer→ reduce variance

• Many parameters to trainMultilayer Reservoir Computing (Jalalvand et al. - CICSyN 2015)• Stacking of reservoirs→ the next “corrects” the previous one

• Same outputs• Trained one after the other

• 3-layer system• 16K neurons per reservoir• 528K trainable parameters→ 16K nodes × 11 readouts × 3 layers


Comparison with other approaches

Classification errorsApproach Error rate Reference

LeNet-1 (CNN) 1.7 LeCun et al. - 1998A reservoir of 1200 neurons 1.42 Schaetti et al. - 2015SVM with gaussian kernel 1.4Committee of 31 reservoirs 1.25 Schaetti et al. - 2015

3-layer reservoir 0.92 Jalalvand et al. - 2015CNN of 551 neurons 0.35 Ciresan al. - 2011

Committee of 7 CNNs 0.23 Ciresan et al. - 2012(221 neurons in each CNN)

Remarks• CNNs give the best results, but have a long training time• A reservoir of 1000 neurons is trained in 15 minutes• Automatic features extraction improves the results


Conclusion and perspectives

Results• A parallel code allowing fast simulations• A first evaluation on the MNIST problem

Future works• Further code improvement→ parallel regression• Use of several reservoirs

• Committees• Correct errors of a reservoir by another one

• Other applications• Simulation of lung motion• Airflow prediction• etc.


Thank you for your attention

Questions ?


Documents

Parallelization and optimization of the neuromorphic ... · of the neuromorphic simulation code. Application on the ... Task of handwritten digits recognition ... Parallelization