PyBrain Intro From Project Report

1 The modelling toolbox

To deal with the various project data in a principled fashion, it was decided by the projectconsortium have WP4 develop a data processing toolbox. WP4 will be the primary user,but the toolbox was to be designed such that other project partners can use it after a shortintroduction. A comprehensive graphical interface was not envisioned, since this level ofease-of-use goes beyond the needs of the project and detracts too much manpower fromalgorithm development and data analysis. All software libraries and scripts mentioned hereare available to project partners through Martin Felder upon request. Note however thatdevising and tuning ML algorithms is a very problem-specific task, so many of the analysisscripts are likely to change as the “real” data come in. The first part of this Deliverabletherefore focuses on the toolbox itself, and its usage, while the second part presents initialresults obtained by using the toolbox through scripts.

1.1 Overview

The toolbox takes the form of a library written in the Python programming language. It was notdeveloped solely for NBT, but contains algorithm contributions from other projects, especiallyin the field of reinforcement learning, which is not a focus of NBT. This has the added benefitof allowing us to quickly try out other approaches than the ones originally planned for in theproject proposal, in case unforeseen insights about the data are gained.

Python was chosen for several reasons as a programming language, most notably becauseit is a very easy and clean language to learn, and can be used to add new features andexperiments quickly. Owing to it’s origins as a scripting language, Python commands cantried out on the command line and are easy to debug. This functionality is of course boughtthrough a certain speed disadvantage compared to compiled languages. Still, the use of op-timised math plug-ins from a growing scientific community, and several options for optimisingcore functions make our library fast enough for all data processing applications in NBT. Inparticular, the SciPy package in conjunction with the matplotlib visualisation library providesa completely free and portable alternative to commercial software like Matlab and IDL.

Our machine learning toolbox is called PyBrain (PYthon-Based Reinforcement learning, Arti-ficial Intelligence and Neural networks library). Its general concept is to encapsulate differentdata processing algorithms in what we call Modules. A minimal Module contains a forwardimplementation and a collection of free parameters that can be adjusted, usually throughsome machine learning algorithm. Modules have an input and an output buffer, plus corre-sponding error buffers which are used in error backpropagation algorithms.

Modules are assembled into Networks by connecting them via Connectors, which again con-tain a number of adjustable parameters, the connection weights. Note that a Network itselfis again a Module, such that it is easy to build hierarchical networks as well. Shortcuts exist

4

http://www.scipy.org

http://matplotlib.sourceforge.net

for building the most common network architectures, but in principle this system allow almostarbitrary connectionist systems to be assembled.

The free parameters of the Network are adjusted by means of a Trainer, which in the super-vised case uses a Dataset to learn the optimum parameters from examples.∗ Validation ofthe parameters usually requires a separate test dataset, unless the ML method chosen isvery fast to train, such that cross-validation can be used, as detailed in Section 1.4.

Raw Data

Preprocessing

PyBrain Datasettrain

TrainerBP, SVM, Evolution

ModelSVN, FFN, LSTM

Classification Validation

PyBrain Datasettest

Visualisation

Results

Figure 1: Data analysis with the PyBrain toolkit.

Figure 1 shows a general overview of the data processing chain. The remaining sectionsprovide details and examples for ingesting data from project partners (Section 1.2), and foranalyses using feed-forward neural networks and recurrent neural networks (Section 1.3),

∗For reinforcement learning experiments, a simulation environment with an associated optimisation task isused instead of a Dataset. While this might be useful for NBT a some point, it is currently not foreseen andwill thus be omitted from discussion.

5

support vector machines (Section 1.4) and some novel, experimental algorithms that havebeen implemented so far (Section 1.5).

1.2 Preprocessing

For preprocessing the data from different groups a collection of Matlab scripts is employed.The GU group uses Matlab extensively to store and analyse their data. SSSA also usesMatlab and LabView to record and treat their data. Hence writing a data preprocessor andconverter in Matlab can be considered a canonical solution.

1.2.1 Microneurography preprocessing.

While the currently available microneurography data are not final, the data format will mostlikely not change much, therefore our preprocessing chain can be used for the upcomingWP2 project data as well. Initial calibration, error checking, and spike extraction is performedby GU, as described in Deliverable 4.1. We can then select options to perform any or all ofthe following steps:

1. Filter by number of spikes and experimental parameters (velocity, force, etc.).

2. Convert spikes into a series of interspike intervals (ISI).

3. Split the series into time windows of different size, number and location.

4. Calculate a selection of common statistics plus a configurable histogram on the timewindows.

5. Assemble feature vectors by combining data from one or several windows.

6. Split the data into training and test dataset.

7. Equalise class distribution.

8. Normalise the features.

9. Store as ASCII or netCDF files.

Alternatively, the raw sequences of ISIs can be stored into a file, or converted to a temporalsampling representation more akin to the original measurements, where each time windowof, say, 2 ms holds a 1 if a spike occurs in it, and a 0 otherwise. These data can then beprocessed with sequence learning methods.

6

1.2.2 Mechanical and simulated data.

Data from the artificial finger V1 and simulated data are easier to handle than spike trains,since they are merely continuous streams of floating point numbers. Measurements fromWP5 and simulations by WP3 are delivered as ASCII files with metaparameters encoded inthe file name. Preprocessing these data involves:

1. Filter by experimental parameters (velocity, surface, etc.).

2. Split the series into time windows of different size.

3. Assemble feature vectors by combining data all data channels of one window.

4. Split the data into training and test dataset.

5. Normalise the features.

6. Store as ASCII or netCDF files.

Figure 2 shows a simple example of preprocessing steps 2 to 4. Again, storing the entiretime series as a sequence is also possible.

Figure 2: Preprocessing preliminary data froma MicroTAF sensor experiment conducted atSSSA by windowing. Here, one input patternis defined as all data points that fall within awindow of size 10 time steps (20 ms). Pat-terns are alternately placed into the trainingand test dataset.

0 0.2 0.4 0.6 0.8−6

−4

−2

0

2

4

6

time / s

sens

or r

eado

ut

v_7_80

V1

V2

V3

V4

0.08 0.1 0.12 0.14 0.16 0.18 0.2−5

−4

−3

−2

−1

0

1

2

3

4

time / s

sens

or r

eado

ut

v_20_80

0 0.2 0.4 0.6 0.8−6

−4

−2

0

2

4

6

time / s

sens

or r

eado

ut

v_40_80

1.2.3 PyBrain datasets.

In general, reading any kind of data into SciPy arrays is straightforward in Python. ThePyBrain DataSet class and its subclasses SupervisedDataSet, SequentialDataSet andClassificationDataSet can then be initialised via

from pybrain . datasets import SupervisedDataSetmydata = SupervisedDataSet ( inputs , targets )

7

where inputs is an array whose first dimension is the number of samples, and whose sec-ond dimension is the number of features per sample. The same goes for parameter target.In case of classification data, the targets will simply be the class numbers, from zero to N−1,if there are N classes. One complication arises when training neural networks on classifi-cation data: It has been found that encoding the classes in a “one-of-many” representationis advantageous. This means there are as many output neurons as there are classes, withthe one being active at a given time encoding the current class. Conversion to and from thisrepresentation is facilitated by the ClassificationDataSet class:

# t a r g e t s are c l a s s numbersmydata = ClassificationDataSet ( inputs , targets )mydata . _convertToOneOfMany ( )# t a r g e t s are now ‘one−of−many ’ . To conver t back :mydata . _convertToClassNb ( )

1.3 Neural networks

Neural networks are one of the fundamental ML techniques implemented in PyBrain. A bio-logical neural network consist of a collection of neurons linked together in a certain way, often(but not always) in layers. In a simplified view, each cell collects incoming action potentialsthrough its dendrites, processes their accumulated effect in a fairly simple manner, and for-wards another electrical signal through its axon to a number of follow-on neurons, to which itis connected via synapses of varying conduction efficiency. Let w ji be the weight associatedwith the synapse connecting neurons i and j of an arbitrary simulated network. Then, theactivation a of neuron j is given by

a j =

L∑i=1

w jioi + w j0. (1)

Here, oi is the output of neuron i, there are L neurons connected to neuron j, and w j0 isthe bias term of neuron j. Bias is often included in the sum by defining an “on-neuron”with o0 ≡ 1, which is assumed connected to all neurons in the network. The response of asimulated neuron to its activation a j is given by a transfer function

o j = f j(a j). (2)

In the simplest case, the transfer function can be selected to be the identity function, or an-other linear function. Input and output layers often use linear transfer functions. However, inorder for the network to exhibit nonlinear properties, f (·) has to be nonlinear at least for onehidden layer. Smooth, S-shaped (=sigmoid) functions behave like threshold or linear func-tions, depending on scaling, and can be differentiated. The PyBrain module SigmoidLayerimplements the logistic function

f (x) =1

1 + exp(−x)(3)

which is equivalent to the hyperbolic tangent.

8

1.3.1 Feed-forward neural networks

In PyBrain, the utility script buildNetwork can be used to construct neural networks of thetype just described (also called multi-layer perceptrons). By default, each layer of neurons iscompletely connected to the next one. The result of a sample call is graphically depicted inFigure 3.

LinearLayer

FullConnection

LinearLayerSigmoidLayer

FullConnection

Figure 3: A neural network resulting from the command buildNetwork(3,5,2). Circles with linesand S-shapes denote linear and sigmoid neurons, respectively. Names of the corresponding Py-Brain modules making up this network are given.

Training a neural network means finding the best choice of weight parameters W = (w)i j

based on some training data set. This cannot be done analytically, therefore some iterativegradient descent algorithm is usually employed. Assume there is a set of paired input andtarget vectors, {xn, tn}, and we want to construct an FNN with output y(xn; W), to model theconditional probability function p(tn|xn) according to the maximum likelihood principle, ie.such that the error function

E =12

∑n

|y(xn; W) − tn|2 . (4)

is minimised. This can be achieved by demanding

∂E∂W= 0 i.e.

∂E∂wi j

= 0, ∀wi j = (W)i j (5)

Finding this minimum is the task of a gradient descent algorithm, the most famous one in thiscontext is the so-called backpropagation of errors. For sake of brevity, we will not lay out its

9

full mathematical details here, which are described in the original literature [Rumelhart et al.,1986] as well as in all standard texts on neural network methods [e.g. Bishop, 2006; RichardO. Duda, 2000]. In PyBrain, the algorithm is schematically implemented as follows:

1. Calculate activations of all neurons in the network given one input pattern (Equation 1).This is the forward pass.

2. Calculate the error function (Equation 4) by comparing with the target.

3. Based on this, calculate the error gradient with respect to the weights for each neuron,starting at the output layer. This is the backward pass.

4. Adjust the weights by a step along the gradient, the size of which is determined by thealgorithm parameters learnrate and momentum.

5. Go back to step 1, using the next pattern in the training data set. Complete presentationof all patterns in the set is called an epoch. If epoch is finished, randomise the order ofpatterns and start over.

Most of the algorithm’s complexity is hidden from the PyBrain user. The parameters learnrate(essentially a scaling factor for the gradient step) and momentum (a heuristic to avoid localminima) may have to be manually adjusted, by specifying them in the trainer creation call. Bydefault, learnrate=0.01 and momentum=0. The optimal learnrate varies around this defaultby typically an order of magnitude, while the momentum term is usually set to either 0, 0.1,or 0.9. Listing 1 shows a complete example of how to train an FNN in PyBrain. Note thattraining the network in batches of epochs with test data performance evaluation in betweenusually serves to prevent overfitting, by stopping the training run once the error on the testset starts increasing again. This procedure is called early stopping regularisation. Figure 4shows the error development during a typical training run.

FNNs can be considered one of the best understood machine learning tools, and have beensuccessfully applied to a great number of problems in different fields. The backpropaga-tion algorithm has undergone a lot of changes and enhancements, among which the abovementioned momentum term was one of the most successful. Another very successful de-velopment is the Resilient Propagation algorithm [RPROP; Riedmiller and Braun, 1993]. Ithas evolved into several subtypes described in [Igel and Hüsken, 2003]. We have imple-mented a version of RPROP called RPROP- in PyBrain. To use it, the BackpropTainerin Listing 1 needs to be replaced with RPropMinusTrainer. RPROP adaptively tracks therequired update step width for every weight separately, therefore the parameters momentumand learnrate are not necessary. This simplicity combined with its very stable and fast trainingperformance on most data sets make RPROP the method of choice for almost all problemsencountered.

1.3.2 Recurrent neural networks

The difference between recurrent networks (RNNs) and FNNs is that the former have circularconnections feeding the output of certain neurons back to their own or other neurons’ input.

10

# ! / usr / bin / env python# Example s c r i p t fo r feed−forward network usage in PyBrain .

# load the necessary componentsfrom pybrain . datasets import ClassificationDataSetfrom pybrain . utilities import percentErrorfrom pybrain . tools . shortcuts import buildNetworkfrom pybrain . supervised . trainers import BackpropTrainer

# load the t r a i n i n g data s e ttrndata = ClassificationDataSet . loadFromFile (’traindata.svm’ )

# neura l networks work b e t t e r i f c l a s s e s are encoded using# one output neuron per c l a s strndata . _convertToOneOfMany ( )

# same fo r the independent t e s t da ta s e ttstdata = ClassificationDataSet . loadFromFile (’testdata.svm’ )tstdata . _convertToOneOfMany ( )

# bu i ld a feed−forward network with 20 hidden un i t s , p lus# a corresponding backpropagat ion t r a i n e rfnn = buildNetwork ( trndata . indim , 20 , trndata . outdim )trainer = BackpropTrainer ( fnn , dataset=trndata , momentum=0.1 )

# repea t 5 t imesfor i in range ( 5 ) :

# t r a i n the network fo r 10 epochstrainer . trainEpochs ( 10 )

# eva lua t e the r e s u l t on the t r a i n i n g and t e s t da tatrnresult = percentError ( trainer . testOnClassData ( ) ,

trndata [ ’class’ ] )tstresult = percentError ( trainer . testOnClassData (

dataset=tstdata ) , tstdata [ ’class’ ] )

# p r i n t the r e s u l tprint "epoch: %4d" % trainer . totalepochs , \

" train error: %5.2f%%" % trnresult , \" test error: %5.2f%%" % tstresult

Listing 1: Example Python script to train an FNN in PyBrain.

11

Figure 4: FNN training on two datasets of the type shown in Figure 2, with different window size.The task was to distinguish between different sample classes (sandpaper grades). Since the totalnumber of data points available did not change, a window size of 2 yields 5 times as many patternsthan window size 10 to train on, but each pattern contains less information. Therefore the errordecrease is slower, but overfitting does not occur as rapidly.

While there are many different ways to do this [Jordan, 1986; Elman, 1990; Lang et al., 1990],we will restrict the discussion here on the type displayed in Figure 5, where only the hiddenlayer feeds back into itself.

This means patterns are now presented by stepping through time sequences, not in randomorder. Let xt ∈ |RN be the input pattern vector at time step t†. For the hidden layer activationsah of an RNN with M hidden neurons, Equation 1 has to be modified to yield

ah =

N∑i=1

whixti +

M∑k=1

whkot−1k . (6)

With this seemingly small change the properties of the system change significantly. An RNNcan provably approximate any measurable sequence-to-sequence mapping to arbitrary ac-curacy, given enough hidden neurons [Hammer , 2000]. Training algorithms have to takeinto account that errors occurring at the current pattern may have their source in past pat-terns. The main developments in this area are real time recurrent learning [RTRL; Robinsonand Fallside, 1987] and backpropagation through time [BPTT; Werbos, 1988; Williams and

†We disregard bias here, which can easily be implemented by appending a constant xt0 = 1∀t to the input

12

input

output

t=1 t=2 t=3

...

Figure 5: The type of RNNs discussed here has a hidden layer feeding back into itself. Anotherperspective is to unfold it over time, such that the dependence of current outputs on past inputsbecomes clearer.

Zipser , 1994]. Again, we refrain from presenting the mathematical details here, and refer tothe original literature and further discussions in [Graves, 2008].

While generic RNNs can be assembled in PyBrain, there is rarely a reason to do so becauseof their practical limitations: It was found that error information from previous time stepstends to either exponentially decrease or blow up. This so-called vanishing gradient problemHochreiter [1991]; Bengio et al. [1994] was found to limit the number of time steps over whichthe RNN can “remember” relevant information to about ten [Hochreiter et al., 2001] – whichmakes it comparable to a FNN using a window of 10 time steps. While there were severalmore or less successful attempts to overcome this problem, a breakthrough was achievedeventually by Hochreiter and Schmidhuber [1997] with the introduction of the Long Short-Term Memory (LSTM) network. Its hidden layer consists of specialised blocks of neuronscalled memory cells (Figure 6) which allow the gradient information to be preserved overlong time delays. They are hence particularly suited for problems that involve signal corre-lations over many time steps, like music generation [Eck and Schmidhuber , 2002], speechrecognition Graves and Schmidhuber [2005] and handwriting recognition [Liwicki et al., 2007;Graves et al., 2008]. It is thus hoped that they will also yield good results at detecting tactilefeatures at high sampling rates.

In PyBrain, LSTM networks can be constructed and trained almost as shown in Listing 1,namely by calling the buildNetwork function with option hiddenclass=LSTMLayer, andusing a SequentialDataSet for training.‡

‡A SupervisedDataSet or ClassificationDataSet can also be used, but in this case each input patternis treated as a separate sequence with a length of one time step.

13

Figure 6: An LSTM cell is build around ofa central neuron, called the “constant er-ror carousel”, which re-cycles status informa-tion from one time step to the next. Smallblue circles indicate multiplicative connec-tions. Whether the status in influenced by theinput is controlled by the input gate neuron,while the output gate controls passing on sta-tus information. More recent additions are theforget gate to reset the status, and a peepholeconnection to directly access it.

net input

input gate

output gate

CEC

x

x

xforgetgate

peephole output

1.4 Support vector machines (SVM)

SVMs [Vapnik , 1995] belong to the class of maximum margin classifiers. They separate twoclasses of data by a hyperplane, as sketched in Figure 7. A hyperplane can be defined by:

〈x,w〉 + b = 0

where w is normal to the hyperplane, b is the offset, x are the points on the hyperplane,〈a, b〉 =

∑i aibi. The classifier

class(x) = sign (〈w,x〉 + b)

then separates the data into the classes +1 and −1. The goal of an maximum margin clas-sifier is to find the hyperplane with the widest separation between classes. For SVMs, this isdone by quadratic programming techniques on the dual Lagrange formulation, as describede.g. in the excellent tutorials of Burges [1998] and Smola and Schölkopf [2004]. Eventually,the hyperplane is described through the support vectors, which basically are the data pointsat the margin. Classification of unknown data is performed by comparing it against the sup-port vectors only, not against the full training data. This makes SVMs very sparse and thusscalable.

Linear boundaries do of course not always yield good classifier. In fact, it can be shown thatmapping the input data into a higher dimension, the feature space, by means of some trans-formation Φ(·), the feature map, enables one to model complex boundaries in the originaldata space (Figure 8). The kernel trick is a way of avoiding explicit mapping of x into thehigh-dimensional space, since the SVM algorithm only needs its scalar product, the kernelK :

K(x,w) = 〈Φ(x),Φ(w)〉 (7)

14

w <x,w>+b=+1

<x,w>+b=0

<x,w>+b=−1

|b|/||w||

2/||w||

Figure 7: A maximum margin classifier, like an SVM,strives to maximise the separation 2/‖w‖ between twoclasses (circles and dots). In this linear case, the sep-arating hyperplane (thick red line) is defined via threesupport vectors (two crosses and one circle on the thinred lines) .

The most common kernels used – apart from the trivial linear kernel – are Gaussian kernels,also called radial basis function (RBF) kernels:

K(a, b) = exp(−‖a − b‖2

2γ2

)

The RBF-SVM thus has the advantage of requiring only two crucial parameters (plus somenumerical parameters of lesser import):

γ is the width of the Gaussian kernel function.

C is a regularisation parameter constraining the amount of “slack” allowed in the solution. Ahigher C means stronger punishment of misclassification.

Training an SVM is an iterative procedure and usually very fast, compared to training a neuralnetwork. Also, since quadratic programming is deterministic, there is no need to carry outmultiple trials. However, the result depends strongly on the two meta-parameters C and γ,for which it is difficult to give default values – they depend heavily on the data set. Therefore,it is advisable to use some of the speed gain to systematically search for the best meta pa-rameters. Figure 9 shows the graphical representation of a typical classification performancesurface over C and γ. Searching this entire grid at high resolution is very time consuming,therefore we implemented a design-of-experiments search procedure, GridSearchDOE, fol-lowing recommendations of Staelin [2003], who has found it to be very efficient and robust.

Classification performance is evaluated here using stratified N-fold cross-validation: Datafrom each class are randomly split into N parts, then a test data set is formed train out ofthe first part of each class. The rest of the data becomes the training set. Training is carriedout until convergence, and SVM performance evaluated on the test set. This procedure isrepeated for parts 2 to N, and the performance results are averaged. The N-fold increase incomputation time can only be afforded for fast methods like SVMs. We usually use N = 5,since an 80/20 split is also quite common when constructing training/test sets for singletrials.

15

Figure 8: Separation of two classes of points with a RBF-SVM. The separating hyperplane in the(infinite-dimensional) Gaussian feature space maps to a very involved boundary in data space.

One SVM-specific problem is their inability to discern more than two classes. Multi-class datasets thus need to be somehow reduced to binary problems. There is an ongoing discussionof how to best achieve this Hsu and Lin [2002]; Rifkin and Klautau [2004]; El-Yaniv et al.[2006]. Common and simple, but still relatively per formant solutions are

one-vs-one: Split the data in to pairs of classes and train an SVM on each pair. Whenfaced with unknown data, present it to all such SVMs and calculate the distances fromthe boundary, d = 〈w,x〉 + b for each one. Then use a voting mechanism to decidethe class it is in. The rationale here is that only the SVMs that have been trained onthe correct class will make a sizable contribution, while contributions from the otherscancel out. Alternatively, it is possible to derive class membership probabilities fromthe raw distances [Wu et al., 2004].

one-vs-rest: Separate one class from the rest of the data and train an SVM on this problem.Repeat for each class. This is probably the simplest way of generating binary problems,but may still yield good results due to the sparsity of SVMs.

Our toolkit provides two different options for using SVMs for classification: We have imple-mented a native implementation in PyBrain to test different algorithms and multiclass solu-tions. The software is working, but relatively slow and complex, hence it is used primarilyas a power tool for particularly hard problems. As an alternative for regular problems, asencountered so far in NANOBIOTACT, we have also designed a wrapper around the popularand highly optimised LIBSVM library Chang and Lin [2001], the use of which is shown inListing 2. Note that the structure is somewhat different to Listing 1, due to the different nature

16

Figure 9: Performance of an SVM classifier onsample microneurography data, conditionedon its metaparameters kernel width σ andslack C. lg(·) denotes the binary logarithm.

of the algorithm as compared to FNNs, and some compromises that had to be made to caterto the library interface.

1.5 Experimental tools and algorithms

Several more or less experimental algorithms that have shown great promise on syntheticbenchmarks and artificial data sets have been implemented in PyBrain, to be tested andevaluated on the project data.

1.5.1 Evolino

EVOlution of systems with LINear Outputs [Evolino; Schmidhuber et al., 2007; Wierstra et al.,2005] is a new class of methods that evolve the weights leading to the nonlinear, hiddennodes of RNNs. Since it is very difficult to evolve accurate networks, only the hidden lay-ers are evolved, which the output is calculated from the hidden state by means of an op-timal linear mapping. Both pseudo-inverse based linear regression and linear SVMs lendthemselves to this second step. The neuroevolution part is performed through enforced sub-populations (ESP), by co-evolving the weights of different neurons, or LSTM cells, sepa-rately. Listing 3 sketches how to run an Evolino experiment in PyBrain Listing 1. The optionoutputbias=False is necessary because the weights to the output layer are computeddirectly and thus do not need a bias to facilitate learning.

17

# ! / usr / bin / env python# Example s c r i p t fo r SVM c l a s s i f i c a t i o n using PyBrain and LIBSVM

# load the necessary componentsfrom pybrain . datasets import ClassificationDataSetfrom pybrain . utilities import percentErrorfrom svmunit import SVMUnitfrom svmtrainer import SVMTrainer

# load the t r a i n i n g and t e s t da ta s e t strndata = ClassificationDataSet . loadFromFile (’traindata.svm’ )tstdata = ClassificationDataSet . loadFromFile (’testdata.svm’ )

# i n i t i a l i z e the SVM module and a corresponding t r a i n e rsvm = SVMUnit ( )trainer = SVMTrainer ( svm , trndata )

# t r a i n the SVM design−of−experiments g r id searchtrainer . train ( search="GridSearchDOE" )

# pass data s e t s through the SVM to get performancetrnresult = percentError ( svm . forwardPass (dataset=trndata ) ,

trndata [ ’class’ ] )tstresult = percentError ( svm . forwardPass (dataset=tstdata ) ,

tstdata [ ’class’ ] )print "train error: %5.2f%%" % trnresult , \

", test error: %5.2f%%" % tstresult

Listing 2: Example script for SVM classification using PyBrain and the LIBSVM wrapper.

18

# load Evolino modules and sequence eva lua to rfrom pybrain . supervised . trainers . evolino import EvolinoTrainerfrom pybrain . tools . validation import testOnSequenceData. . .

# load data s e t s e t c .. . .

# bu i ld a lstm network with 20 hidden un i t s , and the t r a i n e rnet = buildNetwork ( trndata . indim , 20 , trndata . outdim ,

hiddenclass=LSTMLayer , outputbias=False )trainer = EvolinoTrainer ( net , dataset=trndata ,

evalfunc=testOnSequenceData )

# t r a i n i n g loop. . .trnresult = testOnSequenceData (net , trndata )∗100.. . .

Listing 3: Training an RNN with Evolino. Only the differences to Listing 1 are shown.

1.5.2 Multi-dimensional RNNs

One of the main advantages of RNNs compared to window-based methods is their ability totake context into account. For a one-dimensional time series, context obviously consists ofpast samples. It has long been known that in some cases, like language processing, takingthe future context into account helps considerably [Schuster and Paliwal , 1997; Graves andSchmidhuber , 2005]. In this case, a separate hidden layer processes the (buffered) timeseries in the reverse direction, and the results of forward and reverse scan are combinedto yield the network output. This procedure has recently been generalised to more than onedimension Graves et al. [2007]. Roughly speaking, multi-dimensional RNNs (MDRNNs) scana sheet or volume of data from all directions, and combine the results of the correspondingdirectional hidden layers.

PyBrain already contains an implementation of MDRNNs combined with LSTM, in the form ofa MDLSTMLayer. Automatic scanning over a data set is realised through the SwipingNetworkclass. Training can be carried out using standard backpropagation-type algorithms, as de-scribed in Section 1.3.

19

Documents

PyBrain Intro From Project Report