breze Documentation - Read the Docs · CHAPTER 1 Basics Specifiying losses, norms, transfer functions etc. To maintain flexibility and conciseness, configuring models can be achieved

breze DocumentationRelease 0.1

brml.de

February 19, 2017

Contents

1 Basics 11.1 Specifiying losses, norms, transfer functions etc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Models and Algorithms 52.1 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Extreme Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Sparse Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4 ICA with Reconstruction Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.5 Canonical Correlation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.6 Slow Feature Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.7 K-Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.8 Regularized Information Maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.9 Stochastic Gradient Variational Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.10 Linear Denoiser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.11 Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.12 Multilayer Perceptrons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.13 Hybrid Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.14 Trainers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Helpers, convenience functions and tools 153.1 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2 Data manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.3 Various utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.4 Helpers for plotting data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 Architectures, Components 214.1 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2 Transfer functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.3 Loss functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.4 Stochastic Corruption of Theano Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.5 Miscellaneous functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.6 Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.7 Common functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.8 Univariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.9 Multivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.10 Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.11 Common functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

i

5 Implementation Notes 415.1 Variance propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6 Indices and tables 45

Bibliography 47

Python Module Index 49

ii

CHAPTER 1

Basics

Specifiying losses, norms, transfer functions etc.

To maintain flexibility and conciseness, configuring models can be achieved twofold: either by using a string or byusing a function that follows the specific API.

Using the builtin loss functions

Let us start with an example. To instantiate a linear model, we can make use of the following notation:

from breze.learn.glm import Linearmodel = Linear(5, 1, loss='squared')

In this case, we specify the sum of squares loss as string. The logic behind this aims to be straight for-ward: for losses, a lookup is done in the module breze.arch.components.distance. Thus, the functionbreze.arch.component.distance.squared is used as a loss. This function follows a simple protocol. Inthe case of an supervised model, it is called with the target as its first argument and the output of the model as itssecond argument. However, both are required to be Theano variables. In the case of an unsupervised model, the outputof the model is the only argument passed on to the loss.

A list of supervised losses can be found by checking the contents of the breze.arch.components.distancemodule:

>>> from breze.arch.components import distance>>> dir(distance)['T','__builtins__','__doc__','__file__','__name__','__package__','absolute','bernoulli_kl','bernoulli_neg_cross_entropy','discrete_entropy','distance_matrix','lookup','nca','neg_cross_entropy','nominal_neg_cross_entropy',

1

breze Documentation, Release 0.1

'norm','squared']

Some of these are just global variable of course.

Using custom loss functions

Using your own loss function comes down to implementing it following the above protocol and working on Theanovariables. We can thus define the sum of squares loss ourself as follows:

def squared(target, output):d = target - outputreturn (d**2).sum()

We can also use more complicated loss functions. The Huber loss for example is a mix of the absolute error and thesquared error, depending on the size of the error. It depends on an additional threshold parameter and is defined asfollow:

𝐿𝛿(𝑎) =

𝑎2

2if |𝑎| ≤ 𝛿,

𝐿𝛿(𝑎) =

𝛿(|𝑎| − 𝛿

2), else.

We can implement this as follows:

import theano.tensor as Tdelta = 0.1def huber(target, output):

d = target - outputa = .5 * d**2b = delta * (abs(d) - delta / 2.)l = T.switch(abs(d) <= delta, a, b)return l.sum()

Unfortunately, we will have to set a global variable for this. The most elegant solution is to use a function template:

import theano.tensor as Tdef make_huber(delta):

def inner(target, output):d = target - outputa = .5 * d**2b = delta * (abs(d) - delta / 2.)l = T.switch(abs(d) <= delta, a, b)return l.sum()

return inner

my_huber = make_huber(0.1)

This way we can create wild loss functions.

Using norms and transfer functions

The story is similar when using norms and loss functions. In the former case, the module of interest isbreze.arch.component.norm. The protocol is that a single argument, a Theano variable, is given. The re-

2 Chapter 1. Basics


sult is expected to be a Theano variable of the same shape. This is also the case for transfer functions, except that themodule in question is breze.arch.component.transfer.

1.1. Specifiying losses, norms, transfer functions etc. 3


4 Chapter 1. Basics

CHAPTER 2

Models and Algorithms

Learning representations, clustering:

Principal Component Analysis

This module provides functionality for principal component analysis.

class breze.learn.pca.Pca(n_components=None, whiten=False)Class to perform principal component analysis.

Attributes

n_components (integer) Number of components to keep.whiten (boolean) Flag indicating whether to whiten the covariance matrix.weights (array_like) 2D array representing the map from observable to latent space.singular_values (array_like) 1D array containing the singular values of the problem.

Methods

fit(X) Fit the parameters of the model.inverse_transform(F) Perform an inverse transformation of transformed data according to the model.reconstruct(X) Reconstruct the data according to the model.transform(X) Transform data according to the model.

__init__(n_components=None, whiten=False)Create a Pca object.

fit(X)Fit the parameters of the model.

The data should be centered (that is, its mean subtracted rowwise) before using this method.

Parameters X : array_like

An array of shape (n, d) where n is the number of data points and d the input dimen-sionality.

inverse_transform(F)

5


Perform an inverse transformation of transformed data according to the model.

Parameters F : array_like

An array of shape (n, d) where n is the number of data points and d the dimension-ality if the feature space.

Returns X : array_like

An array of shape (n, c) where n is the number of samples and c is the dimension-ality of the input space.

reconstruct(X)Reconstruct the data according to the model.



Returns Y : array_like

An array of shape (n, d) where n is the number of samples and d is the dimension-ality of the input space.

transform(X)Transform data according to the model.




An array of shape (n, c) where n is the number of samples and c is the number ofcomponents kept.

class breze.learn.pca.Zca(min_eig_val=0.1)Class to perform zero component analysis.

Attributes

min_eig_val (float) Eigenvalues are increased by this value before reconstructing.weights (array_like) 2D array representing the map from observable to latent space.singular_values (array_like) 1D array containing the singular values of the problem.

Methods


__init__(min_eig_val=0.1)Create a Zca object.

fit(X)

6 Chapter 2. Models and Algorithms


Fit the parameters of the model.


inverse_transform(F)Perform an inverse transformation of transformed data according to the model.


An array of shape (n, d) where n is the number of data points and d the dimension-ality if the feature space.


An array of shape (n, c) where n is the number of samples and c is the dimension-ality of the input space.





An array of shape (n, d) where n is the number of samples and d is the dimension-ality of the input space.






Extreme Component Analysis

This module provides functionality for extreme component analysis.

An explanation and derivation of the algorithm can be found in [XCA].

class breze.learn.xca.Xca(n_components, whiten=False)Class implementing extreme component analysis.

The idea is that not only the prinicple components or the minor components of a data set are important, but acombination of the two. This algorithm works by combining probabilistic versions of PCA and MCA.

The central idea is that if n principle and m minor components are chosen, a gap of size D - m - n dimensions isformed in the list of singular values. The exact location of this gap is found by chosing the one which minimizesa likelihood combining PCA and MCA.

2.2. Extreme Component Analysis 7


Attributes

n_components (integer) Amount of components kept.

Methods


__init__(n_components, whiten=False)Create an Xca object.

Parameters n_components : integer

Amount of components to keep.





inverse_transform(F)Perform an inverse transformation of transformed data according to the model.


An array of shape (n, d) where n is the number of data points and d the dimensionalityif the feature space.


An array of shape (n, c) where n is the number of samples and c is the dimensionalityof the input space.



An array of shape (n, d) where n is the number of samples and d is the dimensionalityof the input space.




Returns F : array_like




Sparse Filtering

ICA with Reconstruction Cost

Canonical Correlation Analysis

breze.learn.cca.cca(X, Y)Canonical Correlation Analysis


Observation matrix in first space, every column is one data point.

Y : array_like

Observation matrix in second space, every column is one data point.

Returns cA : array_like

Basis in X space

B : array_like

Basis in Y space.

clambdas : array_like

Correlation.

Slow Feature Analysis

Slow Feature Analysis.

This module provides functionality for slow feature analysis. A helpful article is hosted at scholarpedia.

class breze.learn.sfa.SlowFeatureAnalysis(n_components=None)Class for performing Slow feature analysis.

Attributes

n_components (integer) Number of components to keep.

Methods

fit(X) Fit the parameters of the model.transform(X) Transform data according to the model.

__init__(n_components=None)Create a SlowFeatureAnalysis object.


Amount of components to keep.

2.6. Slow Feature Analysis 9

http://www.scholarpedia.org/article/Slow_feature_analysis



The data should be centered (that is, its mean subtracted rowwise) and white (e.g. via pca.Pca) beforeusing this method.

Parameters X : list of array_like

A list of sequences. Each entry is expected to be an array of shape (*, d) where * isthe number of data points and may vary from item to item in the list. d is the inputdimensionality and has to be consistent.

Returns F : list of array_like

List of sequences. Each item in the list is an array which corresponds to the sequencein X. It is of the same shape, except that d is replaced by n_components.



An array of shape (n, d) where n is the number of time steps and d the input dimen-sionality.

Returns F : array_like

An array of shape (n, c) where n is the number of time steps and c is the number ofcomponents kept.

K-Means

class breze.learn.kmeans.GainShapeKMeans(n_component, zscores=False, whiten=False,c_zca=1e-08, max_iter=10, random_state=None)

GainShapeKMeans class to perform K-means clustering for feature learning as described in [LFRKM].


Number of features to learn.

zscores : boolean, optional, default: False

Flag indicating whether the data should be normalized to zero mean and unit variancebefore training and transformation.

whiten : boolean, optional, default: False

Flag indicating whether the data should be whitened before training and transformation.

c_zca : float, optional, default: 1e-8

Small number that is added to each singular value during ZCA.

max_iter : integer, optional

Maximum number of iterations to perform.

random_state : None, integer or numpy.RandomState, optional, default: None

Generator to initialize the dictionary. If None, the numpy singleton generator is used.



References

[LFRKM]

Attributes

activation: {‘identity’, ‘omp-1’,‘soft-threshold’}, optional,default: None

Activation to for transformation. ‘identity’ does not alter the output.‘omp-1’ only retains the component with the largest absolute value.‘soft-threshold’ only sets components below a certain threshold to zero,but separates positive and negative parts.

threshold (scalar,) Threshold used for soft-thresholding activation. Ignored ifanother activation is used.

Methods

fit(X) Fit the parameters of the model.iter_fit(X)normalize_dict() Normalize the columns of the dictionary to unit length.prepare(n_inpt) Initialize the models internal structures.transform(X[, activation]) Transform the data according to the dictionary.



Array of shape (n_samples, n_inpt) used for training.

transform(X, activation=None)Transform the data according to the dictionary.


Input data of shape (n_samples, n_inpt).

activation: {‘identity’, ‘omp-1’}, optional, default: None

Activation to use. ‘linear’ does not alter the output. ‘omp-1’ only retains the componentwith the largest absolute value. ‘soft-threshold’ only sets components below a certainthreshold to zero, but separates positive and negative parts. If None, .activation isused.

Regularized Information Maximization

Stochastic Gradient Variational Bayes

Variational Autoencoder

Denoising:

2.8. Regularized Information Maximization 11


Linear Denoiser

Module for the linear denoiser.

class breze.learn.lde.LinearDenoiser(p_dropout)Class that represents linear denoisers.

LinearDenoisers (LDEs) were later also named Marginalized Denoising AutoEncoders.

Introduced in [R1].

References

[R1]

Methods

fit(X) Fit the parameters of the model.transform(X) Transform data according to the model.

__init__(p_dropout)Create a LinearDenoiser object.

Parameters p_dropout : float

Probability of an input being dropped out.









Supervised Learning

Recurrent Neural Networks

Multilayer Perceptrons

Sampling



Hybrid Monte Carlo

breze.learn.sampling.hmc.sample(f_energy, f_energy_prime, position, n_steps,desired_accept=0.9, initial_step_size=0.01,step_size_grow=1.02, step_size_shrink=0.98,step_size_min=0.0001, step_size_max=0.25,avg_accept_slowness=0.9, sample_dim=0)

Return a sample from the distribution given by f_energy.

Parameters

• f_energy – Log of a function proportional to the density.

• f_energy_prime – Derivative of f_energy wrt to the current position.

• position – An numpy array of any desired shape which represents multiple particles.

• n_steps – Amount of steps to perform for the next sample.

• desired_accept – Desired acceptance rate of the underlying Metropolis hastings.

• initial_step_size – Initial size of a step along the energy landscape.

• step_size_grow – If the acceptance rate is too high, increase the step size by this factor.

• step_size_shrink – If the acceptance rate is too low, decrease the step size by thisfactor.

• step_size_min – Don’t decrease the step size below this value.

• step_size_max – Don’t increase the step size above this value.

• avg_accept_slowness – When calculating the acceptance rate, use this value as adecay for an exponential average.

• sample_dim – The axis which discriminates the different particles given in the positionarray from each other.

Trainers

Trainers

Trainer module

Score module

Module for various scoring strategies.

breze.learn.trainer.score.simple(f_score, *data)Simple scoring strategy which just applies f_score to the passed arguments.

class breze.learn.trainer.score.MinibatchScore(max_samples, sample_dims)MinibatchScore class.

Scoring strategy for very large data sets, where the score of only a subset of rows can be calculated at the sametime. This score assumes that scores are averages.

2.13. Hybrid Monte Carlo 13


Attributes

max_samples(int) Maximum samples to calculcate the score for at the same time.sam-ple_dims

(list of ints) Dimensions along which the samples are stored. The length of this list correspondsto the number of arguments the score takes. The entry along which different samples are stored.

Methods

__call__(f_score, *data) “Return the score of the data.

__init__(max_samples, sample_dims)Create MinibatchScore object.

Parameters max_samples : int

Maximum samples to calculcate the score for at the same time.

sample_dims : list of ints

Dimensions along which the samples are stored. The length of this list corresponds tothe number of arguments the score takes. The entry along which different samples arestored.

Report module


CHAPTER 3

Helpers, convenience functions and tools

Feature extraction

Basic feature extraction

breze.learn.feature.rbf(X, n_centers)Return a design matrix with features given by radial basis functions.

n_centers Gaussian kernels are placed along data dimension, equidistant between the minimum and the maxi-mum along that dimension. The result then contains one column for each of the Kernels.

Parameters

• X – NxD sized array.

• n_centers – Amount of Kernels to use for each dimension.

Returns Nx(n_centers * D) sized array.

Feature extraction for EMG and similar time series data

Module that holds various preprocessing routines for emg signals.

breze.learn.feature.emg.integrated(X)Return the sum of the absolute values of a signal.

Parameters X – An (t, n, d) array where t is the number of time steps, n is the number of differentsignals and d is the number of channels.

Returns An (n, d) array.

breze.learn.feature.emg.mean_absolute_value(X)Return the mean absolute value of the signal.



breze.learn.feature.emg.modified_mean_absolute_value_1(X)Return a weighted version of the mean absolute value.

Instead of equal weight, the first and last quarter of the signal are only weighed half.

15


breze.learn.feature.emg.modified_mean_absolute_value_2(X)Return a weighted version of the mean absolute value.

The central half of the signal has weight one. The beginning and the last quarter increase/decrease their weighttowards that.



breze.learn.feature.emg.mean_absolute_value_slope(X)Return the first derivative of the mean absolute value.



breze.learn.feature.emg.variance(X)Return the variance of the signals.



breze.learn.feature.emg.root_mean_square(X)Return the root mean square of the signals.



breze.learn.feature.emg.zero_crossing(X, threshold=1e-08)Return the amount of times the signal crosses the zero y-axis.

Parameters

• X – An (t, n, d) array where t is the number of time steps, n is the number of different signalsand d is the number of channels.

• threshold – Changes below this value are ignored. Useful to surpress noise.


breze.learn.feature.emg.slope_sign_change(X, threshold=1e-08)Return the amount of times the signal changes slope.

Parameters




breze.learn.feature.emg.willison_amplitude(X, threshold=1e-08)Return the amount of times the difference between two adjacent emg segments exceeds a threshold.

Parameters


16 Chapter 3. Helpers, convenience functions and tools




Data manipulation

Module for manipulating data.

breze.learn.data.shuffle(data)Shuffle the first dimension of an indexable object in place.

breze.learn.data.padzeros(lst, front=True, return_mask=False)Given a list of arrays, pad every array with up front zeros until they reach unit length.

Each element of lst can have a different first dimension, but has to be equal on the other dimensions.

breze.learn.data.collapse_seq_borders(arr)Given an array of ndim 3, return a view of ndim 2 where the first dimension is flattened out.

breze.learn.data.uncollapse_seq_borders(arr, shape)Return a view of ndim 3, given an array of ndim 2, where the first dimension is expanded to 2 dimensions of thegiven shape.

breze.learn.data.skip(X, n, d=1)Return an array X with the same number of rows, but only each n‘th block of d consecutive columns is kept.

Crude way of reducing the dimensionality of time series.

breze.learn.data.interleave(lst)Given a list of arrays, interleave the arrays in a way that the first dimension represents the first dimension ofevery array.

This is useful for time series, where multiple time series should be processed in a single swipe.

breze.learn.data.uninterleave(lst)Given an array of interleaved arrays, return an uninterleaved version of it.

breze.learn.data.interpolate(X, n_intermediates, kind=’linear’)Given an array of shape (j, k), return an array of size (j * n_intermediates, k) where each i * n_intermediatedelement refers to the i’th element in X while all the others are linearly interpolated.

breze.learn.data.windowify(X, size, offset=1)Return a static array that represents a sliding window dataset of size size given by the list of arrays ‘.

breze.learn.data.iter_windows(X, size, offset=1)Return an iterator that goes over a sequential dataset with a sliding time window.

X is expected to be a list of arrays, where each array represents a sequence along its first axis.

breze.learn.data.split(X, maxlength)Return a list of sequences where each sequence has a length of at most maxlength.

Given a list of sequences X, the sequences are split accordingly.

breze.learn.data.collapse(X, n)Return a list of sequences, where n consecutive timesteps have been collapsed into a single timestep by concate-nation for each sequence.

Timesteps are cut off to ensure divisibility by n.

breze.learn.data.uncollapse(X, n)Return a list of sequences, where each timestep is divided into n consecutive timesteps.

3.2. Data manipulation 17


breze.learn.data.consecutify(seqs)Given sequences of equal second dimension, put them into a consecutive memory block M and return it. Alsoreturn a list of views to that block that represent the given sequences.

Various utilities

This function was taken from the deeplearning tutorials. The copyrght notice is in the source.

Helpers for plotting data

breze.learn.display.scatterplot_matrix(X, C=None, symb=’o’, alpha=1, fig=None)Return a figure containig a scatter plot matrix.

This is a useful tool for inspecting multi dimensional data. Each dimension will be plotted against each dimen-sion as a scatter plot, arranged into a matrix. The diagonal will contain histograms.


2D array containing the points to plot.

C : array_like

Class labels (optional). Each row of X with the same value in C will be given the samecolor in the plots.

symb : string

Symbol to use for plotting. Will be forwarded to pylab.plot.

alpha : float

Between 0 and 1. Transparency of the points, where 1 means fully opaque.

fig : matplotlib.pyplot.Figure or None

Figure to plot into. If None, will be created itself.

breze.learn.display.time_series_filter_plot(filters, n_rows=None, n_cols=None,fig=None)

Plot filters for time series data.

Each filter is plotted into its own axis.

Parameters filters : array_like

The argument filters is expected to be an array of shape (n_filters,window_size, n_channels). n_filters is the number of filter banks,window_size is the length of a time window and n_channels is the number ofdifferent sensors.

n_rows : int, optional, default: None

Number of rows for the plot. If not given, inferred from n_cols to match dimensions.If n_cols is not given as well, both are taken to be roughly the square root of thenumber of filters.

n_cols : int, optional, default: None


http://www.deeplearning.net/


Number of rows for the plot. If not given, inferred from n_rows to match dimensions.If n_rows is not given as well, both are taken to be roughly the square root of thenumber of filters.

fig : Figure, optional

Figure to plot the axes into. If not given, a new one is created.

Returns figure : matplotlib figre

Figure object to save or plot.

The following function was adapted from the scipy cookbook.

breze.learn.display.hinton(ax, W, max_weight=None)Draws a Hinton diagram for the matrix W to axis ax.

3.4. Helpers for plotting data 19

http://www.scipy.org/Cookbook/Matplotlib/HintonDiagrams



CHAPTER 4

Architectures, Components

Norms

Module containing various norms.

breze.arch.component.norm.l2(arr, axis=None)Return the L2 norm of a tensor.

Parameters arr : Theano variable.

The variable to calculate the norm of.

axis : integer, optional [default: None]

The sum will be performed along this axis. This makes it possible to calculate the normof many tensors in parallel, given they are organized along some axis. If not given, thenorm will be computed for the whole tensor.

Returns res : Theano variable.

If axis is None, this will be a scalar. Otherwise it will be a tensor with one dimensionless, where the missing dimension corresponds to axis.

Examples

>>> v = T.vector()>>> this_norm = l2(v)

>>> m = T.matrix()>>> this_norm = l2(m, axis=1)

>>> m = T.matrix()>>> this_norm = l2(m)

breze.arch.component.norm.l1(arr, axis=None)Return the L1 norm of a tensor.




21





Examples

>>> v = T.vector()>>> this_norm = l1(v)

>>> m = T.matrix()>>> this_norm = l1(m, axis=1)

>>> m = T.matrix()>>> this_norm = l1(m)

breze.arch.component.norm.soft_l1(inpt, eps=1e-08, axis=None)Return a “soft” L1 norm of a tensor.

The term “soft” is used because we are using√𝑥2 + 𝜖 in favor of |𝑥| which is not smooth at 𝑥 = 0.



eps : float, optional [default: 1e-8]

Small offset to make the function more smooth.





Examples

>>> v = T.vector()>>> this_norm = soft_l1(v)

>>> m = T.matrix()>>> this_norm = soft_l1(m, axis=1)

>>> m = T.matrix()>>> this_norm = soft_l1(m)

breze.arch.component.norm.lp(inpt, p, axis=None)Return the Lp norm of a tensor.


22 Chapter 4. Architectures, Components



p : Theano variable or float.

Order of the norm.





Examples

>>> v = T.vector()>>> this_norm = lp(v, .5)

>>> m = T.matrix()>>> this_norm = lp(m, 3, axis=1)

>>> m = T.matrix()>>> this_norm = lp(m, 4)

Transfer functions

Module that keeps various transfer functions as used in the context of neural networks.

breze.arch.component.transfer.tanh(inpt)Tanh activation function.

Parameters inpt : Theano variable

Input to be transformed.

Returns output : Theano variable

Transformed output. Same shape as inpt.

breze.arch.component.transfer.tanhplus(inpt)Tanh with added linear activation function.

𝑓(𝑥) = 𝑡𝑎𝑛ℎ(𝑥) + 𝑥





breze.arch.component.transfer.sigmoid(inpt)Sigmoid activation function.

𝑓(𝑥) =1

1 + exp(−𝑥)

4.2. Transfer functions 23






breze.arch.component.transfer.rectifier(inpt)Rectifier activation function.

𝑓(𝑥) = max(0, 𝑥)





breze.arch.component.transfer.softplus(inpt)Soft plus activation function.

Smooth approximation to rectifier.

𝑓(𝑥) = log(1 + exp(𝑥))





breze.arch.component.transfer.softsign(inpt)Softsign activation function.

𝑓(𝑥) =𝑥

1 + |𝑥|





breze.arch.component.transfer.softmax(inpt)Softmax activation function.

𝑓(𝑥𝑖) =exp(𝑥𝑖)∑︀𝑗 exp(𝑥𝑗)

Here, the index runs over the columns of inpt.

Numerical stable version that subtracts the maximum of each row from all of its entries.

Wrapper for theano.nnet.softmax.


Array of shape (n, d). Input to be transformed.





Loss functions

Module containing several losses usable for supervised and unsupervised training.

A loss is of the form:

def loss(target, prediction, ...):...

The results depends on the exact nature of the loss. Some examples are:

• coordinate wise loss, such as a sum of squares or a Bernoulli cross entropy with a one-of-k target,

• sample wise, such as neighbourhood component analysis.

In case of the coordinate wise losses, the dimensionality of the result should be the same as that of the predictions andtargets. In all other cases, it is important that the sample axes (usually the first axis) stays the same. The individualdata points lie along the coordinate axis, which might change to 1.

Some examples of valid shape transformations:

(n, d) -> (n, d)(n, d) -> (n, 1)

These are not valid:

(n, d) -> (1, d)(n, d) -> (n,)

For some examples, consult the source code of this module.

breze.arch.component.loss.squared(target, prediction)Return the element wise squared loss between the target and the prediction.

Parameters target : Theano variable

An array of arbitrary shape representing representing the targets.

prediction : Theano variable

An array of arbitrary shape representing representing the predictions.

Returns res : Theano variable

An array of the same shape as target and prediction representing the pairwisedistances.

breze.arch.component.loss.absolute(target, prediction)Return the element wise absolute difference between the target and the prediction.


An array of arbitrary shape representing representing the targets.


An array of arbitrary shape representing representing the predictions.


An array of the same shape as target and prediction representing the pairwisedistances.

breze.arch.component.loss.cat_ce(target, prediction, eps=1e-08)Return the cross entropy between the target and the prediction, where prediction is a summary ofthe statistics of a categorial distribution and target is a some outcome.

4.3. Loss functions 25


Used for multiclass classification purposes.

The loss is different to ncat_ce by that target is not an array of integers but a hot k coding.

Note that predictions are clipped between eps and 1 - eps to ensure numerical stability.


An array of shape (n, k) where n is the number of samples and k is the number ofclasses. Each row represents a hot k coding. It should be zero except for one element,which has to be exactly one.


An array of shape (n, k). Each row is interpreted as a categorical probability. Thus,each row has to sum up to one and be non-negative.


An array of the same size as target and prediction representing the pairwisedivergences.

breze.arch.component.loss.ncat_ce(target, prediction)Return the cross entropy between the target and the prediction, where prediction is a summary ofthe statistics of the categorical distribution and target is a some outcome.

Used for classification purposes.

The loss is different to cat_ce by that target is not a hot k coding but an array of integers.


An array of shape (n,) where n is the number of samples. Each entry of the arrayshould be an integer between 0 and k-1, where k is the number of classes.


An array of shape (n, k) or (t, n , k). Each row (i.e. entry in the last dimen-sion) is interpreted as a categorical probability. Thus, each row has to sum up to oneand be non-negative.


An array of shape (n, 1) as target containing the log probability that that exampleis classified correctly.

breze.arch.component.loss.bern_ces(target, prediction)Return the Bernoulli cross entropies between binary vectors target and a number of Bernoulli variablesprediction.

Used in regression on binary variables, not classification.


An array of shape (n, k) where n is the number of samples and k is the number ofoutputs. Each entry should be either 0 or 1.

prediction : Theano variable.

An array of shape (n, k). Each row is interpreted as a set of statistics of Bernoullivariables. Thus, each element has to lie in (0, 1).





breze.arch.component.loss.bern_bern_kl(X, Y)Return the Kullback-Leibler divergence between Bernoulli variables represented by their sufficient statistics.

Parameters X : Theano variable

An array of arbitrary shape where each element represents the statistic of a Bernoullivariable and thus should lie in (0, 1).

Y : Theano variable

An array of the same shape as target where each element represents the statistic of aBernoulli variable and thus should lie in (0, 1).



breze.arch.component.loss.ncac(target, embedding)Return the NCA for classification loss.

This corresponds to the probability that a point is correctly classified with a soft knn classifier using leave-one-out. Each neighbour is weighted according to an exponential of its negative Euclidean distance. Afterwards, aprobability is calculated for each class depending on the weights of the neighbours. For details, we refer you to

‘Neighbourhood Component Analysis’ by J Goldberger, S Roweis, G Hinton, R Salakhutdinov (2004).


An array of shape (n,) where n is the number of samples. Each entry of the arrayshould be an integer between 0 and k - 1, where k is the number of classes.

embedding : Theano variable

An array of shape (n, d) where each row represents a point in‘‘d‘‘-dimensionalspace.


Array of shape (n, 1) holding a probability that a point is classified correclty.

breze.arch.component.loss.ncar(target, embedding)Return the NCA for regression loss.

This is similar to NCA for classification, except that not soft KNN classification but regression performance ismaximized. (Actually, the negative performance is minimized.)

For details, we refer you to

‘Pose-sensitive embedding by nonlinear nca regression’ by Taylor, G. and Fergus, R. and Williams, G. andSpiro, I. and Bregler, C. (2010)


An array of shape (n, d) where n is the number of samples and d the dimensionaltyof the target space.

embedding : Theano variable

An array of shape (n, d) where each row represents a point in d-dimensional space.


Array of shape (n, 1).

4.3. Loss functions 27


breze.arch.component.loss.drlim(push_margin, pull_margin, c_contrastive,push_loss=’squared’, pull_loss=’squared’)

Return a function that implements the

‘Dimensionality reduction by learning an invariant mapping’ by Hadsell, R. and Chopra, S. and LeCun, Y.(2006).

For an example of such a function, see drlim1 with a margin of 1.

Parameters push_margin : Float

The minimum margin that negative pairs should be seperated by. Pairs seperated byhigher distance than push_margin will not contribute to the loss.

pull_margin: Float

The maximum margin that positive pairs may be seperated by. Pairs seperated by lowerdistances do not contribute to the loss.

c_contrastive : Float

Coefficient to weigh the contrastive term relative to the positive term

push_loss : One of {‘squared’, ‘absolute’}, optional, default: ‘squared’

Loss to encourage Euclidean distances between non pairs.

pull_loss : One of {‘squared’, ‘absolute’}, optional, default: ‘squared’

Loss to punish Euclidean distances between pairs.

Returns loss : callable

Function that takes two arguments, a target and an embedding.

Stochastic Corruption of Theano Variables

This module contains functionality to corrupt Theano variables with noise.

breze.arch.component.corrupt.gaussian_perturb(arr, std, rng=None)Return a Theano variable which is perturbed by additive zero-centred Gaussian noise with standard deviationstd.

Parameters arr : Theano variable

Array of some shape n.

std : float or scalar Theano variable

Standard deviation of the Gaussian noise.

rng : Theano random number generator, optional [default: None]

Generator to draw random numbers from. If None, rng will be instantiated on the spot.


Of shape n.



Examples

>>> m = T.matrix()>>> c = gaussian_perturb(m, 0.1)

breze.arch.component.corrupt.mask(arr, p, rng=None)Return a Theano variable which is with elements of it set to zero with probability p.


Array of some shape n.

p : float or scalar Theano variable

Probability that a unit is set to zero.

rng : Theano random number generator, optional [default: None]

Generator to draw random numbers from. If None, rng will be instantiated on the spot.


Of shape n.

Examples

>>> m = T.matrix()>>> c = mask(m, 0.1)

Miscellaneous functionality

Module holding miscellaneous functionality.

breze.arch.component.misc.pairwise_diff(X, Y=None)Given two arrays with samples in the row, compute the pairwise differences.


Has shape (n, d). Contains one item per first dimension.

Y : Theano variable, optional [default: None]

Has shape (m, d). If not given, defaults to X.


Has shape (n, d, m).

breze.arch.component.misc.distance_matrix(X, Y=None, norm=<function l2>)Return an expression containing the distances given the norm of up to two arrays containing samples.


Has shape (n, d). Contains one item per first dimension.

Y : Theano variable, optional [default: None]

Has shape (m, d). If not given, defaults to X.

norm : string or callable

4.5. Miscellaneous functionality 29


Either a string pointing at a function in breze.arch.component.norm or a func-tion that has the same signature as these.


Has shape (n, m).

breze.arch.component.misc.distance_matrix_by_diff(diff, norm=<function l2>)Return an expression containing the distances given the norm norm arrays containing samples.

Parameters D : Theano variable

Has shape (n, d, m) and represents differences between two collections of the sameset.

norm : string or callable

Either a string pointing at a function in breze.arch.component.norm or a func-tion that has the same signature as these.


Has shape (n, m).

breze.arch.component.misc.cat_entropy(arr)Return the entropy of categorical distributions described by the rows in arr.


Array of shape (n, d) describing n different categorical variables. Rows need to sumup to 1 and be non-negative.

Returns res : theano variable

Has shape (n,).

breze.arch.component.misc.project_into_l2_ball(arr, radius=1)Return arr projected into the L2 ball.


Array of shape either (n, d) or (d,). If the former, all rows are projected individu-ally.

radius : float, optional [default: 1]


Projected result of the same shape as arr.

Layers

Module that contains various layer like components.

breze.arch.component.layer.simple(inpt, weights, bias, out_transfer, p_dropout=0, prefix=’‘)Return a dictionary containing computations from a simple layer.

The layer has the following form

𝑓((𝑥 · 𝑑)𝑇𝑊 + 𝑏),

where 𝑓 corresponds to transfer, 𝑥 to input, · indicates the element-wise product, 𝑑 is a vector of Bernoullisamples with parameter p_dropout, 𝑊 is the weight matrix weights and 𝑏 is the bias.




Array of shape (n, d).

weights : Theano variable

Array of shape (d, e).

bias : Theano variable

Array of shape (e,).

transfer : function or string

If a function should given a Theano variable return a Theano variableof the same shape. If string, is used to get a transfer function frombreze.arch.component.transfer.

p_dropout : Theano scalar or float

Needs to be in (0, 1). Indicates the probability that an input is set to zero.

prefix : string, optional [default: ‘’]

Each enty in the returned dictionary will be prefixed with this.

Returns d : dict

Has the following entries: output_in, activation before application of transfer.output, activation after application of transfer.

Common functions

Module that contains functionality common to many other modules.

breze.arch.component.common.supervised_loss(target, prediction, loss, coord_axis=1,imp_weight=False, prefix=’‘)

Return a dictionary populated with several expressions for a supervised loss and corresponding targets andpredictions.


Array representing the target variables.


Array representing the predictions.

loss : callable or string

If a string, should index a member of breze.arch.component.loss. If acallable, has to be a of the form described in breze.arch.component.loss.

coord_axis : integer, optional [default: 1]

Axis aong which the coordinates of single sample are stored. I.e. not the sample axis orsome spatial axis.


Each key in the resulting dictionary will be prefixed with prefix.

imp_weight : Theano variable, float or boolean, optional [default: False]

Importance weights for the loss. Will be multiplied to the coordinate wise loss.

4.7. Common functions 31


Returns res : dict

Dictionary containing the expressions. See example for keys.

Examples

>>> import theano.tensor as T>>> prediction, target = T.matrix('prediction'), T.matrix('target')>>> from breze.arch.component.loss import squared>>> loss_dict = supervised_loss(target, prediction, squared,... prefix='mymodel-')>>> sorted(loss_dict.items())[('mymodel-loss', ...), ('mymodel-loss_coord_wise', ...), ('mymodel-loss_sample_wise', ...), ('mymodel-prediction', prediction), ('mymodel-target', target)]

breze.arch.component.common.unsupervised_loss(output, loss, coord_axis=1, prefix=’‘)Return a dictionary populated with several expressions for a unsupervised loss and corresponding output.

Parameters output : Theano variable

Array representing the predictions.


If a string, should index a member of breze.arch.component.loss. If acallable, has to be a of the form described in breze.arch.component.loss.





Returns res : dict


Examples

>>> import theano.tensor as T>>> output = T.matrix('output')>>> my_loss = lambda x: abs(x)>>> loss_dict = unsupervised_loss(output, my_loss, prefix='$')>>> sorted(loss_dict.items())[('$loss', ...), ('$loss_coord_wise', ...), ('$loss_sample_wise', ...), ('$output', ...)]

Univariate Normal Distribution

breze.arch.component.distributions.normal.pdf(sample, location=0, scale=1)Return a theano expression representing the values of the probability density function of a Gaussian distribution.

Parameters sample : Theano variable

Array of shape (n,) where n is the number of samples.

location : Theano variable



Scalar representing the mean of the distribution.

scale : Theano variable

Scalar representing the standard deviation of the distribution.

Returns l : Theano variable

Array of shape (n,) where each entry represents the density of the correspondingsample.

Examples

>>> import theano>>> import theano.tensor as T>>> import numpy as np>>> from breze.learn.utils import theano_floatx>>> sample, mean, std = T.vector(), T.scalar(), T.scalar()>>> p = pdf(sample, mean, std)>>> f_p = theano.function([sample, mean, std], p)

>>> X, = theano_floatx(np.array([-1, 0, 1]))>>> ps = f_p(X, 0.1, 1.2)>>> np.allclose(ps, [0.21840613, 0.33129956, 0.25094786])True

breze.arch.component.distributions.normal.cdf(sample, location=0, scale=1)Return a theano expression representing the values of the cumulative density function of a Gaussian distribution.


Array of shape (n,) where n is the number of samples.

location : Theano variable

Scalar representing the mean of the distribution.

scale : Theano variable

Scalar representing the standard deviation of the distribution.


Array of shape (n,) where each entry represents the cumulative density of the corre-sponding sample.

Examples

>>> import theano>>> import theano.tensor as T>>> import numpy as np>>> from breze.learn.utils import theano_floatx>>> sample, mean, std = T.vector(), T.scalar(), T.scalar()>>> c = cdf(sample, mean, std)>>> f_c = theano.function([sample, mean, std], c)

>>> X, = theano_floatx(np.array([-1, 0, 1]))>>> cs = f_c(X, 0.1, 1.2)>>> np.allclose(cs, [0.17965868, 0.46679324, 0.77337265])True

4.8. Univariate Normal Distribution 33


Multivariate Normal Distribution

Module containing expression buildes for the multivariate normal.

breze.arch.component.distributions.mvn.pdf(sample, mean, cov)Return a theano expression representing the values of the probability density function of the multivariate normal.


Array of shape (n, d) where n is the number of samples and d the dimensionality ofthe data.

mean : Theano variable

Array of shape (d,) representing the mean of the distribution.

cov : Theano variable

Array of shape (d, d) representing the covariance of the distribution.


Array of shape (n,) where each entry represents the density of the correspondingsample.

Examples

>>> import theano>>> import theano.tensor as T>>> import numpy as np>>> from breze.learn.utils import theano_floatx>>> sample = T.matrix('sample')>>> mean = T.vector('mean')>>> cov = T.matrix('cov')>>> p = pdf(sample, mean, cov)>>> f_p = theano.function([sample, mean, cov], p)

>>> mu = np.array([-1, 1])>>> sigma = np.array([[.9, .4], [.4, .3]])>>> X = np.array([[-1, 1], [1, -1]])>>> mu, sigma, X = theano_floatx(mu, sigma, X)>>> ps = f_p(X, mu, sigma)>>> np.allclose(ps, [4.798702e-01, 7.73744047e-17])True

breze.arch.component.distributions.mvn.logpdf(sample, mean, cov)Return a theano expression representing the values of the log probability density function of the multivariatenormal.


Array of shape (n, d) where n is the number of samples and d the dimensionality ofthe data.

mean : Theano variable

Array of shape (d,) representing the mean of the distribution.

cov : Theano variable

Array of shape (d, d) representing the covariance of the distribution.




Array of shape (n,) where each entry represents the log density of the correspondingsample.

Examples

>>> import theano>>> import theano.tensor as T>>> import numpy as np>>> from breze.learn.utils import theano_floatx>>> sample = T.matrix('sample')>>> mean = T.vector('mean')>>> cov = T.matrix('cov')>>> p = logpdf(sample, mean, cov)>>> f_p = theano.function([sample, mean, cov], p)

>>> mu = np.array([-1, 1])>>> sigma = np.array([[.9, .4], [.4, .3]])>>> X = np.array([[-1, 1], [1, -1]])>>> mu, sigma, X = theano_floatx(mu, sigma, X)>>> ps = f_p(X, mu, sigma)>>> np.allclose(ps, np.log([4.798702e-01, 7.73744047e-17]))True

Utilities

class breze.arch.util.ModelModel class.

Intended as a base class for parameterized models providing a convenience method for compilation and a com-mon interface.

We partition Theano variables for parametrized models in three groups. (1) The adaptable parameters, (2)external variables such as inputs and targets, the data (3) expressions composed out of the two, such as theprediction of a model or the loss resulting from those.

There are several “reserved” names for expressions.

•inpt: observations of a supervised or unsupervised model,

•target: desired outputs of a supervised model,

•loss: quantity to be optimized for fitting the parameters; might not refer to the criterion of interest, butinstead to a regularzied objective.

•true_loss: Quantity of interest for the user, e.g. the loss without regularization or the empirical risk.

Overriding these names is possible in general, but is part of the interface and will lead to unexpected behaviourwith functionality building upon this.

Lookup of variables and expressions is typically done in the following ways.

•as the variable/expression itself,

•as a string which is the attribute/key to look for in the ParameterSet

4.10. Utilities 35


object/expression dictinary, - as a path along theese, e.g. the tuple (’foo’, ’bar’, 0) willidentify .parameters.foo.bar[0] or .parameters[’foo’][’bar’][0] depending onthe context.

Attributes

pars (ParameterSet object) Holding the adaptable parameters of the object.exprs (dictionary) Containig the expressions. Out of convenience, the external variables are held in here

as well.up-dates

(dict) Containing update variables, e.g. due to the use of theano.scan.

Methods

functionvar_exp_for_gpu

function(variables, exprs, mode=None, explicit_pars=False, givens=None, on_unused_input=’raise’,numpy_result=False)

Return a compiled function for the given exprs given variables.

Parameters variables : list of strings

Each string refers to an item in .exprs and is considered an input to the function.

exprs : (List of) Theano expression or string

Expressions for which to create the function. If a single expression is given, the functionwill return a single value; if a list is given, the result will be a tuple containing oneelement for each. An expression can either be a Theano expression or a string. In thelatter case, the corresponding expression will be retrieved from .exprs.

mode : string or None, optional, default: None

Mode to use for compilation. Passed on to theano.function. See Theano docu-mentation for details. If None, self.mode will be used.

explicit_pars: boolean, optional, default: False

If True, the first argument to the function is expected to be an array representing theadaptable parameters of the model.

givens : dictionary, optional, default: None

Dictionary of substitutions for compilation. Not passed on to theano.function,instead the expressions are cloned. See code for further details.

on_unused_input: string

Specifiy behaviour in case of unused inputs. Passed on to theano.function. SeeTheano documentation for details.

numpy_result : boolean, optional, default: False

If set to True, a numpy array is always returned, even if the computation is done on theGPU and a gnumpy array was more natural.



var_exp_for_gpu(variables, exprs, outputs=True)Given variables and theano expressions built from these variables, return variables and expressions of thesame form that are tailored towards GPU usage.

class breze.arch.util.ParameterSetParameterSet class.

This class provides functionality to group several Theano tensors of different sizes in a consecutive chunk ofmemory. The main aim of this is to allow a view on several tensors as a single long vector.

In the following, a (parameter) array refers to a concrete instantiation of a parameter variable (with concretevalues) while a (parameter) tensor/variable refers to the symbolic Theano variable.

Initialization takes a variable amount of keyword arguments, where each has to be a single integer or a tuple ofarbitrary length containing only integers. For each of the keyword argument keys a tensor of the shape given bythe value will be created. The key is the identifier of that variable.

All symbolic variables can be accessed as attributes of the object, all concrete variables as keys. E.g. parame-ter_set.x references the symbolic variable, while parameter_set[’x’] will give you the concrete array.

Attributes

n_pars (integer) Total amount of parameters.flat (Theano vector) Flat one dimensional tensor containing all the different tensors flattened out.

Symbolic pendant to data.data (array_like) Concrete array containig all the different arrays flattened out. Concrete pendant to

flat.views (dict) All parameter arrays can be accessed by with their identifier as key in this dictionary.

Methods

allocdeclareview

Nested Lists for Theano, etc.

breze.arch.util.flatten(nested)Flatten nested tuples and/or lists into a flat list.

breze.arch.util.unflatten(tmpl, flat)Nest the items in flat into the shape of tmpl.

breze.arch.util.theano_function_with_nested_exprs(variables, exprs, *args, **kwargs)Creates and returns a theano.function that takes values for variables as arguments, wherevariables‘ may contain nested lists and/or tuples, and returns values for‘‘exprs, where again exprs may contain nested lists and/or tuples.

All other arguments are passed to theano.function without modification.

breze.arch.util.theano_expr_bfs(expr)Generator function to walk a Theano expression graph in breadth first.

breze.arch.util.tell_deterministic(expr)Return True iff no random number generator is in the expression graph.

4.10. Utilities 37


GPU related utilities

breze.arch.util.cpu_tensor_to_gpu(tensor)Given a tensor for the CPU return a tensor of the same type and name for the GPU.

breze.arch.util.cpu_tensor_to_gpu_nested(inpts, cache=None)Given a list (of lists of...) CPU tensor variables return as list of the same types of corresponding GPU tensorvaraibles.

Also return a dictionary containing all substitutions done. This can be provided to future calls to not makeconversions multiple times.

breze.arch.util.cpu_expr_to_gpu(expr, unsafe=False)Given a CPU expr return the same expression for the GPU.

If unsafe is set to True, subsequent function calls evaluating the expression might return arrays pointing at thesame memory region.

breze.arch.util.cpu_expr_to_gpu_nested(inpts, unsafe=False)Given a list (of lists of...) expressions, return expressions for the GPU.

If unsafe is set to True, subsequent function calls evaluating the expression might return arrays pointing at thesame memory region.

breze.arch.util.garray_to_cudandarray_nested(lst)

breze.arch.util.gnumpy_func_wrap(f)Wrap a function that accepts and returns CudaNdArrays to accept and return gnumpy arrays.

Other

breze.arch.util.get_named_variables(dct, name=True, overwrite=False, prefix=’‘)Return a dictionary with all the items from dct with only Theano variables/expressions.

If name is set to True, the variables will be named accordingly, however not be overwritten unless overwriteis True as well.

breze.arch.util.lookup(what, where, default=None)Return where.what if what is a string, otherwise what. If not found return default.

breze.arch.util.lookup_some_key(what, where, default=None)Given a list of keys what, return the first of those to which there is an item in where.

If nothing is found, return default.

For variance propagation:

Common functions

breze.arch.component.varprop.common.supervised_loss(target, prediction, loss, co-ord_axis=1, imp_weight=False,prefix=’‘)

Return a dictionary populated with several expressions for a supervised loss and corresponding targets andpredictions.

Version for variance propagation, where the prediction is not only a point but a mean with a variance.




Array representing the target variables. Has size d along the coordinate axiscoord_axis.


Array representing the predictions. Has size 2 * d along the coordinate axis, wherethe first half corresponds to the mean and the second half to the variance of the predic-tion.


If a string, should index a member of breze.arch.component.loss.If a callable, has to be a of the form described inbreze.arch.component.varprop.loss.



imp_weight : Theano variable, float or boolean, optional [default: False]

Importance weights for the loss. Will be multiplied to the coordinate wise loss.



Returns res : dict


Examples

>>> import theano.tensor as T>>> prediction, target = T.matrix('prediction'), T.matrix('target')>>> from breze.arch.component.varprop.loss import diag_gaussian_nll>>> loss_dict = supervised_loss(target, prediction, diag_gaussian_nll,... prefix='mymodel-')>>> sorted(loss_dict.items())[('mymodel-loss', ...), ('mymodel-loss_coord_wise', ...), ('mymodel-loss_sample_wise', ...), ('mymodel-prediction', prediction), ('mymodel-target', target)]

breze.arch.component.varprop.common.unsupervised_loss(output, loss, coord_axis=1,prefix=’‘)

Return a dictionary populated with several expressions for a unsupervised loss and corresponding output.

Version for variance propagation, where the prediction is not only a point but a mean with a variance.

Parameters output : Theano variable

Array representing the output of the model. Has size 2 * d along the coordinate axis,where the first half corresponds to the mean and the second half to the variance of theprediction.


If a string, should index a member of breze.arch.component.loss.If a callable, has to be a of the form described inbreze.arch.component.varprop.loss.


4.11. Common functions 39





Returns res : dict


Examples

>>> import theano.tensor as T>>> output = T.matrix('output')>>> my_loss = lambda x: abs(x)>>> loss_dict = unsupervised_loss(output, my_loss, prefix='$')>>> sorted(loss_dict.items())[('$loss', ...), ('$loss_coord_wise', ...), ('$loss_sample_wise', ...), ('$output', ...)]


CHAPTER 5

Implementation Notes

Variance propagation

This package implements variance propagating networks.

If we really want to talk about neural networks in a probabilistic way, the right way to do it is to treat every number inthe network as a Dirac distributed value.

There have been numerous attempts to model the adaptable parameters of networks as random variables, leading to socalled “Bayesian Neural Networks”.

In some applications, it makes sense to treat the activations as random variables. This can be done very efficiently andwith a very good approximation for the mean and the variance of random variables.

The algorithm for this has initially been described in [FD] and been described in the context of RNNs in [FD-RNN].

References

Recurrent Networks

Module implementing variance propagation and fast dropout for recurrent networks.

In this module, we will often do with multiple sequences organized into a single Theano tensor. This tensor then hasthe shape of (t, n, d), where

• t is the number of time steps,

• n is the number of samples and

• d is the dimensionality of each sample.

We call these “sequence tensor”. Sometimes, it makes sense to flatten out the time dimension to apply better optimizedlinear algebra, such as a dot product. In that case, we will talk of a “flat sequence tensor”.

breze.arch.model.varprop.rnn.recurrent_layer(in_mean, in_var, weights, f, ini-tial_hidden_mean, initial_hidden_var,p_dropout)

Return a theano variable representing a recurrent layer.

Parameters in_mean : Theano variable

Sequence tensor of shape (t, n ,d). Represents the mean of the input to the layer.

in_var : Theano variable

41


Sequence tensor. Represents the variance of the input to the layer. Either (a) same shapeas the mean or (b) scalar.

weights : Theano variable

Theano matrix of shape (d, d). Represents the recurrent weight matrix the hiddensare right multiplied with.

f : function

Function that takes a theano variable and returns a theano variable of the same shape.Meant as transfer function of the layer.

initial_hidden : Theano variable

Theano vector of size d, representing the initial hidden state.

p_dropout : Theano variable

Scalar representing the probability that unit is dropped out.

Returns hidden_in_mean_rec : Theano variable

Theano sequence tensor representing the mean of the hidden activations before the ap-plication of f.

hidden_in_var_rec : Theano variable

Theano sequence tensor representing the varianceof the hidden activations before theapplication of f.

hidden_mean_rec : Theano variable

Theano sequence tensor representing the mean of the hidden activations after the appli-cation of f.

hidden_var_rec : Theano variable

Theano sequence tensor representing the varianceof the hidden activations after the ap-plication of f.

Transfer functions

Module that contains transfer functions for variance propagation, working on Theano variables.

Each transfer function has the signature:

m2, s2 = f(m1, s1)

where f is the transfer function, m1 and s2 are the pre-synaptic mean and variance respectively; m2 and s2 are thepost-synaptic means.

breze.arch.component.varprop.transfer.identity(mean, var)Return the mean and variance unchanged.

Parameters mean : Theano variable

Theano variable of the shape s.

var : Theano variable


Returns mean_ : Theano variable

Theano variable of the shape r.

42 Chapter 5. Implementation Notes


var_ : Theano variable


breze.arch.component.varprop.transfer.sigmoid(mean, var)Return the mean and variance of a Gaussian distributed random variable, described by its mean and variacne,after passing it through a logistic sigmoid.









breze.arch.component.varprop.transfer.rectifier(mean, var)Return the mean and variance of a Gaussian distributed random variable, described by its mean and variacne,after passing it through a rectified linear unit.









breze.arch.component.varprop.transfer.tanh(mean, var)Return the mean and variance of a Gaussian distributed random variable, described by its mean and variacne,after passing it through a tangent hyperbolicus.









5.1. Variance propagation 43


Losses

Module containing several losses usable for supervised and unsupervised training. This is different frombreze.component.loss in the sense that each prediction is also assumed to have a variance.

The losses in this module assume two inputs: a target and a prediction. Additionally, if the target has a dimensionalityof D, the prediction is assumed to have a dimensionality of 2D. The first D element constitute to the mean while thelatter to the variance.

Additionally, all losses from breze.arch.component.loss are also available; here, we just ignore the variancepart of the input to the loss.

44 Chapter 5. Implementation Notes

CHAPTER 6

Indices and tables

• genindex

• modindex

• search

45


46 Chapter 6. Indices and tables

Bibliography

[XCA] Extreme component analysis, Welling et al (2003)

[LFRKM] Learning Feature Representations with K-means, Adam Coates (2012)

[R1] Xu, Zhixiang Eddie, Kilian Q. Weinberger, and Fei Sha. “Rapid feature learning with stacked linear denoisers.”arXiv preprint arXiv:1105.0972 (2011).

[FD] Wang, Sida, and Christopher Manning. “Fast dropout training.” Proceedings of the 30th International Conferenceon Machine Learning (ICML-13). 2013.

[FD-RNN] Bayer, Justin, et al. “On Fast Dropout and its Applicability to Recurrent Networks.” arXiv preprintarXiv:1311.0701 (2013).

47


48 Bibliography

Python Module Index

bbreze.arch.component.common, 31breze.arch.component.corrupt, 28breze.arch.component.distributions.mvn,

34breze.arch.component.distributions.normal,

32breze.arch.component.layer, 30breze.arch.component.loss, 25breze.arch.component.misc, 29breze.arch.component.norm, 21breze.arch.component.varprop.common, 38breze.arch.component.varprop.loss, 44breze.arch.component.varprop.transfer,

42breze.arch.model.varprop, 41breze.arch.model.varprop.rnn, 41breze.learn.data, 17breze.learn.feature, 15breze.learn.feature.emg, 15breze.learn.kmeans, 10breze.learn.lde, 12breze.learn.pca, 5breze.learn.sfa, 9breze.learn.trainer.score, 13breze.learn.xca, 7

49


50 Python Module Index

Index

Symbols__init__() (breze.learn.lde.LinearDenoiser method), 12__init__() (breze.learn.pca.Pca method), 5__init__() (breze.learn.pca.Zca method), 6__init__() (breze.learn.sfa.SlowFeatureAnalysis method),

9__init__() (breze.learn.trainer.score.MinibatchScore

method), 14__init__() (breze.learn.xca.Xca method), 8

Aabsolute() (in module breze.arch.component.loss), 25

Bbern_bern_kl() (in module breze.arch.component.loss),

26bern_ces() (in module breze.arch.component.loss), 26breze.arch.component.common (module), 31breze.arch.component.corrupt (module), 28breze.arch.component.distributions.mvn (module), 34breze.arch.component.distributions.normal (module), 32breze.arch.component.layer (module), 30breze.arch.component.loss (module), 25breze.arch.component.misc (module), 29breze.arch.component.norm (module), 21breze.arch.component.transfer (module), 23breze.arch.component.varprop.common (module), 38breze.arch.component.varprop.loss (module), 44breze.arch.component.varprop.transfer (module), 42breze.arch.model.varprop (module), 41breze.arch.model.varprop.rnn (module), 41breze.learn.data (module), 17breze.learn.feature (module), 15breze.learn.feature.emg (module), 15breze.learn.kmeans (module), 10breze.learn.lde (module), 12breze.learn.pca (module), 5breze.learn.sfa (module), 9breze.learn.trainer.score (module), 13breze.learn.xca (module), 7

Ccat_ce() (in module breze.arch.component.loss), 25cat_entropy() (in module breze.arch.component.misc), 30cca() (in module breze.learn.cca), 9cdf() (in module breze.arch.component.distributions.normal),

33collapse() (in module breze.learn.data), 17collapse_seq_borders() (in module breze.learn.data), 17consecutify() (in module breze.learn.data), 17cpu_expr_to_gpu() (in module breze.arch.util), 38cpu_expr_to_gpu_nested() (in module breze.arch.util), 38cpu_tensor_to_gpu() (in module breze.arch.util), 38cpu_tensor_to_gpu_nested() (in module breze.arch.util),

38

Ddistance_matrix() (in module

breze.arch.component.misc), 29distance_matrix_by_diff() (in module

breze.arch.component.misc), 30drlim() (in module breze.arch.component.loss), 27

Ffit() (breze.learn.kmeans.GainShapeKMeans method), 11fit() (breze.learn.lde.LinearDenoiser method), 12fit() (breze.learn.pca.Pca method), 5fit() (breze.learn.pca.Zca method), 6fit() (breze.learn.sfa.SlowFeatureAnalysis method), 9fit() (breze.learn.xca.Xca method), 8flatten() (in module breze.arch.util), 37function() (breze.arch.util.Model method), 36

GGainShapeKMeans (class in breze.learn.kmeans), 10garray_to_cudandarray_nested() (in module

breze.arch.util), 38gaussian_perturb() (in module

breze.arch.component.corrupt), 28get_named_variables() (in module breze.arch.util), 38gnumpy_func_wrap() (in module breze.arch.util), 38

51


Hhinton() (in module breze.learn.display), 19

Iidentity() (in module breze.arch.component.varprop.transfer),

42integrated() (in module breze.learn.feature.emg), 15interleave() (in module breze.learn.data), 17interpolate() (in module breze.learn.data), 17inverse_transform() (breze.learn.pca.Pca method), 5inverse_transform() (breze.learn.pca.Zca method), 7inverse_transform() (breze.learn.xca.Xca method), 8iter_windows() (in module breze.learn.data), 17

Ll1() (in module breze.arch.component.norm), 21l2() (in module breze.arch.component.norm), 21LinearDenoiser (class in breze.learn.lde), 12logpdf() (in module breze.arch.component.distributions.mvn),

34lookup() (in module breze.arch.util), 38lookup_some_key() (in module breze.arch.util), 38lp() (in module breze.arch.component.norm), 22

Mmask() (in module breze.arch.component.corrupt), 29mean_absolute_value() (in module

breze.learn.feature.emg), 15mean_absolute_value_slope() (in module

breze.learn.feature.emg), 16MinibatchScore (class in breze.learn.trainer.score), 13Model (class in breze.arch.util), 35modified_mean_absolute_value_1() (in module

breze.learn.feature.emg), 15modified_mean_absolute_value_2() (in module

breze.learn.feature.emg), 15

Nncac() (in module breze.arch.component.loss), 27ncar() (in module breze.arch.component.loss), 27ncat_ce() (in module breze.arch.component.loss), 26

Ppadzeros() (in module breze.learn.data), 17pairwise_diff() (in module breze.arch.component.misc),

29ParameterSet (class in breze.arch.util), 37Pca (class in breze.learn.pca), 5pdf() (in module breze.arch.component.distributions.mvn),

34pdf() (in module breze.arch.component.distributions.normal),

32

project_into_l2_ball() (in modulebreze.arch.component.misc), 30

Rrbf() (in module breze.learn.feature), 15reconstruct() (breze.learn.pca.Pca method), 6reconstruct() (breze.learn.pca.Zca method), 7reconstruct() (breze.learn.xca.Xca method), 8rectifier() (in module breze.arch.component.transfer), 24rectifier() (in module breze.arch.component.varprop.transfer),

43recurrent_layer() (in module

breze.arch.model.varprop.rnn), 41root_mean_square() (in module breze.learn.feature.emg),

16

Ssample() (in module breze.learn.sampling.hmc), 13scatterplot_matrix() (in module breze.learn.display), 18shuffle() (in module breze.learn.data), 17sigmoid() (in module breze.arch.component.transfer), 23sigmoid() (in module breze.arch.component.varprop.transfer),

43simple() (in module breze.arch.component.layer), 30simple() (in module breze.learn.trainer.score), 13skip() (in module breze.learn.data), 17slope_sign_change() (in module breze.learn.feature.emg),

16SlowFeatureAnalysis (class in breze.learn.sfa), 9soft_l1() (in module breze.arch.component.norm), 22softmax() (in module breze.arch.component.transfer), 24softplus() (in module breze.arch.component.transfer), 24softsign() (in module breze.arch.component.transfer), 24split() (in module breze.learn.data), 17squared() (in module breze.arch.component.loss), 25supervised_loss() (in module

breze.arch.component.common), 31supervised_loss() (in module

breze.arch.component.varprop.common),38

Ttanh() (in module breze.arch.component.transfer), 23tanh() (in module breze.arch.component.varprop.transfer),

43tanhplus() (in module breze.arch.component.transfer), 23tell_deterministic() (in module breze.arch.util), 37theano_expr_bfs() (in module breze.arch.util), 37theano_function_with_nested_exprs() (in module

breze.arch.util), 37time_series_filter_plot() (in module breze.learn.display),

18transform() (breze.learn.kmeans.GainShapeKMeans

method), 11

52 Index


transform() (breze.learn.lde.LinearDenoiser method), 12transform() (breze.learn.pca.Pca method), 6transform() (breze.learn.pca.Zca method), 7transform() (breze.learn.sfa.SlowFeatureAnalysis

method), 10transform() (breze.learn.xca.Xca method), 8

Uuncollapse() (in module breze.learn.data), 17uncollapse_seq_borders() (in module breze.learn.data),

17unflatten() (in module breze.arch.util), 37uninterleave() (in module breze.learn.data), 17unsupervised_loss() (in module

breze.arch.component.common), 32unsupervised_loss() (in module

breze.arch.component.varprop.common),39

Vvar_exp_for_gpu() (breze.arch.util.Model method), 36variance() (in module breze.learn.feature.emg), 16

Wwillison_amplitude() (in module

breze.learn.feature.emg), 16windowify() (in module breze.learn.data), 17

XXca (class in breze.learn.xca), 7

ZZca (class in breze.learn.pca), 6zero_crossing() (in module breze.learn.feature.emg), 16

Index 53

Documents

breze Documentation - Read the Docs · CHAPTER 1 Basics Specifiying losses, norms, transfer functions etc. To maintain flexibility and conciseness, configuring models can be achieved