100
Multi-Dimensional RNNs Grigory Sapunov Moscow Computer Vision Group Moscow, Yandex, 27.04.2016 [email protected]

Multidimensional RNN

Embed Size (px)

Citation preview

Page 1: Multidimensional RNN

Multi-Dimensional RNNsGrigory Sapunov

Moscow Computer Vision GroupMoscow, Yandex, 27.04.2016

[email protected]

Page 2: Multidimensional RNN

Outline● RNN intro

○ RNN vs. FFNN○ LSTM/GRU○ CTC

● Some architectural issues○ BRNN○ MDRNN○ MDMDRNN○ HSRNN

● Some recent results○ (same ideas) 2D LSTM, CLSTM, C-RNN, C-HRNN○ (new ideas) ReNet, PyraMiD-LSTM, Grid LSTM

● (Discussion) RNN vs CNN for Computer Vision● Resources

Page 3: Multidimensional RNN

A tiny intro into RNNs

Page 4: Multidimensional RNN

Feedforward NN vs. Recurrent NN

Recurrent neural networks (RNNs) allow cyclical connections.

Page 5: Multidimensional RNN

Unfolding the RNN and training using BPTT

Can do backprop on the unfolded network: Backpropagation through time (BPTT)http://ir.hit.edu.cn/~jguo/docs/notes/bptt.pdf

Page 6: Multidimensional RNN

Neural Network propertiesFeedforward NN (FFNN):

● FFNN is a universal approximator: feed-forward network with a single hidden layer, which contains finite number of hidden neurons, can approximate continuous functions on compact subsets of Rn, under mild assumptions on the activation function.

● Typical FFNNs have no inherent notion of order in time. They remember only training.

Recurrent NN (RNN):

● RNNs are Turing-complete: they can compute anything that can be computed and have the capacity to simulate arbitrary procedures.

● RNNs possess a certain type of memory. They are much better suited to dealing with sequences, context modeling and time dependencies.

Page 7: Multidimensional RNN

RNN problem: Vanishing gradients

Solution: Long short-term memory (LSTM, Hochreiter, Schmidhuber, 1997)

Page 8: Multidimensional RNN

LSTM cell

Page 9: Multidimensional RNN

LSTM network

Page 10: Multidimensional RNN

LSTM: Fixing vanishing gradient problem

Page 11: Multidimensional RNN

Comparing LSTM and Simple RNN

More on LSTMs: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 12: Multidimensional RNN

Another solution: Gated Recurrent Unit (GRU)

GRU (Cho et al., 2014) is a bit simpler than LSTM (less weights)

Page 13: Multidimensional RNN

Another useful thing: CTC Output LayerCTC (Connectionist Temporal Classification; Graves, Fernández, Gomez, Schmidhuber, 2006) was specifically designed for temporal classification tasks; that is, for sequence labelling problems where the alignment between the inputs and the target labels is unknown.

CTC models all aspects of the sequence with a single neural network, and does not require the network to be combined with a hidden Markov model. It also does not require presegmented training data, or external post-processing to extract the label sequence from the network outputs.

The CTC network predicts only the sequence of phonemes (typically as a series of spikes, separated by ‘blanks’, or null predictions), while the framewise network attempts to align them with the manual segmentation.

Page 14: Multidimensional RNN

Example: CTC vs. Framewise classification

Page 15: Multidimensional RNN

End of IntroSo, further we will not make a distinction between RNN/GRU/LSTM, and will usually be using the word RNN for any kind of internal block. Typically most RNNs now are actually LSTMs.

Significant part of the presentation is based on works of Alex Graves et al.

Page 16: Multidimensional RNN

Some interesting generalizations of simple RNN architecture

Page 17: Multidimensional RNN

#1 Directionality (BRNN/BLSTM)

Page 18: Multidimensional RNN

Bidirectional RNN/LSTMThere are many situations when you see the whole sequence at once (OCR, speech recognition, translation, caption generation, …).

So you can scan the [1-d] sequence in both directions, forward and backward.

Here comes BLSTM (Graves, Schmidhuber, 2005).

Page 19: Multidimensional RNN

BLSTM

Page 20: Multidimensional RNN

Typical result: BRNN>RNN, LSTM>RNN, BLSTM>BRNN

Page 21: Multidimensional RNN

Typical result: BRNN>RNN, LSTM>RNN, BLSTM>BRNN

Page 22: Multidimensional RNN

Typical result: BRNN>RNN, LSTM>RNN, BLSTM>BRNN

Page 23: Multidimensional RNN

Example: BLSTM classifying the utterance “one oh five”

Page 24: Multidimensional RNN

#2 Dimensionality (MDRNN/MDLSTM)

Page 25: Multidimensional RNN

Multidimensional RNN/LSTMStandard RNNs are inherently one dimensional, and therefore poorly suited to multidimensional data (e.g. images).

The basic idea of MDRNNs (Graves, Fernandez, Schmidhuber, 2007) is to replace the single recurrent connection found in standard RNNs with as many recurrent connections as there are dimensions in the data.

It assumes some ordering on the multidimensional data.

Page 26: Multidimensional RNN

MDRNN

The basic idea of MDRNNs is to replace the single recurrent connection found in standard RNNs with as many recurrent connections as there are dimensions

in the data.

Page 27: Multidimensional RNN

Uni-directionality

MDRNN assumes some ordering on the multidimensional data. And it’s not the only possible one.

Page 28: Multidimensional RNN

Uni-directionality

Page 29: Multidimensional RNN

#3 Directionality + Dimensionality (MDMDRNN?)

Page 30: Multidimensional RNN

Multidirectional multidimensional RNN (MDMDRNN?)The previously mentioned ordering is not the only possible one. It might be OK for some tasks, but it is usually preferable for the network to have access to the surrounding context in all directions. This is particularly true for tasks where precise localisation is required, such as image segmentation.

For one dimensional RNNs, the problem of multidirectional context was solved by the introduction of bidirectional recurrent neural networks (BRNNs). BRNNs contain two separate hidden layers that process the input sequence in the forward and reverse directions.

BRNNs can be extended to n-dimensional data by using 2n separate hidden layers, each of which processes the sequence using the ordering defined above, but with a different choice of axes.

Page 31: Multidimensional RNN

Multi-directionality

Page 32: Multidimensional RNN

Multi-directionality

As before, the hidden layers are connected to a single output layer, which now has access to all surrounding context

Page 33: Multidimensional RNN

MDMDRNN example: Air Freight database (2007)A ray-traced colour image sequence that comes with a ground truth segmentation into the different textures mapped onto the 3-d models. The sequence is 455 frames (160x120 px) long and contains 155 distinct textures.

Page 34: Multidimensional RNN

MDMDRNN example: Air Freight database (2007)Network structure:

● Multidirectional 2D LSTM.● 4 layers (not levels! just 4 directional layers on a single level) consisted of 25

memory blocks, each containing 1 cell, 2 forget gates, 1 input gate, 1 output gate and 5 peephole weights.

● The input and output activation function of the cells was tanh, and the activation function for the gates was the logistic sigmoid.

● The input layer was size 3 (RGB) and the output layer (softmax) was size 155 (one unit for each texture).

● The network contained 43,257 trainable weights in total.● The final pixel classification error rate, after 330 training epochs, was 7.1% on

the test set.

Page 35: Multidimensional RNN

MDMDRNN example: Air Freight database (2007)

Page 36: Multidimensional RNN

MDMDRNN example: MNIST (2007)

Additional evaluation on the warped dataset (not used in training at all)

Page 37: Multidimensional RNN

MDMDRNN example: MNIST (2007)

Page 38: Multidimensional RNN

#4 Hierarchical subsampling (HSRNN)

Page 39: Multidimensional RNN

Hierarchical Subsampling Networks (HSRNN)So-called hierarchical subsampling is commonly used in fields such as computer vision where the volume of data is too great to be processed by a ‘flat’ architecture. As well as reducing computational cost, it also reduces the effective dispersal of the data, since inputs that are widely separated at the bottom of the hierarchy are transformed to features that are close together at the top.

A hierarchical subsampling recurrent neural network (HSRNN, Graves and Schmidhuber, 2009) consists of an input layer, an output layer and multiple levels of recurrently connected hidden layers. The output sequence of each level in the hierarchy is used as the input sequence for the next level up. All input sequences are subsampled using subsampling windows of predetermined width. The structure is similar to that used by convolutional networks, except with recurrent, rather than feedforward, hidden layers.

Page 40: Multidimensional RNN

HSRNN

Page 41: Multidimensional RNN

HSRNNFor each layer in the hierarchy, the forward pass equations are identical to those for a standard RNN, except that the sum over input units is replaced by a sum of sums over the subsampling window.

A good rule of thumb is to choose the layer sizes so that each level consumes roughly half the processing time of the level below.

Page 42: Multidimensional RNN

HSRNN

Page 43: Multidimensional RNN

HSRNNCan be easily extended into multidimensional and multidirectional case.

The problem is that each level of the hierarchy requires 2n hidden layers instead of one. To connect every layer at one level to every layer at the next therefore requires O(22n) weights.

One way to reduce the number of weights is to separate the levels with nonlinear feedforward layers, which reduces the number of weights between the levels to O(2n)—the same as standard MDRNNs.

As a rule of thumb, giving each feedforward layer between half and one times as many units as the combined hidden layers in the level below appears to work well in practice.

Page 44: Multidimensional RNN

HSRNN

Page 45: Multidimensional RNN

HSRNN example: Arabic handwriting recognitionNetwork structure:

● The hierarchy contained three levels, multidirectional MDLSTM (so 4 hidden layers for 2D data).

● The three levels were separated by two feedforward layers with the tanh activation function.

● Subsampling windows were applied in three places: to the input sequence, to the output sequence of the first hidden level, and to the output sequence of the second hidden level.

Page 46: Multidimensional RNN

HSRNN

Page 47: Multidimensional RNN

Offline arabic handwriting recognition (2009)● 32,492 black-and-white images of individual handwritten Tunisian town and

village names, of which we used 30,000 for training, and 2,492 for validation● Each image was supplied with a manual transcription for the individual

characters, and the postcode of the corresponding town. There were 120 distinct characters in total

● The task was to identify the postcode, from a list of 937 town names and corresponding postcodes. Many of the town names had transcription variants, giving a total of 1,518 entries in the complete postcode lexicon.

● The test data (which is not published) was divided into sets ‘f’ and ‘s’. The main competition results were based on set ‘f’. Set ‘s’ contains data collected in the United Arab Emirates using the same forms; its purpose was to test the robustness of the recognisers to regional writing variations

Page 48: Multidimensional RNN

Offline arabic handwriting recognition (2009)

Page 49: Multidimensional RNN

Offline arabic handwriting recognition (2009)

Page 50: Multidimensional RNN

Some more recent examples using the same ideas

Page 51: Multidimensional RNN

Example #1: Scene Labeling with 2D LSTM

(CVPR 2015)

http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Byeon_Scene_Labeling_With_2015_CVPR_paper.pdf

Page 52: Multidimensional RNN

2D LSTM for Scene Labeling (2015)

Page 53: Multidimensional RNN

2D LSTM for Scene Labeling (2015)

Scene Labeling with LSTM Recurrent Neural Networks / CVPR 2015,http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Byeon_Scene_Labeling_With_2015_CVPR_paper.pdf

Page 54: Multidimensional RNN

2D LSTM for Scene Labeling (2015)

Scene Labeling with LSTM Recurrent Neural Networks / CVPR 2015http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Byeon_Scene_Labeling_With_2015_CVPR_paper.pdf

Page 55: Multidimensional RNN

Example #2: Convolutional LSTM (CLSTM)

(ILSVRC 2015)

http://image-net.org/challenges/posters/ILSVRC2015_Poster_VUNO.pdf

Page 56: Multidimensional RNN

Convolutional LSTM (CLSTM) (2015)Actually an LSTM over the last layers of CNN.

“Among various models, multi-dimensional recurrent neural network, specifically multi-dimensional long-short term memory (MD-LSTM) has shown promising results and can be naturally integrated and trained ‘end-to-end’ fashion. However, when we try to learn the structure with very low level representation such as input pixel level, the dependency structure can be too noisy or spatially long-term dependency information can be vanished while training. Therefore, we propose to use 2D-LSTM layer on top of convolutional layers by taking advantage of convolution layers to extract high level representation of the image and 2D-LSTM layer to learn global spatial dependencies. We call this network as convolutional LSTM (CLSTM)”

Page 57: Multidimensional RNN

Convolutional LSTM (CLSTM) (2015)

“Our CLSTM models are constructed by replacing the last two convolution layer of CNN with two 2D LSTM layers. Since we used multidirectional 2D LSTM, there are 2^2 directional nodes for each location of feature map.”

Page 58: Multidimensional RNN

Example #3: Convolutional RNN (C-RNN)

(CVPR 2015)

http://www.cv-foundation.org/openaccess/content_cvpr_workshops_2015/W03/papers/Zuo_Convolutional_Recurrent_Neural_2015_CVPR_paper.pdf

Page 59: Multidimensional RNN

Convolutional RNN (C-RNN) (2015)“The C-RNN is trained in an end-to-end manner from raw pixel images. CNN layers are firstly processed to generate middle level features. RNN layer is then learned to encode spatial dependencies.”

“In [13], MDLSTM was proposed to solve the handwriting recognition problem by using RNN. Different from this work, we utilize quad-directional 1D RNN instead of their 2D RNN, our RNN is simpler and it has fewer parameters, but it can already cover the context from all directions. Moreover, our C-RNN make both use of the discriminative representation power of CNN and contextual information modeling capability of RNN, which is more powerful for solving large scale image classification problem.”

Funny, it’s not an LSTM. Just simple RNN.

Page 60: Multidimensional RNN

Convolutional RNN (C-RNN) (2015)

Page 61: Multidimensional RNN

Convolutional RNN (C-RNN) (2015)

“Our C-RNN had the same settings with Alex-net, except that it directly connects the output of the fifth convolutional layer to the sixth fully connected layer, while our C-RNN uses the RNN to connect the fifth convolutional layer and the fully connected layers”

Page 62: Multidimensional RNN

Example #3B: Convolutional Hierarchical RNN (C-

HRNN) (2015)

http://arxiv.org/abs/1509.03877

Page 63: Multidimensional RNN

Convolutional hierarchical RNN (C-HRNN) (2015)“In Hierarchical RNNs (HRNNs), each RNN layer focuses on modeling spatial dependencies among image regions from the same scale but different locations. While the cross RNN scale connections target on modeling scale dependencies among regions from the same location but different scales.”

Finally with LSTM:

“Specifically, we propose two recurrent neural network models: 1) hierarchical simple recurrent network (HSRN), which is fast and has low computational cost; and 2) hierarchical long-short term memory recurrent network (HLSTM), which performs better than HSRN with the price of more computational cost.”

Page 64: Multidimensional RNN

Convolutional hierarchical RNN (C-HRNN) (2015)

Page 65: Multidimensional RNN

Convolutional hierarchical RNN (C-HRNN) (2015)

“Thus, inspired by [22], we generate “2D sequences” for images, and each element simultaneously receives spatial contextual references from its 2D neighborhood elements.”

Page 66: Multidimensional RNN

Convolutional hierarchical RNN (C-HRNN) (2015)

Page 67: Multidimensional RNN

Some more recent examples using new ideas

Page 68: Multidimensional RNN

Example #4: ReNet (2015)

[Francesco Visin, Kyle Kastner, Kyunghyun Cho, Matteo Matteucci, Aaron Courville, Yoshua Bengio]http://arxiv.org/abs/1505.00393

Page 69: Multidimensional RNN

ReNet (2015)“Our model relies on purely uni-dimensional RNNs coupled in a novel way, rather than on a multi-dimensional RNN. The basic idea behind the proposed ReNet architecture is to replace each convolutional layer (with convolution+pooling making up a layer) in the CNN with four RNNs that sweep over lower-layer features in different directions: (1) bottom to top, (2) top to bottom, (3) left to right and (4) right to left.”

“The main difference between ReNet and the model of Graves and Schmidhuber [2009] is that we use the usual sequence RNN, instead of the multidimensional RNN.“

Page 70: Multidimensional RNN

ReNet (2015)“One important consequence of the proposed approach compared to the multidimensional RNN is that the number of RNNs at each layer scales now linearly with respect to the number of dimensions d of the input image (2d). A multidimensional RNN, on the other hand, requires the exponential number of RNNs at each layer (2d). Furthermore, the proposed variant is more easily parallelizable, as each RNN is dependent only along a horizontal or vertical sequence of patches. This architectural distinction results in our model being much more amenable to distributed computing than that of Graves and Schmidhuber [2009]”.

… But for d=2 2d == 2d

“The main difference between ReNet and the model of Graves and Schmidhuber [2009] is that we use the usual sequence RNN, instead of the multidimensional RNN.“

Page 71: Multidimensional RNN

ReNet (2015)

Page 72: Multidimensional RNN

ReNet (2015)

Page 73: Multidimensional RNN

Example #5: “The Empire Strikes Back”PyraMiD-LSTM (2015)

[Marijn F. Stollenga, Wonmin Byeon, Marcus Liwicki, Juergen Schmidhuber]http://arxiv.org/abs/1506.07452

Page 74: Multidimensional RNN

PyraMiD-LSTM (2015)“Multi-Dimensional Recurrent NNs (MD-RNNs) can perceive the entire spatio-temporal context of each pixel in a few sweeps through all pixels, especially when the RNN is a Long Short-Term Memory (LSTM). Despite these theoretical advantages, however, unlike CNNs, previous MD-LSTM variants were hard to parallelize on GPUs. Here we re-arrange the traditional cuboid order of computations in MD-LSTM in pyramidal fashion. The resulting PyraMiD-LSTM is easy to parallelize, especially for 3D data such as stacks of brain slice images.”

Page 75: Multidimensional RNN

PyraMiD-LSTM (2015)

Page 76: Multidimensional RNN

PyraMiD-LSTM (2015)“One of the striking differences between PyraMiD-LSTM and MD-LSTM is the shape of the scanned contexts. Each LSTM of an MD-LSTM scans rectangle-like contexts in 2D or cuboids in 3D. Each LSTM of a PyraMiD-LSTM scans triangles in 2D and pyramids in 3D. An MD-LSTM needs 8 LSTMs to scan a volume, while a PyraMiD-LSTM needs only 6, since it takes 8 cubes or 6 pyramids to fill a volume. Given dimension d, the number of LSTMs grows as 2d for an MD-LSTM (exponentially) and 2 × d for a PyraMiD-LSTM (linearly).”

Page 77: Multidimensional RNN

PyraMiD-LSTM (2015)

Page 78: Multidimensional RNN

PyraMiD-LSTM (2015)

Page 79: Multidimensional RNN

PyraMiD-LSTM (2015)

Page 80: Multidimensional RNN

PyraMiD-LSTM (2015)

Page 81: Multidimensional RNN

PyraMiD-LSTM (2015)

Page 82: Multidimensional RNN

PyraMiD-LSTM (2015)On the MR brain dataset, training took around three days, and testing per volume took around 2 minutes. Networks contain three PyraMiD-LSTM layers:

1. 16 hidden units + fully-connected layer with 25 hidden units; 2. 32 hidden units + fully-connected layer with 45 hidden units; 3. 64 hidden units + fully-connected output layer whose size #classes.

“Previous MD-LSTM implementations, however, could not exploit the parallelism of modern GPU hardware. This has changed through our work presented here. Although our novel highly parallel PyraMiD-LSTM has already achieved state-of-the-art segmentation results in challenging benchmarks, we feel we have only scratched the surface of what will become possible with such PyraMiD-LSTM and other MD-RNNs.”

Page 83: Multidimensional RNN

Example #6: Grid LSTM (ICLR 2016)(Graves again!)

[Nal Kalchbrenner, Ivo Danihelka, Alex Graves]http://arxiv.org/abs/1507.01526

Page 84: Multidimensional RNN

Grid LSTM (2016)“This paper introduces Grid Long Short-Term Memory, a network of LSTM cells arranged in a multidimensional grid that can be applied to vectors, sequences or higher dimensional data such as images. The network differs from existing deep LSTM architectures in that the cells are connected between network layers as well as along the spatiotemporal dimensions of the data. The network provides a unified way of using LSTM for both deep and sequential computation.”

Page 85: Multidimensional RNN

Grid LSTM (2016)“Deep networks suffer from exactly the same problems as recurrent networks applied to long sequences: namely that information from past computations rapidly attenuates as it progresses through the chain – the vanishing gradient problem (Hochreiter, 1991) – and that each layer cannot dynamically select or ignore its inputs. It therefore seems attractive to generalise the advantages of LSTM to deep computation.”

Can be N-dimensional. N-dimensional Grid LSTM is called N-LSTM for short.

Page 86: Multidimensional RNN

Grid LSTM (2016)One-dimensional Grid LSTM corresponds to a feed-forward network that uses LSTM cells in place of transfer functions such as tanh and ReLU. These networks are related to Highway Networks (Srivastava et al., 2015) where a gated transfer function is used to successfully train feed-forward networks with up to 900 layers of depth.

Grid LSTM with two dimensions is analogous to the Stacked LSTM, but it adds cells along the depth dimension too.

Grid LSTM with three or more dimensions is analogous to Multidimensional LSTM, but differs from it not just by having the cells along the depth dimension, but also by using the proposed mechanism for modulating the N-way interaction that is not prone to the instability present in Multidimesional LSTM.

Page 87: Multidimensional RNN

Grid LSTM (2016)

Page 88: Multidimensional RNN

Grid LSTM (2016)

Page 89: Multidimensional RNN

Grid LSTM (2016)

Page 90: Multidimensional RNN

Grid LSTM (2016)

Page 91: Multidimensional RNN

Grid LSTM (2016)

The difference with the Multidimensional LSTM is that we apply multiple layers of depth to the image, use three-dimensional blocks and concatenate the top output vectors before classification.

The difference with the ReNet architecture is that the 3-LSTM processes the image according to the two inherent spatial dimensions; instead of stacking hidden layers as in the ReNet, the block also modulates directly what information is passed along the depth dimension.

Page 92: Multidimensional RNN

Time for Discussion:RNN vs. CNN for Computer Vision

Page 93: Multidimensional RNN

Resources

Page 94: Multidimensional RNN

Resources (The Classic)- Multi-Dimensional Recurrent Neural Networks,

Alex Graves, Santiago Fernandez, Juergen Schmidhuberhttps://arxiv.org/abs/0705.2011

- Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks / NIPS 2009,Alex Graves, Juergen Schmidhuber http://papers.nips.cc/paper/3449-offline-handwriting-recognition-with-multidimensional-recurrent-neural-networks

Page 95: Multidimensional RNN

Resources (The Classic)- Supervised Sequence Labelling with Recurrent Neural Networks

Alex Graves, Springer, 2012http://www.springer.com/us/book/9783642247965https://www.cs.toronto.edu/~graves/preprint.pdf

- RNNLIB https://sourceforge.net/projects/rnnl/- http://deeplearning.cs.cmu.edu/slides.2015/20.graves.pdf

Page 96: Multidimensional RNN

Resources (more recent)- Scene Labeling with LSTM Recurrent Neural Networks / CVPR 2015,

Wonmin Byeon, Thomas M. Breuel, Federico Raue, Marcus Liwicki http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Byeon_Scene_Labeling_With_2015_CVPR_paper.pdf

- Deep Convolutional and Recurrent Neural Network for Object Classification and Localization / ILSVRC 2015http://image-net.org/challenges/posters/ILSVRC2015_Poster_VUNO.pdf

- Convolutional Recurrent Neural Networks: Learning Spatial Dependencies for Image Representation / CVPR 2015 http://www.cv-foundation.org/openaccess/content_cvpr_workshops_2015/W03/papers/Zuo_Convolutional_Recurrent_Neural_2015_CVPR_paper.pdf

Page 97: Multidimensional RNN

Resources (more recent)- ReNet: A Recurrent Neural Network Based Alternative to Convolutional

NetworksFrancesco Visin, Kyle Kastner, Kyunghyun Cho, Matteo Matteucci, Aaron Courville, Yoshua Bengiohttp://arxiv.org/abs/1505.00393

- Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image SegmentationMarijn F. Stollenga, Wonmin Byeon, Marcus Liwicki, Juergen Schmidhuberhttp://arxiv.org/abs/1506.07452

- Grid Long Short-Term MemoryNal Kalchbrenner, Ivo Danihelka, Alex Graveshttp://arxiv.org/abs/1507.01526

Page 98: Multidimensional RNN

Resources (more recent)- OCRopus

https://github.com/tmbdev/ocropy https://github.com/tmbdev/clstm

The problem is MDRNNs are mostly unsupported. Not enough modern libraries available.

Page 99: Multidimensional RNN

More to come- Graph LSTM http://arxiv.org/pdf/1603.07063v1.pdf - Local-Global LSTM http://arxiv.org/pdf/1511.04510v1.pdf - …