30
Simple and Efficient Learning with Automatic Operation Batching Graham Neubig http://dynet.io/autobatch/ joint work w/ Yoav Goldberg and Chris Dyer in https://github.com/neubig/howtocode-2017

Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

  • Upload
    hatu

  • View
    230

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Simple and Efficient Learning with Automatic

Operation BatchingGraham Neubig

http://dynet.io/autobatch/

joint work w/ Yoav Goldberg and Chris Dyer

in https://github.com/neubig/howtocode-2017

Page 2: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Neural Networks w/ Complicated Structures

Phrases

Words Sentences

Alice gave a message to Bob

PPNP

VP

VP

S

Dynamic Decisionsa=1 a=1 a=2

Page 3: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Neural Net Programming Paradigms

Page 4: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

What is Necessary for Neural Network Training

• define computation

• add data

• calculate result (forward)

• calculate gradients (backward)

• update parameters

Page 5: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Paradigm 1: Static Graphs(Tensorflow, Theano)

• define

• for each data point:

• add data

• forward

• backward

• update

Page 6: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Advantages/Disadvantages of Static Graphs

• Advantages:

• Can be optimized at definition time

• Easy to feed data to GPUs, etc., via data iterators

• Disadvantages:

• Difficult to implement nets with varying structure (trees, graphs, flow control)

• Need to learn big API that implements flow control in the “graph” language

Page 7: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Paradigm 2: Dynamic+Eager Evaluation

(PyTorch, Chainer)

• for each data point:

• define/add data/forward

• backward

• update

Page 8: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Advantages/Disadvantages of Dynamic+Eager Evaluation• Advantages:

• Easy to implement nets with varying structure, API is closer to standard Python/C++

• Easy to debug because errors occur immediately • Disadvantages:

• Cannot be optimized at definition time • Hard to serialize graphs w/o program logic,

decide device placement, etc.

Page 9: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Paradigm 3: Dynamic+Lazy Evaluation (DyNet)

• for each data point:

• define/add data

• forward

• backward

• update

Page 10: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Advantages/Disadvantages of Dynamic+Lazy Evaluation• Advantages:

• Easy to implement nets with varying structure,API is closer to standard Python/C++

• Can be optimized at definition time (this presentation!)

• Disadvantages:• Harder to debug because errors occur immediately • Still hard to serialize graphs w/o program logic,

decide device placement, etc.

Page 11: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Efficiency Tricks: Operation Batching

Page 12: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Efficiency Tricks: Mini-batching

• On modern hardware 10 operations of size 1 is much slower than 1 operation of size 10

• Minibatching combines together smaller operations into one big one

Page 13: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Minibatching

Page 14: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Manual Mini-batching• DyNet has special minibatch operations for lookup

and loss functions, everything else automatic

• You need to:

• Group sentences into a mini batch (optionally, for efficiency group sentences by length)

• Select the “t”th word in each sentence, and send them to the lookup and loss functions

Page 15: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Example Task: Sentiment

I hate this movie

I love this movie

I do n’t hate this movie

very good good

neutral bad

very bad

very good good

neutral bad

very bad

very good good

neutral bad

very bad

Page 16: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Continuous Bag of Words (CBOW)

I hate this movie

+

bias

=

scores

+ + +

lookup lookup lookuplookup

W

=

Page 17: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

I

Batching CBOW

I hate this movie

+ + +

lookup lookup lookuplookup

love that movie

Page 18: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Mini-batched Code Example

Page 19: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Mini-batching Sequencesthis is an example </s>this is another </s> </s>

PaddingLoss Calculation

Mask

11� 1

1� 11� 1

1� 10�

Take Sum

Page 20: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Bi-directional LSTMI hate this movie

+

bias

=

scores

W

LSTM LSTM LSTM LSTM

LSTM LSTM LSTM LSTM

concat

Page 21: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Tree-structured RNN/LSTMI hate this movie

+

bias

=

scores

W

RNN

RNN

RNN

Page 22: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

And What About These?

Phrases

Words Sentences

Alice gave a message to Bob

PPNP

VP

VP

S

Dynamic Decisionsa=1 a=1 a=2

Page 23: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Automatic Operation Batching

Page 24: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Automatic Mini-batching!

• Innovatd by TensorFlow Fold (faster than unbatched, but implementation relatively complicated)

• DyNet Autobatch (basically effortless implementation)

Page 25: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Programming Paradigm

for minibatch in training_data: loss_values = [] for x, y in minibatch: loss_values.append(calculate_loss(x,y)) loss_sum = sum(loss_values) loss_sum.forward() loss_sum.backward() trainer.update()

Just write a for loop!

Batching occurs here

Page 26: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Under the Hood• Each node has “profile”, same profile → batchable

• Batch and execute items with their dependencies satisfied

Page 27: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Challenges

• This goes in your training loop: must be blazing fast!

• DyNet’s C++ implementation is highly optimized

• Profiles stored as hash functions

• Minimize memory allocation overhead

Page 28: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Synthetic Experiments• Fixed-length RNN → ideal case for manual batching

• How close can we get?

Page 29: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Real NLP Tasks• Variably Lengthed RNN, RNN w/ character

embeddings, tree LSTM, dependency parser

Page 30: Simple and Efficient Learning with Automatic Operation …phontron.com/slides/neubig17howtocode.pdf · Dynamic Decisions a=1 a=1 a=2. ... of Static Graphs ... • Variably Lengthed

Let’s Try it Out!http://dynet.io/autobatch/

https://github.com/neubig/howtocode-2017