52
30/08/2018 1 Source: rdn consulting CafeDSL, Aug 2018 Truyen Tran Deakin University @truyenoz truyentran.github.io [email protected] letdataspeak.blogspot.com goo.gl/3jJ1O0 Advances in Neural Turing Machines

Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

30/08/2018 1

Source: rdn consultingCafeDSL, Aug 2018

Truyen TranDeakin University @truyenoz

truyentran.github.io

[email protected]

letdataspeak.blogspot.com

goo.gl/3jJ1O0

Advances inNeural Turing Machines

Page 2: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

30/08/2018 2

https://twitter.com/nvidia/status/1010545517405835264

Page 3: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

(Real) Turing machine

30/08/2018 3

It is possible to invent a single machine which can be used to compute any computable sequence. If this machine U is supplied with the tape on the beginning of which is written the string of quintuples separated by semicolons of some computing machine M, then U will compute the same sequence as M.

Wikipedia

Page 4: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

30/08/2018 4

Can we learn from data a model that is as powerful as a Turing machine?

Page 5: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Agenda

Brief review of deep learningNeural Turing machine (NTM)Dual-controlling for read and write (PAKDD’18)Dual-view in sequences (KDD’18)Bringing variability in output sequences (NIPS’18 ?)Bringing relational structures into memory (IJCAI’17 WS+)

30/08/2018 5

Page 6: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

2016

Deep learning in a nutshell

http://blog.refu.co/wp-content/uploads/2009/05/mlp.png

1986

30/08/2018 62012

Page 7: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Let’s review current offerings

Feedforward nets (FFN)

Recurrent nets (RNN)

Convolutional nets (CNN)

Message-passing graph nets (MPGNN)

Universal transformer

…..

Work surprisingly well on LOTS of important problems

Enter the age of differentiable programming

30/08/2018 7

BUTS …

No storage of intermediate results.

Little choices over what to compute and what to use

Little support for complex chained reasoning

Little support for rapid switching of tasks

Page 8: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Searching for better priors

Translation invariance in CNN

Recurrence in RNN

Permutation invariance in attentions and graph neural networks

Memory for complex computation

Memory-augmented neural networks (MANN)

(LeCun, 2015)

Page 9: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

What is missing? A memoryUse multiple pieces of information

Store intermediate results (RAM like)

Episodic recall of previous tasks (Tape like)

Encode/compress & generate/decompress long sequences

Learn/store programs (e.g., fast weights)

Store and query external knowledge

Spatial memory for navigation

30/08/2018 9

Rare but important events (e.g., snake bite)

Needed for complex control

Short-cuts for ease of gradient propagation = constant path length

Division of labour: program, execution and storage

Working-memory is an indicator of IQ in human

Page 10: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Example: Code language model

10

Still needs a better memory for:

RepetitivenessE.g. for (int i = 0; i < n; i++)

LocalnessE.g. for (int size may appear more often that for (int i in some source files.

Very long sequence (big file, or char level)

Page 11: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Example: Electronic medical records

Three interwoven processes:Disease progressionInterventions & care processesRecording rules

30/08/2018 11

Source: medicalbillingcodings.org

visits/admissions

time gap ?

prediction point

Abstraction

Modelling

Need memory to handle thousands of events

Page 12: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

EMR visualisation

A prototype system developed iHops (our spin-off)

Page 13: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Conjecture: Healthcare is Turing computational

Healthcare processes as executable computer program obeying hidden “grammars”

The “grammars” are learnable through observational data

With “generative grammars”, entire health trajectory can be simulated.

30/08/2018 13

Get sick

See doctor

Enjoy life

Page 14: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Other possible applications of memory

Video captioning

QA, VQA

Machine translationMachine reading (stories, books, DNA)

Business process continuation

Software executionCode generation

30/08/2018 14

Graph as sequence of edges

Event sequences

Graph traversalAlgorithm learning (e.g., sort)

Dialog systems (e.g., chat bots)

Reinforcement learning agents

Page 15: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Neural Turing machine (NTM)

30/08/2018 15

Page 16: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

RNN: theoretically powerful, practically limited

ClassificationImage captioning

Sentence classification

Neural machine translation

Sequence labelling

Source: http://karpathy.github.io/assets/rnn/diags.jpeg30/08/2018 16

Page 17: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Neural Turing machine (NTM)

A controller that takes input/output and talks to an external memory module.

Memory has read/write operations.

The main issue is where to write, and how to update the memory state.All operations are differentiable.

https://rylanschaeffer.github.io/content/research/neural_turing_machine/main.html

Page 18: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

NTM operations

30/08/2018 18https://medium.com/@aidangomez/the-neural-turing-machine-79f6e806c0a1

https://rylanschaeffer.github.io

Page 19: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

30/08/2018 19

NTM unrolled in time with LSTM as controller

#Ref: https://medium.com/snips-ai/ntm-lasagne-a-library-for-neural-turing-machines-in-lasagne-2cdce6837315

Page 20: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Differentiable neural computer (DNC)

30/08/2018 20

Source: deepmind.com

#REF: Graves, Alex, et al. "Hybrid computing using a neural network with dynamic external memory.” Nature 538.7626 (2016): 471-476.

https://rylanschaeffer.github.io

20162014

Page 21: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Dual-controlling for read and writeHung Le, Truyen Tran & Svetha Venkatesh

PAKDD’18

30/08/2018 21

Page 22: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

MANN with dual control (DC-MANN)

30/08/2018 22

Two controllers, for input & output

The encoder reads the input sequence is encoded into memory

The decoder reads the memory and produces a sequence of output symbols

During decoding, the memory is write-protected (DCw-MANN)

#REF: Hung Le, Truyen Tran, and Svetha Venkatesh. “Dual Control Memory Augmented Neural Networks for Treatment Recommendations”, PAKDD18.

Page 23: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

DC-MANN

LSTM LSTM LSTM LSTM LSTM LSTM

E11 N18

1916 1910

I10 1916 1910

1893

23#Ref: https://medium.com/snips-ai/ntm-lasagne-a-library-for-neural-turing-machines-in-lasagne-2cdce6837315

Page 24: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Result: Odd-Even Sequence Prediction Input: a sequence of random odd numbers output: a sequence of even numbersOutput:

24

Write-protected

helps

Without memory, LSTMs fail the task

Page 25: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Treatment recommendation

E11 I10 N18 1916 1910 Z86 E11 A08 1952 1893 E11 T81 A08

Admission 1 Admission N-1 Admission N (current)

? ? ?

Predict output sequence:Treatments for current admission

25

Page 26: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Result: Medicine prescription

26

Page 27: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Compared to DNC

27

Page 28: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Dual-view sequential problemsHung Le, Truyen Tran & Svetha Venkatesh

KDD’18

30/08/2018 28

Page 29: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Synchronous two-view sequential learning

Visual

Speech

1 2 3 4

Page 30: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Asynchronous two-view sequential learning Healthcare: medicine prescription

E11 I10 N18

1916 1910

Z86 E11

1952 1893

DOCU100L ACET325

Diagnoses

Procedures

Medicines

Page 31: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Asynchronous two-view sequential learning Healthcare: disease progression

E11 I10 N18

1916 1910

Z86 E11

DOCU100LACET325

Previous diagnoses

Previous interventions

Future diagnoses ???

Page 32: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Intra-view & inter-view interactions

output

Page 33: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Dual architecture

Dual Memory Neural Computer (DMNC). There are two encoders and one decoder implemented as LSTMs. The dash arrows represent cross-memory accessing in early-fusion mode

Intra-interaction

Inter-interaction

Long-term dependencies

#Ref: Le, Hung, Truyen Tran, and Svetha Venkatesh. "Dual Memory Neural Computer for Asynchronous Two-view Sequential Learning." KDD18.

Page 34: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Accuracy

Learning curve

Simple sum, but distant, asynchronous

Page 35: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

70 80 90

AUC

F1

P@1

P@2

P@3

Medicine prescription performance(data: MIMIC-III)

LSTM DNC WLAS DMNC

Page 36: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

67.6

61.3

57

53.6

50

47.1

65.9

60.8

56.5

51.8

48.9

45.7

66.2

59.6

53.752.7

49.4

46.2

44

49

54

59

64

69

P@1 Dieabies P@2 Dieabies P@3 Dieabies P@1 Mental P@2 Mental P@3 Mental

Disease progression performance(data: MIMIC-III)

DMNC WLAS DeepCare

Page 37: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Bringing variability in output sequencesHung Le, Truyen Tran & Svetha Venkatesh

Submitted to NIPS’18

30/08/2018 37

Page 38: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Motivation: Dialog system

30/08/2018 38

A dialog system needs to maintain the history of chat (e.g., could be hours)Memory is needed

The generation of response needs to be flexible, adapting to variation of moods, styles Current techniques are mostly based on LSTM, leading to “stiff” default responses

(e.g., “I see”).

There are many ways to express the same thought Variational generative methods are needed.

Page 39: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Variational Auto-Encoder (VAE)(Kingma & Welling, 2014)

Two separate processes: generative (hidden visible) versus recognition (visible hidden)

http://kvfrans.com/variational-autoencoders-explained/

Gaussian hidden variables

Data

Generative net

Recognisingnet

Page 40: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Variational memory encoder-decoder (VMED)

30/08/2018 40

Conditional Variational Auto-Encoder

contextgenerated

latent variables

VMED

contextgenerated

latent variables memory

reads

Page 41: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

30/08/2018 41

Page 42: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Sample response

30/08/2018 42

Page 43: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Sample response (2)

30/08/2018 43

Page 44: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Bringing relational structures into memoryTrang Pham, Truyen Tran & Svetha Venkatesh

IJCAI’17 WS+

30/08/2018 44

Page 45: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

NTM as matrix machine

Controller and memory operations can be conceptualized as matrix operations Controller is a vector

changing over time

Memory is a matrix changing over time

30/08/2018 45

#REF: Kien Do, Truyen Tran, Svetha Venkatesh, “Learning Deep Matrix Representations”, arXiv preprint arXiv:1703.01454

Recurrent dynamics

Page 46: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Idea: Relational memoryIndependent memory slots not suitable for relational reasoning

Human working memory sub-processes seem inter-dependent

30/08/2018 46

Relational structure

New memory proposalNew information

Transformation

Old memoryTime-aware bias

Page 47: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Relational Dynamic Memory Network (DMNN)

30/08/2018 47

Controller

Memory

Graph

Query Output

Read WriteOutputController

Memory

Query

Read Write

Relational Dynamic Memory NetworkNTM

Page 48: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

RDMN unrolled

30/08/2018 48

Input process

Memory process

Output process

Controller process

Message passing

Page 49: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Drug-disease response

30/08/2018 49

Molecule Bioactivity

Page 50: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Chemical reaction

30/08/2018 50

Molecules Reaction

Page 51: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

Team @ Deakin (A2I2)

30/08/2018 51

Thanks to many people who have created beautiful graphics & open-source programming frameworks.

Page 52: Advances in Neural Turing MachinesCafeDSL-UoW.pdf · Video captioning QA, VQA Machine translation Machine reading (stories, books, DNA) Business process continuation Software execution

References

Memory–Augmented Neural Networks for Predictive Process Analytics, A Khan, H Le, K Do, T Tran, A Ghose, H Dam, R Sindhgatta, arXiv preprint arXiv:1802.00938

Learning deep matrix representations, K Do, T Tran, S Venkatesh, arXiv preprint arXiv:1703.01454

Variational memory encoder-decoder, H Le, T Tran, T Nguyen, S Venkatesh, arXiv preprintarXiv:1807.09950

Relational dynamic memory networks, Trang Pham, Truyen Tran, Svetha Venkatesh, arXivpreprint arXiv:1808.04247

Dual Memory Neural Computer for Asynchronous Two-view Sequential Learning, H Le, T Tran, S Venkatesh, KDD'18

Dual control memory augmented neural networks for treatment recommendations, H Le, T Tran, S Venkatesh, PAKDD'18.

30/08/2018 52