Local Representation Alignment: A Biologically Motivated...

Preview:

Citation preview

Local Representation Alignment: A Biologically Motivated Algorithm

for Training Neural Systems

Alexander G. Ororbia II

The Neural Adaptive Computing (NAC) Laboratory

Rochester Institute of Technology

1

Collaborators

• The Pennsylvania State University• Dr. C. Lee Giles

• Dr. Daniel Kifer

• Rochester Institute of Technology (RIT)• Dr. Ifeoma Nwogu (Computer Vision)

• Dr. Travis Desell (Neuro-evolution, distributed computing)

• Students• Ankur Mali (PhD student, Penn State, co-advised w/ Dr. C. Lee Giles)

• Timothy Zee (PhD student, RIT, co-advised w/ Dr. Ifeoma Nwogu)

• Abdelrahman Elsiad (PhD student, RIT, co-advised w/ Dr. Travis Desell)

2

Objectives

• Context: Credit assignment & algorithmic alternatives• Backpropagation of errors (backprop)• Feedback alignment algorithms• Target propagation (TP) • Contrastive Hebbian learning (CHL)

• Discrepancy Reduction – a family of learning procedures• Error-Driven Local Representation Alignment (LRA/LRA-E)• Adaptive Noise Difference Target Propagation (DTP-σ)

• Experimental Results & Variations

• Conclusions3

Equilibrium propagation (EP)

Contrastive Divergence (CD)

4

Backprop, CHL, LRA

SGD, Adam, RMSprop

MSE, MAE, CNLL

MNISTMLP, AE, BM, RNN

5

MLP = Multilayer perceptronAE = AutoencoderBM = Boltzmann machine

Problems with Backprop

Global optimization, back-prop through whole graph.

6

• The global feedback pathway• Vanishing/exploding gradients• In recurrent networks, this is worse!!

• The weight transport problem• High sensitivity to initialization• Activation constraints/conditions

• Requires system to be fully differentiable → difficulty in handling discrete-valued functions

• Requires sufficiently linearity →adversarial samples

Feedforward InferenceIllustration: forward propagation in a multilayer perceptron (MLP) to collect activities

(Shared across most algorithms, i.e., backprop, random feedback alignment, direct feedback alignment, local representation alignment)

7

8

9

10

11

12

Backpropagation of Errors

13

Conducting credit assignment using the activities produced by the inference pass 14

Pass error signal back through post-activations (get derivatives w.r.t. pre-activitions)

15

Pass error signal back through (incoming) synaptic weights to get error signal transmitted to post-activations in layer below

16

Repeat the previous steps, layer by layer (recursive treatment of backprop procedure)

17

18

19

Random Feedback Alignment

20

21

Pass error signal back through post-activations (get derivatives w.r.t. pre-activitions)

22

Pass error signal back through fixed, random alignment weights (replaces backprop’s step of passing error through transpose of feedforward weights)

23

Repeat previous steps (similar to backprop)

24

25

26

Direct Feedback Alignment

27

28

Pass error signal back through post-activations (get derivatives w.r.t. pre-activitions)

29

Pass error signal along first set of direct alignment weights to second layer

30

Pass error signal along next set of direct alignment weights to first layer

31

Treat the signals propagated along direct alignment connections as proxies for error derivatives and run them through post-activations in each layer, respectively

32

33

Random Feedback Alignment: Direct Feedback Alignment:

Backpropagation of Errors:

34

Global versus Local Signals

Global optimization, back-prop through whole graph. Local optimization, back-prop through sub-graphs.

36

Global versus Local Signals

Global optimization, back-prop through whole graph. Local optimization, back-prop through sub-graphs.

37

Global feedback pathway

Will these yield coherent models?

Equilibrium Propagation

38Negative phase Positive phase

The Discrepancy Reduction Family

• General process (Ororbia et. al., 2017 Adapt)• 1) Search for latent representations that better explain input/output (targets)

• 2) Reduce mismatch between currently “guessed” representations & target representations• Sum of internal, local losses (in nats) → total discrepancy (akin to “pseudo-energy”)

• Coordinated local learning rules

• Algorithms• Difference target propagation (DTP) (Lee et. al., 2014)• DTP-σ (Ororbia et. al., 2019)• LRA (Ororbia et. al., 2018, Ororbia et. al., 2019)• Others – targets could come from an external, interacting process

• NPC (neural predictive coding, Ororbia et. al., 2017/2018/2019)

39

Adaptive Noise Difference Target Propagation (DTP-σ)

Image adapted from (Lillicrap et al., 2018)

zL zL˄

z L-1z L-1˄

g(z )Lg(z )L˄

40

Error-Driven Local Representation Alignment (LRA-E)

41

42

Transmit error along error feedback weights, and error correct the post-activations using the transmitted displacement/delta

43

Calculate local error in layer below, measuring discrepancy between original post-activation and error-corrected post-activation

44

Repeat the past several steps, error-correcting each layer further down within the network/system

45

46

Optional…substitute & repeat!

47

Aligning Local Representations• Credit assignment by optimizing subgraphs linked by error units

The Cauchy local loss:48

Aligning Local Representations• Credit assignment by optimizing subgraphs linked by error units,

motivated/inspired by (Rao & Ballard, 1999)

49

Aligning Local Representations• Credit assignment by optimizing subgraphs linked by error units,

motivated/inspired by (Rao & Ballard, 1999)

There is more than one way to compute these changes 50

Some Experimental Results

51

Experimental Results

52

MNIST

Fashion MNIST

7

3

Trousers

Dress

Shirt

(Ororbia et al., 2018 Bio)

Acquired Filters

Third level filters acquired, after a single pass through the data, by tanh network trained by a) backprop, b) LRA.

Backprop LRA

53

Visualization of Topmost Post-Activities

54

55

Angle between LRA, DFA, & DTP-σ against Backprop

Measuring Total Discrepancy in LRA-E

Equilibrium Propagation (8 layers):MNIST: 59.03% Fashion MNIST: 67.33%Equilibrium Propagation (3 layers):MNIST: 6.00% Fashion MNIST: 16.71%

Training Deep (& Thin) Networks

(Ororbia et al., 2018 Credit)

56

Training Networks from Null Initialization

LWTA: SLWTA:

(Ororbia et al., 2018 Credit) 57

Training Stochastic Networks

58(Ororbia et al., 2018 Credit)

If time permits…let’s talk about modeling time…

59

Training Neural Temporal/Recurrent Models

The Parallel Temporal Neural Coding Network (P-TNCN) (Ororbia et al., 2018)

(Ororbia et al., 2018 Continual)

• Integrating LRA into recurrent networks – result = Temporal Neural Coding Network

60

Removing Back-Propagation through Time!• Each step in time entails: 1) generate hypothesis, 2) error correction in light of evidence

61

62

63

Conclusions

• Backprop has issues, alignment algorithms fix one issue• Other algorithms such as DTP or EP are slow….

• Discrepancy reduction• Local representation alignment• Adaptive noise difference target propagation (DTP-σ)

• Showed promising results, stable and performant compared to alternatives such as Equilibrium Propagation & alignment algorithms• Can work with non-differentiable operators (discrete/stochastic)

• Can be used to train recurrent/temporal models too!64

Questions?

65

References• (Ororbia et al., 2018, Credit) -- Alexander G. Ororbia II, Ankur Mali, Daniel Kifer,

and C. Lee Giles. “Deep Credit Assignment by Aligning Local Distributed Representations”. arXiv:1803.01834 [cs.LG].

• (Ororbia et al., 2018, Continual) -- Alexander G. Ororbia II , Ankur Mali, C. Lee Giles, and Daniel Kifer. “Continual Learning of Recurrent Neural Networks by Locally Aligning Distributed Representations”. arXiv:1810.07411 [cs.LG].

• (Ororbia et al., 2017, Adapt) -- Alexander G. Ororbia II , Patrick Haffner, David Reitter, and C. Lee Giles. “Learning to Adapt by Minimizing Discrepancy”. arXiv:1711.11542 [cs.LG].

• (Ororbia et al., 2018, Lifelong) -- Alexander G. Ororbia II , Ankur Mali, Daniel Kifer, and C. Lee Giles. “Lifelong Neural Predictive Coding: Sparsity Yields Less Forgetting when Learning Cumulatively”. arXiv:1905.10696 [cs.LG].

• (Ororbia et al., 2018, Bio) -- Alexander G. Ororbia II and Ankur Mali. “Biologically Motivated Algorithms for Propagating Local Target Representations”. In: Thirty-Third AAAI Conference on Artificial Intelligence.

66

Recommended