Local Representation Alignment: A Biologically Motivated...

Local Representation Alignment: A Biologically Motivated Algorithm

for Training Neural Systems

Alexander G. Ororbia II

The Neural Adaptive Computing (NAC) Laboratory

Rochester Institute of Technology

Collaborators

• The Pennsylvania State University• Dr. C. Lee Giles

• Dr. Daniel Kifer

• Rochester Institute of Technology (RIT)• Dr. Ifeoma Nwogu (Computer Vision)

• Dr. Travis Desell (Neuro-evolution, distributed computing)

• Students• Ankur Mali (PhD student, Penn State, co-advised w/ Dr. C. Lee Giles)

• Timothy Zee (PhD student, RIT, co-advised w/ Dr. Ifeoma Nwogu)

• Abdelrahman Elsiad (PhD student, RIT, co-advised w/ Dr. Travis Desell)

Objectives

• Context: Credit assignment & algorithmic alternatives• Backpropagation of errors (backprop)• Feedback alignment algorithms• Target propagation (TP) • Contrastive Hebbian learning (CHL)

• Discrepancy Reduction – a family of learning procedures• Error-Driven Local Representation Alignment (LRA/LRA-E)• Adaptive Noise Difference Target Propagation (DTP-σ)

• Experimental Results & Variations

• Conclusions3

Equilibrium propagation (EP)

Contrastive Divergence (CD)

Backprop, CHL, LRA

SGD, Adam, RMSprop

MSE, MAE, CNLL

MNISTMLP, AE, BM, RNN

MLP = Multilayer perceptronAE = AutoencoderBM = Boltzmann machine

Problems with Backprop

Global optimization, back-prop through whole graph.

• The global feedback pathway• Vanishing/exploding gradients• In recurrent networks, this is worse!!

• The weight transport problem• High sensitivity to initialization• Activation constraints/conditions

• Requires system to be fully differentiable → difficulty in handling discrete-valued functions

• Requires sufficiently linearity →adversarial samples

Feedforward InferenceIllustration: forward propagation in a multilayer perceptron (MLP) to collect activities

(Shared across most algorithms, i.e., backprop, random feedback alignment, direct feedback alignment, local representation alignment)

Backpropagation of Errors

Conducting credit assignment using the activities produced by the inference pass 14

Pass error signal back through post-activations (get derivatives w.r.t. pre-activitions)

Pass error signal back through (incoming) synaptic weights to get error signal transmitted to post-activations in layer below

Repeat the previous steps, layer by layer (recursive treatment of backprop procedure)

Random Feedback Alignment

Pass error signal back through fixed, random alignment weights (replaces backprop’s step of passing error through transpose of feedforward weights)

Repeat previous steps (similar to backprop)

Direct Feedback Alignment

Pass error signal along first set of direct alignment weights to second layer

Pass error signal along next set of direct alignment weights to first layer

Treat the signals propagated along direct alignment connections as proxies for error derivatives and run them through post-activations in each layer, respectively

Random Feedback Alignment: Direct Feedback Alignment:

Backpropagation of Errors:

Global versus Local Signals

Global optimization, back-prop through whole graph. Local optimization, back-prop through sub-graphs.

Global versus Local Signals

Global optimization, back-prop through whole graph. Local optimization, back-prop through sub-graphs.

Global feedback pathway

Will these yield coherent models?

Equilibrium Propagation

38Negative phase Positive phase

The Discrepancy Reduction Family

• General process (Ororbia et. al., 2017 Adapt)• 1) Search for latent representations that better explain input/output (targets)

• 2) Reduce mismatch between currently “guessed” representations & target representations• Sum of internal, local losses (in nats) → total discrepancy (akin to “pseudo-energy”)

• Coordinated local learning rules

• Algorithms• Difference target propagation (DTP) (Lee et. al., 2014)• DTP-σ (Ororbia et. al., 2019)• LRA (Ororbia et. al., 2018, Ororbia et. al., 2019)• Others – targets could come from an external, interacting process

• NPC (neural predictive coding, Ororbia et. al., 2017/2018/2019)

Adaptive Noise Difference Target Propagation (DTP-σ)

Image adapted from (Lillicrap et al., 2018)

zL zL˄

z L-1z L-1˄

g(z )Lg(z )L˄

Error-Driven Local Representation Alignment (LRA-E)

Transmit error along error feedback weights, and error correct the post-activations using the transmitted displacement/delta

Calculate local error in layer below, measuring discrepancy between original post-activation and error-corrected post-activation

Repeat the past several steps, error-correcting each layer further down within the network/system

Optional…substitute & repeat!

Aligning Local Representations• Credit assignment by optimizing subgraphs linked by error units

The Cauchy local loss:48

Aligning Local Representations• Credit assignment by optimizing subgraphs linked by error units,

motivated/inspired by (Rao & Ballard, 1999)

Aligning Local Representations• Credit assignment by optimizing subgraphs linked by error units,

motivated/inspired by (Rao & Ballard, 1999)

There is more than one way to compute these changes 50

Some Experimental Results

Experimental Results

Fashion MNIST

Trousers

(Ororbia et al., 2018 Bio)

Acquired Filters

Third level filters acquired, after a single pass through the data, by tanh network trained by a) backprop, b) LRA.

Backprop LRA

Visualization of Topmost Post-Activities

Angle between LRA, DFA, & DTP-σ against Backprop

Measuring Total Discrepancy in LRA-E

Equilibrium Propagation (8 layers):MNIST: 59.03% Fashion MNIST: 67.33%Equilibrium Propagation (3 layers):MNIST: 6.00% Fashion MNIST: 16.71%

Training Deep (& Thin) Networks

(Ororbia et al., 2018 Credit)

Training Networks from Null Initialization

LWTA: SLWTA:

(Ororbia et al., 2018 Credit) 57

Training Stochastic Networks

58(Ororbia et al., 2018 Credit)

If time permits…let’s talk about modeling time…

Training Neural Temporal/Recurrent Models

The Parallel Temporal Neural Coding Network (P-TNCN) (Ororbia et al., 2018)

(Ororbia et al., 2018 Continual)

• Integrating LRA into recurrent networks – result = Temporal Neural Coding Network

Removing Back-Propagation through Time!• Each step in time entails: 1) generate hypothesis, 2) error correction in light of evidence

Conclusions

• Backprop has issues, alignment algorithms fix one issue• Other algorithms such as DTP or EP are slow….

• Discrepancy reduction• Local representation alignment• Adaptive noise difference target propagation (DTP-σ)

• Showed promising results, stable and performant compared to alternatives such as Equilibrium Propagation & alignment algorithms• Can work with non-differentiable operators (discrete/stochastic)

• Can be used to train recurrent/temporal models too!64

Questions?

References• (Ororbia et al., 2018, Credit) -- Alexander G. Ororbia II, Ankur Mali, Daniel Kifer,

and C. Lee Giles. “Deep Credit Assignment by Aligning Local Distributed Representations”. arXiv:1803.01834 [cs.LG].

• (Ororbia et al., 2018, Continual) -- Alexander G. Ororbia II , Ankur Mali, C. Lee Giles, and Daniel Kifer. “Continual Learning of Recurrent Neural Networks by Locally Aligning Distributed Representations”. arXiv:1810.07411 [cs.LG].

• (Ororbia et al., 2017, Adapt) -- Alexander G. Ororbia II , Patrick Haffner, David Reitter, and C. Lee Giles. “Learning to Adapt by Minimizing Discrepancy”. arXiv:1711.11542 [cs.LG].

• (Ororbia et al., 2018, Lifelong) -- Alexander G. Ororbia II , Ankur Mali, Daniel Kifer, and C. Lee Giles. “Lifelong Neural Predictive Coding: Sparsity Yields Less Forgetting when Learning Cumulatively”. arXiv:1905.10696 [cs.LG].

• (Ororbia et al., 2018, Bio) -- Alexander G. Ororbia II and Ankur Mali. “Biologically Motivated Algorithms for Propagating Local Target Representations”. In: Thirty-Third AAAI Conference on Artificial Intelligence.

Local Representation Alignment: A Biologically Motivated...

Documents

Deep Learning Srihari Common Probability Distributionsclgiles.ist.psu.edu/IST597/materials/slides/lect2/... · Deep Learning Multinoulli Distribution Srihari • Distribution over

Secure and Selective Dissemination of XML Documentsnike.psu.edu/classes/ist597/2003-fall/papers/p290-bertino.pdf · Secure and Selective Dissemination of XML Documents ELISA BERTINO

Inspired Students Inspired Teaching Inspired Learning

Don\'t just be inspired; Be Bio-inspired

spine 8.5mm INSPIRED ENGLISH 1 INSPIRED

Inspired Text

Superstar Inspired

Carlos O'Kelly's - Inspired Mex · Carlos O'Kelly's - Inspired Mex

inspired meaning

spine 8.5 mm INSPIRED ENGLISH 1 INSPIRED Inspired English ... · Inspired English 2 Teacher Book 978 1 4586 5099 3 Inspired English 1 Teacher Book ... be fascinating to discover how

Be inspired - instructionsmanuals.com · Be inspired s

Be inspired

Work Inspired

Inspired marketing

Bio-inspired computing tools and applications: position paperusers.cecs.anu.edu.au/~Tom.Gedeon/pdfs/Bio-inspired... · 2020. 11. 4. · 2 Bio-inspired Computing Tools Bio-inspired

Inspired slides_1

20120221 inspired

Who is Inspired 2Be Inspired slide show

Inspired Legacies

Metamaterial Inspired ImprovedMetamaterial Inspired ImprovedMetamaterial Inspired ImprovedMetamaterial Inspired ImprovedMetamaterial Inspired ImprovedMetamaterial Inspired ImprovedMetamaterial