34
Synaptic [classical and quantum] fluctuations as a recipe for robust and efficient neural network training R. Zecchina F. Gerace A. Ingrosso C. Lucibello L. Saglietti E. Tartaglione C. Borgs J. Chayes H.J. Kappen P. Chaudhari S. Soatto Carlo Baldassi Bocconi University, Milano, Italy

Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Synaptic [classical and quantum] fluctuationsas a recipe for robust and efficient

neural network training

R. Zecchina

F. GeraceA. IngrossoC. LucibelloL. SagliettiE. Tartaglione

C. BorgsJ. Chayes

H.J. Kappen

P. ChaudhariS. Soatto

Carlo BaldassiBocconi University, Milano, Italy

Page 2: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Outline (1/4)

?inputs

output

Simple (but surprisingly rich) neuronal models → Stat. Phys. Analysis

More complex neural networks, realistic data → numerical experiments

(replica method)

Page 3: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Outline (2/4)

Learning as an optimization problem

Highly non-convex landscape, many sharp local minima, isolated global minima

Should be hopeless, yet efficient heuristics exist

energy

Page 4: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Outline (3/4)

Large deviation analysis → hidden dense region (high local entropy)

energy

Hidden (eq. analysis ignores it)Dense → robust → generalizes wellAccessible to simple algorithms

Page 5: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Outline (4/4)

High-density states can be targeted by explicitly designed algorithms

But some processes are attracted to them too. In particular:1) Quantum annealing2) Simple gradient on stochastic synapses

energy

Page 6: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Motivation, in brief

● Deep learning models: impressive results (super-human in some cases), very versatile

● Would benefit from: improved theoretical understanding, reduced resources usage, dedicated hardware

● Biological neurons: low-precision, quite noisy

Page 7: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Simplified neuronal models

● Simplest model neuron: perceptron (one layer, discretized time — no dynamics, no Dale's principle...)

?inputs

output

Page 8: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Simplified neuronal models

● Simplest model neuron: perceptron (building block of DNNs)

?inputs

output

Page 9: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Simplified neuronal models

● Simplest model neuron: perceptron (building block of DNNs)

● Learning as an optimization problem: minimize misclassifications

?inputs

output desired output

Page 10: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Simplified neuronal models

● Simplest model neuron: perceptron (building block of DNNs)

● Learning as an optimization problem: minimize misclassifications

● Random i.i.d. patterns, large N → stat. phys. tools → phase transitions (e.g. SAT/UNSAT)

?inputs

output desired output

Page 11: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Equilibrium statistical physics ofneural networks

● Errors ~ energy → Gibbs measure → Stat. Phys. Tools

● SAT/UNSAT phase transition (there is a “critical capacity” αc)

Neural net factor graph(interactions ~ patterns)

Variables(synaptic weights)

Patterns(inputs+des. outputs)

Emin

=0 Emin

>0

Page 12: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Binary (±1) synapses, equilibrium results

● Good capacity: αc = 0.83

● But nasty optimization landscape: typical solutions are isolated, and immersed in exponentially many local minima

● Yet, simple efficient heuristics exists up to αmax ~ 0.75. They find non-isolated solutions, i.e. atypical

● Need to move away from equilibrium analysis

W. Krauth, M. Mézard, J. Phys. France, 1989H. Huang, Y. Kabashima, Phys. Rev. E, 2014

A. Braunstein, R. Zecchina, PRL, 2006C. Baldassi, A. Braunstein, et al, PNAS, 2007

C. Baldassi, J. Stat. Phys., 2009C. Baldassi, A. Braunstein, J. Stat. Mech., 2015

space of configurations

energy landscape

Page 13: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Large deviation analysis: local entropy● Idea: skew the statistical analysis towards non-isolated regions

● Given a configuration, we want to weigh it by how many solutions surround it — local entropy

C. Baldassi, A. Ingrosso, C. Lucibello, L. Saglietti, R. Zecchina, PRL, 2015

Gibbs distribution with a different

energy

Page 14: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Large deviation analysis: Results Summary

● Ultra-dense regions exist at least up to some α ~ 0.77 (good agreement with heuristic solvers)

● Better generalization properties (quasi-bayesian) [note: dense regions are not planted, they are structural]

● Same phenomenology also in with arbitrary number of synaptic states

– capacity saturates fast with number of states → high-precision is not needed (with the right algorithms)

● Same phenomenology also in deeper networks with more realistic datasets

C. Baldassi, A. Ingrosso, C. Lucibello, L. Saglietti, R. Zecchina, PRL, 2015C. Baldassi, F. Gerace, C. Lucibello, L. Saglietti, R. Zecchina, Phys. Rev. E, 2016

Page 15: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

energy

dense regionIsolated optima(much more numerous than depicted here)

Page 16: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

energylocal entropy

large d

Page 17: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

energylocal entropy

intermediate d

Page 18: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

energylocal entropy

small d

Page 19: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

energylocal entropy

large to small d = "focusing" or "scoping" process

Page 20: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Exploring loc.entropy by interacting replicas● Assume that the parameter y is integer and transform the partition

function:

C. Baldassi, C. Borgs, J. Chayes, A. Ingrosso, C. Lucibello, L. Saglietti, R. Zecchina, PNAS, 2016

Local free entropy

reference(“center”) y replicas

interaction

Large deviation measure

Page 21: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Exploring loc.entropy by interacting replicas● Assume that the parameter y is integer and transform the partition

function: simple recipe to extend existing algorithms

C. Baldassi, C. Borgs, J. Chayes, A. Ingrosso, C. Lucibello, L. Saglietti, R. Zecchina, PNAS, 2016

reference

y replicas

interaction

original factor graph

We call this the Robust Ensemble

(RE)

Page 22: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Replicated variants of existing algorithms

● Replicated Simulated Annealing → doesn't get stuck → exponential speed-up w.r.t. non-replicated version

● Replicated Belief Propagation → efficient solver + semi-analytical tool for studying dense states + link with reinforcement heuristic

● Replicated Stochastic Gradient Descent (SGD) → dramatic improvement in capacity and speed w.r.t. non-replicated SGD

● (All of these were tested on 2-layer networks: same scenario as the 1-layer perceptron)

C. Baldassi, C. Borgs, J. Chayes, A. Ingrosso, C. Lucibello, L. Saglietti, R. Zecchina, PNAS, 2016

Page 23: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Deep neural networks

● Replicated SGD very deeply related to EASGD (Zhang et al. 2015) (11-layers, convolutional architecture, real-life images [ImageNet])

● Entropy-SGD (2016): use Langevin dynamics to estimate local entropy, with excellent results (convolutional networks and recurrent networks, realistic inputs)

● Parallel Local Entropy (Parle) (2017): state-of-the-art results on an array of tasks, but ~3x faster, distributed learning

● Theoretical and numerical evidence from other research groups: minima are not all the same, wide minima (dense states) generalize better

S. Zhang, A. Choromanska, Y. LeCun, NIPS 2015P. Chaudhari, A. Choromanska, S. Soatto, Y. LeCun, C. Baldassi, C. Borgs, J. Chayes, L. Sagun, R. Zecchina, ICLR 2016

P. Chaudhari, C. Baldassi, R. Zecchina, S. Soatto, A. Talwalkar, A. Oberman (arXiv 1707.00424), SysML 2018

Page 24: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Quantum annealing

● Quantum annealing strategy: use quantum fluctuations (rather than thermal fluctuations) to overcome energetic barriers

– Classical energy function + quantum perturbation, slowly send the perturbation to zero

● So far: unclear if "true" QA really helps, compared to standard annealing, in any relevant concrete scenario

classicalpart

transverse field(send Γ to 0)

Page 25: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Suzuki-Trotter transformation

● Partition function transformation → "effective" replicated classical Hamiltonian (with infinite replicas, y→∞)

replicatedclassical

partinteraction

Γ→0 ⟺ γ→∞

● Can be simulated with MCMC (finite y) → Quantum Simulated Annealing (QSA)

Page 26: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Quantum annealing vs Robust ensemble

● Effective Hamiltonian after Suzuki-Trotter transformation: very similar to the robust ensemble description...

originalfactor graph

QA

RE

C. Baldassi, R. Zecchina, PNAS 2018 (in press, arXiv:1706.08470)

Page 27: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

QSA on binary neural networks study● Analytical calculations + numerical experiments + comparison with

true QA in small instances

● Ends up in the dense states (exponential speed-up w.r.t. thermal annealing – a physical device would work in ~O(1)...)

● QA lowers kinetic energy by delocalizing → favors dense regions

C. Baldassi, R. Zecchina, PNAS 2018 (in press, arXiv:1706.08470)

(DWave-like)

Page 28: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Geometric structure is essential

C. Baldassi, R. Zecchina, PNAS 2018 (in press, arXiv:1706.08470)

Page 29: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Stochastic synapses (overview)

● Consider a network with binary stochastic synapses, each controlled by a single parameter (a magnetization)

● At each presentation of a pattern, the actual values of the synapses are extracted at random

● Maximum likelihood approach (gradient descent, simulated annealing...)

● Natural way to enforce robustness: the network must try to get in the middle of a dense region where almost all solutions are good.

● …and indeed it does → ends up in high local-entropy states

[This is no accident, the intuition is corroborated by a formal similarity with the RE setting]

C. Baldassi, F. Gerace, H.J. Kappen, C. Lucibello, L. Saglietti, E. Tartaglione, R. Zecchina, arXiv:1710.09825, 2017

Page 30: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Stochastic synapses

C. Baldassi, F. Gerace, H.J. Kappen, C. Lucibello, L. Saglietti, E. Tartaglione, R. Zecchina, arXiv:1710.09825, 2017

Standard:

Bayes:

Our approach:

This is the log-likelihood of a stochastic network where for each pattern (x,y) we sample the synapses W.

More complicated then Standard, less then Bayes. We get some of the goodies of the Bayesian approach (encoding structural priors, implicit robustness) while in the end we solve the Standard problem.

It can be understood as a relaxation of the standard problem. Near the end of the training we have Now we can use SGD to train binary networks!

Page 31: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Stochastic binary perceptron

C. Baldassi, F. Gerace, H.J. Kappen, C. Lucibello, L. Saglietti, E. Tartaglione, R. Zecchina, arXiv:1710.09825, 2017

For large N the log-likelihood becomes

Smoothed-out landscape: we can do Gradient Descent on [or any other simple algorithm, actually]

Eventually, we can get to a polarized configuration which is in the middle of a dense region (verified analytically and numerically).

Page 32: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Stochastic binary multi-layer networks

C. Baldassi, C.Borgs, J. Cheyes, F. Gerace, C. Lucibello, L. Saglietti, E. Tartaglione, R. Zecchina, in preparation, 2018

● The procedure can be applied to more complex architectures (e.g. 3 or more hidden layers).

● However, in order to be efficient, we must introduce an approximation (uncorrelated neurons) which can be quite crude.

– In cases where we can keep the correlations under control, the results are indeed very good (work in progress…)

● Or we could estimate the loglikelihood by sampling (also WIP...)

Page 33: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Conclusions, future directions

● A wide family of stochastic processes is attracted to out-of-equilibrium states with peculiar (and useful) geometric properties in the space of configurations. Can we extend and generalize these results?

● Learning with low precision synapses can be made extremely simple, and the performances are very good: can we improve existing networks/design new neural hardware?

● Accessible states in other problems – by exploring the robust ensemble we can find dense regions as if they were typical

● Applications in inference?

● We want to address unsupervised learning (automatic feature extraction) as well [we have preliminary results, on hold due to time constraints...]

Page 34: Synaptic [classical and quantum] fluctuations as a recipe for …cnls.lanl.gov/external/piml/Carlo Balsassi.pdf · 2018-04-23 · Synaptic [classical and quantum] fluctuations as

Thanks

R. ZecchinaC. Baldassi

F. GeraceA. IngrossoC. LucibelloL. SagliettiE. Tartaglione

C. BorgsJ. Chayes

H.J. Kappen

P. ChaudhariS. Soatto