Exponential expressivity in deep neural networks through transient chaos Ben Poole 1 , Subhaneil Lahiri 1 , Maithra Raghu 2,3 , Jascha Sohl-Dickstein 3 , Surya Ganguli 1 1 Stanford University, 2 Cornell University , 3 Google Brain Goal: develop a theoretical understanding of deep neural networks ● Expressivity: represent a large class of functions ● Trainability: tractable algorithms for finding good solutions ● Generalizability: work well in unseen regions of input space Introduction Expressivity in random neural networks Signal propagation in random neural networks Length propagation Correlation propagation Local stretching, ᶩ 1 Local curvature Global curvature (Grassmanian length) x 0 ( θ) x 1 ( θ) x 2 ( θ) x 3 ( θ) Input Output Independent random Normal weights and biases: Fully-connected neural network with nonlinearity : weight variance bias variance A single point: When does its length grow or shrink and how fast? Theory: how do simple input manifolds propagate through a deep network? A pair of points: Do they become more similar or more different? Experiment: random deep nets more expressive than wide shallow nets A smooth manifold: How does its curvature and volume change? Self-averaging approximation: for large N l , average over neurons in a layer ≈ average over random weights for one neuron Increasing σ W Chaotic regime: nearby points become decorrelated Ordered regime: nearby points become more correlated q* - fixed point of iterative map The correlation map always has a fixed point at 1 whose stability is determined by the slope of the correlation map at 1: Recursion relation for the length of a point as it propagates through the network: Depth ᶩ 1 acts like a local stretching factor: ᶩ 1 < 1 : nearby points come closer together ᶩ 1 > 1 : nearby points are driven apart Local Stretch Local Curvature Grassmannian Length Ordered: ᶩ 1 < 1 Exponential decay Exponential growth Constant Chaotic: ᶩ 1 > 1 Exponential growth Constant Exponential growth References M. Raghu, B. Poole, J. Kleinberg, S. Ganguli, J. Sohl-Dickstein. On the expressive power of deep neural networks. arXiv:1606.05336 S. Schoenholz, J. Gilmer, S. Ganguli, J. Sohl-Dickstein. Deep Information Propagation. arXiv:1611.01232 Code to reproduce all results: github.com/ganguli-lab/deepchaos Riemannian geometry of manifold propagation error Our work: expressivity in neural networks with random weights: ● New framework for analyzing random deep networks using mean field theory and Riemannian geometry ● Random deep nets are exponentially more expressive than shallow: deep → shallow requires exponentially more neurons! ● Can represent exponentially curved decision boundaries in input Acknowledgements: BP is supported by NSF IGERT and SIGF. SG and SL thank the Burroughs-Wellcome, Sloan, McKnight, James S. McDonnell, and Simons Foundations, and the Office of Naval Research for support. Autocorrelation In the chaotic regime, deep networks with random weights create a space-filling curve that exponentially expands in length without decreasing local curvature, leading to an exponential growth in the global curvature.

Exponential expressivity in deep neural networks through transient chaospoole/deepchaos_poster.pdf · 2016-12-10 · Exponential expressivity in deep neural networks through transient

Download PDF Report

Upload
others
View
14
Download
0

Embed Size (px)

Citation preview

Page 1: Exponential expressivity in deep neural networks through transient chaospoole/deepchaos_poster.pdf · 2016-12-10 · Exponential expressivity in deep neural networks through transient

Exponential expressivity in deep neural networks through transient chaosBen Poole1, Subhaneil Lahiri1, Maithra Raghu2,3, Jascha Sohl-Dickstein3, Surya Ganguli1

1Stanford University, 2Cornell University , 3Google Brain

Goal: develop a theoretical understanding of deep neural networks● Expressivity: represent a large class of functions● Trainability: tractable algorithms for finding good solutions● Generalizability: work well in unseen regions of input space

Introduction

Expressivity in random neural networks

Signal propagation in random neural networks

Length propagation

Correlation propagationLocal stretching, 1 Local curvature Global curvature

(Grassmanian length)

x0(θ) x1(θ) x2(θ) x3(θ)

Input Output

Independent random Normal weights and biases:

Fully-connected neural network with nonlinearity :

weight variance bias variance

A single point: When does its length grow or shrink and how fast?

Theory: how do simple input manifolds propagate through a deep network?

A pair of points: Do they become more similar or more different?

Experiment: random deep nets more expressive than wide shallow nets

A smooth manifold: How does its curvature and volume change?

Self-averaging approximation: for large Nl, average over neurons in a layer ≈ average over random weights for one neuron

Incr

easi

ng σ

Chaotic regime: nearby points become decorrelated

Ordered regime: nearby points become more correlated

q* - fixed point of iterative map

The correlation map always has a fixed point at 1 whose stability is determined by the slope of the correlation map at 1:

Recursion relation for the length of a point as it propagates through the network:

Depth

1 acts like a local stretching factor: 1 < 1 : nearby points come closer together

1 > 1 : nearby points are driven apart

Local Stretch Local Curvature Grassmannian Length

Ordered: 1 < 1 Exponential decay Exponential growth Constant

Chaotic: 1 > 1 Exponential growth Constant Exponential growth

ReferencesM. Raghu, B. Poole, J. Kleinberg, S. Ganguli, J. Sohl-Dickstein. On the expressive power of deep neural networks. arXiv:1606.05336S. Schoenholz, J. Gilmer, S. Ganguli, J. Sohl-Dickstein. Deep Information Propagation. arXiv:1611.01232Code to reproduce all results: github.com/ganguli-lab/deepchaos

Riemannian geometry of manifold propagation

error

Our work: expressivity in neural networks with random weights:● New framework for analyzing random deep networks using mean

field theory and Riemannian geometry● Random deep nets are exponentially more expressive than shallow:

deep → shallow requires exponentially more neurons!● Can represent exponentially curved decision boundaries in input

Acknowledgements: BP is supported by NSF IGERT and SIGF. SG and SL thank the Burroughs-Wellcome, Sloan, McKnight, James S. McDonnell, and Simons Foundations, and the Office of Naval Research for support.

Autocorrelation

In the chaotic regime, deep networks with random weights create a space-filling curve that exponentially expands in length without decreasing local curvature, leading to an exponential growth in the global curvature.