58
Ha r mo n i c Analysis & Deep Learning Sungbin Lim

Harmonic Analysis and Deep Learning

Embed Size (px)

Citation preview

Page 1: Harmonic Analysis and Deep Learning

Harmonic Analysis &

Deep Learning

Sungbin Lim

Page 2: Harmonic Analysis and Deep Learning

In this talk…

Mathematical theory about filter, activation, pooling through multi-layers based on DCNN

Encompass general ingredients

Lipschitz continuity & Deformation sensitivity

WARNING : Very tough mathematics…without non-Euclidean geometry (e.g. Geometric DL)

Page 3: Harmonic Analysis and Deep Learning

What is Harmonic Analysis?

f(x)=X

n2Nan n(x), an := hf, niH

How to represent a function efficiently in the sense of Hilbert space?

Number theory

Signal processing

Quantum mechanics

Neuroscience, Statistics, Finance, etc…

Includes PDE theory, Stochastic Analysis

Page 4: Harmonic Analysis and Deep Learning

What is Harmonic Analysis?

f(x)=X

n2Nan n(x), an := hf, niH

How to represent a function efficiently in the sense of Hilbert space?

Number theory

Signal processing

Quantum mechanics

Neuroscience, Statistics, Finance, etc…

Includes PDE theory, Stochastic Analysis

Page 5: Harmonic Analysis and Deep Learning

Hilbert space & Inner product

Banach space :

Hilbert space :

© Kyung-Min Rho

Page 6: Harmonic Analysis and Deep Learning

Hilbert space & Inner product

© Kyung-Min Rho

Banach space : Normed space + Completeness

Hilbert space :

Page 7: Harmonic Analysis and Deep Learning

Banach space : Normed space + Completeness

Hilbert space :Banach space + Inner product

Hilbert space & Inner product

© Kyung-Min Rho

Page 8: Harmonic Analysis and Deep Learning

Banach space : Normed space + Completeness

Hilbert space :Banach space + Inner product

Rd, L2,Wn2 , · · ·

Hilbert space & Inner product

Cn, Lp,Wnp · · ·

© Kyung-Min Rho

Page 9: Harmonic Analysis and Deep Learning

Banach space : Normed space + Completeness

Hilbert space :Banach space + Inner product

Rd, L2,Wn2 , · · ·

hu,vi =dX

k=1

ukvk

hf, giL2 =

Zf(x)g(x)dx

hf, giWn2= hf, giL2 +

nX

k=1

h@kxf, @

kxgiL2

Hilbert space & Inner product

Cn, Lp,Wnp · · ·

© Kyung-Min Rho

Page 10: Harmonic Analysis and Deep Learning

Why Harmonic Analysis?

Pn(x) = anxn + an�1x

n�1 + · · ·+ a1x+ a0

Page 11: Harmonic Analysis and Deep Learning

Why Harmonic Analysis?

Pn(x) = anxn + an�1x

n�1 + · · ·+ a1x+ a0

(an, an�1, . . . , a1 , a0)

Encoding

Page 12: Harmonic Analysis and Deep Learning

Why Harmonic Analysis?

Pn(x) = anxn + an�1x

n�1 + · · ·+ a1x+ a0

(an, an�1, . . . , a1 , a0)

Encoding

Pn(x) = anxn + an�1x

n�1 + · · ·+ a1x+ a0

Decoding

Page 13: Harmonic Analysis and Deep Learning

Why Harmonic Analysis?

Pn(x) = anxn + an�1x

n�1 + · · ·+ a1x+ a0

(an, an�1, . . . , a1 , a0)

Encoding

Pn(x) = anxn + an�1x

n�1 + · · ·+ a1x+ a0

Decoding

Why we prefer polynomial?

Page 14: Harmonic Analysis and Deep Learning

Stone-Weierstrass theorem

Polynomial is Universal approximation!

8f 2 C(X ), 8" > 0,

9Pn s.t. maxx2X

|f(x)� Pn(x)| < "

© Wikipedia

Page 15: Harmonic Analysis and Deep Learning

8f 2 C(X ),

9Pn s.t. limn!1

kf � Pnk1 = 0

Stone-Weierstrass theorem

Polynomial is Universal approximation!

© Wikipedia

Page 16: Harmonic Analysis and Deep Learning

Stone-Weierstrass theorem

Even we can approximate derivatives!

9Pn s.t. limn!1

kf � PnkCn ! 0

Polynomial is Universal approximation!

8f 2 Ck(X ),

© Wikipedia

Page 17: Harmonic Analysis and Deep Learning

Stone-Weierstrass theorem

Even we can approximate derivatives!

Universal approximation = {DL, polynomials, Tree,…}

Polynomial is Universal approximation!

9Pn s.t. limn!1

kf � PnkCn ! 0

8f 2 Ck(X ),

© Wikipedia

Page 18: Harmonic Analysis and Deep Learning

Stone-Weierstrass theorem

Even we can approximate derivatives!

Universal approximation = {DL, polynomials, Tree,…}

But why we do not use polynomial?

Polynomial is Universal approximation!

9Pn s.t. limn!1

kf � PnkCn ! 0

8f 2 Ck(X ),

© Wikipedia

Page 19: Harmonic Analysis and Deep Learning

Local interpolation works well for low dimension© S. Mallat

Page 20: Harmonic Analysis and Deep Learning

Local interpolation works well for low dimension

Need "�d points to cover [0, 1]d at a distance "

© S. Mallat

Page 21: Harmonic Analysis and Deep Learning

Local interpolation works well for low dimension

Need "�d points to cover [0, 1]d at a distance "

High dimension ⇢ Curse of dimension!

© H. Bölcskei

Page 22: Harmonic Analysis and Deep Learning

Universal approximator= Good feature extractor?

Page 23: Harmonic Analysis and Deep Learning

Universal approximator= Good feature extractor\

…in HIGH dimension!

Page 24: Harmonic Analysis and Deep Learning

Nonlinear Feature Extraction© S. Mallat, © H. Bölcskei

Page 25: Harmonic Analysis and Deep Learning

Dimension Reduction ⇢ Invariants© S. Mallat

Page 26: Harmonic Analysis and Deep Learning

Dimension Reduction ⇢ Invariants

How?© S. Mallat

Page 27: Harmonic Analysis and Deep Learning

Main Topic in Harmonic Analysis

Linear operator ⇢ Convolution + Multiplier

Invariance vs Discriminability

L[f ](x) = hTx[K], fi () dL[f ](!) = bK(!) bf(!)

Page 28: Harmonic Analysis and Deep Learning

Main Topic in Harmonic AnalysisL[f ](x) = hTx[K], fi () dL[f ](!) = bK(!) bf(!)

Linear operator ⇢ Convolution + Multiplier

Invariance vs Discriminability

Page 29: Harmonic Analysis and Deep Learning

Main Topic in Harmonic AnalysisL[f ](x) = hTx[K], fi () dL[f ](!) = bK(!) bf(!)

Linear operator ⇢ Convolution + Multiplier

Discriminability vs InvarianceLittlewood-Paley Condition ⇢ Semi-discrete Frame

AkfkH kL[f ]kH BkfkH

Page 30: Harmonic Analysis and Deep Learning

Main Topic in Harmonic AnalysisL[f ](x) = hTx[K], fi () dL[f ](!) = bK(!) bf(!)

AkfkH kL[f ]kH BkfkH

Linear operator ⇢ Convolution + Multiplier

Discriminability vs InvarianceLittlewood-Paley Condition ⇢ Semi-discrete Frame

kL[f1]� L[f2]kH = kL[f1 � f2]kH � Akf1 � f2kHi.e. f1 6= f2 ) L[f1] 6= L[f2]

Page 31: Harmonic Analysis and Deep Learning

Main Topic in Harmonic AnalysisL[f ](x) = hTx[K], fi () dL[f ](!) = bK(!) bf(!)

AkfkH kL[f ]kH BkfkH

Linear operator ⇢ Convolution + Multiplier

Discriminability vs InvarianceLittlewood-Paley Condition ⇢ Semi-discrete Frame

kL � · · · � L| {z }n-fold

[f ]kH BkL � · · · � L| {z }(n-1)-fold

[f ]kH · · · BnkfkH

Page 32: Harmonic Analysis and Deep Learning

Main Topic in Harmonic AnalysisL[f ](x) = hTx[K], fi () dL[f ](!) = bK(!) bf(!)

AkfkH kL[f ]kH BkfkH

Linear operator ⇢ Convolution + Multiplier

Discriminability vs InvarianceLittlewood-Paley Condition ⇢ Semi-discrete Frame

kL � · · · � L| {z }n-fold

[f ]kH BkL � · · · � L| {z }(n-1)-fold

[f ]kH · · · BnkfkH

Banach fixed-point t

heorem

Page 33: Harmonic Analysis and Deep Learning

Main Tasks in Deep CNN

Representation learning

Feature Extraction

Nonlinear transform

Page 34: Harmonic Analysis and Deep Learning

Main Tasks in Deep CNN

Representation learning

Feature Extraction

Nonlinear transform

Page 35: Harmonic Analysis and Deep Learning

Main Tasks in Deep CNN

Representation learning

Feature Extraction

Nonlinear transformLipschitz continuity

ex) ReLU, tanh, sigmoid …

|f(x)� f(y)| Ckx� yk () krf(x)k C

Page 36: Harmonic Analysis and Deep Learning

How to control Lipschitz ?

k⇢(L[f ])kH N(B,C)kfkHTheorem

No change in Invariance!

Page 37: Harmonic Analysis and Deep Learning

k⇢(L[f ])kH N(B,C)kfkH

Proof)

No change in Invariance!

Let ⇢ = ReLU,H = W 12 . Then

Theorem

How to control Lipschitz ?

Page 38: Harmonic Analysis and Deep Learning

k⇢(L[f ])kH N(B,C)kfkH

Proof)

No change in Invariance!

Let ⇢ = ReLU,H = W 12 . Then

Theorem

k⇢(L[f ])kW 12= kmax{L[f ], 0}kL2 + kr⇢(L[f ])kL2

kL[f ]kL2 + k ⇢0(L[f ])| {z }=1 or 0

r(L[f ])kL2

kL[f ]kL2 + kr(L[f ])kL2 = kL[f ]kW 12 BkfkW 1

2

How to control Lipschitz ?

Page 39: Harmonic Analysis and Deep Learning

k⇢(L[f ])kH N(B,C)kfkH

Proof)

No change in Invariance!

Let ⇢ = ReLU,H = W 12 . Then

Theorem

k⇢(L[f ])kW 12= kmax{L[f ], 0}kL2 + kr⇢(L[f ])kL2

kL[f ]kL2 + k ⇢0(L[f ])| {z }=1 or 0

r(L[f ])kL2

kL[f ]kL2 + kr(L[f ])kL2 = kL[f ]kW 12 BkfkW 1

2

How to control Lipschitz ?

Page 40: Harmonic Analysis and Deep Learning

k⇢(L[f ])kH N(B,C)kfkH

Proof)

No change in Invariance!

Let ⇢ = ReLU,H = W 12 . Then

Theorem

k⇢(L[f ])kW 12= kmax{L[f ], 0}kL2 + kr⇢(L[f ])kL2

kL[f ]kL2 + k ⇢0(L[f ])| {z }=1 or 0

r(L[f ])kL2

kL[f ]kL2 + kr(L[f ])kL2 = kL[f ]kW 12 BkfkW 1

2

How to control Lipschitz ?

Page 41: Harmonic Analysis and Deep Learning

k⇢(L[f ])kH N(B,C)kfkH

Proof)

No change in Invariance!

Let ⇢ = ReLU,H = W 12 . Then

Theorem

k⇢(L[f ])kW 12= kmax{L[f ], 0}kL2 + kr⇢(L[f ])kL2

kL[f ]kL2 + k ⇢0(L[f ])| {z }=1 or 0

r(L[f ])kL2

kL[f ]kL2 + kr(L[f ])kL2 = kL[f ]kW 12 BkfkW 1

2

How to control Lipschitz ?

Page 42: Harmonic Analysis and Deep Learning

k⇢(L[f ])kH N(B,C)kfkH

Proof)

No change in Invariance!

Let ⇢ = ReLU,H = W 12 . Then

Theorem

k⇢(L[f ])kW 12= kmax{L[f ], 0}kL2 + kr⇢(L[f ])kL2

kL[f ]kL2 + k ⇢0(L[f ])| {z }=1 or 0

r(L[f ])kL2

kL[f ]kL2 + kr(L[f ])kL2 = kL[f ]kW 12 BkfkW 1

2

How to control Lipschitz ?

Page 43: Harmonic Analysis and Deep Learning

k⇢(L[f ])kH N(B,C)kfkH

Proof)

No change in Invariance!

Let ⇢ = ReLU,H = W 12 . Then

Theorem

k⇢(L[f ])kW 12= kmax{L[f ], 0}kL2 + kr⇢(L[f ])kL2

kL[f ]kL2 + k ⇢0(L[f ])| {z }=1 or 0

r(L[f ])kL2

kL[f ]kL2 + kr(L[f ])kL2 = kL[f ]kW 12 BkfkW 1

2

How to control Lipschitz ?

What about Discriminability?

Page 44: Harmonic Analysis and Deep Learning

Scale Invariant Feature

Translation Invariant

Stable at Deformation

© S. Mallat

Page 45: Harmonic Analysis and Deep Learning

Scale Invariant Feature

Translation Invariant

Stable at Deformation

Page 46: Harmonic Analysis and Deep Learning

Scattering Network (Mallat, 2012)

�(f) =[

n

(����� · · ·���|f ⇤ g�(j) | ⇤ g�(k)

��� · · · ⇤ g�(p)

�����| {z }

n-fold convolution

⇤�n

)

�(j),··· ,�(p)

© H. Bölcskei

Page 47: Harmonic Analysis and Deep Learning

Generalized Scattering Network (Wiatowski, 2015)

�(f) =[

n

(����� · · ·���|f ⇤ g�(j) | ⇤ g�(k)

��� · · · ⇤ g�(p)

�����| {z }

n-fold convolution

⇤�n

)

�(j),··· ,�(p)

Gabor frame

Tensor wavelet Directional wavelet

Ridgelet frame Curvelet frame

© H. Bölcskei

Page 48: Harmonic Analysis and Deep Learning

Generalized Scattering Network (Wiatowski, 2015)

�(f) =[

n

(����� · · ·���|f ⇤ g�(j) | ⇤ g�(k)

��� · · · ⇤ g�(p)

�����| {z }

n-fold convolution

⇤�n

)

�(j),··· ,�(p)

© S. Mallat

Page 49: Harmonic Analysis and Deep Learning

Generalized Scattering Network (Wiatowski, 2015)

�(f) =[

n

(����� · · ·���|f ⇤ g�(j) | ⇤ g�(k)

��� · · · ⇤ g�(p)

�����| {z }

n-fold convolution

⇤�n

)

�(j),··· ,�(p)

Linearize symmetries

© S. Mallat

Page 50: Harmonic Analysis and Deep Learning

Generalized Scattering Network (Wiatowski, 2015)

�(f) =[

n

(����� · · ·���|f ⇤ g�(j) | ⇤ g�(k)

��� · · · ⇤ g�(p)

�����| {z }

n-fold convolution

⇤�n

)

�(j),··· ,�(p)

Linearize symmetries

“Space folding”, Cho (2014)

© S. Mallat

Page 51: Harmonic Analysis and Deep Learning

�(f) =[

n

(����� · · ·���|f ⇤ g�(j) | ⇤ g�(k)

��� · · · ⇤ g�(p)

�����| {z }

n-fold convolution

⇤�n

)

�(j),··· ,�(p)

f 7! Sd/2n Pn(f)(Sn·)

|k�n(Ttf)� �n(f)|k = O

ktkQnj=1 Sj

!Theorem

Generalized Scattering Network (Wiatowski, 2015)

Page 52: Harmonic Analysis and Deep Learning

f 7! Sd/2n Pn(f)(Sn·)

|k�n(Ttf)� �n(f)|k = O

ktkQnj=1 Sj

!Theorem

Features become more translation invariant with increasing network depth

Generalized Scattering Network (Wiatowski, 2015)

Page 53: Harmonic Analysis and Deep Learning

Generalized Scattering Network (Wiatowski, 2015)

© Philip Scott Johnson

�(f) =[

n

(����� · · ·���|f ⇤ g�(j) | ⇤ g�(k)

��� · · · ⇤ g�(p)

�����| {z }

n-fold convolution

⇤�n

)

�(j),··· ,�(p)

Theorem

F⌧,! = e2⇡i!(x)f(x� ⌧(x))

|k�(F⌧,![f ])� �(f)k| C(k⌧k1 + k!k1)kfkL2

Page 54: Harmonic Analysis and Deep Learning

Generalized Scattering Network (Wiatowski, 2015)

© Philip Scott Johnson

Theorem

F⌧,! = e2⇡i!(x)f(x� ⌧(x))

|k�(F⌧,![f ])� �(f)k| C(k⌧k1 + k!k1)kfkL2

Multi-layer convolution linearize Featuresi.e. stable to deformations

Page 55: Harmonic Analysis and Deep Learning

Generalized Scattering Network (Wiatowski, 2015)

© Philip Scott Johnson

Page 56: Harmonic Analysis and Deep Learning

Ergodic Reconstructions

© Philip Scott Johnson

© S. Mallat

Page 57: Harmonic Analysis and Deep Learning

David Hilbert

Wir müssen wissen.

Wir werden wissen.

Page 58: Harmonic Analysis and Deep Learning

Q.A