Normalization 방법

  • Published on
    12-Apr-2017

  • View
    2.555

  • Download
    2

Embed Size (px)

Transcript

PowerPoint

Layer NormalizationJimmy Lei BaJamie Ryan KirosGeoffrey E.Hinton : Nishida Geio :

Layer Normalization Batch NormalizationWeight NormalizationLayer Normalization 3

State-of-the-art DNN ! Batch Normalization .1.Batch size ( mini-batch OK)2. Test 3. RNN Layer Normalization

: DNN Bottle-neck

VGG net(Karen Simonyan et al. ICLR 2015)16/19CNN+FCGPU: Titan Black 4 :2~3

https://arxiv.org/pdf/1509.07627v1.pdfGoogles Neural Machine Translation system (Yonghui Wu et al. 2016)8 LSTM RNNGPU: Tesla k80 96:1

:

DNN (Jeffrey Dean et al. NIPS 2012) DNN SW

2 cost

: Ill-conditioned problem

Parameter . Batch Normalization(Sergey Ioffe et al. NIPS 2015) (Saturation Actuation )

: Well-conditioned problem

: ?

(Activation Function) (Saturation Region) 0 . 0

!ReLU(x) = max(x,0)

Q. DNN ?A. .(internal) covariate shift

: ?

. DNN : covariate shift ( )

Test ) test Test . , vol.13, no3, pp.111-118, 2006

: parameter

DNN

:0 1

0,1

parameter

.(Internal Covariate Shift) : covariate shift ( )

(Internal) DNN

0, 1

0, 1

p(=0,=1)

q(p) r(q)

: covariate shift ( ) parameter

()DNN

q(p)

: covariate shift ( ) parameter 0, 1 0, 1 p(=0,=1) q(p) r(q)

(Internal) DNN

p ( )()DNN : covariate shift ( ) parameter

(Internal) DNN 0, 1 0, 1 p(=0,=1) q(p) r(q)q(p)

q(p)q(p)r(q)r(q)

p(=0,=1)p

q(p)! ()DNN : covariate shift ( ) parameter

(Internal) DNN 0, 1 0, 1 q(p)

q(p)q(p)r(q)r(q)p(=0,=1)p q(p)! ()DNN : covariate shift ( ) parameter

(Internal) DNN 0, 1 0, 1 q(p)

:

approachBatch Normalization(Sergey Ioffe et al. NIPS 2015)Weight Normalization(Tim Salimans et al. NIPS 2016)Layer Normalization(Jimmy Lei Ba et al. NIPS 2016)

norm

norm

norm 0, 1 0, 1

Batch Normalization

: = : = ( + )BN : unit

hi

f(ai+bi)biaiwix

BN : unit

aiwix

f(ai+bi)

norm

bi

hi

ai

xwi

f(ai+bi)

norm

bi

hi

ai

batch size

BN : unit batch

Batch Normalization

BN :

Mega !ImageNet classificationGoogLeNet x-14

BN :

BN scale scale BN scale Jacobi 1/( )scale()

Weight Normalization

WN :

Weight Normalization

mini-batch sample RNN CNN BN low-cost

= [t], WN : unit

Weight Normalization = +

Layer Normalization

LN :

Layer Normalization mini-batch sample RNN CNN Batch Norm ?

ai

xwi

f(ai+bi)

norm

bi

hi

xwi+1

f(ai+1+bi+1)bi+1

hi+1ai+1

BN batch size LN : unit

Layer Normalization

H

LN : batch size

LN

batch-size 128batch-size 4

BN

LN : LN LSTM

Q&A

BN RNN

LN : LN WN LSTM

DRAW (MNIST ) BN LN Baseline LN ? .

(Tim Salimans et al. 2016)(Jimmy Lei Ba et al. 2016)

BN WN LN

? ? ? ? ? Capture the curvature of the manifold structure implicitly from the fisher information volume.

norm

norm

norm

~LN ~

BN WN LN

~LN ~

BN WN LN

If we consider that learning moves around manifolds, the curvature is implicitly captured by Fisher information quantity matrixApproximate curvature quadratic form ds2 with Fisher information quantity matrixThe network model is approximated by Generalized Linear Model(GLM) LN parameter BN WN LN

GLM Generalized Linear Model (GLM) F()

BN WN LN

GLM

F()LN F()

GLM BN WN LN

F()LN F()

( )BN WN LN

GLM

LN F()

F()Wi : wii wi 1/2 .BN WN LN

GLM

Gain ~BN,LN~ parameter

(BN) (LN)Gain update

BN WN LN

~WN~ parameter (WN)Gain updat (ai) BN LN Gain - WN

BN WN LN

Batch Norm(Sergey Ioffe et al. NIPS 2015)Weight Norm(Tim Salimans et al. NIPS 2016)Layer Norm(Jimmy Lei Ba et al. NIPS 2016)ProsBatch size scale sift robust( ) scale robust updated scale CNN scale robust mini-batch RNN scale robust scale Sift robust updated scale Cons RNN,LSTM test CNN (BN )

BN WN LN