54
N eural N etw orks for D ata Science A pplications M aster’s D egree in D ata Science Lecture 4: Feedforw ard neural netw orks Lecturer : S. Scardapane

Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

  • Upload
    others

  • View
    18

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

N e u r a l N e t w o r k s f o r D a t a S c i e n c e A p p l i c a t i o n s

M a s t e r ’s D e g r e e i n D a t a S c i e n c e

L e c t u r e 4 : F e e d f o r w a r d n e u r a l n e t w o r k s

L e c t u r e r : S . S c a r d a p a n e

Page 2: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

C o n t e n t f o r t h i s l e c t u r e

1 . W e b u i l d u p o n l i n e a r m o d e l s t o i n t r o d u c e f e e d f o r w a r d n e u r a l n e t w o r k s .

2 . W e c o n s i d e r t h e p r o b l e m o f c o m p u t i n g t h e g r a d i e n t v i a r e v e r s e - m o d e a u t o -

m a t i c d i f f e r e n t i a t i o n ( b a c k - p r o p a g a t i o n ) .

3 . W e d e s c r i b e s o m e a c t i v a t i o n f u n c t i o n s a n d t h e p r o b l e m o f

v a n i s h i n g a n d e x p l o d i n g g r a d i e n t s .

2

Page 3: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

F e e d f o r w a r d n e u r a l n e t w o r k s

L i m i t a t i o n s o f l i n e a r m o d e l s : t h e X O R

f u n c t i o n

Page 4: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

I n w h a t i s a l i n e a r m o d e l l i m i t e d ?

T o u n d e r s t a n d t h e l i m i t a t i o n s o f a l i n e a r m o d e l f (x), c o n s i d e r a n i n p u t v e c t o r bx, e q u a lt o x b u t f o r a s i n g l e f e a t u r e bxi = 2xi ( e . g . , a c l i e n t w i t h d o u b l e i n c o m e ) .

T h e o u t p u t o n t h e t w o i n p u t s i s r e l a t e d b y :

f (bx) = f (x) +wixi . ( 1 )

E f f e c t s a r e l i n e a r l y s u p e r - i m p o s e d : i t i s i m p o s s i b l e t o m o d e l s o m e f o r m o f i n t e r a c -

t i o n b e t w e e n t w o f e a t u r e s ( e . g . , ‘ a c l i e n t w i t h 10k i n c o m e i s n o t t r u s t w o r t h y , u n l e s s

h e i s v e r y y o u n g ’ ) .

3

Page 5: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

T h e X O R f u n c t i o n

C o n s i d e r t h e X O R p r o b l e m , f o r t w o b i n a r y i n p u t s x1 a n d x2 :

X O R (x1, x2) =

(

0 i f (x1 = x2 = 0) o r (x1 = x2 = 1) ,

1 o t h e r w i s e .( 2 )

T h i s h a s o n l y f o u r t r a i n i n g p a t t e r n s t h a t w e c a n e n u m e r a t e :

(0,0), (0, 1), (1,0), (1, 1)

.

H o w e v e r , w e c a n n o t l e a r n t h e X O R b y l o g i s t i c r e g r e s s i o n : w e s a y X O R i s n o t l i n e a r l y

s e p a r a b l e .

4

Page 6: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

V i s u a l i z i n g t h e X O R f u n c t i o n

1

-1

1-1

F i g u r e 1 : V i s u a l i z a t i o n o f t h e X O R f u n c t i o n : y = 0 ( r e d c i r c l e s ) v s . y = 1 ( b l u e s q u a r e s ) . N o

s t r a i g h t l i n e c a n p e r f e c t l y d i v i d e t h e t w o .5

Page 7: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

F e e d f o r w a r d n e u r a l n e t w o r k s

I n t r o d u c i n g h i d d e n l a y e r s

Page 8: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

M o d e l l i n g X O R a s a s e q u e n c e o f s t e p s

I n t e r e s t i n g l y , t h e X O R c a n b e d e c o m p o s e d i n t o s i m p l e r l o g i c a l e x p r e s s i o n s , e a c h o f

w h i c h a l l o w s l i n e a r m o d e l i n g :

F i g u r e 2 : X O R u s i n g N A N D a n d N O R g a t e s ( s o u r c e : W i k i M e d i a , a u t h o r C r y s t a l l i z e d c a r b o n ) .

V e r y l o o s e l y , t h i s c o r r e s p o n d s t o t h e f o l l o w i n g r e a s o n i n g :

1 . C h e c k i f o n e o f t h e t w o i n p u t s i s t r u e ;

2 . C h e c k i f b o t h a r e t r u e s i m u l t a n e o u s l y ;

3 . C o m b i n e t h e t w o b i t s o f i n f o r m a t i o n .

6

Page 9: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

L e a r n i n g f e a t u r e s v s . l e a r n i n g c l a s s i fi e r s

L a y e r i n g d e c i s i o n s c o u l d b e a g o o d w a y o f i n t r o d u c i n g n o n - l i n e a r i t y b y m o d e l c o m -

p o s i t i o n :

É F i r s t l e a r n a m o d e l g(x) w i t h a s e t o f i n t e r e s t i n g ‘ u s e f u l ’ f e a t u r e s h =�

h1, . . . ,hp�T

.

É T h e n a p p l y a l i n e a r m o d e l wTh o n t o p o f i t .

E v e n i f a l l m o d e l s a r e l i n e a r i n t h e i r i n p u t s , t h e i r c o m b i n a t i o n c a n c r e a t e s i g n i fi c a n t

c o m p l e x i t y i n t h e fi n a l f u n c t i o n .

W e w o u l d l i k e t h e o v e r a l l m o d e l t o b e t r a i n a b l e s i m u l t a n e o u s l y ( e n d - t o - e n d ) w i t h

g r a d i e n t d e s c e n t , s o h o w s h o u l d w e m o d e l g(x)?

7

Page 10: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

L e a r n i n g i n t e r m e d i a t e c o m p u t a t i o n s

h1

h2

x1

x2

?

g(x)

F i g u r e 3 : H o w c a n w e d e fi n e t h e i n t e r m e d i a t e m o d e l g(x)?

8

Page 11: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

T h e i n c o r r e c t w a y : l a y e r i n g f u l l y - l i n e a r m o d e l s

C a n w e l a y e r t w o l i n e a r m o d e l s ?

h = Wx , ( 3 )

f (h) = wTh . ( 4 )

W e c o u l d , b u t t h i s i s e q u i v a l e n t t o a s i n g l e l i n e a r m o d e l :

f (x) =�

wTW�

x . ( 5 )

W e n e e d s o m e w a y o f s e p a r a t i n g t h e t w o l i n e a r o p e r a t i o n s .

9

Page 12: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

F e e d f o r w a r d n e u r a l n e t w o r k s

W e c a n a v o i d t h e ‘ c o l l a p s e ’ o f t h e t w o l i n e a r m o d e l s b y i n t e r l e a v i n g t h e m w i t h s o m e

e l e m e n t - w i s e n o n l i n e a r i t y ϕ:

h = ϕ(Wx) , ( 6 )

f (h) = wTh . ( 7 )

T h i s i s t h e p r o t o t y p e o f a f e e d f o r w a r d n e u r a l n e t w o r k ( N N ) , s o m e t i m e s k n o w n a s a

m u l t i l a y e r p e r c e p t r o n .

1 0

Page 13: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

V i s u a l i z a t i o n o f a f e e d f o r w a r d N N

�( )ℎ1

x1

x2

h1

h2 �( )ℎ2

F i g u r e 4 : V i s u a l i z a t i o n o f a s i m p l e f e e d f o r w a r d N N . I n g e n e r a l , w e w i l l n e v e r s h o w t h e

n o n l i n e a r i t i e s e x p l i c i t l y i n t h e g r a p h s f r o m n o w o n .

1 1

Page 14: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

F e e d f o r w a r d n e u r a l n e t w o r k s

T r a i n i n g a n d c o m p l e x i t y c o n s i d e r a t i o n s

Page 15: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

T r a i n i n g t h e n e t w o r k

T r a i n i n g o f t h e n e t w o r k c a n b e p r o c e e d s i m i l a r l y t o a l i n e a r m o d e l . F o r e x a m p l e ,

g i v e n a d a t a s e t {xi, yi}ni=1 f o r r e g r e s s i o n , w e c a n m i n i m i z e t h e L S l o s s :

W∗,w∗ = argmin1n

n∑

i=1

yi − f (xi)�2. ( 8 )

F o r c l a s s i fi c a t i o n , w e c a n w r a p t h e o u t p u t i n a s i g m o i d a n d m i n i m i z e a c r o s s - e n t r o p y

l o s s .

1 2

Page 16: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

A p p r o x i m a t i o n r e s u l t s *

T h e o r e m 1 . 1 : C y b e n k o ( 1 9 8 9 ) - H o r n i k ( 1 9 9 1 ) - L e s h n o ( 1 9 9 3 )

L e t h(x) b e a c o n t i n u o u s f u n c t i o n d e fi n e d o n a c o m p a c t s u b s e t S ⊂ Rd a n d

ϵ > 0. F o r a s u f fi c i e n t l y l a r g e p, t h e r e e x i s t s a n f (x) a s i n E q . ( 6 ) - ( 7 ) w i t h ph i d d e n u n i t s s u c h t h a t :

|h(x)− f (x)| < ϵ, ∀x ∈ S . ( 9 )

T h i s h o l d s f o r a n y n o n - c o n s t a n t , b o u n d e d , c o n t i n u o u s ϕ.

A n N N o f t h i s f o r m i s a u n i v e r s a l a p p r o x i m a t o r . H o w e v e r , t h i s t h e o r e m d o e s n o t t e l l

u s a b o u t t h e f e a s i b i l i t y f o r a g i v e n p r o b l e m ( e . g . , h o w l a r g e s h o u l d p b e ? ) .

1 3

Page 17: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

C o m p u t a t i o n a l c o m p l e x i t y

B e c a u s e t h e N N c a n b e h i g h l y n o n - c o n v e x , i t s o p t i m i z a t i o n p r o b l e m h a s m u l t i p l e

l o c a l m i n i m u m a n d / o r s a d d l e p o i n t s .

I n f a c t , t r a i n i n g a n e u r a l n e t w o r k i s N P - h a r d , e v e n f o r v e r y s i m p l e a r c h i t e c t u r e s , a n d

h i g h l y d e p e n d e n t o n a g o o d i n i t i a l i z a t i o n .

H i n t

F i n d i n g t h e g l o b a l o p t i m u m r e q u i r e s r u n n i n g G D f r o m a l m o s t e v e r y w h e r e : t h i s i s

s i m i l a r t o a n e x h a u s t i v e s e a r c h .

B l u m , A . a n d R i v e s t , R . L . , 1 9 8 9 . T r a i n i n g a 3 - n o d e n e u r a l n e t w o r k i s N P - c o m p l e t e . I n A d v a n c e s i n n e u r a l i n f o r -

m a t i o n p r o c e s s i n g s y s t e m s ( p p . 4 9 4 - 5 0 1 ) .

1 4

Page 18: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

A d d i n g h i d d e n l a y e r s

N o t h i n g p r e v e n t s u s f r o m a d d i n g a d d i t i o n a l ‘ h i d d e n ’ ( i n t e r m e d i a t e ) l a y e r s :

f (x) = wT ·

h2︷ ︸︸ ︷

ϕ�

Z · ϕ (W · x)︸ ︷︷ ︸

h1

+c�

( 1 0 )

hL1

x1

x2

h11

h12 hL2

...

...

1 5

Page 19: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

P a r a m e t e r s a n d h y p e r - p a r a m e t e r s

D i f f e r e n t l y f r o m a l i n e a r m o d e l , a N N h a s s e v e r a l d e s i g n c h o i c e s t h a t w e h a v e f r e e -

d o m o n :

É T h e n o n l i n e a r i t y ( s o m e t i m e s c a l l e d t h e a c t i v a t i o n f u n c t i o n ) ;

É T h e d i m e n s i o n a l i t y o f t h e h i d d e n l a y e r s ;

É S e v e r a l o t h e r s t h a t w e w i l l i n t r o d u c e i n t h e n e x t l e c t u r e s .

W e c a l l t h e s e h y p e r - p a r a m e t e r s t o d i f f e r e n t i a t e t h e m f r o m p a r a m e t e r s ( w e i g h t s ) t o

b e t r a i n e d v i a G D .

C h o o s i n g t h e c o r r e c t s e t o f h y p e r - p a r a m e t e r s i s c a l l e d t h e m o d e l s e l e c t i o n / h y p e r -

p a r a m e t e r o p t i m i z a t i o n p r o b l e m .

1 6

Page 20: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

P l a y i n g w i t h a n e u r a l n e t w o r k

F i g u r e 5 : h t t p s : / / p l a y g r o u n d . t e n s o r fl o w . o r g / .

1 7

Page 21: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

T r a i n i n g o f t h e n e t w o r k s

S t o c h a s t i c o p t i m i z a t i o n

Page 22: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

H a n d l i n g l a r g e d a t a s e t s

C o n s i d e r t h e s t e p s n e e d e d f o r a s i n g l e i t e r a t i o n o f G D :

1 . C o m p u t i n g t h e o u t p u t o f t h e N N f (x) o n a l l e x a m p l e s ;

2 . C o m p u t i n g t h e g r a d i e n t o f t h e c o s t f u n c t i o n w i t h r e s p e c t t o t h e w e i g h t s .

B o t h t h e s e o p e r a t i o n s s c a l e l i n e a r l y i n t h e n u m b e r o f e x a m p l e s , w h i c h i s i n f e a s i b l e

w h e n w e h a v e 105 o r e v e n m o r e e x a m p l e s .

I n p r a c t i c e , t o t r a i n N N s w e u s e s t o c h a s t i c v e r s i o n s o f G D , t h a t a p p r o x i m a t e t h e r e a l

g r a d i e n t f r o m s m a l l e r a m o u n t s o f d a t a .

1 8

Page 23: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

S t o c h a s t i c G D

T h e k e y o b s e r v a t i o n i s t h a t t h e t r a i n i n g p r o b l e m i n N N s i s a n e x p e c t a t i o n w i t h r e -

s p e c t t o a l l d a t a (θ i s t h e s e t o f w e i g h t s ) :

J(θ) =1n

n∑

i=1

yi − f (xi)�2. ( 1 1 )

T o l i m i t t h e c o m p l e x i t y , w e u s e o n l y a m i n i - b a t c h B o f M e x a m p l e s f r o m t h e f u l l

d a t a s e t :

J(θ) ≈eJ(θ) =1M

i∈B

yi − f (xi)�2. ( 1 2 )

W e c a n u s e t h e g r a d i e n t f r o m t h e a p p r o x i m a t e d f u n c t i o n t o p e r f o r m a n i t e r a t i o n o f

G D .

1 9

Page 24: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

C h a r a c t e r i s t i c s o f s t o c h a s t i c G D

T h e p r e v i o u s a l g o r i t h m i s c a l l e d s t o c h a s t i c g r a d i e n t d e s c e n t ( S G D ) .

T h e c o m p u t a t i o n a l c o m p l e x i t y o f a n i t e r a t i o n o f S G D i s fi x e d w i t h r e s p e c t t o B ( t h e

b a t c h s i z e ) a n d d o e s n o t d e p e n d o n t h e s i z e o f t h e d a t a s e t .

B e c a u s e w e a s s u m e d t h a t s a m p l e s a r e i . i . d . , w e c a n p r o v e S G D a l s o c o n v e r g e s t o

a s t a t i o n a r y p o i n t i n a v e r a g e , a l b e i t w i t h n o i s y s t e p s .

2 0

Page 25: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

V i s u a l i z a t i o n o f t h e l o s s

F i g u r e 6 : S G D w i l l c o n v e r g e o n a v e r a g e t o a s t a t i o n a r y p o i n t , a l t h o u g h i n a n a p p a r e n t l y

n o i s y f a s h i o n ( S o u r c e : E n g M R K ) .

2 1

Page 26: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

E x t r a c t i n g b a t c h e s f r o m t h e d a t a

I n o r d e r t o e x t r a c t m i n i - b a t c h e s f r o m o u r d a t a s e t , w e c a n a p p l y a s i m p l e p r o c e d u r e :

1 . S h u f fl e t h e f u l l d a t a s e t ;

2 . S p l i t t h e d a t a s e t i n t o b l o c k s o f M e l e m e n t s a n d p r o c e s s t h e m s e q u e n t i a l l y ;

3 . A f t e r t h e l a s t b l o c k , r e t u r n t o p o i n t ( 1 ) a n d i t e r a t e .

A f u l l p a s s o v e r t h e d a t a s e t i s c a l l e d a n e p o c h .

2 2

Page 27: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

E x t r a c t i n g b a t c h e s f r o m t h e d a t a

Mini-batch#1

Mini-batch#2 ... Mini-batch

#B

Epoch

Mini-batch#(B+1)

Mini-batch#(B+2) ... Mini-batch

#2B

Shuf

fling

Dataset

(Shuffled) Dataset

...

F i g u r e 7 : D i v i d i n g t h e o p t i m i z a t i o n p r o c e s s i n t o e p o c h s i s a l s o h e l p f u l t o d e fi n e m e t r i c s

a n d s t o p p i n g c r i t e r i a .

2 3

Page 28: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

C h o o s i n g t h e b a t c h s i z e

T h e l a s t c h o i c e i s t h e s i z e o f t h e m i n i - b a t c h :

É S m a l l e r m i n i - b a t c h e s a r e f a s t e r , b u t t h e n e t w o r k m i g h t r e q u i r e m o r e i t e r a t i o n s

t o c o n v e r g e b e c a u s e t h e g r a d i e n t i s n o i s i e r .

É L a r g e r m i n i - b a t c h e s p r o v i d e a m o r e r e l i a b l e e s t i m a t i o n o f t h e g r a d i e n t , b u t a r e

s l o w e r .

I n g e n e r a l , i t i s t y p i c a l t o c h o o s e p o w e r - o f - t w o s i z e s ( 3 2 , 6 4 , 1 2 8 , . . . ) d e p e n d i n g o n

t h e h a r d w a r e c o n fi g u r a t i o n a n d t h e t o t a l m e m o r y a v a i l a b l e .

I n t h e l i m i t M = 1 w e o b t a i n a n o n l i n e ( s t r e a m i n g ) o p t i m i z a t i o n .

2 4

Page 29: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

T r a i n i n g o f t h e n e t w o r k s

C o m p u t i n g t h e g r a d i e n t

Page 30: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

A b r i e f s u m m a r y

L e t u s s u m m a r i z e t h e s t e p s u p t o n o w :

1 . D e fi n e s o m e m o d e l f (x), w i t h s o m e p a r a m e t e r s θ.

2 . D e fi n e a n o p t i m i z a t i o n p r o b l e m J(θ) t o m i n i m i z e t h e e r r o r .

3 . R a n d o m l y i n i t i a l i z e θ ( m o r e o n t h i s n e x t l e c t u r e s ) .

4 . I t e r a t i v e l y i m p r o v e θ b y m o v i n g t o w a r d s −∇J.

O n e q u e s t i o n r e m a i n s : h o w d o w e e f fi c i e n t l y a n d a u t o m a t i c a l l y c o m p u t e t h e g r a d i -

e n t o f J?

T e n s o r F l o w d o e s i t f o r u s , b u t u n d e r s t a n d i n g h o w i s k e y t o a s u c c e s s f u l i m p l e m e n -

t a t i o n o f d e e p l e a r n i n g .

2 5

Page 31: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

N e u r a l n e t w o r k s a n d c o m p u t a t i o n a l g r a p h s

W e f o c u s o n s c a l a r v a l u e s f o r o u r e x p l a n a t i o n ( f o r s i m p l i c i t y ) .

I r r e s p e c t i v e o f t h e m o d e l w e c h o o s e , w e c a n a l w a y s r e p r e s e n t o u r o p t i m i z a t i o n

p r o b l e m a s a c o m p u t a t i o n a l g r a p h d e s c r i b i n g t h e s e q u e n c e o f o p e r a t i o n s , a s t h e y

a r e p e r f o r m e d .

U s i n g t h i s f o r m u l a t i o n , w e c a n d e v i s e a n e x t r e m e l y e f fi c i e n t t e c h n i q u e f o r c o m p u t -

i n g t h e g r a d i e n t s w i t h r e s p e c t t o a n y q u a n t i t y i n v o l v e d .

2 6

Page 32: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

S o m e n o t a t i o n

W e u s e t h i s n o t a t i o n :

É x i s t h e i n p u t v a l u e ;

É θ1 , . . . , θp a r e a s e t o f p a r a m e t e r s o f t h e m o d e l ;

É u1 , . . . , un a r e s o m e i n t e r m e d i a t e c o m p u t a t i o n s ( e . g . , t h e o u t p u t o f a n a c t i v a t i o n

f u n c t i o n , a m u l t i p l i c a t i o n , . . . ) ;

É Pi i s t h e s e t o f p a r a m e t e r s t h a t c o n t r i b u t e t o t h e it h o p e r a t i o n .

É z i s t h e o u t p u t ( e . g . , t h e l o s s f u n c t i o n ) .

W e o r d e r t h e o p e r a t i o n s s u c h t h a t ui c a n o n l y d e p e n d o n x a n d u1 , . . . , ui−1 .

2 7

Page 33: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

E x a m p l e o f c o m p u t a t i o n a l g r a p h

�2

u1 u2

z

��1 +�1 �2

�( )�2

x �1

F i g u r e 8 : E x a m p l e o f a s i m p l e c o m p u t a t i o n a l g r a p h c o r r e s p o n d i n g t o a s c a l a r l i n e a r m o d e l

z = ϕ(θ1x+ θ2). I n t h i s c a s e , P1 = {1}, a n d P2 = {2}.

2 8

Page 34: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

C h a i n r u l e

S u p p o s e w e o n l y h a v e a s i n g l e i n t e r m e d i a t e n o d e u1 . I n t h i s c a s e , b y t h e c h a i n r u l e :

dzdθ1

=dzdu1·du1dθ1

. ( 1 3 )

F o r e x a m p l e , f o r u1 = θ1x a n d z = ϕ(u1) w e o b t a i n :

dzdθ1

= ϕ′(u1)x . ( 1 4 )

C a n w e e x t e n d t h i s r e a s o n i n g t o m o r e t h a n a s i n g l e i n t e r m e d i a t e v a l u e ?

2 9

Page 35: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

C h a i n r u l e w i t h m a n y i n t e r m e d i a t e v a l u e s

B y e x p l o i t i n g t h e o r d e r i n g o f t h e i n t e r m e d i a t e v a l u e s , w e c a n g e n e r a l i z e t h e p r e v i o u s

e x p r e s s i o n a s :

dzdθi

=∑

j

dzduj·dujdθi︸︷︷︸

0 i f i /∈Pj

. ( 1 5 )

B e c a u s e t h e o p e r a t i o n s ui a r e o r d e r e d , w e c a n c o m p u t e t h e a d j o i n t v a r i a b l e sdzduj i n

t h e o p p o s i t e o r d e r .

I n i t s s i m p l i c i t y , t h i s b a c k p r o p a g a t i o n a l g o r i t h m a l l o w s t o c o m p u t e t h e d e r i v a t i v e s

o f a n y ( s c a l a r ) c o m p u t a t i o n a l e x p r e s s i o n w i t h r e s p e c t t o a n y i n t e r v e n i n g t e r m s .

3 0

Page 36: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

E x a m p l e o f b a c k p r o p a g a t i o n 1 / 3

�2

u1 u2

z

��1 +�1 �2

�( )�2

x �1

= ( )�′

2�′

�2

F i g u r e 9 : E x a m p l e o f b a c k p r o p a g a t i o n o n t h e p r e v i o u s g r a p h . F i r s t o f a l l , w e c o m p u t e t h e

g r a d i e n t o f t h e l a s t - t o - o u t p u t t e r m s . 3 1

Page 37: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

E x a m p l e o f b a c k p r o p a g a t i o n 2 / 3

�2

u1 u2

z

��1 +�1 �2

�( )�2

x �1

= ( )�′

2�′

�2

�′

2

F i g u r e 1 0 : W h e n a p p r o p r i a t e , w e c a n c o m p u t e t h e g r a d i e n t s f o r a l l t h e p a r a m e t e r s o f t h e

m o d e l . 3 2

Page 38: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

E x a m p l e o f b a c k p r o p a g a t i o n 3 / 3

�2

u1 u2

z

��1 +�1 �2

�( )�2

x �1

= ( )�′

2�′

�2

=�′

1�

2

�′

2��

1

F i g u r e 1 1 : T h i s i n t e r m e d i a t e i n f o r m a t i o n i s p r o p a g a t e d b a c k w a r d s u n t i l t h e b e g i n n i n g o f

t h e g r a p h . 3 3

Page 39: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

S u m m a r i z i n g t h e e x a m p l e

O r i g i n a l

p r o g r a m :

u1 = θ1xu2 = u1 + θ2

z = ϕ(u2)

D i f f e r e n t i a t e d

p r o g r a m :

u′1 = u′2u′2 = ϕ′(u2)

z′ = 1

I n s i d e a tf.GradientTape(), a l l o p e r a t i o n s i n r e d a r e t r a c e d .

W h e n c a l l i n g gradient(), o p e r a t i o n s a r e ‘ u n r o l l e d ’ t o c o m p u t e t h e g r e e n p r o g r a m .

3 4

Page 40: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

D i f f e r e n c e b e t w e e n A D a n d s y m b o l i c d i f f e r e n t a t i o n

F i g u r e 1 2 : D i f f e r e n c e b e t w e e n A D a n d p u r e s y m b o l i c d i f f e r e n t i a t i o n ( S o u r c e : W i k i p e d i a ) .

3 5

Page 41: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

E x t e n d i n g t o v e c t o r - v a l u e d c o m p u t a t i o n s *

N o w s u p p o s e t h e i n p u t v a l u e x a n d t h e i n t e r m e d i a t e c o m p u t a t i o n s ui a r e v e c t o r -

v a l u e d , b u t t h e o u t p u t r e m a i n s s c a l a r .

F o r a s i n g l e i n t e r m e d i a t e v a l u e , w e o b t a i n :

∇θz =�

∂u1∂θ

�Th

∇u1zi

, ( 1 6 )

w h e r e t h e t e r m∂u1∂θ i s t h e J a c o b i a n m a t r i x c o n t a i n i n g a l l p a i r w i s e p a r t i a l g r a d i e n t s .

E s s e n t i a l l y , a g e n e r a l v e c t o r - v a l u e d b a c k p r o p a g a t i o n c a n b e i m p l e m e n t e d b y a s e -

r i e s o f J a c o b i a n - v e c t o r p r o d u c t s .

3 6

Page 42: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

H o w e f fi c i e n t i s b a c k p r o p a g a t i n g ?

T h e p r e v i o u s a l g o r i t h m a l l o w s u s t o c o m p u t e a l l t h e g r a d i e n t s i n a c o s t t h a t i s l i n e a r

w . r . t . c o m p u t i n g t h e o r i g i n a l z ( t y p i c a l l y , 2− 3 t i m e s a s c o s t l y ) .

H o w e v e r , w e a r e r e q u i r e d t o m a i n t a i n a l l i n t e r m e d i a t e c o m p u t a t i o n s i n m e m o r y , s o

b a c k - p r o p a g a t i o n i s m e m o r y - i n t e n s i v e .

I n t h e c o m p u t e r s c i e n c e c o m m u n i t y , b a c k p r o p a g a t i o n i s e q u i v a l e n t t o a m o r e g e n -

e r a l a l g o r i t h m c a l l e d r e v e r s e - m o d e a u t o m a t i c d i f f e r e n t i a t i o n .

3 7

Page 43: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

I m p l e m e n t i n g r e v e r s e - m o d e A D

É T e n s o r F l o w , l i k e m o s t d e e p l e a r n i n g f r a m e w o r k s , i m p l e m e n t r e v e r s e - m o d e A D

b y a m i x o f a d o m a i n - s p e c i fi c l a n g u a g e a n d o p e r a t o r o v e r l o a d i n g .1

É T h e T F r u n t i m e e x e c u t o r t r a n s l a t e s t h e i n s t r u c t i o n s t o a n h a r d w a r e - d e p e n d e n t

o p t i m i z e d i m p l e m e n t a t i o n .

É T h i s i s a n a r e a o f a c t i v e r e s e a r c h ( e . g . , M y i a , G o o g l e T a n g e n t , F l u x , . . . ) .

1A g r a w a l , A . e t a l . , 2 0 1 9 . T e n s o r F l o w e a g e r : A m u l t i - s t a g e , P y t h o n - e m b e d d e d D S L f o r m a c h i n e l e a r n i n g . a r X i v

p r e p r i n t a r X i v : 1 9 0 3 . 0 1 8 5 5 .

3 8

Page 44: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

T r a i n i n g o f t h e n e t w o r k s

C h o o s i n g a n a c t i v a t i o n f u n c t i o n

Page 45: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

H o w d o w e c h o o s e a n o n l i n e a r i t y ?

U n d e r s t a n d i n g b a c k - p r o p a g a t i o n g i v e s s o m e i n t e r e s t i n g i n s i g h t s i n t o h o w t o c h o o s e

a p r o p e r a c t i v a t i o n f u n c t i o n .

C o n s i d e r t h e a p p l i c a t i o n o f a n e l e m e n t - w i s e n o n l i n e a r i t y :

ui+1 = ϕ (ui) , ( 1 7 )

w h e r e ui i s i t s e l f a n i n t e r m e d i a t e c o m p u t a t i o n . W e h a v e :

dui+1dui−1

= ϕ′ (ui) ·duidui−1

. ( 1 8 )

3 9

Page 46: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

V a n i s h i n g a n d e x p l o d i n g g r a d i e n t s

W h e n e v e r a v a l u e g o e s t h r o u g h a n a c t i v a t i o n f u n c t i o n , d u r i n g b a c k p r o p a g a t i o n w e

m u l t i p l y b y t h e d e r i v a t i v e o f t h e f u n c t i o n .

I f w e h a v e m a n y l a y e r s , w e m a k e a l o t o f t h e s e m u l t i p l i c a t i o n s . A s a r e s u l t :

É I f ϕ′(·) < 1 a l w a y s , t h e g r a d i e n t w i l l g o t o z e r o e x p o n e n t i a l l y f a s t i n t h e n u m b e r

o f l a y e r s ( v a n i s h i n g g r a d i e n t ) .

É I f ϕ′(·) > 1 a l w a y s , t h e g r a d i e n t w i l l e x p l o d e e x p o n e n t i a l l y f a s t i n t h e n u m b e r

o f l a y e r s ( e x p l o d i n g g r a d i e n t ) .

4 0

Page 47: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

D e r i v a t i v e o f t h e s i g m o i d

−10 −5 0 5 10

s

0.0

0.2

0.4

0.6

0.8

1.0

Sig

mo

idan

dit

sd

eriv

ativ

e

σ(s)

σ′(s)

F i g u r e 1 3 : T h e s i g m o i d f u n c t i o n σ(·) i s a p o o r c h o i c e a s a c t i v a t i o n f u n c t i o n ( f o r d e e p

n e t w o r k s ) , b e c a u s e i t s d e r i v a t i v e i s b o u n d e d i n [0,0.25].4 1

Page 48: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

T h e r e c t i fi e d l i n e a r u n i t ( R e L U )

A v e r y c o m m o n c h o i c e f o r d e e p n e t w o r k s i s t h e r e c t i fi e d l i n e a r u n i t ( R e L U ) , d e fi n e d

a s :

ϕ(s) = max (0, s) . ( 1 9 )

I t s d e r i v a t i v e i s e i t h e r 1 ( w h e n e v e r s > 0) , o r 0 o t h e r w i s e .

( T h e R e L U i s n o t d i f f e r e n t i a b l e f o r s = 0, b u t t h i s c a n b e e a s i l y t a k e n c a r e o f w i t h

t h e n o t i o n o f s u b g r a d i e n t s ) .

H i n t

T h e R e L U i s a g o o d d e f a u l t c h o i c e i n m o s t a p p l i c a t i o n s .

4 2

Page 49: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

R e L U a n d i t s d e r i v a t i v e

−3 −2 −1 0 1 2 3

s

0.0

0.5

1.0

1.5

2.0

2.5

3.0

ReL

Uan

dit

sd

eriv

ativ

e

ReLU(s)

Derivative of ReLU(s)

F i g u r e 1 4 : A p l o t o f R e L U a n d i t s d e r i v a t i v e .

4 3

Page 50: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

A n e x a m p l e o f v a n i s h i n g g r a d i e n t

1 2 3 4 5

Layer

10−1

100

Gra

die

nt

nor

mo

fw

eig

ht

mat

rix

Sigmoid

ReLU

F i g u r e 1 5 : W e i n i t i a l i z e a N N w i t h 5 h i d d e n l a y e r s . W e i g h t s a r e s a m p l e d f r o m t h e u n i f o r m

d i s t r i b u t i o n o n [−0.5,0.5]. W e s h o w t h e n o r m o f a t y p i c a l g r a d i e n t ( c r o s s - e n t r o p y o n a f e w

e x a m p l e s ) w i t h s i g m o i d a n d R e L U a c t i v a t i o n f u n c t i o n s .4 4

Page 51: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

A b o u t T e n s o r F l o w

t f . d a t a a n d t f . k e r a s . l a y e r s

Page 52: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

O v e r v i e w : T F 2 . 0 m i d - l e v e l a r c h i t e c t u r e

Low-levelTF(variables,autograd,...)

tf.dataDataprocessing

tf.metricsModelevaluation

tf.keras.layersPremadelayers

tf.keras.optimizersOptimizationalgorithms

Kerashigh-levelinterface

4 5

Page 53: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

A n e x a m p l e o f t f . d a t a

tf.data a l l o w s t o l o a d d a t a o f v a r i o u s t y p e s , a n d a p p l y c o m m o n o p e r a t i o n s s u c h

a s b a t c h i n g , s h u f fl i n g , a n d s o o n :

dataset = tf.data.Dataset.from_tensor_slices((X, y))for xb, yb in dataset.shuffle(1000).batch(32):

# Do something with the batch

T h e r e i s a c o m p r e h e n s i v e g u i d e o n t h e w e b s i t e :

https://www.tensorflow.org/guide/datasets

4 6

Page 54: Neural Netw orks for Data Science Applicationsispac.diet.uniroma1.it/scardapane/wp-content/uploads/...Neural Netw orks for Data Science Applications M aster’s Degree in Data Science

B u i l d i n g m o d e l s w i t h t f . k e r a s . l a y e r s

I n a d d i t i o n , t h e m o d u l e tf.keras.layers a l l o w s t o b u i l d c o m p l e x m o d e l s b y

‘s t a c k i n g ’ m o r e b a s i c b l o c k s :

from tensorflow.keras.models import Sequentialfrom tensorflow.keras.layers import Dense

# A network with one hidden layer of dimension 10network = Sequential([

Dense(10, activation='relu'),Dense(1, activation='sigmoid')

])

4 7