Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Deep Neural NetworksAre Our Friends
Wang Ling
● Part I - Neural Networks are our friends○ Numbers are our friends ○ Operators are our friends○ Functions are our friends○ Parameters are our friends○ Cost Functions are our friends○ Optimizers are our friends○ Gradients are our friends○ Computation Graphs are our friends
Outline
● Part I - Neural Networks are our friends● Part 2 - Into Deep Learning
○ Nonlinear Neural Models○ Multilayer Perceptrons○ Using Discrete Variables○ Example Applications
Outline
Numbers are our friends
Numbers are our friendsAbby Cadabby
How many apples does Abby have?
Numbers are our friends
4
Abby Cadabby
Numbers are our friends● Types of Numbers:
○ Integers : 5○ Rationals : 1/2○ Reals : 1.4e10 ...
Operators are our friends
4
Bert
Operators are our friends
41
Bert
If Abby has 4 apples, and gives Bert 1 apple, how many apples will
Abby have?
Operators are our friends
3 1
Bert
Operators are our friends● Arithmetic Operators
○ Addition : 23 + 12 = 35○ Subtraction : 31 - 15 = 16○ Multiplication : 4 x 5 = 20○ Division : 20 / 5 = 4
Functions are our friends
41
Functions are our friends
4
5?
1
If Bert always returns 3 bananas for each apple, how many bananas will
Abby receive for 2 apples
Functions are our friends
y = 3x
● Input, x - Number of Apples given by Abby
Functions are our friends
y = 3x
● Input, x - Number of Apples given by Abby
● Output, y - Number of Bananas received by Abby
Functions are our friends
4
5?
1
y = 3x
Functions are our friends
4
5?
1
y = 3x , x =1
Functions are our friends
4
53
1
y = 3x , x =1y = 3
Functions are our friendsy = 3x
Functions are our friendsy = 3x
Cookie Monster
Functions are our friendsy = 3x y = ??
Functions are our friendsy = ??
0
1
Functions are our friendsy = ??
0
1
16
5
Functions are our friendsy = ??
0
1
16
5
20
6
Functions are our friendsy = ??
0
1
16
5
20
6
?
3
If Abby gives Cookie Monster 3 apples, how many bananas
does she get?
Parameters are our friends
y = 3x + 1
● Input● Output
Parameters are our friends
y = wx + b
● Input● Output● Parameters
Input - Fixed, comes from dataParameters - Need to be estimated
Parameters are our friendsy = wx + b
0
1
16
5
20
6
?
3
Data
Parameters are our friendsy = wx + b
0
1
16
5
20
6
?
3
Parameters are our friendsy = wx + b
?
3
x y
1 0
5 16
6 20
Parameters are our friends
y = wx + bx y
1 0
5 16
6 20
Data Model
Parameters are our friends
y = wx + bx y
1 0
5 16
6 20
Data Model
How to find the parameters w and b?
Parameters are our friends
y = wx + bx y
1 0
5 16
6 20
Data ModelModel
Candidate 1x y ŷ
1 0 1
5 16 5
6 20 6y = 1x + 0
Parameters are our friends
y = wx + bx y
1 0
5 16
6 20
Data ModelModel
Candidate 1x y ŷ
1 0 1
5 16 5
6 20 6
Model Candidate 2 x y ŷ
1 0 4
5 16 12
6 20 14
y = 1x + 0
y = 2x + 2
Parameters are our friends
y = wx + bx y
1 0
5 16
6 20
Data ModelModel
Candidate 1x y ŷ
1 0 1
5 16 5
6 20 6
Model Candidate 2 x y ŷ
1 0 4
5 16 12
6 20 14
y = 1x + 0
y = 2x + 2Which one is better ?
Cost functions are our friends
yn = wxn + bn x y
0 1 0
1 5 16
2 6 20
Data ModelModel
Candidate 1x y ŷ
1 0 1
5 16 5
6 20 6
Model Candidate 2 x y ŷ
1 0 4
5 16 12
6 20 14
y = 1x + 0
y = 2x + 2
Cost functions are our friends
yn = wxn + bn x y
0 1 0
1 5 16
2 6 20
Data ModelModel
Candidate 1x y ŷ
1 0 1
5 16 5
6 20 6
Model Candidate 2 x y ŷ
1 0 4
5 16 12
6 20 14
y = 1x + 0
y = 2x + 2
Cost
C(w,b) = ∑(yn-ŷn)n∈{0,1,2}
2
Cost functions are our friends
yn = wxn + bn x y
0 1 0
1 5 16
2 6 20
Data ModelModel
Candidate 1
Model Candidate 2 x y ŷ
1 0 4
5 16 12
6 20 14
y = 1x + 0
y = 2x + 2
Cost
C(w,b) = ∑(yn-ŷn)n∈{0,1,2}
2
n x y ŷ (y-ŷ)
0 1 0 1 1
1 5 16 5
2 6 20 6
2
Cost functions are our friends
yn = wxn + bn x y
0 1 0
1 5 16
2 6 20
Data ModelModel
Candidate 1
Model Candidate 2 x y ŷ
1 0 4
5 16 12
6 20 14
y = 1x + 0
y = 2x + 2
Cost
C(w,b) = ∑(yn-ŷn)n∈{0,1,2}
2
n x y ŷ (y-ŷ)
0 1 0 1 1
1 5 16 5 121
2 6 20 6
2
Cost functions are our friends
yn = wxn + bn x y
0 1 0
1 5 16
2 6 20
Data ModelModel
Candidate 1
Model Candidate 2 x y ŷ
1 0 4
5 16 12
6 20 14
y = 1x + 0
y = 2x + 2
Cost
C(w,b) = ∑(yn-ŷn)n∈{0,1,2}
2
n x y ŷ (y-ŷ)
0 1 0 1 1
1 5 16 5 121
2 6 20 6 196
2
Cost functions are our friends
yn = wxn + bn x y
0 1 0
1 5 16
2 6 20
Data ModelModel
Candidate 1
Model Candidate 2 x y ŷ
1 0 4
5 16 12
6 20 14
y = 1x + 0
y = 2x + 2
Cost
C(w,b) = ∑(yn-ŷn)n∈{0,1,2}
2
n x y ŷ (y-ŷ)
0 1 0 1 1
1 5 16 5 121
2 6 20 6 196
2
318C(1,0)
Cost functions are our friends
yn = wxn + bn x y
0 1 0
1 5 16
2 6 20
Data ModelModel
Candidate 1
n x y ŷ (y-ŷ)
0 1 0 1 1
1 5 16 5 121
2 6 20 6 196
Model Candidate 2
y = 1x + 0
y = 2x + 2
Cost
C(w,b) = ∑(yn-ŷn)n∈{0,1,2}
2
2
318
n x y ŷ (y-ŷ)
0 1 0 4 16
1 5 16 12 16
2 6 20 14 36
2
68
C(1,0)
C(2,2)
Cost functions are our friends
yn = wxn + bn x y
0 1 0
1 5 16
2 6 20
Data ModelModel
Candidate 1
Model Candidate 2
y = 1x + 0
y = 2x + 2
Cost
C(w,b) = ∑(yn-ŷn)n∈{0,1,2}
2
318
68
C(1,0)
C(2,2)
Cost functions are our friends
yn = wxn + bn x y
0 1 0
1 5 16
2 6 20
Data Model
Cost
C(w,b) = ∑(yn-ŷn)n∈{0,1,2}
2
Cost functions are our friends
yn = wxn + bn x y
0 1 0
1 5 16
2 6 20
Data Model
Cost
C(w,b) = ∑(yn-ŷn)n∈{0,1,2}
2
How to find the parameters w and b?
Optimizers are our friends
yn = wxn + bn x y
0 1 0
1 5 16
2 6 20
Data Model
Cost
C(w,b) = ∑(yn-ŷn)n∈{0,1,2}
2Optimizer
arg min C(w,b)w,b∈[-∞,∞]
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w
b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w0,b0 = 2,2 : C(w0,b0) = 68
w
b
y = wx + b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w0,b0 = 2,2 : C(w0,b0) = 68
w
b
2
2
68
y = wx + b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w0,b0 = 2,2 : C(w0,b0) = 68w1,b1 = 3,2 : C(w1,b1) = ?
w
b
y = wx + b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w0,b0 = 2,2 : C(w0,b0) = 68w1,b1 = 3,2 : C(w1,b1) = 26
n x y ŷ (y-ŷ)
0 1 0 5 25
1 5 16 17 1
2 6 20 20 0
C(3,2) 26
w
b
2
y = wx + b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w0,b0 = 2,2 : C(w0,b0) = 68w1,b1 = 3,2 : C(w1,b1) = 26
n x y ŷ (y-ŷ)
0 1 0 5 25
1 5 16 17 1
2 6 20 20 0
C(3,2) 26
w
b
2
y = wx + b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w1,b1 = 3,2 : C(w1,b1) = 26w2,b2 = 4,2 : C(w2,b2) = ??
w
b
y = wx + b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w1,b1 = 3,2 : C(w1,b1) = 26w2,b2 = 4,2 : C(w2,b2) = 136
w
b
n x y ŷ (y-ŷ)
0 1 0 6 36
1 5 16 22 64
2 6 20 26 36
C(4,2) 136
2
y = wx + b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w1,b1 = 3,2 : C(w1,b1) = 26
w
b
y = wx + b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w1,b1 = 3,2 : C(w1,b1) = 26w2,b2 = 3,3 : C(w2,b2) = 41
w
b
n x y ŷ (y-ŷ)
0 1 0 6 36
1 5 16 18 4
2 6 20 21 1
C(3,3) 41
2
y = wx + b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w1,b1 = 3,2 : C(w1,b1) = 26
w
b
y = wx + b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w1,b1 = 3,2 : C(w1,b1) = 26w2,b2 = 3,1 : C(w2,b2) = 17
w
b
n x y ŷ (y-ŷ)
0 1 0 4 16
1 5 16 16 0
2 6 20 19 1
C(3,1) 17
2
y = wx + b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w2,b2 = 3,1 : C(w2,b2) = 17
w
b
y = wx + b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w2,b2 = 3,1 : C(w2,b2) = 17
w
b
w3,b3 = 3,0 : C(w3,b3) = 13
n x y ŷ (y-ŷ)
0 1 0 3 9
1 5 16 15 1
2 6 20 18 4
C(3,0) 13
2
y = wx + b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w
b
w3,b3 = 3,0 : C(w3,b3) = 13
y = wx + b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w
b
w3,b3 = 3,0 : C(w3,b3) = 13w4,b4 = 3,-1 : C(w4,b4) = 17
n x y ŷ (y-ŷ)
0 1 0 2 4
1 5 16 14 4
2 6 20 17 9
C(3,-1) 17
2
y = wx + b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w
b
w3,b3 = 3,0 : C(w3,b3) = 13w4,b4 = 2,0 : C(w4,b4) = 104
n x y ŷ (y-ŷ)
0 1 0 2 4
1 5 16 10 36
2 6 20 12 64
C(2,0) 104
2
y = wx + b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w
b
w3,b3 = 3,0 : C(w3,b3) = 13w4,b4 = 4,0 : C(w4,b4) = 104
n x y ŷ (y-ŷ)
0 1 0 4 16
1 5 16 20 16
2 6 20 24 16
C(2,0) 54
2
y = wx + b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w
b
w3,b3 = 3,0 : C(w3,b3) = 13
y = wx + b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w
b
w?,b? = 4,-2 : C(w?,b?) = ??
y = wx + b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w
b
n x y ŷ (y-ŷ)
0 1 0 2 4
1 5 16 18 4
2 6 20 22 4
C(4,-2) 12
2
w?,b? = 4,-2 : C(w?,b?) = 12
y = wx + b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w
b
w3,b3 = 3,0 : C(w3,b3) = 13
y = wx + b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w
b
w3,b3 = 3,0 : C(w3,b3) = 13
Search Problem
y = wx + b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w
b
w3,b3 = 3,0 : C(w3,b3) = 13w4,b4 = 3.01,0 : C(w4,b4) = 12.82
n x y ŷ (y-ŷ)
0 1 0 3.01 9.06
1 5 16 15.01 0.98
2 6 20 18.01 3.96
C(3.01,0) 12.82
2
y = wx + b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w
b
w*,b* = 4,-2 : C(w*,b*) = 12
y = wx + b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w
b
w*,b* = 4,-2 : C(w*,b*) = 12
y = wx + b
Optimizers are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w
b
w*,b* = 4,-4 : C(w*,b*) = 0
y = wx + b
Gradients are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w
b
Should be used sparingly
y = wx + b
Gradients are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w
b
y = wx + b
w0,b0 = 2,2 : C(w0,b0) = 68
2
2
68
Gradients are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w
b
y = wx + b
w0,b0 = 2,2 : C(w0,b0) = 68
2
2
68
hwhw = 1
Gradients are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w
b
y = wx + b
w0,b0 = 2,2 : C(w0,b0) = 68
2
2
68
hwhw = 1C(w0+hw,b0) = C(3,2) = 26
Gradients are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w
b
y = wx + b
w0,b0 = 2,2 : C(w0,b0) = 68
2
2
68
hwhw = 1C(w0+hw,b0) = C(3,2) = 26 (C(w0+1,b0)-C(w0,b0))
(C(3,2)-C(2,2))=-421
1
rw=
rw=
Gradients are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w
b
y = wx + b
w0,b0 = 2,2 : C(w0,b0) = 68
2
2
68
hwhw = 1, r = -42hw = 0.1, r = -98hw = 0.01, r = -104hw = 0.001, r = -104
Gradients are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w
b
y = wx + b
w0,b0 = 2,2 : C(w0,b0) = 68
2
2
68
hwhw = 1, r = -42hw = 0.1, r = -98hw = 0.01, r = -104hw = 0.001, r = -104 ∂C
∂w(w0,b0)hw → 0, r =
Gradients are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w
b
y = wx + b
w0,b0 = 2,2 : C(w0,b0) = 68
2
2
68
hw∂C
∂w=
∂∑(ŷn-yn) 2
∂wn
Gradients are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w
b
y = wx + b
w0,b0 = 2,2 : C(w0,b0) = 68
2
2
68
hw∂C
∂w=
∂∑(ŷn-yn) 2
∂wn = ∑-2(ŷn-yn)xn
n
Gradients are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w0,b0 = 2,2 : C(w0,b0) = 68
∂C
∂w=
∂∑(ŷn-yn) 2
∂wn = ∑-2(ŷn-yn)xn
n
∂w(w0,b0)hw → 0, rw = = -104
∂C
n x y ŷ (ŷ-y) -2(ŷ-y)x
0 1 0 4 4 8
1 5 16 12 -4 -40
2 6 20 14 -6 -72
Gradients are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w
b
y = wx + b
w0,b0 = 2,2 : C(w0,b0) = 68
2
2
68
hw∂C
∂w=
∂∑(ŷn-yn) 2
∂wn = ∑-2(ŷn-yn)xn
n
∂C
∂b=
∂∑(ŷn-yn) 2
∂bn = ∑-2(ŷn-yn)
n
Gradients are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w0,b0 = 2,2 : C(w0,b0) = 68
∂w(w0,b0)hw → 0, rw = = -104
∂C
n x y ŷ (ŷ-y) -2(ŷ-y)
0 1 0 4 4 8
1 5 16 12 -4 -8
2 6 20 14 -6 -12
∂w(w0,b0)hb → 0, rb = = -12
∂C
Gradients are our friendsOptimizer
arg min C(w,b)w,b∈[-∞,∞]
w0,b0 = 2,2 : C(w0,b0) = 68
∂w(w0,b0)hw → 0, rw = = -104
∂C
∂w(w0,b0)hb → 0, rb = = -12
∂C
w
b
y = wx + b
2
2w1 = w0 - rw
b1 = b0 - rb → Learning Rate
Gradients are our friendsy = 4x-4
Data
0
1
16
5
20
6
?
3
Gradients are our friendsy = 4x-4
Data
0
1
16
5
20
6
8
3
Computation Graphs are our friends
C(w,b) = ∑(yn-ŷn)n∈{0,1,2}
2
∂C
∂w=
∂∑(ŷn-yn)
∂wn = ∑-2(ŷn-yn)xn
n
∂C
∂b=
∂∑(ŷn-yn) 2
∂bn = ∑-2(ŷn-yn)
n
y = wx + b
Easy!
2
Computation Graphs are our friends
Harder!
y = wx + b + tanh(yx + b)2
Computation Graphs are our friends
Computation Graphs can
compute gradients for you!
y = wx + b + tanh(yx + b)2
Computation Graphs are our friends
C(w,b) = ∑(yn-ŷn)n∈{0,1,2}
2
∂C
∂w=
∂∑(ŷn-yn)
∂wn = ∑-2(ŷn-yn)xn
n
∂C
∂b=
∂∑(ŷn-yn) 2
∂bn = ∑-2(ŷn-yn)
n
y = wx + b
2
Computation Graphs are our friends
C(w,b) = ∑(yn-ŷn)n∈{0,1,2}
2
∂C
∂w=
∂(ŷn-yn)
∂ynn
= ∑-2(ŷn-yn)xn n
2
= ∑-2(ŷn-yn) n
y = wx + b
∂yn
∂w
2
∑
∂C
∂b=
∂(ŷn-yn)
∂ynn
∂yn
∂b∑
Computation Graphs are our friends
C(w,b) = ∑(yn-ŷn)n∈{0,1,2}
2
∂C
∂w=
∂(ŷn-yn)
∂ynn
2
y = wx + b
∂yn
∂w
2
∑
∂C
∂b=
∂(ŷn-yn)
∂ynn ∂b∑ ∂yn
Computation Graphs are our friends
C(w,b) = ∑(yn-ŷn)n∈{0,1,2}
2
∂C
∂w=
∂(ŷn-yn)
∂ynn
2
y = o + bo = wx
∂yn
∂w
2
∑
∂C
∂b=
∂(ŷn-yn)
∂ynn ∂b∑ ∂yn
Computation Graphs are our friends
C(w,b) = ∑cnn∈{0,1,2}
∂C
∂w=
∂ynn
2
c = dd = y - ŷy = o + bo = wx
∂yn
∂w
2
∑
∂C
∂b=
∂(ŷn-yn)
∂ynn ∂b∑ ∂yn
2
∂(ŷn-yn)
Computation Graphs are our friends
C(w,b) = ∑cnn∈{0,1,2}
∂C
∂w=
∂cn
∂dnn
2
c = dd = y - ŷy = o + bo = wx
∂on
∂w∑
∂C
∂b=
∂(ŷn-yn)
∂ynn ∂b∑ ∂yn
2
∂dn
∂yn
∂yn
∂on
Computation Graphs are our friends
C(w,b) = ∑cnn∈{0,1,2}
∂C
∂w=
∂cn
∂dnn
c = dd = y - ŷy = o + bo = wx
∂on
∂w∑
∂C
∂b
2
∂dn
∂yn
∂yn
∂on
= ∂cn
∂dnn
∑ ∂dn
∂yn
∂yn
∂b
Computation Graphs are our friends
C(w,b) = ∑cnn∈{0,1,2}
∂C
∂w=
∂cn
∂dnn
c = dd = y - ŷy = o + bo = wx
∂on
∂w∑
∂C
∂b
2
∂dn
∂yn
∂yn
∂on
= ∂cn
∂dnn
∑ ∂dn
∂yn
∂yn
∂b
Power 2
Sub
Add
Product
Sub
Computation Graphs are our friends
C(w,b) = ∑cnn∈{0,1,2}
∂C
∂w=
∂cn
∂dnn
c = dd = y - ŷy = o + bo = wx
∂on
∂w∑
∂C
∂b
2
∂dn
∂yn
∂yn
∂on
= ∂cn
∂dnn
∑ ∂dn
∂yn
∂yn
∂b
Power 2
Sub
Add
Product
forward(x,y) → zbackward(x,y,dz) → dx,dy
Sub
Computation Graphs are our friends
C(w,b) = ∑cnn∈{0,1,2}
∂C
∂w=
∂cn
∂dnn
c = dd = y - ŷy = o + bo = wx
∂on
∂w∑
∂C
∂b
2
∂dn
∂yn
∂yn
∂on
= ∂cn
∂dnn
∑ ∂dn
∂yn
∂yn
∂b
Power 2
Sub
Add
Product
forward(x,y) : return x - ybackward(x,y,dz) : return dz, -dz
Sub
Computation Graphs are our friends
C(w,b) = ∑cnn∈{0,1,2}
∂C
∂w=
∂cn
∂dnn
c = dd = y - ŷy = o + bo = wx
∂on
∂w∑
∂C
∂b
2
∂dn
∂yn
∂yn
∂on
= ∂cn
∂dnn
∑ ∂dn
∂yn
∂yn
∂b
Power 2
Sub
Add
Product
forward(x,y) : return x - ybackward(x,y,dz) : return dz, -dz
Sub
Computation Graphs are our friends
C(w,b) = ∑cnn∈{0,1,2}
∂C
∂w=
∂cn
∂dnn
c = dd = y - ŷy = o + bo = wx
∂on
∂w∑
∂C
∂b
2
∂dn
∂yn
∂yn
∂on
= ∂cn
∂dnn
∑ ∂dn
∂yn
∂yn
∂b
Power 2
Sub
Add
Product
forward(x,y) : return x - ybackward(x,y,dz) : return 1, -1
Sub ∂dn
∂ŷn
Computation Graphs are our friends
C(w,b) = ∑cnn∈{0,1,2}
∂C
∂w=
∂cn
∂dnn
c = dd = y - ŷy = o + bo = wx
∂on
∂w∑
∂C
∂b
2
∂dn
∂yn
∂yn
∂on
= ∂cn
∂dnn
∑ ∂dn
∂yn
∂yn
∂b
Power 2
Sub
Add
Product
o
w x
Product
Computation Graphs are our friends
C(w,b) = ∑cnn∈{0,1,2}
∂C
∂w=
∂cn
∂dnn
c = dd = y - ŷ
∂on
∂w∑
∂C
∂b
2
∂dn
∂yn
∂yn
∂on
= ∂cn
∂dnn
∑ ∂dn
∂yn
∂yn
∂b
Power 2
Sub
o
w x
Product
b
Add
y
Computation Graphs are our friends
C(w,b) = ∑cnn∈{0,1,2}
∂C
∂w=
∂cn
∂dnn
∂on
∂w∑
∂C
∂b
∂dn
∂yn
∂yn
∂on
= ∂cn
∂dnn
∑ ∂dn
∂yn
∂yn
∂b
Power 2
Sub
o
w x
Product
b
Add
y ŷ
d c
Computation Graphs are our friends
C(w,b) = ∑cnn∈{0}
∂C
∂w=
∂cn
∂dnn
∂on
∂w∑
∂C
∂b
∂dn
∂yn
∂yn
∂on
= ∂cn
∂dnn
∑ ∂dn
∂yn
∂yn
∂b
Power 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
Computation Graphs are our friends
C(w,b) = ∑cnn∈{0}
∂C
∂w=
∂cn
∂dnn
∂on
∂w∑
∂C
∂b
∂dn
∂yn
∂yn
∂on
= ∂cn
∂dnn
∑ ∂dn
∂yn
∂yn
∂b
Power 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
Input
Computation Graphs are our friends
C(w,b) = ∑cnn∈{0}
∂C
∂w=
∂cn
∂dnn
∂on
∂w∑
∂C
∂b
∂dn
∂yn
∂yn
∂on
= ∂cn
∂dnn
∑ ∂dn
∂yn
∂yn
∂b
Power 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
Input
Parameters
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables
Variables
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables
Variables
2 values: x and dx
0,0
0,0
0,00,0 0,0
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables
0,0
0,0
0,00,0 0,0
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables
0,0
0,0
0,00,0 0,0
1st
2nd
3rd4th 5th
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables
10,0
0,0
0,00,0 0,0
1st
2nd
3rd4th 5th
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables
10,0
12,0
0,00,0 0,0
1st
2nd
3rd4th 5th
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables
0,0
0,0
0,00,0 0,0
1st
2nd
3rd4th 5th
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables
0,0
2,0
0,00,0 0,0
1st
2nd
3rd4th 5th
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables
10,0
2,0
0,00,0 0,0
1st
2nd
3rd4th 5th
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables
0,0
0,0
0,00,0 0,0
Computation Graphs are our friendsPower 2
Sub
o
Add
y
d c Id CForward:
1-Initialize inputs2-Initialize variables3-Topological Sort variables
0,0
0,0
0,00,0 0,0
Computation Graphs are our friends
o
y
d c CForward:
1-Initialize inputs2-Initialize variables3-Topological Sort variables
0,0
0,0
0,00,0 0,0
1st
2nd
3rd4th 5th
Computation Graphs are our friendsPower 2
Sub
o
Add
y
d c Add CForward:
1-Initialize inputs2-Initialize variables3-Topological Sort variables
0,0
0,0
0,00,0 0,0
g0,0
Add
s 0,0
Computation Graphs are our friends
o
y
d c CForward:
1-Initialize inputs2-Initialize variables3-Topological Sort variables
0,0
0,0
0,00,0 0,0
g0,0
s 0,0
1st
2nd
3th
4th
5th 6th 7th
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological
order, run the forward method of all operations that link to them
0,0
0,0
0,00,0 0,0
1st
2nd
3rd
4th 5th
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological
order, run the forward method of all operations that link to them
10,0
0,0
0,00,0 0,0
1st
2nd
3rd
4th 5th
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological
order, run the forward method of all operations that link to them
10,0
12,0
0,00,0 0,0
1st
2nd
3rd
4th 5th
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological
order, run the forward method of all operations that link to them
10,0
12,0
-4,00,0 0,0
1st
2nd
3rd
4th 5th
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological
order, run the forward method of all operations that link to them
10,0
12,0
-4,016,0 0,0
1st
2nd
3rd
4th 5th
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological
order, run the forward method of all operations that link to them
10,0
12,0
-4,016,0
1st
2nd
3rd
4th 5th16,0
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological
order, run the forward method of all operations that link to them
5-Set gradients to final variables
10,0
12,0
-4,016,0
1st
2nd
3rd
4th 5th16,1
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological
order, run the forward method of all operations that link to them (Forward)
5-Set gradients to final variables6-run the operations backward method
in reverse order (Backward)10,0
12,0
-4,016,0
1st
2nd
3rd
4th 5th16,1
∂C
∂c C=c =1
dc = dC ∂C
∂c
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological
order, run the forward method of all operations that link to them (Forward)
5-Set gradients to final variables6-run the operations backward method
in reverse order (Backward)10,0
12,0
-4,016,1
1st
2nd
3rd
4th 5th16,1
∂C
∂c C=c =1
dc = dC ∂C
∂c
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological
order, run the forward method of all operations that link to them (Forward)
5-Set gradients to final variables6-run the operations backward method
in reverse order (Backward)10,0
12,0
-4,016,1
1st
2nd
3rd
4th 5th16,1
c = d2
dd = dc ∂c
∂d
∂c
∂d= 2d
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological
order, run the forward method of all operations that link to them (Forward)
5-Set gradients to final variables6-run the operations backward method
in reverse order (Backward)10,0
12,0
-4,016,1
1st
2nd
3rd
4th 5th16,1
c = d2
dd = dc ∂c
∂d
∂c
∂d= 2 x -4
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological
order, run the forward method of all operations that link to them (Forward)
5-Set gradients to final variables6-run the operations backward method
in reverse order (Backward)10,0
12,0
-4,016,1
1st
2nd
3rd
4th 5th16,1
c = d2
dd = dc ∂c
∂d
∂c
∂d= -8
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological
order, run the forward method of all operations that link to them (Forward)
5-Set gradients to final variables6-run the operations backward method
in reverse order (Backward)10,0
12,0
-4,-816,1
1st
2nd
3rd
4th 5th16,1
c = d2
dd = dc ∂c
∂d
∂c
∂d= -8
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological
order, run the forward method of all operations that link to them (Forward)
5-Set gradients to final variables6-run the operations backward method
in reverse order (Backward)10,0
12,0
-4,-816,1
1st
2nd
3rd
4th 5th16,1
d = y - ŷ ∂d
∂y= 1
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological
order, run the forward method of all operations that link to them (Forward)
5-Set gradients to final variables6-run the operations backward method
in reverse order (Backward)10,0
12,-8
-4,-816,1
1st
2nd
3rd
4th 5th16,1
d = y - ŷ ∂d
∂y= 1
dy = dd ∂d
∂y
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological
order, run the forward method of all operations that link to them (Forward)
5-Set gradients to final variables6-run the operations backward method
in reverse order (Backward)10,-8
12,-8
-4,-816,1
1st
2nd
3rd
4th 5th16,1
y = o + b
∂y
∂o= 1
do = dy ∂y
∂o
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological
order, run the forward method of all operations that link to them (Forward)
5-Set gradients to final variables6-run the operations backward method
in reverse order (Backward)10,-8
12,-8
-4,-816,1
1st
2nd
3rd
4th 5th16,1
y = o + b
∂y
∂o= 1
∂y
∂b= 1
bt+1 = b - dy ∂y
∂b
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological
order, run the forward method of all operations that link to them (Forward)
5-Set gradients to final variables6-run the operations backward method
in reverse order (Backward)10,-8
12,-8
-4,-816,1
1st
2nd
3rd
4th 5th16,1
y = o + b
∂y
∂o= 1
∂y
∂b= 1
bt+1 = b - dy ∂y
∂b
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological
order, run the forward method of all operations that link to them (Forward)
5-Set gradients to final variables6-run the operations backward method
in reverse order (Backward)10,-8
12,-8
-4,-816,1
1st
2nd
3rd
4th 5th16,1
y = o + b
∂y
∂o= 1
∂y
∂b= 1
bt+1 = b - ∂c
∂d
∂d∂y
∂y∂b
∂C
∂c
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological
order, run the forward method of all operations that link to them (Forward)
5-Set gradients to final variables6-run the operations backward method
in reverse order (Backward)10,-8
12,-8
-4,-816,1
1st
2nd
3rd
4th 5th16,1
y = o + b
∂y
∂o= 1
∂y
∂b= 1
bt+1 = b - ∂C
∂b
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52
2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological
order, run the forward method of all operations that link to them (Forward)
5-Set gradients to final variables6-run the operations backward method
in reverse order (Backward)10,-8
12,-8
-4,-816,1
1st
2nd
3rd
4th 5th16,1
o = wx
∂o
∂w= x
wt+1 = w - do ∂o
∂w
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52.8
2.2
Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological
order, run the forward method of all operations that link to them (Forward)
5-Set gradients to final variables6-run the operations backward method
in reverse order (Backward)7-update parameters 10,-8
12,-8
-4,-816,1
1st
2nd
3rd
4th 5th16,1
o = wx
∂o
∂w= x
wt+1 = w - do ∂o
∂w
Computation Graphs are our friendsPower 2
Sub
o
w x
Product
b
Add
y ŷ
d c Id C
16
52.8
2.210,-8
12,-8
-4,-816,1 16,1
o = wx
∂o
∂w= x
wt+1 = w - do ∂o
∂w
Existing Tools:-Tensorflow ( https://www.tensorflow.org )-Torch ( https://github.com/torch/nn )-CNN ( https://github.com/clab/cnn )-JNN ( https://github.com/wlin12/JNN )-Theano (http://deeplearning.net/software/theano/ )
Into Deep Learning
Nonlinear Neural Modelsy = 4x-4
Data
0
1
16
5
20
6
?
3
Nonlinear Neural Models
Data
0
1
16
5
20
6
?
3
There is a limit of bananas I can give you
Nonlinear Neural Models
n x y
0 1 0
1 5 16
2 6 20
Data
x
y y = 4x-4
Nonlinear Neural Models
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
Data
x
y y = 4x-4
Nonlinear Neural Models
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
Data
x
y y = 2x+3
Model Problem
Nonlinear Neural Models
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
Data
x
y y = 2x+3
Model Problem
Underfitting
Nonlinear Neural Models
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
Data
x
y y = ???
Can we learn arbitrary functions?
Nonlinear Neural Models
y = (w1x + b1)s1 + (w2x+b2)s2
Use different linear functions depending on the value of x?
Nonlinear Neural Models
y = (w1x + b1)s1 + (w2x+b2)s2s1 - 1 if x < 6 and 0 otherwises2 - 1 if x >= 6 and 0 otherwise
Nonlinear Neural Models
y = (w1x + b1)s1 + (w2x+b2)s2
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
Data
y = (4x - 4)s1 + (0x+20)s2
s1 - 1 if x < 6 and 0 otherwises2 - 1 if x >= 6 and 0 otherwise
Nonlinear Neural Models
s = (wx + b)
(t) = 11 + e-t
Nonlinear Neural Models
s = (1000x)
x = 0.1 then (1000x) = 1
x = -0.1 then (1000x) = 0
Nonlinear Neural Models
s = (1000x)
x = 0.1 then (1000x) = 1
x = -0.1 then (1000x) = 0
Nonlinear Neural Models
s = (1000x - 6000)
x = 6.1 then (1000x - 6000) = 1
x = 5.9 then (1000x - 6000) = 0
Nonlinear Neural Models
y = (w1x + b1)s1 + (w2x+b2)s2
s1 = (w3x + b3)s2 = (w4x + b4)
Nonlinear Neural Models
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
Data y = (4x - 4)s1 + (0x+20)s2
s1 = (-1000x + 6000)s2 = (1000x - 6000)
Nonlinear Neural Models
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
Data y = (4x - 4)s1 + (0x+20)s2
s1 = (-1000x + 6000)s2 = (1000x - 6000)
Nonlinear Neural Models
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
Data y = (16)s1 + (0x+20)s2
s1 = (-1000x + 6000)s2 = (1000x - 6000)
Nonlinear Neural Models
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
Data y = (16)s1 + (20)s2
s1 = (-1000x + 6000)s2 = (1000x - 6000)
Nonlinear Neural Models
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
Data y = (16)s1 + (20)s2
s1 = (1000)s2 = (1000x - 6000)
Nonlinear Neural Models
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
Data y = (16)s1 + (20)s2
s1 = (1000)s2 = (-1000)
Nonlinear Neural Models
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
Data y = (16)1 + (20)0
s1 = (1000)s2 = (-1000)
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
Data y = 16
s1 = (1000)s2 = (-1000)
Nonlinear Neural Models
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
Data y = (4x - 4)s1 + (0x+20)s2
s1 = (-1000x + 6000)s2 = (1000x - 6000)
Nonlinear Neural Models
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
Data y = (32)s1 + (0x+20)s2
s1 = (-1000x + 6000)s2 = (1000x - 6000)
Nonlinear Neural Models
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
Data y = (32)s1 + (20)s2
s1 = (-1000x + 6000)s2 = (1000x - 6000)
Nonlinear Neural Models
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
Data y = (32)s1 + (20)s2
s1 = (-3000)s2 = (1000x - 6000)
Nonlinear Neural Models
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
Data y = (32)s1 + (20)s2
s1 = (-3000)s2 = (3000)
Nonlinear Neural Models
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
Data y = (32)0 + (20)1
s1 = (-3000)s2 = (3000)
Nonlinear Neural Models
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
Data y = 20
s1 = (-3000)s2 = (3000)
Nonlinear Neural Models
Data
0
1
16
5
20
6
?
3
If you give me too many apples, I will give them to...
Nonlinear Neural Models
Data
0
1
16
5
20
6
?
3
Count Von Count
Nonlinear Neural Models
Multilayer Perceptrons
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
Data
x
y y = (4x - 4)s1 + (0x+20)s2
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
5 15 1
6 19 1
Data
x
y y = (4x - 4)s1 + (0x+20)s2
Multilayer Perceptrons
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
5 15 1
6 19 1
Data
y = (4x - 4)s1 + (0x+20)s2 + (0x+1)s3 s1 = (-1000x + 6000)s2 = ????s3 = (1000x - 15000)
Multilayer Perceptrons
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
5 15 1
6 19 1
Data
y = (4x - 4)s1 + (0x+20)s2 + (0x+1)s3 s1 = (-1000x + 6000)s2 = not s1 and not s3
s3 = (1000x - 15000)
Multilayer Perceptrons
y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3
s1 = (w4x + b4)s2 = (w5s1 + w6s3 + b5)s3 = (w7x + b6)
Multilayer Perceptrons
y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3
s1 = (w4x + b4)s2 = (w5s1 + w6s3 + b5)s3 = (w7x + b6)
Layer 1 Perceptron
Layer 1 Perceptron
Multilayer Perceptrons
y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3
s1 = (w4x + b4)s2 = (w5s1 + w6s3 + b5)s3 = (w7x + b6)
Layer 2 Perceptron
Layer 1 Perceptron
Layer 1 Perceptron
Multilayer Perceptrons
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
5 15 1
6 19 1
Data
y = (4x - 4)s1 + (0x+20)s2 + (0x+1)s3 s1 = (-1000x + 6000)s2 = not s1 and not s3
s3 = (1000x - 15000)
Multilayer Perceptrons
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
5 15 1
6 19 1
Data
y = (4x - 4)s1 + (0x+20)s2 + (0x+1)s3 s1 = (-1000x + 6000)s2 = (-1000s1 - 1000s3 + 500)s3 = (1000x - 15000)
Multilayer Perceptrons
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
5 15 1
6 19 1
Data
y = (4x - 4)s1 + (0x+20)s2 + (0x+1)s3 s1 = (-1000x + 6000)s2 = (-1000s1 - 1000s3 + 500)s3 = (1000x - 15000)
Multilayer Perceptrons
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
5 15 1
6 19 1
Data
y = (40)s1 + (20)s2 + (1)s3 s1 = (-1000x + 6000)s2 = (-1000s1 - 1000s3 + 500)s3 = (1000x - 15000)
Multilayer Perceptrons
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
5 15 1
6 19 1
Data
y = (40)s1 + (20)s2 + (1)s3 s1 = (-5000) = 0s2 = (-1000s1 - 1000s3 + 500)s3 = (-4000) = 0
Multilayer Perceptrons
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
5 15 1
6 19 1
Data
y = (40)s1 + (20)s2 + (1)s3 s1 = (-5000) = 0s2 = (-1000s4 - 1000s5 + 500)s3 = (-4000) = 0
Multilayer Perceptrons
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
5 15 1
6 19 1
Data
y = (40)s1 + (20)s2 + (1)s3 s1 = (-5000) = 0s2 = (500)s3 = (-4000) = 0
Multilayer Perceptrons
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
5 15 1
6 19 1
Data
y = (40)s1 + (20)s2 + (1)s3 s1 = (-5000) = 0s2 = (500) = 1s3 = (-4000) = 0
Multilayer Perceptrons
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
5 15 1
6 19 1
Data
y = (40)0 + (20)1 + (1)0s1 = (-5000) = 0s2 = (500) = 1s3 = (-4000) = 0
Multilayer Perceptrons
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
5 15 1
6 19 1
Data
y = 20s1 = (-5000) = 0s2 = (500) = 1s3 = (-4000) = 0
Multilayer Perceptrons
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
5 15 1
6 19 1
Data
y = (4x - 4)s1 + (0x+20)s2 + (0x+1)s3 s1 = (-1000x + 6000)s2 = (-1000s1 - 1000s3 + 500)s3 = (1000x - 15000)
Multilayer Perceptrons
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
5 15 1
6 19 1
Data
y = (772)s1 + (20)s2 + (1)s3 s1 = (-1000x + 6000)s2 = (-1000s4 - 1000s5 + 500)s3 = (1000x - 15000)
Multilayer Perceptrons
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
5 15 1
6 19 1
Data
y = (772)s1 + (20)s2 + (1)s3 s1 = (-13000) = 0s2 = (-1000s4 - 1000s5 + 500)s3 = (4000) = 1
Multilayer Perceptrons
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
5 15 1
6 19 1
Data
y = (772)s1 + (20)s2 + (1)s3 s1 = (-13000) = 0s2 = (-1000 + 0 + 500)s3 = (4000) = 1
Multilayer Perceptrons
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
5 15 1
6 19 1
Data
y = (772)s1 + (20)s2 + (1)s3 s1 = (-13000) = 0s2 = (-500) = 0s3 = (4000) = 1
Multilayer Perceptrons
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
5 15 1
6 19 1
Data
y = (772)0 + (20)0 + (1)1s1 = (-13000) = 0s2 = (-500) = 0s3 = (4000) = 1
Multilayer Perceptrons
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
5 15 1
6 19 1
Data
y = 1s1 = (-13000) = 0s2 = (-500) = 0s3 = (4000) = 1
Multilayer Perceptrons
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
5 15 1
6 19 1
Data
x
yy = (4x - 4)s1 + (0x+20)s2 + (0x+1)s3
Multilayer Perceptrons
y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3
s1 = (w4x + b4)s2 = (w5s1 + w6s3 + b5)s3 = (w7x + b6)
Layer 2 Perceptron
Layer 1 Perceptron
Layer 1 Perceptron
Multilayer Perceptrons
y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3
s1 = (w4x + b4)s2 = (w5s1 + w6s3 + b5)s3 = (w7x + b6)
x
s1
s3
s2
w4x
b4
Multilayer Perceptrons
y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3
s1 = (w4x + b4)s2 = (w5s1 + w6s3 + b5)s3 = (w7x + b6)
x
s2
w4x
b4
w7x
b5
s1
s3
Multilayer Perceptrons
y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3
s1 = (w4x + b4)s2 = (w5s1 + w6s3 + b5)s3 = (w7x + b6)
x
s2
s1
s3
w6s3w5s1
b5
Multilayer Perceptrons
y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3
x
s2
s1
s3x < 6 x > 15
!(x > 15) & !(x < 6)
Multilayer Perceptrons
y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3
x
s2
s1
s3x < 6 x > 15
x∈[6,15]
Multilayer Perceptrons
x
s2
s1
s3x < 6 x > 15
x∈[6,15]
s4
x∈]-∞,6] & ]15,∞]
Multilayer Perceptrons
x
s5
s1
s2x < 6 x > 15
x∈[6,15]
s3 x > 2
s4 x < 3
s7
s6
s7
x∈]-∞,6] & ]15,∞] x∈[2,15] x∈[2,3]
Multilayer Perceptrons
x
s5
s1
s2x < 6 x > 15
x∈[6,15]
s3 x > 2
s4 x < 3
s7
s6
s7
x∈]-∞,6] & ]15,∞] x∈[2,15] x∈[2,3]
Input
Layer 1 (Input Features)
Layer 2 (And and Or Combinations)
Multilayer Perceptrons
x
s5
s1
s2x < 6 x > 15
x∈[6,15]
s3 x > 2
s4 x < 3
s7
s6
s7
x∈]-∞,6] & ]15,∞] x∈[2,15] x∈[2,3]
Input
Layer 1 (Input Features)
Layer 2 (And and Or Combinations)
And(s1,s2) = (1000s1 + 1000s3 - 1500)Or(s1,s2) = (1000s1 + 1000s3 - 500)
Multilayer Perceptrons
x
s5
s1
s2
s3
s4
s7
s6
s7
Input
Layer 1 (Input Features)
Layer 2 (And and Or Combinations)
Layer 3 (Xor Combinations)s8
s9
sa
sb
Multilayer Perceptrons
x
s5
s1
s2
s3
s4
s7
s6
s7
Input
Layer 1 (Input Features)
Layer 2 (And and Or Combinations)
Layer 3 (Xor Combinations)s8
s9
sa
sb
Xor(s1,s2) = Or(And(s1,!s2), And(!s1,s2))
Multilayer Perceptrons
x
s5
s1
s2
s3
s4
s7
s6
s7
Input
Layer 1 (Input Features)
Layer 2 (And and Or Combinations)
Layer 3 (Xor Combinations)s8
s9
sa
sb
Xor(s1,s2) = Or(s5, s6)
Multilayer Perceptrons
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
5 15 1
6 19 1
Data
x
y
Universal approximator
Multilayer Perceptrons
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
5 15 1
6 19 1
Data
x
y
but...
Multilayer Perceptrons
n x y
0 1 0
1 5 16
2 6 20
3 9 20
4 11 20
5 15 1
6 19 1
Data
x
y
No guarantee that the best function will
be found
Multilayer Perceptrons
x
s5
s1
s2x > 1 x < 2
x∈]-∞,1]
s3 x < 5
s4 x < 6
s7
s6
x∈[5,6[ x∈[6,∞]
n x y
0 1 0
1 5 16
2 6 20
y
Multilayer Perceptrons
x
s5
s1
s2x > 1 x < 2
x∈]-∞,1]
s3 x < 5
s4 x < 6
s7
s6
x∈[5,6[ x∈[6,∞]
n x y
0 1 0
1 5 16
2 6 20
y = 0s5 + 16s6 + 20s7
y
Multilayer Perceptrons
x
s5
s1
s2x > 1 x < 2
x∈]-∞,1]
s3 x < 5
s4 x < 6
s7
s6
x∈[5,6[ x∈[6,∞]
n x y
0 1 0
1 5 16
2 6 20
y
y = 0s5 + 16s6 + 20s7
Multilayer Perceptrons
x
s5
s1
s2x > 1 x < 2
x∈]-∞,1]
s3 x < 5
s4 x < 6
s7
s6
x∈[5,6[ x∈[6,∞]
n x y
0 1 0
1 5 16
2 6 20
y
y = 0s5 + 16s6 + 20s7
Multilayer Perceptrons
x
s5
s1
s2x > 1 x < 2
x∈]-∞,1]
s3 x < 5
s4 x < 6
s7
s6
x∈[5,6[ x∈[6,∞]
n x y
0 1 0
1 5 16
2 6 20Overfitting
y = 0s5 + 16s6 + 20s7
Multilayer Perceptrons
y
Model Problem
Task Complexity
Model Complexity
Multilayer Perceptrons
Task Complexity
Model Complexity
Underfitting
Multilayer Perceptrons
Task Complexity
Model Complexity
Overfitting
Underfitting
Multilayer Perceptrons
Task Complexity
Model Complexity
Overfitting
Underfitting
Happy Zone
Multilayer Perceptrons
Task Complexity
Model Complexity
Overfitting
Underfitting
Happy Zone
Line
ar R
egre
ssio
n
MLP
1 L
ayer
MLP
2 L
ayer
MLP
3 L
ayer
Multilayer Perceptrons
Task Complexity
Model Complexity
Overfitting
Underfitting
Happy Zone
Line
ar R
egre
ssio
n
Line
ar
Reg
ress
ion
mor
e fe
atur
es
Multilayer Perceptrons
Task Complexity
Model Complexity
Overfitting
Underfitting
Happy Zone
Line
ar R
egre
ssio
n
MLP
1 L
ayer
MLP
2 L
ayer
MLP
3 L
ayer
Multilayer Perceptrons
Task Complexity
Model Complexity
Overfitting
Underfitting
Happy Zone
Line
ar R
egre
ssio
n
MLP
1 L
ayer
MLP
2 L
ayer
MLP
3 L
ayer
Sentiment analysis
Multilayer Perceptrons
Task Complexity
Model Complexity
Overfitting
Underfitting
Happy Zone
Line
ar R
egre
ssio
n
MLP
1 L
ayer
MLP
2 L
ayer
MLP
3 L
ayer
Sentiment analysis
Machine Translation
Multilayer Perceptrons
Task Complexity
Model Complexity
Overfitting
Underfitting
Happy Zone
Data
Multilayer Perceptrons
Task Complexity
Model Complexity
Overfitting
Underfitting
Happy Zone
Data
Multilayer Perceptrons
Task Complexity
Model Complexity
Overfitting
Underfitting
Happy Zone
Data
Multilayer Perceptrons
yn x y
0 1 0
1 5 16
2 6 20
y y
Multilayer Perceptrons
yn x y
0 1 0
1 5 16
2 6 20
3 2 4
y y
Multilayer Perceptrons
n x y
0 1 0
1 5 16
2 6 20
3 2 4
y y
Multilayer Perceptrons
Task Complexity
Model Complexity
Overfitting
Underfitting
Happy Zone
Model Bias
Multilayer Perceptrons
Task Complexity
Model Complexity
Overfitting
Underfitting
Happy Zone
Model BiasL1 & L2 RegularizationStochastic Dropout (Srivastava et al, 2014)Model Structure (CNN, RNNs)
Multilayer Perceptrons
Regularization
C(w,b) = ∑(yn-ŷn) + (w+b)ß
ß = Regularization constantn∈{0,1,2}
2
Multilayer Perceptrons
x
s5
s1
s2x > 1 x < 2
x∈]-∞,1]
s3 x < 5
s4 x < 6
s7
s6
x∈[5,6[ x∈[6,∞]
y
Regularization
Multilayer Perceptrons
x
s5
s1
s2x > 1 nothing
x∈]-∞,1]
s3 nothing
s4 x < 6
s7
s6
nothing x∈[6,∞]
y
Regularization
Multilayer Perceptrons
x
s5
s1
s2x > 1 nothing
x∈]-∞,1]
s3 nothing
s4 x < 6
s7
s6
nothing x∈[6,∞]
y
Regularization
Find solutions that require less effort
Multilayer Perceptrons
x
s5
s1
s2x > 1 x < 2
x∈]-∞,1]
s3 x < 5
s4 x < 6
s7
s6
x∈[5,6[ x∈[6,∞]
y
Stochastic Dropout (Srivastava et al, 2014)
Multilayer Perceptrons
Stochastic Dropout (Srivastava et al, 2014)
x
s5
s1
s2x > 1 0
x∈]-∞,1]
s3 x < 5
s4 x < 6
s7
s6
0 0
y
Multilayer Perceptrons
Stochastic Dropout (Srivastava et al, 2014)
x
s5
s1
s2x > 1 0
x∈]-∞,1]
s3 x < 5
s4 x < 6
s7
s6
0 0
y Find robust models
Multilayer Perceptrons
Model Structure
Weighted sum of linear functions VS MLP
y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3
Multilayer Perceptrons
Model Structure
Weighted sum of linear functions VS MLP
y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3
Convolutional Vs RNNs
Multilayer Perceptrons
s1 = (w4x + b4)s2 = (w5s1 + w6s3 + b5)s3 = (w7x + b6)
x
s2
s1
s3
w6s3w5s1
b5
Representation
Multilayer Perceptrons
s1 = (W3x + b3)s2 = (W4s1 + b4)
Representation
s1
s2
2
1
1xx
s2
s1
s3
Multilayer Perceptrons
Representation
s1
s2
1000
1000
100x
s1 = (Ws2 + b)
Multilayer Perceptrons
Representation
s1
s2
1000
1000
100x
s1 = (Ws2 + b)Tensoflow Code
s1 = tf.matmul(x, W1) + b1
s1 = tf.nn.sigmoid(s1)
s2 = tf.matmul(s1, W2) + b2
s2 = tf.nn.sigmoid(s2)
Multilayer Perceptrons
Using Discrete Variables
Data
0
1
16
5
20
6
?
3
Using Discrete Variables
Data
0
1
16
5
20
6
?
3
Using Discrete Variables
Data
0
1
16
5
20
6
?
3
?
Using Discrete Variables
x
s5
s1
s2
s3
s4
s7
s6
y
Number of fruit to offer
Number of fruit received
Using Discrete Variablesx
y
Number of fruit to offer
Number of fruit received
s1
s2
Using Discrete Variablesx
y
Number of fruit to offer
uType of fruit to offer
v Number of fruit receivedType of fruit received
s1
s2
Using Discrete Variablesx
y
Number of fruit to offer
uType of fruit to offer
v Number of fruit receivedType of fruit received
s1
s2
u∈{Apple, Banana, Coconut}
v∈{Apple, Banana, Coconut}
Using Discrete VariablesLookup Tables
e1 e2 e3 e4
Apple 0.1 -0.4 0.2 0.5
Banana 0.4 1.4 -1.0 0.1
Coconut 1.1 0.9 1.1 0.5
u
V = 3
Using Discrete VariablesLookup Tables
e1 e2 e3 e4
Apple 0.1 -0.4 0.2 0.5
Banana 0.4 1.4 -1.0 0.1
Coconut 1.1 0.9 1.1 0.5
u
V = 3
Using Discrete VariablesLookup Tables
e1 e2 e3 e4
Apple 0.1 -0.4 0.2 0.5
Banana 0.4 1.4 -1.0 0.1
Coconut 1.1 0.9 1.1 0.5
u
Embedding for u Size = 4
V = 3
Using Discrete VariablesLookup Tables
e1 e2 e3 e4
Apple 0.1 -0.4 0.2 0.5
Banana 0.4 1.4 -1.0 0.1
Coconut 1.1 0.9 1.1 0.5
u
Embedding for u
Banana
Size = 4
V = 3
Using Discrete VariablesLookup Tables
e1 e2 e3 e4
0 0.1 -0.4 0.2 0.5
1 0.4 1.4 -1.0 0.1
2 1.1 0.9 1.1 0.5
u
Embedding for u
1
Size = 4
V = 3
Using Discrete VariablesLookup Tables
u
Embedding for u
1
Lookup
Size = 4
Using Discrete Variablesx
y
Number of fruit to offer
uType of fruit to offer
v Number of fruit receivedType of fruit received
s1
s2
u∈{Apple, Banana, Coconut}
v∈{Apple, Banana, Coconut}
eu
Lookup
Using Discrete VariablesSoftmax
V = 3
Apple Banana Coconut
w1 0.1 -0.4 0.2
w2 0.4 1.4 -1.0
w3 1.1 0.9 1.1
w4 1.3 0.1 0.4
Using Discrete VariablesSoftmax
Input vector Size = 4V = 3
Apple Banana Coconut
w1 0.1 -0.4 0.2
w2 0.4 1.4 -1.0
w3 1.1 0.9 1.1
w4 1.3 0.1 0.4
Using Discrete VariablesSoftmax
Input vector Size = 4
logits Size = V
V = 3
Apple Banana Coconut
w1 0.1 -0.4 0.2
w2 0.4 1.4 -1.0
w3 1.1 0.9 1.1
w4 1.3 0.1 0.4
Using Discrete VariablesSoftmax
Input Vector
Logits
V = 3
Apple Banana Coconut
w1 0.1 -0.4 0.2
w2 0.4 1.4 -1.0
w3 1.1 0.9 1.1
w4 1.3 0.1 0.4
s1
s2
s3
s4
d1
d2
d3
1 -1 -2
Using Discrete VariablesSoftmax
Input Vector
Logits
V = 3
Apple Banana Coconut
w1 0.1 -0.4 0.2
w2 0.4 1.4 -1.0
w3 1.1 0.9 1.1
w4 1.3 0.1 0.4
s1
s2
s3
s4
d1
d2
d3
1 -1 -2
p1
p2
p2
0.84 0.11 0.05
Using Discrete VariablesSoftmax
Input Vector
Logits
V = 3
Apple Banana Coconut
w1 0.1 -0.4 0.2
w2 0.4 1.4 -1.0
w3 1.1 0.9 1.1
w4 1.3 0.1 0.4
s1
s2
s3
s4
d1
d2
d3
1 -1 -2
p1
p2
p2
0.84 0.11 0.05
Apple
Using Discrete Variablesx
y
Number of fruit to offer
uType of fruit to offer
v Number of fruit receivedType of fruit received
s1
s2
u∈{Apple, Banana, Coconut}
v∈{Apple, Banana, Coconut}
eu
Softmax
Lookup
Using Discrete Variablesx
y
Number of fruit to offer
uType of fruit to offer
v Number of fruit receivedType of fruit received
s1
s2
u∈{Apple, Banana, Coconut}
v∈{Apple, Banana, Coconut}
eu
Softmax
Lookup
Example Applications
Window-based Tagging (Collobert et al, 2011)
Abby likes to eat apples and bananas
NNP VBZ TO VB NNS CC NNS
Example Applications
Window-based Tagging (Collobert et al, 2011)
Abby likes to eat apples and bananas
e-2 e-1 e-0 e1 e2
Example Applications
Window-based Tagging (Collobert et al, 2011)
Abby likes to eat apples and bananas
e-2 e-1 e-0 e1 e2 Word Embeddings
Non-Linear Layer 1s1
s2 Non-Linear Layer 2
Example Applications
Window-based Tagging (Collobert et al, 2011)
Abby likes to eat apples and bananas
e-2 e-1 e-0 e1 e2 Word Embeddings
Non-Linear Layer 1s1
s2 Non-Linear Layer 2
VB Softmax
Example Applications
Window-based Tagging (Collobert et al, 2011)
Abby likes to eat apples and bananas
e-2 e-1 e-0 e1 e2 Word Embeddings
Non-Linear Layer 1s1
s2 Non-Linear Layer 2
VB Softmax
Example Applications
Window-based Tagging (Collobert et al, 2011)
Example Applications
Translation Rescoring (Devlin et al, 2014)
Abby likes to eat apples and bananas
Example Applications
Translation Rescoring (Devlin et al, 2014)
Abby likes to eat apples and bananas
ContextPredict
Example Applications
Translation Rescoring (Devlin et al, 2014)
Abby likes to eat apples and bananas
e-4 e-3 e-2 e-1
s1
s2
Softmax
Example Applications
Translation Rescoring (Devlin et al, 2014)
Abby likes to eat apples and bananas
0.2<s>
Example Applications
Translation Rescoring (Devlin et al, 2014)
Abby likes to eat apples and bananas
0.10.2
Example Applications
Translation Rescoring (Devlin et al, 2014)
Abby likes to eat apples and bananas
0.10.2 0.3
Example Applications
Translation Rescoring (Devlin et al, 2014)
Abby likes to eat apples and bananas
0.10.2 0.3 0.5 0.7 0.4 0.20.000378
Example Applications
Translation Rescoring (Devlin et al, 2014)
Abby likes to eat apples and bananas 0.000378
Abby dislikes to drink apples and bananas 0.00012
John does to eat coconuts and bananas 0.00003
Example Applications
Translation Rescoring (Devlin et al, 2014)
Abby likes to eat apples and bananas 0.000378
Abby dislikes to drink apples and bananas 0.00012
John does to eat coconuts and bananas 0.00003
Example Applications
Translation Rescoring (Devlin et al, 2014)
Abby likes to eat apples and bananas
ContextPredict
Translation
Source
Abby gosta de comer macas e bananas
Example Applications
Translation Rescoring (Devlin et al, 2014)
Abby likes to eat apples and bananas
ContextPredict
Translation
Source
Abby gosta de comer macas e bananas
Example Applications
Translation Rescoring (Devlin et al, 2014)
Abby likes to eat apples and bananas
Translation
macas
e-4 e-3 e-2 e-1
s1
s2
f-1
Example Applications
Translation Rescoring (Devlin et al, 2014)
Translation Score (BLEU) Arabic - English Chinese - English
Best Rescored System 52.8 34.7
1st OpenMT12 49.5 32.6
Hierarchical 43.4 30.1
Deep Neural Networks are our friends?Convolutional Neural Network
Deep Neural Networks are our friends?Convolutional Neural Network
x1 x2 x3 x4
x5 x6 x7 x8
x9 x10 x11 x12
x13 x14 x15 x16
4x4 image
Deep Neural Networks are our friends?Convolutional Neural Network
x1 x2 x3 x4
x5 x6 x7 x8
x9 x10 x11 x12
x13 x14 x15 x16
4x4 image
Deep Neural Networks are our friends?Convolutional Neural Network
x1 x2 x3 x4
x5 x6 x7 x8
x9 x10 x11 x12
x13 x14 x15 x16
4x4 image
z1
x1
x2
...
x11
z1
w9
w1
Deep Neural Networks are our friends?Convolutional Neural Network
x1 x2 x3 x4
x5 x6 x7 x8
x9 x10 x11 x12
x13 x14 x15 x16
4x4 image
z1 z2
x2
x3
...
x12
z1
w1
w9
Deep Neural Networks are our friends?Convolutional Neural Network
x1 x2 x3 x4
x5 x6 x7 x8
x9 x10 x11 x12
x13 x14 x15 x16
4x4 image
z1 z2
z3 z4
Deep Neural Networks are our friends?Convolutional Neural Network
x1 x2 x3 x4
x5 x6 x7 x8
x9 x10 x11 x12
x13 x14 x15 x16
4x4 image
z1 z2
z3 z4
z1
z2
z3
z4
y Is this a cat?