305
Deep Neural Networks Are Our Friends Wang Ling

Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Deep Neural NetworksAre Our Friends

Wang Ling

Page 2: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

● Part I - Neural Networks are our friends○ Numbers are our friends ○ Operators are our friends○ Functions are our friends○ Parameters are our friends○ Cost Functions are our friends○ Optimizers are our friends○ Gradients are our friends○ Computation Graphs are our friends

Outline

Page 3: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

● Part I - Neural Networks are our friends● Part 2 - Into Deep Learning

○ Nonlinear Neural Models○ Multilayer Perceptrons○ Using Discrete Variables○ Example Applications

Outline

Page 4: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Numbers are our friends

Page 5: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Numbers are our friendsAbby Cadabby

How many apples does Abby have?

Page 7: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Numbers are our friends● Types of Numbers:

○ Integers : 5○ Rationals : 1/2○ Reals : 1.4e10 ...

Page 8: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Operators are our friends

4

Bert

Page 9: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Operators are our friends

41

Bert

If Abby has 4 apples, and gives Bert 1 apple, how many apples will

Abby have?

Page 10: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Operators are our friends

3 1

Bert

Page 11: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Operators are our friends● Arithmetic Operators

○ Addition : 23 + 12 = 35○ Subtraction : 31 - 15 = 16○ Multiplication : 4 x 5 = 20○ Division : 20 / 5 = 4

Page 12: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Functions are our friends

41

Page 13: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Functions are our friends

4

5?

1

If Bert always returns 3 bananas for each apple, how many bananas will

Abby receive for 2 apples

Page 14: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Functions are our friends

y = 3x

● Input, x - Number of Apples given by Abby

Page 15: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Functions are our friends

y = 3x

● Input, x - Number of Apples given by Abby

● Output, y - Number of Bananas received by Abby

Page 16: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Functions are our friends

4

5?

1

y = 3x

Page 17: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Functions are our friends

4

5?

1

y = 3x , x =1

Page 18: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Functions are our friends

4

53

1

y = 3x , x =1y = 3

Page 19: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Functions are our friendsy = 3x

Page 20: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Functions are our friendsy = 3x

Cookie Monster

Page 21: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Functions are our friendsy = 3x y = ??

Page 22: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Functions are our friendsy = ??

0

1

Page 23: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Functions are our friendsy = ??

0

1

16

5

Page 24: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Functions are our friendsy = ??

0

1

16

5

20

6

Page 25: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Functions are our friendsy = ??

0

1

16

5

20

6

?

3

If Abby gives Cookie Monster 3 apples, how many bananas

does she get?

Page 26: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Parameters are our friends

y = 3x + 1

● Input● Output

Page 27: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Parameters are our friends

y = wx + b

● Input● Output● Parameters

Input - Fixed, comes from dataParameters - Need to be estimated

Page 28: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Parameters are our friendsy = wx + b

0

1

16

5

20

6

?

3

Page 29: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Data

Parameters are our friendsy = wx + b

0

1

16

5

20

6

?

3

Page 30: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Parameters are our friendsy = wx + b

?

3

x y

1 0

5 16

6 20

Page 31: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Parameters are our friends

y = wx + bx y

1 0

5 16

6 20

Data Model

Page 32: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Parameters are our friends

y = wx + bx y

1 0

5 16

6 20

Data Model

How to find the parameters w and b?

Page 33: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Parameters are our friends

y = wx + bx y

1 0

5 16

6 20

Data ModelModel

Candidate 1x y ŷ

1 0 1

5 16 5

6 20 6y = 1x + 0

Page 34: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Parameters are our friends

y = wx + bx y

1 0

5 16

6 20

Data ModelModel

Candidate 1x y ŷ

1 0 1

5 16 5

6 20 6

Model Candidate 2 x y ŷ

1 0 4

5 16 12

6 20 14

y = 1x + 0

y = 2x + 2

Page 35: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Parameters are our friends

y = wx + bx y

1 0

5 16

6 20

Data ModelModel

Candidate 1x y ŷ

1 0 1

5 16 5

6 20 6

Model Candidate 2 x y ŷ

1 0 4

5 16 12

6 20 14

y = 1x + 0

y = 2x + 2Which one is better ?

Page 36: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Cost functions are our friends

yn = wxn + bn x y

0 1 0

1 5 16

2 6 20

Data ModelModel

Candidate 1x y ŷ

1 0 1

5 16 5

6 20 6

Model Candidate 2 x y ŷ

1 0 4

5 16 12

6 20 14

y = 1x + 0

y = 2x + 2

Page 37: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Cost functions are our friends

yn = wxn + bn x y

0 1 0

1 5 16

2 6 20

Data ModelModel

Candidate 1x y ŷ

1 0 1

5 16 5

6 20 6

Model Candidate 2 x y ŷ

1 0 4

5 16 12

6 20 14

y = 1x + 0

y = 2x + 2

Cost

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

Page 38: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Cost functions are our friends

yn = wxn + bn x y

0 1 0

1 5 16

2 6 20

Data ModelModel

Candidate 1

Model Candidate 2 x y ŷ

1 0 4

5 16 12

6 20 14

y = 1x + 0

y = 2x + 2

Cost

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

n x y ŷ (y-ŷ)

0 1 0 1 1

1 5 16 5

2 6 20 6

2

Page 39: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Cost functions are our friends

yn = wxn + bn x y

0 1 0

1 5 16

2 6 20

Data ModelModel

Candidate 1

Model Candidate 2 x y ŷ

1 0 4

5 16 12

6 20 14

y = 1x + 0

y = 2x + 2

Cost

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

n x y ŷ (y-ŷ)

0 1 0 1 1

1 5 16 5 121

2 6 20 6

2

Page 40: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Cost functions are our friends

yn = wxn + bn x y

0 1 0

1 5 16

2 6 20

Data ModelModel

Candidate 1

Model Candidate 2 x y ŷ

1 0 4

5 16 12

6 20 14

y = 1x + 0

y = 2x + 2

Cost

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

n x y ŷ (y-ŷ)

0 1 0 1 1

1 5 16 5 121

2 6 20 6 196

2

Page 41: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Cost functions are our friends

yn = wxn + bn x y

0 1 0

1 5 16

2 6 20

Data ModelModel

Candidate 1

Model Candidate 2 x y ŷ

1 0 4

5 16 12

6 20 14

y = 1x + 0

y = 2x + 2

Cost

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

n x y ŷ (y-ŷ)

0 1 0 1 1

1 5 16 5 121

2 6 20 6 196

2

318C(1,0)

Page 42: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Cost functions are our friends

yn = wxn + bn x y

0 1 0

1 5 16

2 6 20

Data ModelModel

Candidate 1

n x y ŷ (y-ŷ)

0 1 0 1 1

1 5 16 5 121

2 6 20 6 196

Model Candidate 2

y = 1x + 0

y = 2x + 2

Cost

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

2

318

n x y ŷ (y-ŷ)

0 1 0 4 16

1 5 16 12 16

2 6 20 14 36

2

68

C(1,0)

C(2,2)

Page 43: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Cost functions are our friends

yn = wxn + bn x y

0 1 0

1 5 16

2 6 20

Data ModelModel

Candidate 1

Model Candidate 2

y = 1x + 0

y = 2x + 2

Cost

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

318

68

C(1,0)

C(2,2)

Page 44: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Cost functions are our friends

yn = wxn + bn x y

0 1 0

1 5 16

2 6 20

Data Model

Cost

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

Page 45: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Cost functions are our friends

yn = wxn + bn x y

0 1 0

1 5 16

2 6 20

Data Model

Cost

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

How to find the parameters w and b?

Page 46: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friends

yn = wxn + bn x y

0 1 0

1 5 16

2 6 20

Data Model

Cost

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2Optimizer

arg min C(w,b)w,b∈[-∞,∞]

Page 47: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

Page 48: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w0,b0 = 2,2 : C(w0,b0) = 68

w

b

y = wx + b

Page 49: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w0,b0 = 2,2 : C(w0,b0) = 68

w

b

2

2

68

y = wx + b

Page 50: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w0,b0 = 2,2 : C(w0,b0) = 68w1,b1 = 3,2 : C(w1,b1) = ?

w

b

y = wx + b

Page 51: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w0,b0 = 2,2 : C(w0,b0) = 68w1,b1 = 3,2 : C(w1,b1) = 26

n x y ŷ (y-ŷ)

0 1 0 5 25

1 5 16 17 1

2 6 20 20 0

C(3,2) 26

w

b

2

y = wx + b

Page 52: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w0,b0 = 2,2 : C(w0,b0) = 68w1,b1 = 3,2 : C(w1,b1) = 26

n x y ŷ (y-ŷ)

0 1 0 5 25

1 5 16 17 1

2 6 20 20 0

C(3,2) 26

w

b

2

y = wx + b

Page 53: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w1,b1 = 3,2 : C(w1,b1) = 26w2,b2 = 4,2 : C(w2,b2) = ??

w

b

y = wx + b

Page 54: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w1,b1 = 3,2 : C(w1,b1) = 26w2,b2 = 4,2 : C(w2,b2) = 136

w

b

n x y ŷ (y-ŷ)

0 1 0 6 36

1 5 16 22 64

2 6 20 26 36

C(4,2) 136

2

y = wx + b

Page 55: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w1,b1 = 3,2 : C(w1,b1) = 26

w

b

y = wx + b

Page 56: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w1,b1 = 3,2 : C(w1,b1) = 26w2,b2 = 3,3 : C(w2,b2) = 41

w

b

n x y ŷ (y-ŷ)

0 1 0 6 36

1 5 16 18 4

2 6 20 21 1

C(3,3) 41

2

y = wx + b

Page 57: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w1,b1 = 3,2 : C(w1,b1) = 26

w

b

y = wx + b

Page 58: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w1,b1 = 3,2 : C(w1,b1) = 26w2,b2 = 3,1 : C(w2,b2) = 17

w

b

n x y ŷ (y-ŷ)

0 1 0 4 16

1 5 16 16 0

2 6 20 19 1

C(3,1) 17

2

y = wx + b

Page 59: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w2,b2 = 3,1 : C(w2,b2) = 17

w

b

y = wx + b

Page 60: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w2,b2 = 3,1 : C(w2,b2) = 17

w

b

w3,b3 = 3,0 : C(w3,b3) = 13

n x y ŷ (y-ŷ)

0 1 0 3 9

1 5 16 15 1

2 6 20 18 4

C(3,0) 13

2

y = wx + b

Page 61: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

w3,b3 = 3,0 : C(w3,b3) = 13

y = wx + b

Page 62: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

w3,b3 = 3,0 : C(w3,b3) = 13w4,b4 = 3,-1 : C(w4,b4) = 17

n x y ŷ (y-ŷ)

0 1 0 2 4

1 5 16 14 4

2 6 20 17 9

C(3,-1) 17

2

y = wx + b

Page 63: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

w3,b3 = 3,0 : C(w3,b3) = 13w4,b4 = 2,0 : C(w4,b4) = 104

n x y ŷ (y-ŷ)

0 1 0 2 4

1 5 16 10 36

2 6 20 12 64

C(2,0) 104

2

y = wx + b

Page 64: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

w3,b3 = 3,0 : C(w3,b3) = 13w4,b4 = 4,0 : C(w4,b4) = 104

n x y ŷ (y-ŷ)

0 1 0 4 16

1 5 16 20 16

2 6 20 24 16

C(2,0) 54

2

y = wx + b

Page 65: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

w3,b3 = 3,0 : C(w3,b3) = 13

y = wx + b

Page 66: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

w?,b? = 4,-2 : C(w?,b?) = ??

y = wx + b

Page 67: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

n x y ŷ (y-ŷ)

0 1 0 2 4

1 5 16 18 4

2 6 20 22 4

C(4,-2) 12

2

w?,b? = 4,-2 : C(w?,b?) = 12

y = wx + b

Page 68: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

w3,b3 = 3,0 : C(w3,b3) = 13

y = wx + b

Page 69: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

w3,b3 = 3,0 : C(w3,b3) = 13

Search Problem

y = wx + b

Page 70: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

w3,b3 = 3,0 : C(w3,b3) = 13w4,b4 = 3.01,0 : C(w4,b4) = 12.82

n x y ŷ (y-ŷ)

0 1 0 3.01 9.06

1 5 16 15.01 0.98

2 6 20 18.01 3.96

C(3.01,0) 12.82

2

y = wx + b

Page 71: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

w*,b* = 4,-2 : C(w*,b*) = 12

y = wx + b

Page 72: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

w*,b* = 4,-2 : C(w*,b*) = 12

y = wx + b

Page 73: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Optimizers are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

w*,b* = 4,-4 : C(w*,b*) = 0

y = wx + b

Page 74: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

Should be used sparingly

y = wx + b

Page 75: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

y = wx + b

w0,b0 = 2,2 : C(w0,b0) = 68

2

2

68

Page 76: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

y = wx + b

w0,b0 = 2,2 : C(w0,b0) = 68

2

2

68

hwhw = 1

Page 77: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

y = wx + b

w0,b0 = 2,2 : C(w0,b0) = 68

2

2

68

hwhw = 1C(w0+hw,b0) = C(3,2) = 26

Page 78: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

y = wx + b

w0,b0 = 2,2 : C(w0,b0) = 68

2

2

68

hwhw = 1C(w0+hw,b0) = C(3,2) = 26 (C(w0+1,b0)-C(w0,b0))

(C(3,2)-C(2,2))=-421

1

rw=

rw=

Page 79: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

y = wx + b

w0,b0 = 2,2 : C(w0,b0) = 68

2

2

68

hwhw = 1, r = -42hw = 0.1, r = -98hw = 0.01, r = -104hw = 0.001, r = -104

Page 80: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

y = wx + b

w0,b0 = 2,2 : C(w0,b0) = 68

2

2

68

hwhw = 1, r = -42hw = 0.1, r = -98hw = 0.01, r = -104hw = 0.001, r = -104 ∂C

∂w(w0,b0)hw → 0, r =

Page 81: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

y = wx + b

w0,b0 = 2,2 : C(w0,b0) = 68

2

2

68

hw∂C

∂w=

∂∑(ŷn-yn) 2

∂wn

Page 82: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

y = wx + b

w0,b0 = 2,2 : C(w0,b0) = 68

2

2

68

hw∂C

∂w=

∂∑(ŷn-yn) 2

∂wn = ∑-2(ŷn-yn)xn

n

Page 83: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w0,b0 = 2,2 : C(w0,b0) = 68

∂C

∂w=

∂∑(ŷn-yn) 2

∂wn = ∑-2(ŷn-yn)xn

n

∂w(w0,b0)hw → 0, rw = = -104

∂C

n x y ŷ (ŷ-y) -2(ŷ-y)x

0 1 0 4 4 8

1 5 16 12 -4 -40

2 6 20 14 -6 -72

Page 84: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w

b

y = wx + b

w0,b0 = 2,2 : C(w0,b0) = 68

2

2

68

hw∂C

∂w=

∂∑(ŷn-yn) 2

∂wn = ∑-2(ŷn-yn)xn

n

∂C

∂b=

∂∑(ŷn-yn) 2

∂bn = ∑-2(ŷn-yn)

n

Page 85: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w0,b0 = 2,2 : C(w0,b0) = 68

∂w(w0,b0)hw → 0, rw = = -104

∂C

n x y ŷ (ŷ-y) -2(ŷ-y)

0 1 0 4 4 8

1 5 16 12 -4 -8

2 6 20 14 -6 -12

∂w(w0,b0)hb → 0, rb = = -12

∂C

Page 86: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Gradients are our friendsOptimizer

arg min C(w,b)w,b∈[-∞,∞]

w0,b0 = 2,2 : C(w0,b0) = 68

∂w(w0,b0)hw → 0, rw = = -104

∂C

∂w(w0,b0)hb → 0, rb = = -12

∂C

w

b

y = wx + b

2

2w1 = w0 - rw

b1 = b0 - rb → Learning Rate

Page 87: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Gradients are our friendsy = 4x-4

Data

0

1

16

5

20

6

?

3

Page 88: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Gradients are our friendsy = 4x-4

Data

0

1

16

5

20

6

8

3

Page 89: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friends

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

∂C

∂w=

∂∑(ŷn-yn)

∂wn = ∑-2(ŷn-yn)xn

n

∂C

∂b=

∂∑(ŷn-yn) 2

∂bn = ∑-2(ŷn-yn)

n

y = wx + b

Easy!

2

Page 90: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friends

Harder!

y = wx + b + tanh(yx + b)2

Page 91: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friends

Computation Graphs can

compute gradients for you!

y = wx + b + tanh(yx + b)2

Page 92: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friends

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

∂C

∂w=

∂∑(ŷn-yn)

∂wn = ∑-2(ŷn-yn)xn

n

∂C

∂b=

∂∑(ŷn-yn) 2

∂bn = ∑-2(ŷn-yn)

n

y = wx + b

2

Page 93: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friends

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

∂C

∂w=

∂(ŷn-yn)

∂ynn

= ∑-2(ŷn-yn)xn n

2

= ∑-2(ŷn-yn) n

y = wx + b

∂yn

∂w

2

∂C

∂b=

∂(ŷn-yn)

∂ynn

∂yn

∂b∑

Page 94: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friends

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

∂C

∂w=

∂(ŷn-yn)

∂ynn

2

y = wx + b

∂yn

∂w

2

∂C

∂b=

∂(ŷn-yn)

∂ynn ∂b∑ ∂yn

Page 95: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friends

C(w,b) = ∑(yn-ŷn)n∈{0,1,2}

2

∂C

∂w=

∂(ŷn-yn)

∂ynn

2

y = o + bo = wx

∂yn

∂w

2

∂C

∂b=

∂(ŷn-yn)

∂ynn ∂b∑ ∂yn

Page 96: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0,1,2}

∂C

∂w=

∂ynn

2

c = dd = y - ŷy = o + bo = wx

∂yn

∂w

2

∂C

∂b=

∂(ŷn-yn)

∂ynn ∂b∑ ∂yn

2

∂(ŷn-yn)

Page 97: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0,1,2}

∂C

∂w=

∂cn

∂dnn

2

c = dd = y - ŷy = o + bo = wx

∂on

∂w∑

∂C

∂b=

∂(ŷn-yn)

∂ynn ∂b∑ ∂yn

2

∂dn

∂yn

∂yn

∂on

Page 98: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0,1,2}

∂C

∂w=

∂cn

∂dnn

c = dd = y - ŷy = o + bo = wx

∂on

∂w∑

∂C

∂b

2

∂dn

∂yn

∂yn

∂on

= ∂cn

∂dnn

∑ ∂dn

∂yn

∂yn

∂b

Page 99: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0,1,2}

∂C

∂w=

∂cn

∂dnn

c = dd = y - ŷy = o + bo = wx

∂on

∂w∑

∂C

∂b

2

∂dn

∂yn

∂yn

∂on

= ∂cn

∂dnn

∑ ∂dn

∂yn

∂yn

∂b

Power 2

Sub

Add

Product

Sub

Page 100: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0,1,2}

∂C

∂w=

∂cn

∂dnn

c = dd = y - ŷy = o + bo = wx

∂on

∂w∑

∂C

∂b

2

∂dn

∂yn

∂yn

∂on

= ∂cn

∂dnn

∑ ∂dn

∂yn

∂yn

∂b

Power 2

Sub

Add

Product

forward(x,y) → zbackward(x,y,dz) → dx,dy

Sub

Page 101: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0,1,2}

∂C

∂w=

∂cn

∂dnn

c = dd = y - ŷy = o + bo = wx

∂on

∂w∑

∂C

∂b

2

∂dn

∂yn

∂yn

∂on

= ∂cn

∂dnn

∑ ∂dn

∂yn

∂yn

∂b

Power 2

Sub

Add

Product

forward(x,y) : return x - ybackward(x,y,dz) : return dz, -dz

Sub

Page 102: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0,1,2}

∂C

∂w=

∂cn

∂dnn

c = dd = y - ŷy = o + bo = wx

∂on

∂w∑

∂C

∂b

2

∂dn

∂yn

∂yn

∂on

= ∂cn

∂dnn

∑ ∂dn

∂yn

∂yn

∂b

Power 2

Sub

Add

Product

forward(x,y) : return x - ybackward(x,y,dz) : return dz, -dz

Sub

Page 103: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0,1,2}

∂C

∂w=

∂cn

∂dnn

c = dd = y - ŷy = o + bo = wx

∂on

∂w∑

∂C

∂b

2

∂dn

∂yn

∂yn

∂on

= ∂cn

∂dnn

∑ ∂dn

∂yn

∂yn

∂b

Power 2

Sub

Add

Product

forward(x,y) : return x - ybackward(x,y,dz) : return 1, -1

Sub ∂dn

∂ŷn

Page 104: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0,1,2}

∂C

∂w=

∂cn

∂dnn

c = dd = y - ŷy = o + bo = wx

∂on

∂w∑

∂C

∂b

2

∂dn

∂yn

∂yn

∂on

= ∂cn

∂dnn

∑ ∂dn

∂yn

∂yn

∂b

Power 2

Sub

Add

Product

o

w x

Product

Page 105: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0,1,2}

∂C

∂w=

∂cn

∂dnn

c = dd = y - ŷ

∂on

∂w∑

∂C

∂b

2

∂dn

∂yn

∂yn

∂on

= ∂cn

∂dnn

∑ ∂dn

∂yn

∂yn

∂b

Power 2

Sub

o

w x

Product

b

Add

y

Page 106: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0,1,2}

∂C

∂w=

∂cn

∂dnn

∂on

∂w∑

∂C

∂b

∂dn

∂yn

∂yn

∂on

= ∂cn

∂dnn

∑ ∂dn

∂yn

∂yn

∂b

Power 2

Sub

o

w x

Product

b

Add

y ŷ

d c

Page 107: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0}

∂C

∂w=

∂cn

∂dnn

∂on

∂w∑

∂C

∂b

∂dn

∂yn

∂yn

∂on

= ∂cn

∂dnn

∑ ∂dn

∂yn

∂yn

∂b

Power 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

Page 108: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0}

∂C

∂w=

∂cn

∂dnn

∂on

∂w∑

∂C

∂b

∂dn

∂yn

∂yn

∂on

= ∂cn

∂dnn

∑ ∂dn

∂yn

∂yn

∂b

Power 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

Input

Page 109: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friends

C(w,b) = ∑cnn∈{0}

∂C

∂w=

∂cn

∂dnn

∂on

∂w∑

∂C

∂b

∂dn

∂yn

∂yn

∂on

= ∂cn

∂dnn

∑ ∂dn

∂yn

∂yn

∂b

Power 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

Input

Parameters

Page 110: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs

Page 111: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables

Variables

Page 112: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables

Variables

2 values: x and dx

0,0

0,0

0,00,0 0,0

Page 113: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables

0,0

0,0

0,00,0 0,0

Page 114: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables

0,0

0,0

0,00,0 0,0

1st

2nd

3rd4th 5th

Page 115: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables

10,0

0,0

0,00,0 0,0

1st

2nd

3rd4th 5th

Page 116: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables

10,0

12,0

0,00,0 0,0

1st

2nd

3rd4th 5th

Page 117: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables

0,0

0,0

0,00,0 0,0

1st

2nd

3rd4th 5th

Page 118: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables

0,0

2,0

0,00,0 0,0

1st

2nd

3rd4th 5th

Page 119: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables

10,0

2,0

0,00,0 0,0

1st

2nd

3rd4th 5th

Page 120: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables

0,0

0,0

0,00,0 0,0

Page 121: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

Add

y

d c Id CForward:

1-Initialize inputs2-Initialize variables3-Topological Sort variables

0,0

0,0

0,00,0 0,0

Page 122: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friends

o

y

d c CForward:

1-Initialize inputs2-Initialize variables3-Topological Sort variables

0,0

0,0

0,00,0 0,0

1st

2nd

3rd4th 5th

Page 123: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

Add

y

d c Add CForward:

1-Initialize inputs2-Initialize variables3-Topological Sort variables

0,0

0,0

0,00,0 0,0

g0,0

Add

s 0,0

Page 124: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friends

o

y

d c CForward:

1-Initialize inputs2-Initialize variables3-Topological Sort variables

0,0

0,0

0,00,0 0,0

g0,0

s 0,0

1st

2nd

3th

4th

5th 6th 7th

Page 125: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them

0,0

0,0

0,00,0 0,0

1st

2nd

3rd

4th 5th

Page 126: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them

10,0

0,0

0,00,0 0,0

1st

2nd

3rd

4th 5th

Page 127: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them

10,0

12,0

0,00,0 0,0

1st

2nd

3rd

4th 5th

Page 128: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them

10,0

12,0

-4,00,0 0,0

1st

2nd

3rd

4th 5th

Page 129: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them

10,0

12,0

-4,016,0 0,0

1st

2nd

3rd

4th 5th

Page 130: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them

10,0

12,0

-4,016,0

1st

2nd

3rd

4th 5th16,0

Page 131: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them

5-Set gradients to final variables

10,0

12,0

-4,016,0

1st

2nd

3rd

4th 5th16,1

Page 132: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,0

12,0

-4,016,0

1st

2nd

3rd

4th 5th16,1

∂C

∂c C=c =1

dc = dC ∂C

∂c

Page 133: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,0

12,0

-4,016,1

1st

2nd

3rd

4th 5th16,1

∂C

∂c C=c =1

dc = dC ∂C

∂c

Page 134: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,0

12,0

-4,016,1

1st

2nd

3rd

4th 5th16,1

c = d2

dd = dc ∂c

∂d

∂c

∂d= 2d

Page 135: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,0

12,0

-4,016,1

1st

2nd

3rd

4th 5th16,1

c = d2

dd = dc ∂c

∂d

∂c

∂d= 2 x -4

Page 136: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,0

12,0

-4,016,1

1st

2nd

3rd

4th 5th16,1

c = d2

dd = dc ∂c

∂d

∂c

∂d= -8

Page 137: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,0

12,0

-4,-816,1

1st

2nd

3rd

4th 5th16,1

c = d2

dd = dc ∂c

∂d

∂c

∂d= -8

Page 138: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,0

12,0

-4,-816,1

1st

2nd

3rd

4th 5th16,1

d = y - ŷ ∂d

∂y= 1

Page 139: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,0

12,-8

-4,-816,1

1st

2nd

3rd

4th 5th16,1

d = y - ŷ ∂d

∂y= 1

dy = dd ∂d

∂y

Page 140: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,-8

12,-8

-4,-816,1

1st

2nd

3rd

4th 5th16,1

y = o + b

∂y

∂o= 1

do = dy ∂y

∂o

Page 141: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,-8

12,-8

-4,-816,1

1st

2nd

3rd

4th 5th16,1

y = o + b

∂y

∂o= 1

∂y

∂b= 1

bt+1 = b - dy ∂y

∂b

Page 142: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,-8

12,-8

-4,-816,1

1st

2nd

3rd

4th 5th16,1

y = o + b

∂y

∂o= 1

∂y

∂b= 1

bt+1 = b - dy ∂y

∂b

Page 143: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,-8

12,-8

-4,-816,1

1st

2nd

3rd

4th 5th16,1

y = o + b

∂y

∂o= 1

∂y

∂b= 1

bt+1 = b - ∂c

∂d

∂d∂y

∂y∂b

∂C

∂c

Page 144: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,-8

12,-8

-4,-816,1

1st

2nd

3rd

4th 5th16,1

y = o + b

∂y

∂o= 1

∂y

∂b= 1

bt+1 = b - ∂C

∂b

Page 145: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52

2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)10,-8

12,-8

-4,-816,1

1st

2nd

3rd

4th 5th16,1

o = wx

∂o

∂w= x

wt+1 = w - do ∂o

∂w

Page 146: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52.8

2.2

Forward:1-Initialize inputs2-Initialize variables3-Topological Sort variables4-For each variable in topological

order, run the forward method of all operations that link to them (Forward)

5-Set gradients to final variables6-run the operations backward method

in reverse order (Backward)7-update parameters 10,-8

12,-8

-4,-816,1

1st

2nd

3rd

4th 5th16,1

o = wx

∂o

∂w= x

wt+1 = w - do ∂o

∂w

Page 147: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Computation Graphs are our friendsPower 2

Sub

o

w x

Product

b

Add

y ŷ

d c Id C

16

52.8

2.210,-8

12,-8

-4,-816,1 16,1

o = wx

∂o

∂w= x

wt+1 = w - do ∂o

∂w

Existing Tools:-Tensorflow ( https://www.tensorflow.org )-Torch ( https://github.com/torch/nn )-CNN ( https://github.com/clab/cnn )-JNN ( https://github.com/wlin12/JNN )-Theano (http://deeplearning.net/software/theano/ )

Page 148: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Into Deep Learning

Page 149: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Nonlinear Neural Modelsy = 4x-4

Data

0

1

16

5

20

6

?

3

Page 150: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Nonlinear Neural Models

Data

0

1

16

5

20

6

?

3

There is a limit of bananas I can give you

Page 151: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

Data

x

y y = 4x-4

Page 152: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data

x

y y = 4x-4

Page 153: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data

x

y y = 2x+3

Model Problem

Page 154: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data

x

y y = 2x+3

Model Problem

Underfitting

Page 155: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data

x

y y = ???

Can we learn arbitrary functions?

Page 156: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Nonlinear Neural Models

y = (w1x + b1)s1 + (w2x+b2)s2

Use different linear functions depending on the value of x?

Page 157: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Nonlinear Neural Models

y = (w1x + b1)s1 + (w2x+b2)s2s1 - 1 if x < 6 and 0 otherwises2 - 1 if x >= 6 and 0 otherwise

Page 158: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Nonlinear Neural Models

y = (w1x + b1)s1 + (w2x+b2)s2

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data

y = (4x - 4)s1 + (0x+20)s2

s1 - 1 if x < 6 and 0 otherwises2 - 1 if x >= 6 and 0 otherwise

Page 159: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Nonlinear Neural Models

s = (wx + b)

(t) = 11 + e-t

Page 160: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Nonlinear Neural Models

s = (1000x)

x = 0.1 then (1000x) = 1

x = -0.1 then (1000x) = 0

Page 161: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Nonlinear Neural Models

s = (1000x)

x = 0.1 then (1000x) = 1

x = -0.1 then (1000x) = 0

Page 162: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Nonlinear Neural Models

s = (1000x - 6000)

x = 6.1 then (1000x - 6000) = 1

x = 5.9 then (1000x - 6000) = 0

Page 163: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Nonlinear Neural Models

y = (w1x + b1)s1 + (w2x+b2)s2

s1 = (w3x + b3)s2 = (w4x + b4)

Page 164: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (4x - 4)s1 + (0x+20)s2

s1 = (-1000x + 6000)s2 = (1000x - 6000)

Page 165: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (4x - 4)s1 + (0x+20)s2

s1 = (-1000x + 6000)s2 = (1000x - 6000)

Page 166: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (16)s1 + (0x+20)s2

s1 = (-1000x + 6000)s2 = (1000x - 6000)

Page 167: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (16)s1 + (20)s2

s1 = (-1000x + 6000)s2 = (1000x - 6000)

Page 168: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (16)s1 + (20)s2

s1 = (1000)s2 = (1000x - 6000)

Page 169: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (16)s1 + (20)s2

s1 = (1000)s2 = (-1000)

Page 170: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Nonlinear Neural Models

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (16)1 + (20)0

s1 = (1000)s2 = (-1000)

Page 171: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = 16

s1 = (1000)s2 = (-1000)

Nonlinear Neural Models

Page 172: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (4x - 4)s1 + (0x+20)s2

s1 = (-1000x + 6000)s2 = (1000x - 6000)

Nonlinear Neural Models

Page 173: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (32)s1 + (0x+20)s2

s1 = (-1000x + 6000)s2 = (1000x - 6000)

Nonlinear Neural Models

Page 174: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (32)s1 + (20)s2

s1 = (-1000x + 6000)s2 = (1000x - 6000)

Nonlinear Neural Models

Page 175: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (32)s1 + (20)s2

s1 = (-3000)s2 = (1000x - 6000)

Nonlinear Neural Models

Page 176: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (32)s1 + (20)s2

s1 = (-3000)s2 = (3000)

Nonlinear Neural Models

Page 177: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = (32)0 + (20)1

s1 = (-3000)s2 = (3000)

Nonlinear Neural Models

Page 178: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data y = 20

s1 = (-3000)s2 = (3000)

Nonlinear Neural Models

Page 179: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Data

0

1

16

5

20

6

?

3

If you give me too many apples, I will give them to...

Nonlinear Neural Models

Page 180: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Data

0

1

16

5

20

6

?

3

Count Von Count

Nonlinear Neural Models

Page 181: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Multilayer Perceptrons

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

Data

x

y y = (4x - 4)s1 + (0x+20)s2

Page 182: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

x

y y = (4x - 4)s1 + (0x+20)s2

Multilayer Perceptrons

Page 183: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (4x - 4)s1 + (0x+20)s2 + (0x+1)s3 s1 = (-1000x + 6000)s2 = ????s3 = (1000x - 15000)

Multilayer Perceptrons

Page 184: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (4x - 4)s1 + (0x+20)s2 + (0x+1)s3 s1 = (-1000x + 6000)s2 = not s1 and not s3

s3 = (1000x - 15000)

Multilayer Perceptrons

Page 185: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3

s1 = (w4x + b4)s2 = (w5s1 + w6s3 + b5)s3 = (w7x + b6)

Multilayer Perceptrons

Page 186: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3

s1 = (w4x + b4)s2 = (w5s1 + w6s3 + b5)s3 = (w7x + b6)

Layer 1 Perceptron

Layer 1 Perceptron

Multilayer Perceptrons

Page 187: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3

s1 = (w4x + b4)s2 = (w5s1 + w6s3 + b5)s3 = (w7x + b6)

Layer 2 Perceptron

Layer 1 Perceptron

Layer 1 Perceptron

Multilayer Perceptrons

Page 188: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (4x - 4)s1 + (0x+20)s2 + (0x+1)s3 s1 = (-1000x + 6000)s2 = not s1 and not s3

s3 = (1000x - 15000)

Multilayer Perceptrons

Page 189: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (4x - 4)s1 + (0x+20)s2 + (0x+1)s3 s1 = (-1000x + 6000)s2 = (-1000s1 - 1000s3 + 500)s3 = (1000x - 15000)

Multilayer Perceptrons

Page 190: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (4x - 4)s1 + (0x+20)s2 + (0x+1)s3 s1 = (-1000x + 6000)s2 = (-1000s1 - 1000s3 + 500)s3 = (1000x - 15000)

Multilayer Perceptrons

Page 191: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (40)s1 + (20)s2 + (1)s3 s1 = (-1000x + 6000)s2 = (-1000s1 - 1000s3 + 500)s3 = (1000x - 15000)

Multilayer Perceptrons

Page 192: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (40)s1 + (20)s2 + (1)s3 s1 = (-5000) = 0s2 = (-1000s1 - 1000s3 + 500)s3 = (-4000) = 0

Multilayer Perceptrons

Page 193: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (40)s1 + (20)s2 + (1)s3 s1 = (-5000) = 0s2 = (-1000s4 - 1000s5 + 500)s3 = (-4000) = 0

Multilayer Perceptrons

Page 194: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (40)s1 + (20)s2 + (1)s3 s1 = (-5000) = 0s2 = (500)s3 = (-4000) = 0

Multilayer Perceptrons

Page 195: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (40)s1 + (20)s2 + (1)s3 s1 = (-5000) = 0s2 = (500) = 1s3 = (-4000) = 0

Multilayer Perceptrons

Page 196: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (40)0 + (20)1 + (1)0s1 = (-5000) = 0s2 = (500) = 1s3 = (-4000) = 0

Multilayer Perceptrons

Page 197: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = 20s1 = (-5000) = 0s2 = (500) = 1s3 = (-4000) = 0

Multilayer Perceptrons

Page 198: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (4x - 4)s1 + (0x+20)s2 + (0x+1)s3 s1 = (-1000x + 6000)s2 = (-1000s1 - 1000s3 + 500)s3 = (1000x - 15000)

Multilayer Perceptrons

Page 199: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (772)s1 + (20)s2 + (1)s3 s1 = (-1000x + 6000)s2 = (-1000s4 - 1000s5 + 500)s3 = (1000x - 15000)

Multilayer Perceptrons

Page 200: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (772)s1 + (20)s2 + (1)s3 s1 = (-13000) = 0s2 = (-1000s4 - 1000s5 + 500)s3 = (4000) = 1

Multilayer Perceptrons

Page 201: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (772)s1 + (20)s2 + (1)s3 s1 = (-13000) = 0s2 = (-1000 + 0 + 500)s3 = (4000) = 1

Multilayer Perceptrons

Page 202: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (772)s1 + (20)s2 + (1)s3 s1 = (-13000) = 0s2 = (-500) = 0s3 = (4000) = 1

Multilayer Perceptrons

Page 203: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = (772)0 + (20)0 + (1)1s1 = (-13000) = 0s2 = (-500) = 0s3 = (4000) = 1

Multilayer Perceptrons

Page 204: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

y = 1s1 = (-13000) = 0s2 = (-500) = 0s3 = (4000) = 1

Multilayer Perceptrons

Page 205: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

x

yy = (4x - 4)s1 + (0x+20)s2 + (0x+1)s3

Multilayer Perceptrons

Page 206: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3

s1 = (w4x + b4)s2 = (w5s1 + w6s3 + b5)s3 = (w7x + b6)

Layer 2 Perceptron

Layer 1 Perceptron

Layer 1 Perceptron

Multilayer Perceptrons

Page 207: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3

s1 = (w4x + b4)s2 = (w5s1 + w6s3 + b5)s3 = (w7x + b6)

x

s1

s3

s2

w4x

b4

Multilayer Perceptrons

Page 208: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3

s1 = (w4x + b4)s2 = (w5s1 + w6s3 + b5)s3 = (w7x + b6)

x

s2

w4x

b4

w7x

b5

s1

s3

Multilayer Perceptrons

Page 209: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3

s1 = (w4x + b4)s2 = (w5s1 + w6s3 + b5)s3 = (w7x + b6)

x

s2

s1

s3

w6s3w5s1

b5

Multilayer Perceptrons

Page 210: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3

x

s2

s1

s3x < 6 x > 15

!(x > 15) & !(x < 6)

Multilayer Perceptrons

Page 211: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3

x

s2

s1

s3x < 6 x > 15

x∈[6,15]

Multilayer Perceptrons

Page 212: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

x

s2

s1

s3x < 6 x > 15

x∈[6,15]

s4

x∈]-∞,6] & ]15,∞]

Multilayer Perceptrons

Page 213: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

x

s5

s1

s2x < 6 x > 15

x∈[6,15]

s3 x > 2

s4 x < 3

s7

s6

s7

x∈]-∞,6] & ]15,∞] x∈[2,15] x∈[2,3]

Multilayer Perceptrons

Page 214: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

x

s5

s1

s2x < 6 x > 15

x∈[6,15]

s3 x > 2

s4 x < 3

s7

s6

s7

x∈]-∞,6] & ]15,∞] x∈[2,15] x∈[2,3]

Input

Layer 1 (Input Features)

Layer 2 (And and Or Combinations)

Multilayer Perceptrons

Page 215: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

x

s5

s1

s2x < 6 x > 15

x∈[6,15]

s3 x > 2

s4 x < 3

s7

s6

s7

x∈]-∞,6] & ]15,∞] x∈[2,15] x∈[2,3]

Input

Layer 1 (Input Features)

Layer 2 (And and Or Combinations)

And(s1,s2) = (1000s1 + 1000s3 - 1500)Or(s1,s2) = (1000s1 + 1000s3 - 500)

Multilayer Perceptrons

Page 216: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

x

s5

s1

s2

s3

s4

s7

s6

s7

Input

Layer 1 (Input Features)

Layer 2 (And and Or Combinations)

Layer 3 (Xor Combinations)s8

s9

sa

sb

Multilayer Perceptrons

Page 217: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

x

s5

s1

s2

s3

s4

s7

s6

s7

Input

Layer 1 (Input Features)

Layer 2 (And and Or Combinations)

Layer 3 (Xor Combinations)s8

s9

sa

sb

Xor(s1,s2) = Or(And(s1,!s2), And(!s1,s2))

Multilayer Perceptrons

Page 218: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

x

s5

s1

s2

s3

s4

s7

s6

s7

Input

Layer 1 (Input Features)

Layer 2 (And and Or Combinations)

Layer 3 (Xor Combinations)s8

s9

sa

sb

Xor(s1,s2) = Or(s5, s6)

Multilayer Perceptrons

Page 219: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

x

y

Universal approximator

Multilayer Perceptrons

Page 220: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

x

y

but...

Multilayer Perceptrons

Page 221: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 9 20

4 11 20

5 15 1

6 19 1

Data

x

y

No guarantee that the best function will

be found

Multilayer Perceptrons

Page 222: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

x

s5

s1

s2x > 1 x < 2

x∈]-∞,1]

s3 x < 5

s4 x < 6

s7

s6

x∈[5,6[ x∈[6,∞]

n x y

0 1 0

1 5 16

2 6 20

y

Multilayer Perceptrons

Page 223: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

x

s5

s1

s2x > 1 x < 2

x∈]-∞,1]

s3 x < 5

s4 x < 6

s7

s6

x∈[5,6[ x∈[6,∞]

n x y

0 1 0

1 5 16

2 6 20

y = 0s5 + 16s6 + 20s7

y

Multilayer Perceptrons

Page 224: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

x

s5

s1

s2x > 1 x < 2

x∈]-∞,1]

s3 x < 5

s4 x < 6

s7

s6

x∈[5,6[ x∈[6,∞]

n x y

0 1 0

1 5 16

2 6 20

y

y = 0s5 + 16s6 + 20s7

Multilayer Perceptrons

Page 225: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

x

s5

s1

s2x > 1 x < 2

x∈]-∞,1]

s3 x < 5

s4 x < 6

s7

s6

x∈[5,6[ x∈[6,∞]

n x y

0 1 0

1 5 16

2 6 20

y

y = 0s5 + 16s6 + 20s7

Multilayer Perceptrons

Page 226: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

x

s5

s1

s2x > 1 x < 2

x∈]-∞,1]

s3 x < 5

s4 x < 6

s7

s6

x∈[5,6[ x∈[6,∞]

n x y

0 1 0

1 5 16

2 6 20Overfitting

y = 0s5 + 16s6 + 20s7

Multilayer Perceptrons

y

Model Problem

Page 227: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Task Complexity

Model Complexity

Multilayer Perceptrons

Page 228: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Task Complexity

Model Complexity

Underfitting

Multilayer Perceptrons

Page 229: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Task Complexity

Model Complexity

Overfitting

Underfitting

Multilayer Perceptrons

Page 230: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Task Complexity

Model Complexity

Overfitting

Underfitting

Happy Zone

Multilayer Perceptrons

Page 231: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Task Complexity

Model Complexity

Overfitting

Underfitting

Happy Zone

Line

ar R

egre

ssio

n

MLP

1 L

ayer

MLP

2 L

ayer

MLP

3 L

ayer

Multilayer Perceptrons

Page 232: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Task Complexity

Model Complexity

Overfitting

Underfitting

Happy Zone

Line

ar R

egre

ssio

n

Line

ar

Reg

ress

ion

mor

e fe

atur

es

Multilayer Perceptrons

Page 233: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Task Complexity

Model Complexity

Overfitting

Underfitting

Happy Zone

Line

ar R

egre

ssio

n

MLP

1 L

ayer

MLP

2 L

ayer

MLP

3 L

ayer

Multilayer Perceptrons

Page 234: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Task Complexity

Model Complexity

Overfitting

Underfitting

Happy Zone

Line

ar R

egre

ssio

n

MLP

1 L

ayer

MLP

2 L

ayer

MLP

3 L

ayer

Sentiment analysis

Multilayer Perceptrons

Page 235: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Task Complexity

Model Complexity

Overfitting

Underfitting

Happy Zone

Line

ar R

egre

ssio

n

MLP

1 L

ayer

MLP

2 L

ayer

MLP

3 L

ayer

Sentiment analysis

Machine Translation

Multilayer Perceptrons

Page 236: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Task Complexity

Model Complexity

Overfitting

Underfitting

Happy Zone

Data

Multilayer Perceptrons

Page 237: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Task Complexity

Model Complexity

Overfitting

Underfitting

Happy Zone

Data

Multilayer Perceptrons

Page 238: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Task Complexity

Model Complexity

Overfitting

Underfitting

Happy Zone

Data

Multilayer Perceptrons

Page 239: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

yn x y

0 1 0

1 5 16

2 6 20

y y

Multilayer Perceptrons

Page 240: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

yn x y

0 1 0

1 5 16

2 6 20

3 2 4

y y

Multilayer Perceptrons

Page 241: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

n x y

0 1 0

1 5 16

2 6 20

3 2 4

y y

Multilayer Perceptrons

Page 242: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Task Complexity

Model Complexity

Overfitting

Underfitting

Happy Zone

Model Bias

Multilayer Perceptrons

Page 243: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Task Complexity

Model Complexity

Overfitting

Underfitting

Happy Zone

Model BiasL1 & L2 RegularizationStochastic Dropout (Srivastava et al, 2014)Model Structure (CNN, RNNs)

Multilayer Perceptrons

Page 244: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Regularization

C(w,b) = ∑(yn-ŷn) + (w+b)ß

ß = Regularization constantn∈{0,1,2}

2

Multilayer Perceptrons

Page 245: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

x

s5

s1

s2x > 1 x < 2

x∈]-∞,1]

s3 x < 5

s4 x < 6

s7

s6

x∈[5,6[ x∈[6,∞]

y

Regularization

Multilayer Perceptrons

Page 246: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

x

s5

s1

s2x > 1 nothing

x∈]-∞,1]

s3 nothing

s4 x < 6

s7

s6

nothing x∈[6,∞]

y

Regularization

Multilayer Perceptrons

Page 247: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

x

s5

s1

s2x > 1 nothing

x∈]-∞,1]

s3 nothing

s4 x < 6

s7

s6

nothing x∈[6,∞]

y

Regularization

Find solutions that require less effort

Multilayer Perceptrons

Page 248: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

x

s5

s1

s2x > 1 x < 2

x∈]-∞,1]

s3 x < 5

s4 x < 6

s7

s6

x∈[5,6[ x∈[6,∞]

y

Stochastic Dropout (Srivastava et al, 2014)

Multilayer Perceptrons

Page 249: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Stochastic Dropout (Srivastava et al, 2014)

x

s5

s1

s2x > 1 0

x∈]-∞,1]

s3 x < 5

s4 x < 6

s7

s6

0 0

y

Multilayer Perceptrons

Page 250: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Stochastic Dropout (Srivastava et al, 2014)

x

s5

s1

s2x > 1 0

x∈]-∞,1]

s3 x < 5

s4 x < 6

s7

s6

0 0

y Find robust models

Multilayer Perceptrons

Page 251: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Model Structure

Weighted sum of linear functions VS MLP

y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3

Multilayer Perceptrons

Page 252: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Model Structure

Weighted sum of linear functions VS MLP

y = (w1x + b1)s1 + (w2x+b2)s2 + (w3x+b3)s3

Convolutional Vs RNNs

Multilayer Perceptrons

Page 253: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

s1 = (w4x + b4)s2 = (w5s1 + w6s3 + b5)s3 = (w7x + b6)

x

s2

s1

s3

w6s3w5s1

b5

Representation

Multilayer Perceptrons

Page 254: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

s1 = (W3x + b3)s2 = (W4s1 + b4)

Representation

s1

s2

2

1

1xx

s2

s1

s3

Multilayer Perceptrons

Page 255: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Representation

s1

s2

1000

1000

100x

s1 = (Ws2 + b)

Multilayer Perceptrons

Page 256: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Representation

s1

s2

1000

1000

100x

s1 = (Ws2 + b)Tensoflow Code

s1 = tf.matmul(x, W1) + b1

s1 = tf.nn.sigmoid(s1)

s2 = tf.matmul(s1, W2) + b2

s2 = tf.nn.sigmoid(s2)

Multilayer Perceptrons

Page 257: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Using Discrete Variables

Data

0

1

16

5

20

6

?

3

Page 258: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Using Discrete Variables

Data

0

1

16

5

20

6

?

3

Page 259: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Using Discrete Variables

Data

0

1

16

5

20

6

?

3

?

Page 260: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Using Discrete Variables

x

s5

s1

s2

s3

s4

s7

s6

y

Number of fruit to offer

Number of fruit received

Page 261: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Using Discrete Variablesx

y

Number of fruit to offer

Number of fruit received

s1

s2

Page 262: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Using Discrete Variablesx

y

Number of fruit to offer

uType of fruit to offer

v Number of fruit receivedType of fruit received

s1

s2

Page 263: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Using Discrete Variablesx

y

Number of fruit to offer

uType of fruit to offer

v Number of fruit receivedType of fruit received

s1

s2

u∈{Apple, Banana, Coconut}

v∈{Apple, Banana, Coconut}

Page 264: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Using Discrete VariablesLookup Tables

e1 e2 e3 e4

Apple 0.1 -0.4 0.2 0.5

Banana 0.4 1.4 -1.0 0.1

Coconut 1.1 0.9 1.1 0.5

u

V = 3

Page 265: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Using Discrete VariablesLookup Tables

e1 e2 e3 e4

Apple 0.1 -0.4 0.2 0.5

Banana 0.4 1.4 -1.0 0.1

Coconut 1.1 0.9 1.1 0.5

u

V = 3

Page 266: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Using Discrete VariablesLookup Tables

e1 e2 e3 e4

Apple 0.1 -0.4 0.2 0.5

Banana 0.4 1.4 -1.0 0.1

Coconut 1.1 0.9 1.1 0.5

u

Embedding for u Size = 4

V = 3

Page 267: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Using Discrete VariablesLookup Tables

e1 e2 e3 e4

Apple 0.1 -0.4 0.2 0.5

Banana 0.4 1.4 -1.0 0.1

Coconut 1.1 0.9 1.1 0.5

u

Embedding for u

Banana

Size = 4

V = 3

Page 268: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Using Discrete VariablesLookup Tables

e1 e2 e3 e4

0 0.1 -0.4 0.2 0.5

1 0.4 1.4 -1.0 0.1

2 1.1 0.9 1.1 0.5

u

Embedding for u

1

Size = 4

V = 3

Page 269: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Using Discrete VariablesLookup Tables

u

Embedding for u

1

Lookup

Size = 4

Page 270: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Using Discrete Variablesx

y

Number of fruit to offer

uType of fruit to offer

v Number of fruit receivedType of fruit received

s1

s2

u∈{Apple, Banana, Coconut}

v∈{Apple, Banana, Coconut}

eu

Lookup

Page 271: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Using Discrete VariablesSoftmax

V = 3

Apple Banana Coconut

w1 0.1 -0.4 0.2

w2 0.4 1.4 -1.0

w3 1.1 0.9 1.1

w4 1.3 0.1 0.4

Page 272: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Using Discrete VariablesSoftmax

Input vector Size = 4V = 3

Apple Banana Coconut

w1 0.1 -0.4 0.2

w2 0.4 1.4 -1.0

w3 1.1 0.9 1.1

w4 1.3 0.1 0.4

Page 273: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Using Discrete VariablesSoftmax

Input vector Size = 4

logits Size = V

V = 3

Apple Banana Coconut

w1 0.1 -0.4 0.2

w2 0.4 1.4 -1.0

w3 1.1 0.9 1.1

w4 1.3 0.1 0.4

Page 274: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Using Discrete VariablesSoftmax

Input Vector

Logits

V = 3

Apple Banana Coconut

w1 0.1 -0.4 0.2

w2 0.4 1.4 -1.0

w3 1.1 0.9 1.1

w4 1.3 0.1 0.4

s1

s2

s3

s4

d1

d2

d3

1 -1 -2

Page 275: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Using Discrete VariablesSoftmax

Input Vector

Logits

V = 3

Apple Banana Coconut

w1 0.1 -0.4 0.2

w2 0.4 1.4 -1.0

w3 1.1 0.9 1.1

w4 1.3 0.1 0.4

s1

s2

s3

s4

d1

d2

d3

1 -1 -2

p1

p2

p2

0.84 0.11 0.05

Page 276: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Using Discrete VariablesSoftmax

Input Vector

Logits

V = 3

Apple Banana Coconut

w1 0.1 -0.4 0.2

w2 0.4 1.4 -1.0

w3 1.1 0.9 1.1

w4 1.3 0.1 0.4

s1

s2

s3

s4

d1

d2

d3

1 -1 -2

p1

p2

p2

0.84 0.11 0.05

Apple

Page 277: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Using Discrete Variablesx

y

Number of fruit to offer

uType of fruit to offer

v Number of fruit receivedType of fruit received

s1

s2

u∈{Apple, Banana, Coconut}

v∈{Apple, Banana, Coconut}

eu

Softmax

Lookup

Page 278: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Using Discrete Variablesx

y

Number of fruit to offer

uType of fruit to offer

v Number of fruit receivedType of fruit received

s1

s2

u∈{Apple, Banana, Coconut}

v∈{Apple, Banana, Coconut}

eu

Softmax

Lookup

Page 279: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Example Applications

Window-based Tagging (Collobert et al, 2011)

Abby likes to eat apples and bananas

NNP VBZ TO VB NNS CC NNS

Page 280: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Example Applications

Window-based Tagging (Collobert et al, 2011)

Abby likes to eat apples and bananas

e-2 e-1 e-0 e1 e2

Page 281: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Example Applications

Window-based Tagging (Collobert et al, 2011)

Abby likes to eat apples and bananas

e-2 e-1 e-0 e1 e2 Word Embeddings

Non-Linear Layer 1s1

s2 Non-Linear Layer 2

Page 282: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Example Applications

Window-based Tagging (Collobert et al, 2011)

Abby likes to eat apples and bananas

e-2 e-1 e-0 e1 e2 Word Embeddings

Non-Linear Layer 1s1

s2 Non-Linear Layer 2

VB Softmax

Page 283: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Example Applications

Window-based Tagging (Collobert et al, 2011)

Abby likes to eat apples and bananas

e-2 e-1 e-0 e1 e2 Word Embeddings

Non-Linear Layer 1s1

s2 Non-Linear Layer 2

VB Softmax

Page 284: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Example Applications

Window-based Tagging (Collobert et al, 2011)

Page 285: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Example Applications

Translation Rescoring (Devlin et al, 2014)

Abby likes to eat apples and bananas

Page 286: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Example Applications

Translation Rescoring (Devlin et al, 2014)

Abby likes to eat apples and bananas

ContextPredict

Page 287: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Example Applications

Translation Rescoring (Devlin et al, 2014)

Abby likes to eat apples and bananas

e-4 e-3 e-2 e-1

s1

s2

Softmax

Page 288: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Example Applications

Translation Rescoring (Devlin et al, 2014)

Abby likes to eat apples and bananas

0.2<s>

Page 289: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Example Applications

Translation Rescoring (Devlin et al, 2014)

Abby likes to eat apples and bananas

0.10.2

Page 290: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Example Applications

Translation Rescoring (Devlin et al, 2014)

Abby likes to eat apples and bananas

0.10.2 0.3

Page 291: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Example Applications

Translation Rescoring (Devlin et al, 2014)

Abby likes to eat apples and bananas

0.10.2 0.3 0.5 0.7 0.4 0.20.000378

Page 292: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Example Applications

Translation Rescoring (Devlin et al, 2014)

Abby likes to eat apples and bananas 0.000378

Abby dislikes to drink apples and bananas 0.00012

John does to eat coconuts and bananas 0.00003

Page 293: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Example Applications

Translation Rescoring (Devlin et al, 2014)

Abby likes to eat apples and bananas 0.000378

Abby dislikes to drink apples and bananas 0.00012

John does to eat coconuts and bananas 0.00003

Page 294: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Example Applications

Translation Rescoring (Devlin et al, 2014)

Abby likes to eat apples and bananas

ContextPredict

Translation

Source

Abby gosta de comer macas e bananas

Page 295: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Example Applications

Translation Rescoring (Devlin et al, 2014)

Abby likes to eat apples and bananas

ContextPredict

Translation

Source

Abby gosta de comer macas e bananas

Page 296: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Example Applications

Translation Rescoring (Devlin et al, 2014)

Abby likes to eat apples and bananas

Translation

macas

e-4 e-3 e-2 e-1

s1

s2

f-1

Page 297: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Example Applications

Translation Rescoring (Devlin et al, 2014)

Translation Score (BLEU) Arabic - English Chinese - English

Best Rescored System 52.8 34.7

1st OpenMT12 49.5 32.6

Hierarchical 43.4 30.1

Page 298: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline
Page 299: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Deep Neural Networks are our friends?Convolutional Neural Network

Page 300: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Deep Neural Networks are our friends?Convolutional Neural Network

x1 x2 x3 x4

x5 x6 x7 x8

x9 x10 x11 x12

x13 x14 x15 x16

4x4 image

Page 301: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Deep Neural Networks are our friends?Convolutional Neural Network

x1 x2 x3 x4

x5 x6 x7 x8

x9 x10 x11 x12

x13 x14 x15 x16

4x4 image

Page 302: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Deep Neural Networks are our friends?Convolutional Neural Network

x1 x2 x3 x4

x5 x6 x7 x8

x9 x10 x11 x12

x13 x14 x15 x16

4x4 image

z1

x1

x2

...

x11

z1

w9

w1

Page 303: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Deep Neural Networks are our friends?Convolutional Neural Network

x1 x2 x3 x4

x5 x6 x7 x8

x9 x10 x11 x12

x13 x14 x15 x16

4x4 image

z1 z2

x2

x3

...

x12

z1

w1

w9

Page 304: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Deep Neural Networks are our friends?Convolutional Neural Network

x1 x2 x3 x4

x5 x6 x7 x8

x9 x10 x11 x12

x13 x14 x15 x16

4x4 image

z1 z2

z3 z4

Page 305: Deep Neural Networks Are Our Friendslxmls.it.pt/2016/Deep-Neural-Networks-Are-Our-Friends.pdf · 2016. 7. 26. · Gradients are our friends Computation Graphs are our friends Outline

Deep Neural Networks are our friends?Convolutional Neural Network

x1 x2 x3 x4

x5 x6 x7 x8

x9 x10 x11 x12

x13 x14 x15 x16

4x4 image

z1 z2

z3 z4

z1

z2

z3

z4

y Is this a cat?