19
CSE 473 Lecture 28 (Chapter 18) Neural Networks and Ensemble Learning © CSE AI faculty + Chris Bishop, Dan Klein, Stuart Russell, Andrew Moore k u

Neural Networks and Ensemble Learning · 2013. 12. 6. · Bagging: Details 1. Generate m new training datasets by sampling with replacement from the given dataset 2. Train m classifiers

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Neural Networks and Ensemble Learning · 2013. 12. 6. · Bagging: Details 1. Generate m new training datasets by sampling with replacement from the given dataset 2. Train m classifiers

CSE 473

Lecture 28 (Chapter 18)

Neural Networks and

Ensemble Learning

© CSE AI faculty + Chris Bishop, Dan Klein, Stuart Russell, Andrew Moore

ku

Page 2: Neural Networks and Ensemble Learning · 2013. 12. 6. · Bagging: Details 1. Generate m new training datasets by sampling with replacement from the given dataset 2. Train m classifiers

What if you want your neural network to predict continuous

outputs rather than +1/-1 (i.e., perform regression)?

E.g., Teaching a network to drive

Image Source: Wikimedia Commons

Page 3: Neural Networks and Ensemble Learning · 2013. 12. 6. · Bagging: Details 1. Generate m new training datasets by sampling with replacement from the given dataset 2. Train m classifiers

Input nodes

aeag

1

1)(

a

(a) 1

Sigmoid output function:

Parameter controls

the slope

g(a)

)()( i

i

i

T uwggv uw

u = (u1 u2 u3)T

w

Output

Continuous Outputs with Sigmoid Networks

Page 4: Neural Networks and Ensemble Learning · 2013. 12. 6. · Bagging: Details 1. Generate m new training datasets by sampling with replacement from the given dataset 2. Train m classifiers

4

Learning the weights

Given: Training data (input u, desired output d)

Problem: How do we learn the weights w?

Idea: Minimize squared error between network’s output and desired output:

2)()( vdE w

)( uw gvwhere

Starting from random values for w, want to

change w so that E(w) is minimized – How?

)( uw g

u = (u1 u2 u3)T

w

v =

Page 5: Neural Networks and Ensemble Learning · 2013. 12. 6. · Bagging: Details 1. Generate m new training datasets by sampling with replacement from the given dataset 2. Train m classifiers

Learning by Gradient-Descent (opposite of “Hill-Climbing”)

Change w so that E(w) is minimized

• Use Gradient Descent: Change w in proportion to –dE/dw (why?)

uuwww

)()(2)(2 gvdd

dvvd

d

dE

Derivative of sigmoid

delta = error

Also known as the “delta rule” or

“LMS (least mean square) rule”

www

d

dE

2)()( vdE w )( uw gv

Page 6: Neural Networks and Ensemble Learning · 2013. 12. 6. · Bagging: Details 1. Generate m new training datasets by sampling with replacement from the given dataset 2. Train m classifiers

6

But wait!

This rule is for a one layer network

• One layer networks are not that interesting!!

(remember XOR?)

What if we have multiple

layers?

Page 7: Neural Networks and Ensemble Learning · 2013. 12. 6. · Bagging: Details 1. Generate m new training datasets by sampling with replacement from the given dataset 2. Train m classifiers

7

Learning Multilayer Networks

2)(2

1),( i

i

i vdE wW

Start with random weights W, w

Given input vector u, network

produces output vector v

Use gradient descent to find W

and w that minimize total error

over all output units (labeled i):

))(( k

k

kj

j

jii uwgWgv

This leads to the famous “Backpropagation” learning rule

ku

Page 8: Neural Networks and Ensemble Learning · 2013. 12. 6. · Bagging: Details 1. Generate m new training datasets by sampling with replacement from the given dataset 2. Train m classifiers

8

Backpropagation: Output Weights

j

j

jjiii

ji

ji

jiji

xxWgvddW

dE

dW

dEWW

)()(

{delta rule}

)( j

j

jii xWgv

ku

jx

Learning rule for hidden-output weights W:

2)(2

1),( i

i

i vdE wW

{gradient descent}

Page 9: Neural Networks and Ensemble Learning · 2013. 12. 6. · Bagging: Details 1. Generate m new training datasets by sampling with replacement from the given dataset 2. Train m classifiers

9

Backpropagation: Hidden Weights )( j

j

ji

m

i xWgv

ku

kk

k

kjji

j

jjiii

ikj

kj

j

jkjkj

kjkj

uuwgWxWgvddw

dE

dw

dx

dx

dE

dw

dE

dw

dEww

)()()(

: But {chain rule}

)( k

k

kjj uwgx

Learning rule for input-hidden weights w:

2)(2

1),( i

i

i vdE wW

Page 10: Neural Networks and Ensemble Learning · 2013. 12. 6. · Bagging: Details 1. Generate m new training datasets by sampling with replacement from the given dataset 2. Train m classifiers

Examples: Pole Balancing and Backing up a Truck (courtesy of Keith Grochow)

• Neural network learns to balance a pole on a cart

• Input: xcart, vcart, θpole, vpole

• Output: New force on cart

• Network learns to back a truck into a loading dock

• Input: x, y, θ of truck

• Output: Steering angle

xcart

vcart

vpole

θpole

Page 11: Neural Networks and Ensemble Learning · 2013. 12. 6. · Bagging: Details 1. Generate m new training datasets by sampling with replacement from the given dataset 2. Train m classifiers

11

Ensemble Learning

Sometimes each learning technique yields a different “hypothesis” (function)

But no perfect hypothesis…

Could we combine several imperfect hypotheses to get a better hypothesis?

Page 12: Neural Networks and Ensemble Learning · 2013. 12. 6. · Bagging: Details 1. Generate m new training datasets by sampling with replacement from the given dataset 2. Train m classifiers

Why Ensemble Learning?

Wisdom of the Crowds…

Page 13: Neural Networks and Ensemble Learning · 2013. 12. 6. · Bagging: Details 1. Generate m new training datasets by sampling with replacement from the given dataset 2. Train m classifiers

13

Example

This line is one simple classifier saying that

everything to the left is + and everything to the right is -

Combine 3 linear classifiers

More complex classifier

Page 14: Neural Networks and Ensemble Learning · 2013. 12. 6. · Bagging: Details 1. Generate m new training datasets by sampling with replacement from the given dataset 2. Train m classifiers

14

Analogies:

• Elections combine voters’ choices to pick a good candidate (hopefully)

• Committees combine experts’ opinions to make better decisions

• Students working together on a capstone project

Intuitions:

Individuals make mistakes but the “majority” may be less likely to

Individuals often have partial knowledge; a committee can pool expertise to make better decisions

Ensemble Learning: Motivation

Page 15: Neural Networks and Ensemble Learning · 2013. 12. 6. · Bagging: Details 1. Generate m new training datasets by sampling with replacement from the given dataset 2. Train m classifiers

15

Ensemble Technique 1: Bagging

Combine hypotheses (classifiers) via majority voting

Page 16: Neural Networks and Ensemble Learning · 2013. 12. 6. · Bagging: Details 1. Generate m new training datasets by sampling with replacement from the given dataset 2. Train m classifiers

Bagging: Details

1. Generate m new training datasets by sampling with

replacement from the given dataset

2. Train m classifiers h1,…,hm (e.g., decision trees),

one from each newly generated dataset

3. Classify a new input by running it through the m

classifiers and choosing the class that receives the

most “votes”

Example: Random forest = Bagging with m

decision tree classifiers, each tree constructed

from random subset of attributes

Page 17: Neural Networks and Ensemble Learning · 2013. 12. 6. · Bagging: Details 1. Generate m new training datasets by sampling with replacement from the given dataset 2. Train m classifiers

17

Bagging: Analysis

Error probability went down from 0.1 to 0.01!

Page 18: Neural Networks and Ensemble Learning · 2013. 12. 6. · Bagging: Details 1. Generate m new training datasets by sampling with replacement from the given dataset 2. Train m classifiers

18

Weighted Majority Voting

In practice, hypotheses rarely independent

Some hypotheses have less errors than others all votes are not equal!

Idea: Let’s take a weighted majority

How do we compute the weights?

Page 19: Neural Networks and Ensemble Learning · 2013. 12. 6. · Bagging: Details 1. Generate m new training datasets by sampling with replacement from the given dataset 2. Train m classifiers

19

Next Time

• Weighted Majority Ensemble Classification

• Boosting

• Survey of AI Applications

• To Do:

• Project 4 due tonight!

• Finish Chapter 18