Neural Networks and Ensemble Learning · 2013. 12. 6. · Bagging: Details 1. Generate m new...

CSE 473

Lecture 28 (Chapter 18)

Neural Networks and

Ensemble Learning

What if you want your neural network to predict continuous

outputs rather than +1/-1 (i.e., perform regression)?

E.g., Teaching a network to drive

Image Source: Wikimedia Commons

Input nodes

Sigmoid output function:

Parameter controls

the slope

)()( i

T uwggv uw

u = (u1 u2 u3)T

Output

Continuous Outputs with Sigmoid Networks

Learning the weights

Given: Training data (input u, desired output d)

Problem: How do we learn the weights w?

Idea: Minimize squared error between network’s output and desired output:

2)()( vdE w

)( uw gvwhere

Starting from random values for w, want to

change w so that E(w) is minimized – How?

)( uw g

u = (u1 u2 u3)T

Learning by Gradient-Descent (opposite of “Hill-Climbing”)

Change w so that E(w) is minimized

• Use Gradient Descent: Change w in proportion to –dE/dw (why?)

)()(2)(2 gvdd

Derivative of sigmoid

delta = error

Also known as the “delta rule” or

“LMS (least mean square) rule”

2)()( vdE w )( uw gv

But wait!

This rule is for a one layer network

• One layer networks are not that interesting!!

(remember XOR?)

What if we have multiple

layers?

Learning Multilayer Networks

1),( i

i vdE wW

Start with random weights W, w

Given input vector u, network

produces output vector v

Use gradient descent to find W

and w that minimize total error

over all output units (labeled i):

))(( k

jii uwgWgv

This leads to the famous “Backpropagation” learning rule

Backpropagation: Output Weights

xxWgvddW

{delta rule}

jii xWgv

Learning rule for hidden-output weights W:

1),( i

i vdE wW

{gradient descent}

Backpropagation: Hidden Weights )( j

i xWgv

uuwgWxWgvddw

)()()(

: But {chain rule}

kjj uwgx

Learning rule for input-hidden weights w:

1),( i

i vdE wW

Examples: Pole Balancing and Backing up a Truck (courtesy of Keith Grochow)

• Neural network learns to balance a pole on a cart

• Input: xcart, vcart, θpole, vpole

• Output: New force on cart

• Network learns to back a truck into a loading dock

• Input: x, y, θ of truck

• Output: Steering angle

θpole

Ensemble Learning

Sometimes each learning technique yields a different “hypothesis” (function)

But no perfect hypothesis…

Could we combine several imperfect hypotheses to get a better hypothesis?

Why Ensemble Learning?

Wisdom of the Crowds…

Example

This line is one simple classifier saying that

everything to the left is + and everything to the right is -

Combine 3 linear classifiers

More complex classifier

Analogies:

• Elections combine voters’ choices to pick a good candidate (hopefully)

• Committees combine experts’ opinions to make better decisions

• Students working together on a capstone project

Intuitions:

Individuals make mistakes but the “majority” may be less likely to

Individuals often have partial knowledge; a committee can pool expertise to make better decisions

Ensemble Learning: Motivation

Ensemble Technique 1: Bagging

Combine hypotheses (classifiers) via majority voting

Bagging: Details

1. Generate m new training datasets by sampling with

replacement from the given dataset

2. Train m classifiers h1,…,hm (e.g., decision trees),

one from each newly generated dataset

3. Classify a new input by running it through the m

classifiers and choosing the class that receives the

most “votes”

Example: Random forest = Bagging with m

decision tree classifiers, each tree constructed

from random subset of attributes

Bagging: Analysis

Error probability went down from 0.1 to 0.01!

Weighted Majority Voting

In practice, hypotheses rarely independent

Some hypotheses have less errors than others all votes are not equal!

Idea: Let’s take a weighted majority

How do we compute the weights?

Next Time

• Weighted Majority Ensemble Classification

• Boosting

• Survey of AI Applications

• To Do:

• Project 4 due tonight!

• Finish Chapter 18

Neural Networks and Ensemble Learning · 2013. 12. 6. · Bagging: Details 1. Generate m new...

Documents

Boosting e Bagging

Ensemble learning - CSUcs545/fall13/dokuwiki/lib/...Breiman, Leo (1996). "Bagging predictors”. Machine Learning 24 (2): 123–140. Bagging Comments: How to combine the classifiers

Advanced classifiers

Social Bagging 101

Linear Classifiers

Bagging manual

FORM FILL & SEAL BAGGING TECHNOLOGY FORM FILL & SEAL BAGGING ASSAC Catalogu… · ASSAC S10/15 ASSAC L10 ASSAC F10 FFS AUTOMATIC BAGGING MACHINES MODELS ASSAC The ASSAC Models bagging

Gaussian Classifiers

Weighing Bagging and palletizing line from M&J Machinery Engineer CO.,Ltd

OBSERVATIONS ON BAGGING - Statistics Departmentstat.wharton.upenn.edu/~buja/PAPERS/sinica-bagging-buja-stuetzle.pdf · 1 OBSERVATIONS ON BAGGING Andreas Buja and Werner Stuetzle University

Dynamic Selection of Classifiers - UFPRSelection of classifiers A single or an ensemble of classifiers can be selected. Static: performed during training, the same selected classifiers

M ORE C LASSIFIERS. A GENDA Key concepts for all classifiers Precision vs recall Biased sample sets Linear classifiers Intro to neural networks

Machine Learning Techniques for Data Miningtaghi/classes/cap6673/book/ML_part_VII.pdf · Bagging classifiers model generation Let n be the number of instances in the training data

Bagging the Bull

Bagging Films En

More Classifiers

Vacuum Bagging Techniques - Jamestown Distributors...Vacuum Bagging Techniques A guide to the principles and practical application of vacuum bagging for ... differential (clamping

Autobag 550 Bagging Machine – Next generation bagging for ... … · Autobag® 550™ Bagging Machine – Next generation bagging for short-run and build-on-demand operations The

Contract-Bagging …

Bayes Classifiers