27
Greg Grudic Intro AI 1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic

Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

  • Upload
    kateb

  • View
    51

  • Download
    0

Embed Size (px)

DESCRIPTION

Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm. Greg Grudic. Questions?. Binary Classification. A binary classifier is a mapping from a set of d inputs to a single output which can take on one of TWO values In the most general setting - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 1

Introduction to Artificial IntelligenceCSCI 3202:

The Perceptron Algorithm

Greg Grudic

Page 2: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 2

Questions?

Page 3: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 3

Binary Classification

• A binary classifier is a mapping from a set of d inputs to a single output which can take on one of TWO values

• In the most general setting

• Specifying the output classes as -1 and +1 is arbitrary!– Often done as a mathematical convenience

{ }inputs:output:

1, 1

d

yx Î ÂÎ - +

Page 4: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 4

A Binary Classifier

ClassificationModelx ˆ 1, 1y

Given learning data: ( ) ( )1 1, ,..., ,N Ny yx x

A model is constructed:

( )M x

Page 5: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 5

Linear Separating Hyper-Planes

1x

2x

01

0d

i ii

xb b=

+ £å0

1

0d

i ii

xb b=

+ >å

01

0d

i ii

xb b=

+ =å

1y=-

1y=+

Page 6: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 6

Linear Separating Hyper-Planes

• The Model:

• Where:

• The decision boundary:

( )0 1ˆ ˆ ˆˆ ( ) sgn ,..., T

dy M b b bé ù= = +ê úë ûx x

[ ] 1 if 0sgn

1 otherwiseA

Aì >ïï=íï -ïî

( )0 1ˆ ˆ ˆ,..., 0T

db b b+ =x

Page 7: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 7

Linear Separating Hyper-Planes

• The model parameters are:

• The hat on the betas means that they are estimated from the data

• Many different learning algorithms have been proposed for determining

( )0 1ˆ ˆ ˆ, ,..., db b b

( )0 1ˆ ˆ ˆ, ,..., db b b

Page 8: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 8

Rosenblatt’s Preceptron Learning Algorithm

• Dates back to the 1950’s and is the motivation behind Neural Networks

• The algorithm:– Start with a random hyperplane– Incrementally modify the hyperplane such that

points that are misclassified move closer to the correct side of the boundary

– Stop when all learning examples are correctly classified

( )0 1ˆ ˆ ˆ, ,..., db b b

Page 9: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 9

Rosenblatt’s Preceptron Learning Algorithm

• The algorithm is based on the following property:– Signed distance of any point to the boundary is:

• Therefore, if is the set of misclassified learning examples, we can push them closer to the boundary by minimizing the following

( ) ( )( )0 1 0 1ˆ ˆ ˆ ˆ ˆ ˆ, ,..., ,..., T

d i d ii M

D yb b b b b bÎ

=- +å x

( ) ( )0 1

0 1

2

1

ˆ ˆ ˆ,...,ˆ ˆ ˆ,...,

ˆ

Td T

dd

ii

dx

xb b b

b b bb

=

+= µ +æ ö÷ç ÷ç ÷ç ÷è øå

x

M

Page 10: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 10

Rosenblatt’s Minimization Function

• This is classic Machine Learning!• First define a cost function in model

parameter space

• Then find an algorithm that modifies such that this cost function is minimized

• One such algorithm is Gradient Descent

( )0 1 01

ˆ ˆ ˆ ˆ ˆ, ,...,d

d i k iki M k

D y xb b b b bÎ =

æ ö÷ç=- + ÷ç ÷ç ÷çè øå å( )0 1

ˆ ˆ ˆ, ,..., db b b

Page 11: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 11

Gradient Descent

( )0 1ˆ ˆ ˆ, ,..., dD b b b =

0b̂ = 1̂b =

Page 12: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 12

The Gradient Descent Algorithm

( )0 1ˆ ˆ ˆ, ,...,

ˆ ˆˆ

d

i ii

D b b bb b r

b

¶¬ -

Where the learning rate is defined by: 0r >

Page 13: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 13

The Gradient Descent Algorithm for the Perceptron

0 0

11 1

ˆ ˆ

ˆ ˆ

ˆ ˆ

i

i i

i idd d

yy x

y x

b bb b r

b b

æ ö æ ö æ ö÷ ÷ç ç ÷÷ ÷ çç ç ÷÷ ÷ çç ç ÷÷ ÷ çç ç ÷÷ ÷ çç ç ÷÷ ÷ çç ç ÷÷ ÷¬ - ç ÷ç ç÷ ÷ ç ÷÷ ÷ç ç ÷÷ ÷ çç ç ÷÷ ÷ çç ç ÷÷ ÷ çç ç ÷÷ç÷ ÷ çè øç ç÷ ÷ç ç ÷÷ ÷è ø è øç ç÷ ÷

MM M

( )0 1

0

ˆ ˆ ˆ, ,...,ˆ

d

ii M

Dy

b b b

b Î

¶=-

¶ å( )0 1

ˆ ˆ ˆ, ,...,, 1,...,ˆ

d

i iji Mj

Dy x j d

b b b

b Î

¶=- =

¶ å

0 0

11 1

ˆ ˆ

ˆ ˆ

ˆ ˆ

ii M

i ii M

d d i idi M

y

y x

y x

b bb b r

b b

Î

Î

Î

æ ö- ÷ç ÷æ ö æ ö ç ÷÷ ÷ çç ç ÷÷ ÷ çç ç ÷÷ ÷ çç ç ÷÷ ÷ -çç ç ÷÷ ÷ ç ÷ç ç÷ ÷ ÷çç ç÷ ÷¬ - ÷çç ç÷ ÷ ÷ç÷ ÷ç ç ÷÷ ÷ çç ç ÷÷ ÷ çç ç ÷÷ ÷ çç ç ÷÷ ÷ ç ÷ç ç÷ ÷è ø è ø - ÷ç ÷ç ÷çè ø

åå

åM M M

Update One misclassified point at a time (online)

Update all misclassified points at once (batch)

Two Versions of the Perceptron Algorithm:

Page 14: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 14

The Learning Data

• Matrix Representation of N learning examples of d dimensional inputs

11 1 1

1

, d

N Nd N

x x yX Y

x x y

æ ö æ ö÷ ÷ç ç÷ ÷ç ç÷ ÷ç ç÷ ÷= =ç ç÷ ÷ç ç÷ ÷ç ç÷ ÷÷ ÷ç çè ø è ø

KM O M M

L

( ) ( )1 1, ,..., ,N Ny yx xTraining Data:

Page 15: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 15

The Good Theoretical Properties of the Perceptron Algorithm

• If a solution exists the algorithm will always converge in a finite number of steps!

• Question: Does a solution always exist?

Page 16: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 16

Linearly Separable Data

• Which of these datasets are separable by a linear boundary?

+

a) b)

++

--

-

+

+

-

-

-

Page 17: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 17

Linearly Separable Data

• Which of these datasets are separable by a linear boundary?

+

a) b)

++

--

-

+

+

-

-

- NotLinearly

Separable!

Page 18: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 18

Bad Theoretical Properties of the Perceptron Algorithm

• If the data is not linearly separable, algorithm cycles forever!– Cannot converge!– This property “stopped” active research in this area between

1968 and 1984…• Perceptrons, Minsky and Pappert, 1969

• Even when the data is separable, there are infinitely many solutions– Which solution is best?

• When data is linearly separable, the number of steps to converge can be very large (depends on size of gap between classes)

Page 19: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 19

What about Nonlinear Data?

• Data that is not linearly separable is called nonlinear data

• Nonlinear data can often be mapped into a nonlinear space where it is linearly separable

Page 20: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 20

Nonlinear Models

• The Linear Model:

• The Nonlinear (basis function) Model:

• Examples of Nonlinear Basis Functions:

01

ˆ ˆˆ ( ) sgnd

i ii

y M xb b=

é ùê ú= = +ê úë ûåx

( )01

ˆ ˆˆ ( ) sgnk

i ii

y M b bf=

é ùê ú= = +ê úë ûåx x

( ) ( ) ( ) ( ) ( )2 21 1 2 2 3 1 2 4 55sinx x x x xff ff= = = =x x x x

Page 21: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 21

Linear Separating Hyper-Planes In Nonlinear Basis Function Space

1f

2f

01

0k

i ii

b bf=

+ £å0

1

0k

i ii

b bf=

+ >å

01

0k

i ii

b bf=

+ =å

1y=-

1y=+

Page 22: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 22

An Example

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

x1

x 2

: y=+1: y=-1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 = x1

2

2 = x

22

: y=+1: y=-1

F

Page 23: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 23

Kernels as Nonlinear Transformations

• Polynomial

• Sigmoid

• Gaussian or Radial Basis Function (RBF)

( ) ( )( ) ( )( ) 2

2

, ,

, tanh ,

1, exp2

k

i j i j

i j i j

i j i j

K q

K

K

x x x x

x x x x

x x x x

k q

s

= += +

æ ö÷ç= - - ÷ç ÷÷çè ø

Page 24: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 24

The Kernel Model

( )01

ˆ ˆˆ ( ) sgn ,N

i ii

y M Kx x xb b=

é ùê ú= = +ê úë ûå

( ) ( )1 1, ,..., ,N Ny yx xTraining Data:

The number of basis functions equals the number of training examples!

- Unless some of the beta’s get set to zero…

Page 25: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 25

Gram (Kernel) Matrix

( ) ( )

( ) ( )

1 1 1

1

, ,

, ,

N

N N N

K KK

K K

æ ö÷ç ÷ç ÷ç ÷=ç ÷ç ÷ç ÷ç ÷÷çè ø

x x x x

x x x x

KM O M

L

( ) ( )1 1, ,..., ,N Ny yx xTraining Data:

Properties:•Positive Definite Matrix

•Symmetric•Positive on diagonal

•N by N

Page 26: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 26

Picking a Model Structure?• How do you pick the Kernels?

– Kernel parameters• These are called learning parameters or

hyperparamters– Two approaches choosing learning paramters

• Bayesian– Learning parameters must maximize probability of correct

classification on future data based on prior biases• Frequentist

– Use the training data to learn the model parameters – Use validation data to pick the best hyperparameters.

• More on learning parameter selection later

( )0 1ˆ ˆ ˆ, ,..., db b b

Page 27: Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm

Greg Grudic Intro AI 27

Perceptron Algorithm Convergence

• Two problems:– No convergence when data is not separable in

basis function space– Gives infinitely many solutions when data is

separable• Can we modify the algorithm to fix these

problems?