Linear Classification: The Perceptron - Penn … · Improving the Perceptron • The Perceptron...

LinearClassification:ThePerceptron

RobotImageCredit:Viktoriya Sukhanova ©123RF.com

TheseslideswereassembledbyEricEaton,withgratefulacknowledgementofthemanyotherswhomadetheircoursematerialsfreelyavailableonline.Feelfreetoreuseoradapttheseslidesforyourownacademicpurposes,providedthatyouincludeproperattribution.PleasesendcommentsandcorrectionstoEric.

LinearClassifiers• Ahyperplane partitionsintotwohalf-spaces– Definedbythenormalvector• isorthogonaltoanyvectorlyingonthehyperplane

– Assumedtopassthroughtheorigin• Thisisbecauseweincorporatedbiastermintoitby

• Considerclassificationwith+1,-1labels...

✓ 2 Rd

✓0 x0 = 1

BasedonslidebyPiyush Rai

LinearClassifiers• Linearclassifiers:representdecisionboundarybyhyperplane

– Notethat:

h(x) = sign(✓|x) sign(z) =

⇢1 if z � 0

�1 if z < 0where

| =⇥1 x1 . . . xd

⇤✓ =

✓0✓1...✓d

|x > 0 =) y = +1

|x < 0 =) y = �1

h(x) = sign(✓|x) sign(z) =

⇢1 if z � 0

�1 if z < 0where

✓j ✓j �↵

⇣h✓

(i)⌘� y

(i)⌘x

• Theperceptronusesthefollowingupdateruleeachtimeitreceivesanewtraininginstance

– Ifthepredictionmatchesthelabel,makenochange– Otherwise,adjustθ

ThePerceptron

(x(i), y(i))

either2or-2

✓j ✓j �↵

⇣h✓

(i)⌘� y

(i)⌘x

• Theperceptronusesthefollowingupdateruleeachtimeitreceivesanewtraininginstance

• Re-writeas(onlyuponmisclassification)

– Caneliminateα inthiscase,sinceitsonlyeffectistoscaleθbyaconstant,whichdoesn’taffectperformance

ThePerceptron

(x(i), y(i))

either2or-2

✓j ✓j + ↵y

PerceptronRule:Ifismisclassified,do✓ ✓ + y(i)x(i)

✓ ✓ + y(i)x(i)

✓old

WhythePerceptronUpdateWorks

x ✓old

+✓new

✓old

+misclassified

WhythePerceptronUpdateWorks• Considerthemisclassifiedexample(y =+1)– Perceptronwronglythinksthat

• Update:

• Notethat

• Therefore,islessnegativethan– So,wearemakingourselvesmorecorrect onthisexample!

+ yx = ✓

+ x (since y = +1)✓

+ yx = ✓

+ x (since y = +1)

x = (✓old

+ x)|x= ✓

x = (✓old

+ x)|x= ✓

kxk22 > 0

x = (✓old

+ x)|x= ✓

• Predictioniscorrectif

• Couldhaveused0/1loss

whereis0ifthepredictioniscorrect,1otherwise

J0/1(✓) =1

`(sign(✓Tx

(i)), y(i))

y(i)✓Tx

(i) > 0

ThePerceptronCostFunction

Doesn’tproduceausefulgradient

BasedonslidebyAlanFern

max(0,�y(i)✓Tx

Jp(✓) =1

max(0,�y(i)✓Tx

ThePerceptronCostFunction• Theperceptronusesthefollowingcostfunction

– is0ifthepredictioniscorrect– Otherwise,itistheconfidenceinthemisprediction

Nicegradient

BasedonslidebyAlanFern

OnlinePerceptronAlgorithm

10BasedonslidebyAlanFern

1.) Let ✓ [0, 0, . . . , 0]2.) Repeat:

3.) Receive training example (x

(i), y(i))4.) if y(i)x(i)

✓ 0 // prediction is incorrect

5.) ✓ ✓ + y(i)x(i)

Onlinelearning– thelearningmodewherethemodelupdateisperformedeachtimeasingleobservationisreceived

Batchlearning– thelearningmodewherethemodelupdateisperformedafterobservingtheentiretrainingset

OnlinePerceptronAlgorithm

11BasedonslidebyAlanFern

Seetheperceptroninaction:www.youtube.com/watch?v=vGwemZhPlsA

Redpointsarelabeled+

Bluepointsarelabeled-

BatchPerceptron

1.) Given training data

(i), y(i)) n

i=12.) Let ✓ [0, 0, . . . , 0]2.) Repeat:

2.) Let � [0, 0, . . . , 0]3.) for i = 1 . . . n, do4.) if y(i)x(i)

✓ 0 // prediction for i

thinstance is incorrect

5.) � �+ y(i)x(i)

6.) � �/n // compute average update

6.) ✓ ✓ + ↵�8.) Until k�k2 < ✏

• Simplestcase:α=1anddon’tnormalize,yieldsthefixedincrementperceptron

• Guaranteedtofindaseparatinghyperplane ifoneexistsBasedonslidebyAlanFern

ImprovingthePerceptron• ThePerceptronproducesmanyθ‘s duringtraining• ThestandardPerceptronsimplyusesthefinalθ attesttime– Thismaysometimesnotbeagoodidea!– Someotherθmaybecorrecton1,000consecutiveexamples,butonemistakeruinsit!

• Idea:Useacombinationofmultipleperceptrons– (i.e.,neuralnetworks!)

• Idea:Usetheintermediateθ‘s– VotedPerceptron:voteonpredictionsoftheintermediateθ‘s– AveragedPerceptron:averagetheintermediateθ‘s

13BasedonslidebyPiyush Rai

Linear Classification: The Perceptron - Penn … · Improving the Perceptron • The Perceptron...

Documents

Introduction To Perceptron Networks · perceptron. The algorithm has been proved to converge (Haykin, 1994; Lippmann, 1987). The perceptron learning rule is illustrated in Fig. 3

Distributed perceptron

Introduction to AI and the perceptron

Multilayer Perceptron perceptron.pdf · Multilayer Perceptron ... input x belongs to C 1. Perceptron is cosmetically similar to logistic ... Learning Boolean XOR A simple perceptron

Computacion inteligente El perceptron. Nov 2005 2 Agenda History Perceptron structure Perceptron as a two-class classifier The Perceptron as a

Machine Learning Basics Lecture 3: Perceptron · Perceptron Algorithm •Assume for simplicity: all 𝑖 has length 1 Perceptron: figure from the lecture note of Nina Balcan

The Perceptron Mistake Bound - svivek

MULTILAYER PERCEPTRON

Structured Perceptron

Perhitungan Perceptron

Perceptron Multicapa

Practica Perceptron

Perceptron (c) Marcin Sydow Summary Perceptron

Ann Perceptron

The Perceptron Model

ANNs (Artificial Neural Networks). THE PERCEPTRON

Classification: Perceptron · Iterations of Perceptron 1. Randomly assign 𝜔 2. One iteration of the PLA (perceptron learning algorithm) where Ὄ , Ὅis a misclassified training

Perceptron Learning

Lecture4: Perceptron and ADALINE - mshdiau.ac.irghoshuni.mshdiau.ac.ir/ANN/Files/Chap4.pdf · Lecture4: Perceptron and ADALINE Dr. MjidMajid Gh h iGhoshuni 1 ... – The perceptron

The Structured Perceptron - Graham Neubig