Linear Classification: The Perceptron - Penn … · Improving the Perceptron • The Perceptron...

Preview:

Citation preview

LinearClassification:ThePerceptron

RobotImageCredit:Viktoriya Sukhanova ©123RF.com

TheseslideswereassembledbyEricEaton,withgratefulacknowledgementofthemanyotherswhomadetheircoursematerialsfreelyavailableonline.Feelfreetoreuseoradapttheseslidesforyourownacademicpurposes,providedthatyouincludeproperattribution.PleasesendcommentsandcorrectionstoEric.

LinearClassifiers• Ahyperplane partitionsintotwohalf-spaces– Definedbythenormalvector• isorthogonaltoanyvectorlyingonthehyperplane

– Assumedtopassthroughtheorigin• Thisisbecauseweincorporatedbiastermintoitby

• Considerclassificationwith+1,-1labels...

2

Rd

✓ 2 Rd

✓ 2 Rd

✓0 x0 = 1

BasedonslidebyPiyush Rai

LinearClassifiers• Linearclassifiers:representdecisionboundarybyhyperplane

– Notethat:

3

h(x) = sign(✓|x) sign(z) =

⇢1 if z � 0

�1 if z < 0where

x

| =⇥1 x1 . . . xd

⇤✓ =

2

6664

✓0✓1...✓d

3

7775

|x > 0 =) y = +1

|x < 0 =) y = �1

h(x) = sign(✓|x) sign(z) =

⇢1 if z � 0

�1 if z < 0where

✓j ✓j �↵

2

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

• Theperceptronusesthefollowingupdateruleeachtimeitreceivesanewtraininginstance

– Ifthepredictionmatchesthelabel,makenochange– Otherwise,adjustθ

ThePerceptron

4

(x(i), y(i))

either2or-2

✓j ✓j �↵

2

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

• Theperceptronusesthefollowingupdateruleeachtimeitreceivesanewtraininginstance

• Re-writeas(onlyuponmisclassification)

– Caneliminateα inthiscase,sinceitsonlyeffectistoscaleθbyaconstant,whichdoesn’taffectperformance

ThePerceptron

5

(x(i), y(i))

either2or-2

✓j ✓j + ↵y

(i)x

(i)j

PerceptronRule:Ifismisclassified,do✓ ✓ + y(i)x(i)

✓ ✓ + y(i)x(i)

✓old

+

WhythePerceptronUpdateWorks

6

x

x ✓old

+✓new

✓old

+misclassified

BasedonslidebyPiyush Rai

WhythePerceptronUpdateWorks• Considerthemisclassifiedexample(y =+1)– Perceptronwronglythinksthat

• Update:

• Notethat

• Therefore,islessnegativethan– So,wearemakingourselvesmorecorrect onthisexample!

7

|old

x < 0

new

= ✓

old

+ yx = ✓

old

+ x (since y = +1)✓

new

= ✓

old

+ yx = ✓

old

+ x (since y = +1)

|new

x = (✓old

+ x)|x= ✓

|old

x+ x

|x

|new

x = (✓old

+ x)|x= ✓

|old

x+ x

|x

kxk22 > 0

|new

x = (✓old

+ x)|x= ✓

|old

x+ x

|x

|old

x < 0

BasedonslidebyPiyush Rai

• Predictioniscorrectif

• Couldhaveused0/1loss

whereis0ifthepredictioniscorrect,1otherwise

J0/1(✓) =1

n

nX

i=1

`(sign(✓Tx

(i)), y(i))

y(i)✓Tx

(i) > 0

ThePerceptronCostFunction

8

`()

Doesn’tproduceausefulgradient

BasedonslidebyAlanFern

max(0,�y(i)✓Tx

(i))

Jp(✓) =1

n

nX

i=1

max(0,�y(i)✓Tx

(i))

ThePerceptronCostFunction• Theperceptronusesthefollowingcostfunction

– is0ifthepredictioniscorrect– Otherwise,itistheconfidenceinthemisprediction

9

Nicegradient

BasedonslidebyAlanFern

OnlinePerceptronAlgorithm

10BasedonslidebyAlanFern

1.) Let ✓ [0, 0, . . . , 0]2.) Repeat:

3.) Receive training example (x

(i), y(i))4.) if y(i)x(i)

✓ 0 // prediction is incorrect

5.) ✓ ✓ + y(i)x(i)

Onlinelearning– thelearningmodewherethemodelupdateisperformedeachtimeasingleobservationisreceived

Batchlearning– thelearningmodewherethemodelupdateisperformedafterobservingtheentiretrainingset

OnlinePerceptronAlgorithm

11BasedonslidebyAlanFern

Seetheperceptroninaction:www.youtube.com/watch?v=vGwemZhPlsA

Redpointsarelabeled+

Bluepointsarelabeled-

BatchPerceptron

12

1.) Given training data

�(x

(i), y(i)) n

i=12.) Let ✓ [0, 0, . . . , 0]2.) Repeat:

2.) Let � [0, 0, . . . , 0]3.) for i = 1 . . . n, do4.) if y(i)x(i)

✓ 0 // prediction for i

thinstance is incorrect

5.) � �+ y(i)x(i)

6.) � �/n // compute average update

6.) ✓ ✓ + ↵�8.) Until k�k2 < ✏

• Simplestcase:α=1anddon’tnormalize,yieldsthefixedincrementperceptron

• Guaranteedtofindaseparatinghyperplane ifoneexistsBasedonslidebyAlanFern

ImprovingthePerceptron• ThePerceptronproducesmanyθ‘s duringtraining• ThestandardPerceptronsimplyusesthefinalθ attesttime– Thismaysometimesnotbeagoodidea!– Someotherθmaybecorrecton1,000consecutiveexamples,butonemistakeruinsit!

• Idea:Useacombinationofmultipleperceptrons– (i.e.,neuralnetworks!)

• Idea:Usetheintermediateθ‘s– VotedPerceptron:voteonpredictionsoftheintermediateθ‘s– AveragedPerceptron:averagetheintermediateθ‘s

13BasedonslidebyPiyush Rai

Recommended