13
Linear Classification: The Perceptron Robot Image Credit: Viktoriya Sukhanova © 123RF.com These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution. Please send comments and corrections to Eric.

Linear Classification: The Perceptron - Penn … · Improving the Perceptron • The Perceptron produces many θ‘s during training • The standard Perceptron simply uses the final

  • Upload
    haliem

  • View
    234

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Linear Classification: The Perceptron - Penn … · Improving the Perceptron • The Perceptron produces many θ‘s during training • The standard Perceptron simply uses the final

LinearClassification:ThePerceptron

RobotImageCredit:Viktoriya Sukhanova ©123RF.com

TheseslideswereassembledbyEricEaton,withgratefulacknowledgementofthemanyotherswhomadetheircoursematerialsfreelyavailableonline.Feelfreetoreuseoradapttheseslidesforyourownacademicpurposes,providedthatyouincludeproperattribution.PleasesendcommentsandcorrectionstoEric.

Page 2: Linear Classification: The Perceptron - Penn … · Improving the Perceptron • The Perceptron produces many θ‘s during training • The standard Perceptron simply uses the final

LinearClassifiers• Ahyperplane partitionsintotwohalf-spaces– Definedbythenormalvector• isorthogonaltoanyvectorlyingonthehyperplane

– Assumedtopassthroughtheorigin• Thisisbecauseweincorporatedbiastermintoitby

• Considerclassificationwith+1,-1labels...

2

Rd

✓ 2 Rd

✓ 2 Rd

✓0 x0 = 1

BasedonslidebyPiyush Rai

Page 3: Linear Classification: The Perceptron - Penn … · Improving the Perceptron • The Perceptron produces many θ‘s during training • The standard Perceptron simply uses the final

LinearClassifiers• Linearclassifiers:representdecisionboundarybyhyperplane

– Notethat:

3

h(x) = sign(✓|x) sign(z) =

⇢1 if z � 0

�1 if z < 0where

x

| =⇥1 x1 . . . xd

⇤✓ =

2

6664

✓0✓1...✓d

3

7775

|x > 0 =) y = +1

|x < 0 =) y = �1

Page 4: Linear Classification: The Perceptron - Penn … · Improving the Perceptron • The Perceptron produces many θ‘s during training • The standard Perceptron simply uses the final

h(x) = sign(✓|x) sign(z) =

⇢1 if z � 0

�1 if z < 0where

✓j ✓j �↵

2

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

• Theperceptronusesthefollowingupdateruleeachtimeitreceivesanewtraininginstance

– Ifthepredictionmatchesthelabel,makenochange– Otherwise,adjustθ

ThePerceptron

4

(x(i), y(i))

either2or-2

Page 5: Linear Classification: The Perceptron - Penn … · Improving the Perceptron • The Perceptron produces many θ‘s during training • The standard Perceptron simply uses the final

✓j ✓j �↵

2

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

• Theperceptronusesthefollowingupdateruleeachtimeitreceivesanewtraininginstance

• Re-writeas(onlyuponmisclassification)

– Caneliminateα inthiscase,sinceitsonlyeffectistoscaleθbyaconstant,whichdoesn’taffectperformance

ThePerceptron

5

(x(i), y(i))

either2or-2

✓j ✓j + ↵y

(i)x

(i)j

PerceptronRule:Ifismisclassified,do✓ ✓ + y(i)x(i)

✓ ✓ + y(i)x(i)

Page 6: Linear Classification: The Perceptron - Penn … · Improving the Perceptron • The Perceptron produces many θ‘s during training • The standard Perceptron simply uses the final

✓old

+

WhythePerceptronUpdateWorks

6

x

x ✓old

+✓new

✓old

+misclassified

BasedonslidebyPiyush Rai

Page 7: Linear Classification: The Perceptron - Penn … · Improving the Perceptron • The Perceptron produces many θ‘s during training • The standard Perceptron simply uses the final

WhythePerceptronUpdateWorks• Considerthemisclassifiedexample(y =+1)– Perceptronwronglythinksthat

• Update:

• Notethat

• Therefore,islessnegativethan– So,wearemakingourselvesmorecorrect onthisexample!

7

|old

x < 0

new

= ✓

old

+ yx = ✓

old

+ x (since y = +1)✓

new

= ✓

old

+ yx = ✓

old

+ x (since y = +1)

|new

x = (✓old

+ x)|x= ✓

|old

x+ x

|x

|new

x = (✓old

+ x)|x= ✓

|old

x+ x

|x

kxk22 > 0

|new

x = (✓old

+ x)|x= ✓

|old

x+ x

|x

|old

x < 0

BasedonslidebyPiyush Rai

Page 8: Linear Classification: The Perceptron - Penn … · Improving the Perceptron • The Perceptron produces many θ‘s during training • The standard Perceptron simply uses the final

• Predictioniscorrectif

• Couldhaveused0/1loss

whereis0ifthepredictioniscorrect,1otherwise

J0/1(✓) =1

n

nX

i=1

`(sign(✓Tx

(i)), y(i))

y(i)✓Tx

(i) > 0

ThePerceptronCostFunction

8

`()

Doesn’tproduceausefulgradient

BasedonslidebyAlanFern

Page 9: Linear Classification: The Perceptron - Penn … · Improving the Perceptron • The Perceptron produces many θ‘s during training • The standard Perceptron simply uses the final

max(0,�y(i)✓Tx

(i))

Jp(✓) =1

n

nX

i=1

max(0,�y(i)✓Tx

(i))

ThePerceptronCostFunction• Theperceptronusesthefollowingcostfunction

– is0ifthepredictioniscorrect– Otherwise,itistheconfidenceinthemisprediction

9

Nicegradient

BasedonslidebyAlanFern

Page 10: Linear Classification: The Perceptron - Penn … · Improving the Perceptron • The Perceptron produces many θ‘s during training • The standard Perceptron simply uses the final

OnlinePerceptronAlgorithm

10BasedonslidebyAlanFern

1.) Let ✓ [0, 0, . . . , 0]2.) Repeat:

3.) Receive training example (x

(i), y(i))4.) if y(i)x(i)

✓ 0 // prediction is incorrect

5.) ✓ ✓ + y(i)x(i)

Onlinelearning– thelearningmodewherethemodelupdateisperformedeachtimeasingleobservationisreceived

Batchlearning– thelearningmodewherethemodelupdateisperformedafterobservingtheentiretrainingset

Page 11: Linear Classification: The Perceptron - Penn … · Improving the Perceptron • The Perceptron produces many θ‘s during training • The standard Perceptron simply uses the final

OnlinePerceptronAlgorithm

11BasedonslidebyAlanFern

Seetheperceptroninaction:www.youtube.com/watch?v=vGwemZhPlsA

Redpointsarelabeled+

Bluepointsarelabeled-

Page 12: Linear Classification: The Perceptron - Penn … · Improving the Perceptron • The Perceptron produces many θ‘s during training • The standard Perceptron simply uses the final

BatchPerceptron

12

1.) Given training data

�(x

(i), y(i)) n

i=12.) Let ✓ [0, 0, . . . , 0]2.) Repeat:

2.) Let � [0, 0, . . . , 0]3.) for i = 1 . . . n, do4.) if y(i)x(i)

✓ 0 // prediction for i

thinstance is incorrect

5.) � �+ y(i)x(i)

6.) � �/n // compute average update

6.) ✓ ✓ + ↵�8.) Until k�k2 < ✏

• Simplestcase:α=1anddon’tnormalize,yieldsthefixedincrementperceptron

• Guaranteedtofindaseparatinghyperplane ifoneexistsBasedonslidebyAlanFern

Page 13: Linear Classification: The Perceptron - Penn … · Improving the Perceptron • The Perceptron produces many θ‘s during training • The standard Perceptron simply uses the final

ImprovingthePerceptron• ThePerceptronproducesmanyθ‘s duringtraining• ThestandardPerceptronsimplyusesthefinalθ attesttime– Thismaysometimesnotbeagoodidea!– Someotherθmaybecorrecton1,000consecutiveexamples,butonemistakeruinsit!

• Idea:Useacombinationofmultipleperceptrons– (i.e.,neuralnetworks!)

• Idea:Usetheintermediateθ‘s– VotedPerceptron:voteonpredictionsoftheintermediateθ‘s– AveragedPerceptron:averagetheintermediateθ‘s

13BasedonslidebyPiyush Rai