12
Linear Classification: The Perceptron Robot Image Credit: Viktoriya Sukhanova © 123RF.com These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton’s slides and grateful acknowledgement to the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution.

Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

LinearClassification:ThePerceptron

RobotImageCredit:Viktoriya Sukhanova ©123RF.com

TheseslideswereassembledbyByronBoots,withonlyminormodificationsfromEricEaton’sslidesandgratefulacknowledgementtothemanyotherswhomadetheircoursematerialsfreelyavailableonline.Feelfreetoreuseoradapttheseslidesforyourownacademicpurposes,providedthatyouincludeproperattribution.

Page 2: Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

LinearClassifiers• Ahyperplane partitionsintotwohalf-spaces

– Definedbythenormalvector• isorthogonaltoanyvectorlyingonthehyperplane

– Assumedtopassthroughtheorigin• Thisisbecauseweincorporatedbiastermintoitby

• Considerclassificationwith+1,-1labels...

2

Rd

✓ 2 Rd

✓ 2 Rd

✓0 x0 = 1

BasedonslidebyPiyush Rai

Page 3: Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

LinearClassifiers• Linearclassifiers:representdecisionboundarybyhyperplane

– Notethat:

3

h(x) = sign(✓|x) sign(z) =

⇢1 if z � 0

�1 if z < 0where

x

| =⇥1 x1 . . . xd

⇤✓ =

2

6664

✓0✓1...✓d

3

7775

|x > 0 =) y = +1

|x < 0 =) y = �1

Page 4: Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

• Theperceptronusesthefollowingupdateruleeachtimeitreceivesanewtraininginstance

– Ifthepredictionmatchesthelabel,makenochange– Otherwise,adjustθ

h(x) = sign(✓|x) sign(z) =

⇢1 if z � 0

�1 if z < 0where

✓j ✓j �↵

2

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

ThePerceptron

4

(x(i), y(i))

either2or-2

Page 5: Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

✓j ✓j �↵

2

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

• Theperceptronusesthefollowingupdateruleeachtimeitreceivesanewtraininginstance

• Re-writeas(onlyuponmisclassification)

– Caneliminateα inthiscase,sinceitsonlyeffectistoscaleθbyaconstant,whichdoesn’taffectperformance

ThePerceptron

5

(x(i), y(i))

either2or-2

✓j ✓j + ↵y

(i)x

(i)j

PerceptronRule:Ifismisclassified,do✓ ✓ + y(i)x(i)

✓ ✓ + y(i)x(i)

Page 6: Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

✓old

+

WhythePerceptronUpdateWorks

6

x

x ✓old

+✓new

✓old

+misclassified

BasedonslidebyPiyush Rai

Page 7: Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

WhythePerceptronUpdateWorks• Considerthemisclassifiedexample(y =+1)– Perceptronwronglythinksthat

• Update:

• Notethat

• Therefore,islessnegativethan– So,wearemakingourselvesmorecorrect onthisexample!

7

|old

x < 0

new

= ✓

old

+ yx = ✓

old

+ x (since y = +1)✓

new

= ✓

old

+ yx = ✓

old

+ x (since y = +1)

|new

x = (✓old

+ x)|x= ✓

|old

x+ x

|x

|new

x = (✓old

+ x)|x= ✓

|old

x+ x

|x

kxk22 > 0

|new

x = (✓old

+ x)|x= ✓

|old

x+ x

|x

|old

x < 0

BasedonslidebyPiyush Rai

Page 8: Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

ThePerceptronCostFunction• Theperceptronusesthefollowingcostfunction

– is0ifthepredictioniscorrect– Otherwise,itistheconfidenceinthemisprediction

8

Jp(✓) =1

n

nX

i=1

max(0,�y

(i)x

(i)✓)

Jp(✓) =1

n

nX

i=1

max(0,�y

(i)x

(i)✓)

BasedonslidebyAlanFern

Page 9: Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

OnlinePerceptronAlgorithm

9BasedonslidebyAlanFern

1.) Let ✓ [0, 0, . . . , 0]2.) Repeat:

3.) Receive training example (x

(i), y(i))4.) if y(i)x(i)

✓ 0 // prediction is incorrect

5.) ✓ ✓ + y(i)x(i)

Onlinelearning– thelearningmodewherethemodelupdateisperformedeachtimeasingleobservationisreceived

Batchlearning– thelearningmodewherethemodelupdateisperformedafterobservingtheentiretrainingset

Page 10: Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

OnlinePerceptronAlgorithm

10BasedonslidebyAlanFern

Redpointsarelabeled+

Bluepointsarelabeled-

Page 11: Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

BatchPerceptron

11

1.) Given training data

�(x

(i), y(i)) n

i=12.) Let ✓ [0, 0, . . . , 0]2.) Repeat:

2.) Let � [0, 0, . . . , 0]3.) for i = 1 . . . n, do4.) if y(i)x(i)

✓ 0 // prediction for i

thinstance is incorrect

5.) � �+ y(i)x(i)

6.) � �/n // compute average update

6.) ✓ ✓ + ↵�8.) Until k�k2 < ✏

• Simplestcase:α=1anddon’tnormalize,yieldsthefixedincrementperceptron

• Guaranteedtofindaseparatinghyperplane ifoneexistsBasedonslidebyAlanFern

Page 12: Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

ImprovingthePerceptron• ThePerceptronproducesmanyθ‘s duringtraining• ThestandardPerceptronsimplyusesthefinalθ attesttime

– Thismaysometimesnotbeagoodidea!– Someotherθmaybecorrecton1,000consecutiveexamples,butonemistakeruinsit!

• Idea:Useacombinationofmultipleperceptrons– (i.e.,neuralnetworks!)

• Idea:Usetheintermediateθ‘s– VotedPerceptron:voteonpredictionsoftheintermediateθ‘s– AveragedPerceptron:averagetheintermediateθ‘s

12BasedonslidebyPiyush Rai