Nearest Neighbor Classificationsvivek.com/teaching/machine-learning/fall2018/... · dimensions –What might be intuitive for 2 or 3 dimensions do not always apply to high dimensional

MachineLearning

NearestNeighborClassification

1

Thislecture

• K-nearestneighborclassification– Thebasicalgorithm– Differentdistancemeasures– Somepracticalaspects

• Voronoi DiagramsandDecisionBoundaries– Whatisthehypothesisspace?

• TheCurseofDimensionality

2

Thislecture




3

Howwouldyoucolortheblankcircles?

A

B

C

4

Howwouldyoucolortheblankcircles?B

C

Ifwebaseditonthecoloroftheirnearestneighbors,wewouldgetA:BlueB:RedC:Red

5

A

Trainingdatapartitionstheentireinstancespace(usinglabelsofnearestneighbors)

6

NearestNeighbors:Thebasicversion

• Trainingexamplesarevectorsxi associatedwithalabelyi– E.g.xi =afeaturevectorforanemail,yi =SPAM

• Learning:Juststoreallthetrainingexamples

• Prediction foranewexamplex– Findthetrainingexamplexi thatisclosest tox– Predictthelabelofx tothelabelyi associatedwithxi

7

K-NearestNeighbors



• Prediction foranewexamplex– Findthekclosest trainingexamplestox– Constructthelabelofx usingthesekpoints.How?– Forclassification:?

8

K-NearestNeighbors



• Prediction foranewexamplex– Findthekclosest trainingexamplestox– Constructthelabelofx usingthesekpoints.How?– Forclassification:Everyneighborvotesonthelabel.Predictthemost

frequentlabelamongtheneighbors.– Forregression:?

9

K-NearestNeighbors



• Prediction foranewexamplex– Findthekclosest trainingexamplestox– Constructthelabelofx usingthesekpoints.How?– Forclassification:Everyneighborvotesonthelabel.Predictthemost

frequentlabelamongtheneighbors.– Forregression:Predictthemeanvalue

10

Instancebasedlearning

• Aclassoflearningmethods– Learning:Storingexampleswithlabels– Prediction:Whenpresentedanewexample,classifythelabelsusing

similar storedexamples

• K-nearestneighborsalgorithmisanexampleofthisclassofmethods

• Alsocalledlazylearning,becausemostofthecomputation(inthesimplestcase,allcomputation)isperformedonlyatpredictiontime

11Questions?

Distancebetweeninstances

• Ingeneral,agoodplacetoinjectknowledgeaboutthedomain

• Behaviorofthisapproachcandependonthis

• Howdowemeasuredistancesbetweeninstances?

12


Numericfeatures,representedasndimensionalvectors

13


Numericfeatures,representedasndimensionalvectors– Euclideandistance

– Manhattandistance

– Lp-norm• Euclidean=L2• Manhattan=L1• Exercise:WhatisL1?

14





15





16


Whataboutsymbolic/categoricalfeatures?

17


Symbolic/categoricalfeatures

MostcommondistanceistheHammingdistance– Numberofbitsthataredifferent– Or:Numberoffeaturesthathaveadifferentvalue– Alsocalledtheoverlap– Example:

X1:{Shape=Triangle,Color=Red,Location=Left,Orientation=Up}X2:{Shape=Triangle,Color=Blue,Location=Left,Orientation=Down}

Hammingdistance=2

18

Advantages

• Trainingisveryfast– Justaddinglabeledinstancestoalist– Morecomplexindexingmethodscanbeused,whichslowdown

learningslightlytomakepredictionfaster

• Canlearnverycomplexfunctions

• Wealwayshavethetrainingdata– Forotherlearningalgorithms,aftertraining,wedon’tstorethedata

anymore.Whatifwewanttodosomethingwithitlater…

19

Disadvantages

• Needsalotofstorage– Isthisreallyaproblemnow?

• Predictioncanbeslow!– Naïvely:O(dN)forNtrainingexamplesinddimensions– Moredatawillmakeitslower– Comparetootherclassifiers,wherepredictionisveryfast

• Nearestneighborsarefooledbyirrelevantattributes– Importantandsubtle

20

Questions?

Summary:K-NearestNeighbors• Probablythefirst“machinelearning”algorithm

– Guarantee:Ifthereareenoughtrainingexamples,theerrorofthenearestneighborclassifierwillconvergetotheerroroftheoptimal(i.e.bestpossible)predictor

• Inpractice,useanoddK.Why?– Tobreakties

• HowtochooseK?Usingaheld-outsetorbycross-validation

• Featurenormalizationcouldbeimportant– Often,goodideatocenterthefeaturestomakethemzeromeanandunitstandard

deviation.Why?– Becausedifferentfeaturescouldhavedifferentscales(weight,height,etc);butthe

distanceweightsthemequally

• Variantsexist– Neighbors’labelscouldbeweightedbytheirdistance

21









22









23









24









25

Wherearewe?




26

Wherearewe?




27

ThedecisionboundaryforKNN

IstheKnearestneighborsalgorithmexplicitlybuildingafunction?

28

ThedecisionboundaryforKNN

IstheKnearestneighborsalgorithmexplicitlybuildingafunction?– No,itneverformsanexplicithypothesis

Butwecanstillask:Givenatrainingsetwhatistheimplicitfunctionthatisbeingcomputed

29

TheVoronoi Diagram

30

Foranypointx inatrainingsetS,theVoronoi Cellofx isapolyhedronconsistingofallpointsclosertox thananyotherpointsinS

TheVoronoi diagramistheunionofallVoronoi cells• Coverstheentirespace

TheVoronoi Diagram

31

Foranypointx inatrainingsetS,theVoronoi Cellofx isapolyhedronconsistingofallpointsclosertox thananyotherpointsinS


Voronoi diagramsoftrainingexamples

32

Foranypointx inatrainingsetS,theVoronoi Cellofx isapolytopeconsistingofallpointsclosertox thananyotherpointsinS


PointsintheVoronoicellofatrainingexampleareclosertoitthananyothers


33

Foranypointx inatrainingsetS,theVoronoi Cellofx isapolytopeconsistingofallpointsclosertox thananyotherpointsinS




34


PictureusesEuclideandistancewith1-nearestneighbors.

WhataboutK-nearestneighbors?

Alsopartitionsthespace,butmuchmorecomplexdecisionboundary


35


PictureusesEuclideandistancewith1-nearestneighbors.

WhataboutK-nearestneighbors?

Alsopartitionsthespace,butmuchmorecomplexdecisionboundary

Whataboutpointsontheboundary? Whatlabelwilltheyget?

Exercise

Ifyouhaveonlytwotrainingpoints,whatwillthedecisionboundaryfor1-nearestneighborbe?

36

Exercise

Ifyouhaveonlytwotrainingpoints,whatwillthedecisionboundaryfor1-nearestneighborbe?

– Alinebisectingthetwopoints

37

Thislecture




38

Whyyourclassifiermightgowrong

Twoimportantconsiderationswithlearningalgorithms

• Overfitting:Wehavealreadyseenthis

• Thecurseofdimensionality– Methodsthatworkwithlowdimensionalspacesmayfailinhigh

dimensions– Whatmightbeintuitivefor2or3dimensionsdonotalwaysapplyto

highdimensionalspaces

39

Checkoutthe1884bookFlatland:ARomanceofManyDimensions forafunintroductiontothefourthdimension

Ofcourse,irrelevantattributeswillhurt

Supposewehave1000dimensionalfeaturevectors– Butonly10featuresarerelevant

– Distanceswillbedominatedbythelargenumberofirrelevantfeatures

40

Ofcourse,irrelevantattributeswillhurt

Supposewehave1000dimensionalfeaturevectors– Butonly10featuresarerelevant

– Distanceswillbedominatedbythelargenumberofirrelevantfeatures

41

Butevenwithonlyrelevantattributes,highdimensionalspacesbehaveinoddways

TheCurseofDimensionality

Example1:Whatfractionofthepointsinacubelieoutsidethesphereinscribedinit?

42

Intuitionsthatarebasedon2or3dimensionalspacesdonotalwayscarryovertohighdimensionalspaces

Whatfractionofthesquare(i.e thecube)isoutsidetheinscribedcircle(i.e thesphere)intwodimensions?

2r

Intwodimensions



43



2r

Intwodimensions



44



2r

Intwodimensions

But,distancesdonotbehavethesamewayinhighdimensions




2r

Inthreedimensions

45

Whatfractionofthecubeisoutsidetheinscribedsphereinthreedimensions?





2r

Inthreedimensions

46

Whatfractionofthecubeisoutsidetheinscribedsphereinthreedimensions?

Asthedimensionalityincreases,thisfractionapproaches1!!

Inhighdimensions,mostofthevolumeofthecubeisfarawayfromthecenter!


Example2:Whatfractionofthevolumeofaunitsphereliesbetweenradius1- ² andradius1?


47




48

Intwodimensions




49

Intwodimensions Whatfractionoftheareaofthecircleisintheblueregion?




50





51






52

But,distancesdonotbehavethesamewayinhighdimensionsInddimensions,thefractionis

Asdincreases,thisfractiongoesto1!

Inhighdimensions,mostofthevolumeofthesphereisfarawayfromthecenter!

Questions?


• Mostofthepointsinhighdimensionalspacesarefarawayfromtheorigin!– In2or3dimensions,mostpointsarenearthecenter– Needmoredatato“fillupthespace”

• Badnewsfornearestneighborclassificationinhighdimensionalspaces

Evenifmost/allfeaturesarerelevant,inhighdimensionalspaces,mostpointsareequallyfarfromeachother!

“Neighborhood”becomesverylarge

Presentscomputationalproblemstoo

53

Dealingwiththecurseofdimensionality

• Most“real-world”dataisnotuniformlydistributedinthehighdimensionalspace– Differentwaysofcapturingtheunderlyingdimensionality ofthespace

• Eg:Dimensionalityreductiontechniques,manifoldlearning

• Featureselection isanart– Differentmethodsexist– Selectfeatures,maybebyinformationgain– Tryoutdifferentfeaturesetsofdifferentsizesandpickagoodset

basedonavalidationset

• Priorknowledgeorpreferencesaboutthehypothesescanalsohelp

54

Questions?

Summary:Nearestneighborsclassification

• Probablytheoldestandsimplestlearningalgorithm– Predictionisexpensive.

• Efficientdatastructureshelp.k-Dtrees:themostpopular,workswellinlowdimensions

• Approximatenearestneighborsmaybegoodenoughsometimes.Hashingbasedalgorithmsexist

• Requiresadistancemeasurebetweeninstances– Metriclearning:Learnthe“right”distanceforyourproblem

• PartitionsthespaceintoaVoronoi Diagram

• Bewarethecurseofdimensionality

55

Questions?

Exercises

1. WhatwillhappenwhenyouchooseKtothenumberoftrainingexamples?

2. Supposeyouwanttobuildanearestneighborsclassifiertopredictwhetherabeverageisacoffeeorateausingtwofeatures:thevolumeoftheliquid(inmilliliters)andthecaffeinecontent(ingrams).Youcollectthefollowingdata:

56

Volume(ml)

Caffeine(g) Label

238 0.026 Tea

100 0.011 Tea

120 0.040 Coffee

237 0.095 Coffee

WhatisthelabelforatestpointwithVolume=120,Caffeine=0.013?

Whymightthisbeincorrect?

Howwouldyoufixtheproblem?

Exercises

1. WhatwillhappenwhenyouchooseKtothenumberoftrainingexamples?

2. Supposeyouwanttobuildanearestneighborsclassifiertopredictwhetherabeverageisacoffeeorateausingtwofeatures:thevolumeoftheliquid(inmilliliters)andthecaffeinecontent(ingrams).Youcollectthefollowingdata:

57

Volume(ml)

Caffeine(g) Label

238 0.026 Tea

100 0.011 Tea

120 0.040 Coffee

237 0.095 Coffee

WhatisthelabelforatestpointwithVolume=120,Caffeine=0.013?

Whymightthisbeincorrect?

Howwouldyoufixtheproblem?

Thelabelwillalwaysbethemostcommonlabelinthetrainingdata

Coffee

BecauseVolumewilldominatethedistance

Rescalethefeatures.Maybetozeromean,unitvariance

Documents

Nearest Neighbor Classificationsvivek.com/teaching/machine-learning/fall2018/... · dimensions –What might be intuitive for 2 or 3 dimensions do not always apply to high dimensional