Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
MachineLearning
NearestNeighborClassification
1
Thislecture
• K-nearestneighborclassification– Thebasicalgorithm– Differentdistancemeasures– Somepracticalaspects
• Voronoi DiagramsandDecisionBoundaries– Whatisthehypothesisspace?
• TheCurseofDimensionality
2
Thislecture
• K-nearestneighborclassification– Thebasicalgorithm– Differentdistancemeasures– Somepracticalaspects
• Voronoi DiagramsandDecisionBoundaries– Whatisthehypothesisspace?
• TheCurseofDimensionality
3
Howwouldyoucolortheblankcircles?
A
B
C
4
Howwouldyoucolortheblankcircles?B
C
Ifwebaseditonthecoloroftheirnearestneighbors,wewouldgetA:BlueB:RedC:Red
5
A
Trainingdatapartitionstheentireinstancespace(usinglabelsofnearestneighbors)
6
NearestNeighbors:Thebasicversion
• Trainingexamplesarevectorsxi associatedwithalabelyi– E.g.xi =afeaturevectorforanemail,yi =SPAM
• Learning:Juststoreallthetrainingexamples
• Prediction foranewexamplex– Findthetrainingexamplexi thatisclosest tox– Predictthelabelofx tothelabelyi associatedwithxi
7
K-NearestNeighbors
• Trainingexamplesarevectorsxi associatedwithalabelyi– E.g.xi =afeaturevectorforanemail,yi =SPAM
• Learning:Juststoreallthetrainingexamples
• Prediction foranewexamplex– Findthekclosest trainingexamplestox– Constructthelabelofx usingthesekpoints.How?– Forclassification:?
8
K-NearestNeighbors
• Trainingexamplesarevectorsxi associatedwithalabelyi– E.g.xi =afeaturevectorforanemail,yi =SPAM
• Learning:Juststoreallthetrainingexamples
• Prediction foranewexamplex– Findthekclosest trainingexamplestox– Constructthelabelofx usingthesekpoints.How?– Forclassification:Everyneighborvotesonthelabel.Predictthemost
frequentlabelamongtheneighbors.– Forregression:?
9
K-NearestNeighbors
• Trainingexamplesarevectorsxi associatedwithalabelyi– E.g.xi =afeaturevectorforanemail,yi =SPAM
• Learning:Juststoreallthetrainingexamples
• Prediction foranewexamplex– Findthekclosest trainingexamplestox– Constructthelabelofx usingthesekpoints.How?– Forclassification:Everyneighborvotesonthelabel.Predictthemost
frequentlabelamongtheneighbors.– Forregression:Predictthemeanvalue
10
Instancebasedlearning
• Aclassoflearningmethods– Learning:Storingexampleswithlabels– Prediction:Whenpresentedanewexample,classifythelabelsusing
similar storedexamples
• K-nearestneighborsalgorithmisanexampleofthisclassofmethods
• Alsocalledlazylearning,becausemostofthecomputation(inthesimplestcase,allcomputation)isperformedonlyatpredictiontime
11Questions?
Distancebetweeninstances
• Ingeneral,agoodplacetoinjectknowledgeaboutthedomain
• Behaviorofthisapproachcandependonthis
• Howdowemeasuredistancesbetweeninstances?
12
Distancebetweeninstances
Numericfeatures,representedasndimensionalvectors
13
Distancebetweeninstances
Numericfeatures,representedasndimensionalvectors– Euclideandistance
– Manhattandistance
– Lp-norm• Euclidean=L2• Manhattan=L1• Exercise:WhatisL1?
14
Distancebetweeninstances
Numericfeatures,representedasndimensionalvectors– Euclideandistance
– Manhattandistance
– Lp-norm• Euclidean=L2• Manhattan=L1• Exercise:WhatisL1?
15
Distancebetweeninstances
Numericfeatures,representedasndimensionalvectors– Euclideandistance
– Manhattandistance
– Lp-norm• Euclidean=L2• Manhattan=L1• Exercise:WhatisL1?
16
Distancebetweeninstances
Whataboutsymbolic/categoricalfeatures?
17
Distancebetweeninstances
Symbolic/categoricalfeatures
MostcommondistanceistheHammingdistance– Numberofbitsthataredifferent– Or:Numberoffeaturesthathaveadifferentvalue– Alsocalledtheoverlap– Example:
X1:{Shape=Triangle,Color=Red,Location=Left,Orientation=Up}X2:{Shape=Triangle,Color=Blue,Location=Left,Orientation=Down}
Hammingdistance=2
18
Advantages
• Trainingisveryfast– Justaddinglabeledinstancestoalist– Morecomplexindexingmethodscanbeused,whichslowdown
learningslightlytomakepredictionfaster
• Canlearnverycomplexfunctions
• Wealwayshavethetrainingdata– Forotherlearningalgorithms,aftertraining,wedon’tstorethedata
anymore.Whatifwewanttodosomethingwithitlater…
19
Disadvantages
• Needsalotofstorage– Isthisreallyaproblemnow?
• Predictioncanbeslow!– Naïvely:O(dN)forNtrainingexamplesinddimensions– Moredatawillmakeitslower– Comparetootherclassifiers,wherepredictionisveryfast
• Nearestneighborsarefooledbyirrelevantattributes– Importantandsubtle
20
Questions?
Summary:K-NearestNeighbors• Probablythefirst“machinelearning”algorithm
– Guarantee:Ifthereareenoughtrainingexamples,theerrorofthenearestneighborclassifierwillconvergetotheerroroftheoptimal(i.e.bestpossible)predictor
• Inpractice,useanoddK.Why?– Tobreakties
• HowtochooseK?Usingaheld-outsetorbycross-validation
• Featurenormalizationcouldbeimportant– Often,goodideatocenterthefeaturestomakethemzeromeanandunitstandard
deviation.Why?– Becausedifferentfeaturescouldhavedifferentscales(weight,height,etc);butthe
distanceweightsthemequally
• Variantsexist– Neighbors’labelscouldbeweightedbytheirdistance
21
Summary:K-NearestNeighbors• Probablythefirst“machinelearning”algorithm
– Guarantee:Ifthereareenoughtrainingexamples,theerrorofthenearestneighborclassifierwillconvergetotheerroroftheoptimal(i.e.bestpossible)predictor
• Inpractice,useanoddK.Why?– Tobreakties
• HowtochooseK?Usingaheld-outsetorbycross-validation
• Featurenormalizationcouldbeimportant– Often,goodideatocenterthefeaturestomakethemzeromeanandunitstandard
deviation.Why?– Becausedifferentfeaturescouldhavedifferentscales(weight,height,etc);butthe
distanceweightsthemequally
• Variantsexist– Neighbors’labelscouldbeweightedbytheirdistance
22
Summary:K-NearestNeighbors• Probablythefirst“machinelearning”algorithm
– Guarantee:Ifthereareenoughtrainingexamples,theerrorofthenearestneighborclassifierwillconvergetotheerroroftheoptimal(i.e.bestpossible)predictor
• Inpractice,useanoddK.Why?– Tobreakties
• HowtochooseK?Usingaheld-outsetorbycross-validation
• Featurenormalizationcouldbeimportant– Often,goodideatocenterthefeaturestomakethemzeromeanandunitstandard
deviation.Why?– Becausedifferentfeaturescouldhavedifferentscales(weight,height,etc);butthe
distanceweightsthemequally
• Variantsexist– Neighbors’labelscouldbeweightedbytheirdistance
23
Summary:K-NearestNeighbors• Probablythefirst“machinelearning”algorithm
– Guarantee:Ifthereareenoughtrainingexamples,theerrorofthenearestneighborclassifierwillconvergetotheerroroftheoptimal(i.e.bestpossible)predictor
• Inpractice,useanoddK.Why?– Tobreakties
• HowtochooseK?Usingaheld-outsetorbycross-validation
• Featurenormalizationcouldbeimportant– Often,goodideatocenterthefeaturestomakethemzeromeanandunitstandard
deviation.Why?– Becausedifferentfeaturescouldhavedifferentscales(weight,height,etc);butthe
distanceweightsthemequally
• Variantsexist– Neighbors’labelscouldbeweightedbytheirdistance
24
Summary:K-NearestNeighbors• Probablythefirst“machinelearning”algorithm
– Guarantee:Ifthereareenoughtrainingexamples,theerrorofthenearestneighborclassifierwillconvergetotheerroroftheoptimal(i.e.bestpossible)predictor
• Inpractice,useanoddK.Why?– Tobreakties
• HowtochooseK?Usingaheld-outsetorbycross-validation
• Featurenormalizationcouldbeimportant– Often,goodideatocenterthefeaturestomakethemzeromeanandunitstandard
deviation.Why?– Becausedifferentfeaturescouldhavedifferentscales(weight,height,etc);butthe
distanceweightsthemequally
• Variantsexist– Neighbors’labelscouldbeweightedbytheirdistance
25
Wherearewe?
• K-nearestneighborclassification– Thebasicalgorithm– Differentdistancemeasures– Somepracticalaspects
• Voronoi DiagramsandDecisionBoundaries– Whatisthehypothesisspace?
• TheCurseofDimensionality
26
Wherearewe?
• K-nearestneighborclassification– Thebasicalgorithm– Differentdistancemeasures– Somepracticalaspects
• Voronoi DiagramsandDecisionBoundaries– Whatisthehypothesisspace?
• TheCurseofDimensionality
27
ThedecisionboundaryforKNN
IstheKnearestneighborsalgorithmexplicitlybuildingafunction?
28
ThedecisionboundaryforKNN
IstheKnearestneighborsalgorithmexplicitlybuildingafunction?– No,itneverformsanexplicithypothesis
Butwecanstillask:Givenatrainingsetwhatistheimplicitfunctionthatisbeingcomputed
29
TheVoronoi Diagram
30
Foranypointx inatrainingsetS,theVoronoi Cellofx isapolyhedronconsistingofallpointsclosertox thananyotherpointsinS
TheVoronoi diagramistheunionofallVoronoi cells• Coverstheentirespace
TheVoronoi Diagram
31
Foranypointx inatrainingsetS,theVoronoi Cellofx isapolyhedronconsistingofallpointsclosertox thananyotherpointsinS
TheVoronoi diagramistheunionofallVoronoi cells• Coverstheentirespace
Voronoi diagramsoftrainingexamples
32
Foranypointx inatrainingsetS,theVoronoi Cellofx isapolytopeconsistingofallpointsclosertox thananyotherpointsinS
TheVoronoi diagramistheunionofallVoronoi cells• Coverstheentirespace
PointsintheVoronoicellofatrainingexampleareclosertoitthananyothers
Voronoi diagramsoftrainingexamples
33
Foranypointx inatrainingsetS,theVoronoi Cellofx isapolytopeconsistingofallpointsclosertox thananyotherpointsinS
TheVoronoi diagramistheunionofallVoronoi cells• Coverstheentirespace
PointsintheVoronoicellofatrainingexampleareclosertoitthananyothers
Voronoi diagramsoftrainingexamples
34
PointsintheVoronoicellofatrainingexampleareclosertoitthananyothers
PictureusesEuclideandistancewith1-nearestneighbors.
WhataboutK-nearestneighbors?
Alsopartitionsthespace,butmuchmorecomplexdecisionboundary
Voronoi diagramsoftrainingexamples
35
PointsintheVoronoicellofatrainingexampleareclosertoitthananyothers
PictureusesEuclideandistancewith1-nearestneighbors.
WhataboutK-nearestneighbors?
Alsopartitionsthespace,butmuchmorecomplexdecisionboundary
Whataboutpointsontheboundary? Whatlabelwilltheyget?
Exercise
Ifyouhaveonlytwotrainingpoints,whatwillthedecisionboundaryfor1-nearestneighborbe?
36
Exercise
Ifyouhaveonlytwotrainingpoints,whatwillthedecisionboundaryfor1-nearestneighborbe?
– Alinebisectingthetwopoints
37
Thislecture
• K-nearestneighborclassification– Thebasicalgorithm– Differentdistancemeasures– Somepracticalaspects
• Voronoi DiagramsandDecisionBoundaries– Whatisthehypothesisspace?
• TheCurseofDimensionality
38
Whyyourclassifiermightgowrong
Twoimportantconsiderationswithlearningalgorithms
• Overfitting:Wehavealreadyseenthis
• Thecurseofdimensionality– Methodsthatworkwithlowdimensionalspacesmayfailinhigh
dimensions– Whatmightbeintuitivefor2or3dimensionsdonotalwaysapplyto
highdimensionalspaces
39
Checkoutthe1884bookFlatland:ARomanceofManyDimensions forafunintroductiontothefourthdimension
Ofcourse,irrelevantattributeswillhurt
Supposewehave1000dimensionalfeaturevectors– Butonly10featuresarerelevant
– Distanceswillbedominatedbythelargenumberofirrelevantfeatures
40
Ofcourse,irrelevantattributeswillhurt
Supposewehave1000dimensionalfeaturevectors– Butonly10featuresarerelevant
– Distanceswillbedominatedbythelargenumberofirrelevantfeatures
41
Butevenwithonlyrelevantattributes,highdimensionalspacesbehaveinoddways
TheCurseofDimensionality
Example1:Whatfractionofthepointsinacubelieoutsidethesphereinscribedinit?
42
Intuitionsthatarebasedon2or3dimensionalspacesdonotalwayscarryovertohighdimensionalspaces
Whatfractionofthesquare(i.e thecube)isoutsidetheinscribedcircle(i.e thesphere)intwodimensions?
2r
Intwodimensions
TheCurseofDimensionality
Example1:Whatfractionofthepointsinacubelieoutsidethesphereinscribedinit?
43
Intuitionsthatarebasedon2or3dimensionalspacesdonotalwayscarryovertohighdimensionalspaces
Whatfractionofthesquare(i.e thecube)isoutsidetheinscribedcircle(i.e thesphere)intwodimensions?
2r
Intwodimensions
TheCurseofDimensionality
Example1:Whatfractionofthepointsinacubelieoutsidethesphereinscribedinit?
44
Intuitionsthatarebasedon2or3dimensionalspacesdonotalwayscarryovertohighdimensionalspaces
Whatfractionofthesquare(i.e thecube)isoutsidetheinscribedcircle(i.e thesphere)intwodimensions?
2r
Intwodimensions
But,distancesdonotbehavethesamewayinhighdimensions
TheCurseofDimensionality
Example1:Whatfractionofthepointsinacubelieoutsidethesphereinscribedinit?
Intuitionsthatarebasedon2or3dimensionalspacesdonotalwayscarryovertohighdimensionalspaces
2r
Inthreedimensions
45
Whatfractionofthecubeisoutsidetheinscribedsphereinthreedimensions?
But,distancesdonotbehavethesamewayinhighdimensions
TheCurseofDimensionality
Example1:Whatfractionofthepointsinacubelieoutsidethesphereinscribedinit?
Intuitionsthatarebasedon2or3dimensionalspacesdonotalwayscarryovertohighdimensionalspaces
2r
Inthreedimensions
46
Whatfractionofthecubeisoutsidetheinscribedsphereinthreedimensions?
Asthedimensionalityincreases,thisfractionapproaches1!!
Inhighdimensions,mostofthevolumeofthecubeisfarawayfromthecenter!
TheCurseofDimensionality
Example2:Whatfractionofthevolumeofaunitsphereliesbetweenradius1- ² andradius1?
Intuitionsthatarebasedon2or3dimensionalspacesdonotalwayscarryovertohighdimensionalspaces
47
TheCurseofDimensionality
Example2:Whatfractionofthevolumeofaunitsphereliesbetweenradius1- ² andradius1?
Intuitionsthatarebasedon2or3dimensionalspacesdonotalwayscarryovertohighdimensionalspaces
48
Intwodimensions
TheCurseofDimensionality
Example2:Whatfractionofthevolumeofaunitsphereliesbetweenradius1- ² andradius1?
Intuitionsthatarebasedon2or3dimensionalspacesdonotalwayscarryovertohighdimensionalspaces
49
Intwodimensions Whatfractionoftheareaofthecircleisintheblueregion?
TheCurseofDimensionality
Example2:Whatfractionofthevolumeofaunitsphereliesbetweenradius1- ² andradius1?
Intuitionsthatarebasedon2or3dimensionalspacesdonotalwayscarryovertohighdimensionalspaces
50
Intwodimensions Whatfractionoftheareaofthecircleisintheblueregion?
TheCurseofDimensionality
Example2:Whatfractionofthevolumeofaunitsphereliesbetweenradius1- ² andradius1?
Intuitionsthatarebasedon2or3dimensionalspacesdonotalwayscarryovertohighdimensionalspaces
51
But,distancesdonotbehavethesamewayinhighdimensions
Intwodimensions Whatfractionoftheareaofthecircleisintheblueregion?
TheCurseofDimensionality
Example2:Whatfractionofthevolumeofaunitsphereliesbetweenradius1- ² andradius1?
Intuitionsthatarebasedon2or3dimensionalspacesdonotalwayscarryovertohighdimensionalspaces
52
But,distancesdonotbehavethesamewayinhighdimensionsInddimensions,thefractionis
Asdincreases,thisfractiongoesto1!
Inhighdimensions,mostofthevolumeofthesphereisfarawayfromthecenter!
Questions?
TheCurseofDimensionality
• Mostofthepointsinhighdimensionalspacesarefarawayfromtheorigin!– In2or3dimensions,mostpointsarenearthecenter– Needmoredatato“fillupthespace”
• Badnewsfornearestneighborclassificationinhighdimensionalspaces
Evenifmost/allfeaturesarerelevant,inhighdimensionalspaces,mostpointsareequallyfarfromeachother!
“Neighborhood”becomesverylarge
Presentscomputationalproblemstoo
53
Dealingwiththecurseofdimensionality
• Most“real-world”dataisnotuniformlydistributedinthehighdimensionalspace– Differentwaysofcapturingtheunderlyingdimensionality ofthespace
• Eg:Dimensionalityreductiontechniques,manifoldlearning
• Featureselection isanart– Differentmethodsexist– Selectfeatures,maybebyinformationgain– Tryoutdifferentfeaturesetsofdifferentsizesandpickagoodset
basedonavalidationset
• Priorknowledgeorpreferencesaboutthehypothesescanalsohelp
54
Questions?
Summary:Nearestneighborsclassification
• Probablytheoldestandsimplestlearningalgorithm– Predictionisexpensive.
• Efficientdatastructureshelp.k-Dtrees:themostpopular,workswellinlowdimensions
• Approximatenearestneighborsmaybegoodenoughsometimes.Hashingbasedalgorithmsexist
• Requiresadistancemeasurebetweeninstances– Metriclearning:Learnthe“right”distanceforyourproblem
• PartitionsthespaceintoaVoronoi Diagram
• Bewarethecurseofdimensionality
55
Questions?
Exercises
1. WhatwillhappenwhenyouchooseKtothenumberoftrainingexamples?
2. Supposeyouwanttobuildanearestneighborsclassifiertopredictwhetherabeverageisacoffeeorateausingtwofeatures:thevolumeoftheliquid(inmilliliters)andthecaffeinecontent(ingrams).Youcollectthefollowingdata:
56
Volume(ml)
Caffeine(g) Label
238 0.026 Tea
100 0.011 Tea
120 0.040 Coffee
237 0.095 Coffee
WhatisthelabelforatestpointwithVolume=120,Caffeine=0.013?
Whymightthisbeincorrect?
Howwouldyoufixtheproblem?
Exercises
1. WhatwillhappenwhenyouchooseKtothenumberoftrainingexamples?
2. Supposeyouwanttobuildanearestneighborsclassifiertopredictwhetherabeverageisacoffeeorateausingtwofeatures:thevolumeoftheliquid(inmilliliters)andthecaffeinecontent(ingrams).Youcollectthefollowingdata:
57
Volume(ml)
Caffeine(g) Label
238 0.026 Tea
100 0.011 Tea
120 0.040 Coffee
237 0.095 Coffee
WhatisthelabelforatestpointwithVolume=120,Caffeine=0.013?
Whymightthisbeincorrect?
Howwouldyoufixtheproblem?
Thelabelwillalwaysbethemostcommonlabelinthetrainingdata
Coffee
BecauseVolumewilldominatethedistance
Rescalethefeatures.Maybetozeromean,unitvariance