© Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric...

Preview:

Citation preview

© Eric Xing @ CMU, 2006-2010

Machine LearningMachine Learning

Introduction & Introduction &

Nonparametric ClassifiersNonparametric Classifiers

Eric XingEric Xing

Lecture 1, August 12, 2010

Reading:

© Eric Xing @ CMU, 2006-2010

Where does it come from: http://www.cs.cmu.edu/~epxing/Class/10701/ http://www.cs.cmu.edu/~epxing/Class/10708/

Machine Learning

© Eric Xing @ CMU, 2006-2010

Logistics

Text book Chris Bishop, Pattern Recognition and Machine Learning (required) Tom Mitchell, Machine Learning

David Mackay, Information Theory, Inference, and Learning Algorithms Daphnie Koller and Nir Friedman, Probabilistic Graphical Models

Class resource http://bcmi.sjtu.edu.cn/ds/

Host: Lu Baoliang, Shanghai JiaoTong University Xue Xiangyang, Fudan University

Instructors:

© Eric Xing @ CMU, 2006-2010

Local Hosts and co-instructors

© Eric Xing @ CMU, 2006-2010

Logistics

Class mailing list Adv.ml@gmail.com

Home work

Exam

Project

© Eric Xing @ CMU, 2006-2010

Apoptosis + Medicine

What is Learning

Grammatical rulesManufacturing proceduresNatural laws…

Inference

Learning is about seeking a predictive and/or executable understanding of natural/artificial subjects, phenomena, or activities from …

© Eric Xing @ CMU, 2006-2010

Machine Learning

© Eric Xing @ CMU, 2006-2010

Fetching a stapler from inside an office --- the Stanford STAIR robot

© Eric Xing @ CMU, 2006-2010

What is Machine Learning?

Machine Learning seeks to develop theories and computer systems for

representing; classifying, clustering and recognizing; reasoning under uncertainty; predicting; and reacting to …

complex, real world data, based on the system's own experience with data,

and (hopefully) under a unified model or mathematical framework, that

can be formally characterized and analyzed can take into account human prior knowledge can generalize and adapt across data and domains can operate automatically and autonomously and can be interpreted and perceived by human.

© Eric Xing @ CMU, 2006-2010

Where Machine Learning is being used or can be useful?

Speech recognitionSpeech recognition

Information retrievalInformation retrieval

Computer visionComputer vision

Robotic controlRobotic control

PlanningPlanning

GamesGames

EvolutionEvolution

PedigreePedigree

© Eric Xing @ CMU, 2006-2010

Natural language processing and speech recognition

Now most pocket Speech Recognizers or Translators are running on some sort of learning device --- the more you play/use them, the smarter they become!

© Eric Xing @ CMU, 2006-2010

Object Recognition

Behind a security camera, most likely there is a computer that is learning and/or checking!

© Eric Xing @ CMU, 2006-2010

Robotic Control I

The best helicopter pilot is now a computer! it runs a program that learns how to fly and make acrobatic maneuvers by itself! no taped instructions, joysticks, or things like …

A. Ng 2005

© Eric Xing @ CMU, 2006-2010

Text Mining

Reading, digesting, and categorizing a vast text database is too much for human!

We want:

© Eric Xing @ CMU, 2006-2010

Bioinformatics

tgtatcccagcatattttacgtaaaaacaaaacggtaatgcgaacataacttatttattggggcccggaccgcaaaccggccaaacgcgtttgcacccataaaaacataagggcaacaaaaaaattgttaagctgttgtttatttttgcaatcgaaaatcgaaagcgctcaaatagctgcgatcactcgggagcagggtaaagtcgcctcgaaacaggaagctgaagcatcttctataaatacactcaaagcgatcattccgaggcgagtctggttagaaatttacatggactgcaaaaaggtatagccccacaaactgcgtctccacatcgctgcgtttcggcagctaattgccttttagaaattattttcccatttcgagaaactcgtgtgggatgccggatgcggctttcaatcacttctggcccgggatcggattgggtcacattgtctgcgggctctattgtctcgatccgcggatttaaggcgcagttcgcgtgcttagcggtcagaaaggcagagattcggttcggattgatgcgctggcagcagggcacaaagatctaatgactggcaaatcgctacaaataaattaaagtccggcggctaattaatgagcggactgaagccactttggcggagttcattaaccaaaaaacagcagataaacaaaaacggcaaagaaaattgccacagagttgtcacgctttgttgcacaaacatttgtgcagaaaagtgaaaagcttttagccattattaagtttttcctcagctcgctggcagcacttgcgaatgtaaattgaaactgatgttcctcataaatgaaaattaatgtttgctctacgctccaccgaactcgcttgtttgggggattggctggctaatcgcggctagatcccaggcggtataaccttttcgcttcatcagttgtgaaaccagatggctggtgttttggcaacgacagccagcggactcccctcgaacgctctcgaaatcaagtggctttccagccggcccgctgggccgctcgcccactggaccggtattcccaggccaggccacactgtaccgcaccgcataatcctcgccagactcggcgctgataaggcccaatgtcaagaccgcactccgcaggcgtctatttatgccaaggaccgttcttcttcagctttcggctcgagtatttgttgtgccatgttggttacgatgccaatcgcggtacagttatgcaaatgagcagcgaataccgctcactgacaatgaacggcgtcttgtcaaccattcatattcatgctgacattcatattcattcctttggttttttgtcttcgacggactgaaaagtgcggagagaaacccaaaaacagaagcgcgcaaagcgccgttaatatgcgaactcagcgaactcattgaagttatcacaacaccatatccataaccatacccatatccatatcaatatcaatatcgctattattaacgatcatgctctgctgatcaagtattcagcgctgcgctagattcgacagattgaatcgagctcaatagactcaacagactccactcgacagatgcgcaatgccaaggacaattgccgtgtgtaagtggagtaaacgaggcgtatgcgcaacctgcacctggcggacgcggcgtatgcgcaatgtgcaattcgcttaccttctcgttgcgggtcaggaactcccagatgggaatggccgatgacgagctgatctgaatgtggaaggcgcccagcaggcagcaggccaagattactttcgccgcagtcgtcatggtgtcgttgctgcttttatgttgcgtactccgcactacacggagagttcaggggattcgtgctccgtgatctgtgatccgtgttccgtgggtcaattgcacggttcggttgtgtaaccttcgtgtggcgactttctttttttttagggcccaataaaagcgcttttgtggcggcttgatagattatcacttggtttcggtggctagccaagtggctttcttctgtccgacgcacttaattgaattaaccaaacaacgagcgtggccaattcgtattatcgctgttttcgggtgtacgtgtgtctcagcttgaaacgcaaaagcttgtttcacacatcggtttctcggcaagatgggggagtcagtcggtctagggagaggggcgcccaccagtcgatcacgaaaacggcgaattccaagcgaaacggaaacggagcgagcactataatcttgcagtactatgtcgaacaaccgatcgcggcgatgtcagtgagtcgtcttcggacagcgctggcgctccacacgtatttaagctctgagatcggctttgggagagcgcagagagcgccatcgcacggcagagcgaaagcggcagtgagcgaaagcgagagggagagcggcagcgggtgggggatcgggagccccccgaaaaaaacagaggcgcacgtcgatgccatcggggaattggaacctcaatgtgtgggaatgtttaaatattctgtgttaggtagtgtagtttcatagactatagattctcatacagattgtatctgagagtccttcgagccgattatacacgacagcaaaatatttcagtcgcgcttgggcaaaaggcttaagcacgactcccagtccccccttacatttgtcttcctaagcccctggagccactatcaaacttgttctacgcttgcactgaaaatagaaattaaaaaccaaagtaaacaatcaaaaagaccaaaaacaataacaaccagcaccgagtcgaacatcagtgaggcattgcaaaaatttcaaagtcaagtttgcgtcgtcatcgcgtctgagtccgatcaagccgggcttgtaattgaagttgttgatgagccgcgggattactggattgtggcgaattctggtcagcatacttaacagcagcccgctaattaagcaaaataaacatatcaaattccagaatgcgacggcgccatcatcctgtttgggaattcaattcgcgggcagatcgtttaattcaattaaaaggtagaagtcggtaaaagggagcagaagaatgcgatcgctggaatttcctaacatcacggaccccataaatttgataagcccgagctcgctgcgttgagtcagccaccccacatccccaaatccccgccaaaagaagacagctgggttgttgactcgccagattgaggtgcggattgcagtggagtggacctggtcaaagaagcaccgttaatgtgctgattccattcgattccatccgggaatgcgataaagaaaggctctgatccaagcaactgcaatccggatttcgattttctctttccatttggttttgtatttacgtacttatgggaaagcattctaatgaagacttggagaagacttacgttatattcagaccatcgtgcgatagaggatgagtcatttccatatggccgaaatttattatgtttactatcgtttttagaggtgttttttggacttaccaaaagaggcatttgttttctgggattattcaactgaaaagatatttaaattttttcttggaccattttcaaggttccggatatatttgaaacacactagctagcagtgttggtaagttacatgtatttctataatgtcatattcctttgtccgtattcaaatcgaatactccacatctcacaggatgttgtacttgaggaattggcgatcgtagcgatttcccccgccgtaaagttcctgatcctcgttgtttttgtacatcataaagtccggattctgctcgtcgccgaagatgggaacgaagctgccaaagctgagagtctgcttgaggtgctggtccggacggcgtcccagctggataaccttgctgtacagatcggcatctgcctggagggcacgatcgaaatccttccagtggacgaacttcacctgctcgctgggaatagcgttgttgtcaagcagctcaaggagcgtattcgagttgacgggctgcaccacgttaagttcctgctccttcgctggggattcccctgcgggtaagcgccgcttgcttggactcgtttccaaatcccatagccacgccagcagaggagtaacagagctcwhereisthegenetgattaaaaatatcctttaagaaagcccatgggtataacttacgtgctcactgcgtcctatgcgaggaatggtctttaggttctttatggcaaagttctcgcctcgcttgcccagccgcggtacgttcttggtgatctttaggaagaatcctggactactgtcgtctgcctggcttatggccacaagacccaccaagagcgaaacagacaggactgttatgattctcatgctgatgcgactgaagcttcacctgactcctgctccacaattggtggcctttatatagcgagatccacccgcatcttgcgtggaatagaaatgcgggtgactccaggaattagcattatcgatcggaaagtgcaaagcagataaaactgaactaacctgacctaaatgcctggccataattaagtgcatacatacacattacattacttacatttgtataagaactaaattttatagtacataccacttgcgtatgtaaatgcttgtcttttctcttatatacgttttataaatttgtatcccagcatattttacgtaaaaacaaaacggtaatgcgaacataacttatttattggggcccggaccgcaaaccggccaaacgcgtttgcacccataaaaacataagggcaacaaaaaaattgttaagctgttgtttatttttgcaatcgaaaatcgaaagcgctcaaatagctgcgatcactcgggagcagggtaaagtcgcctcgaaacaggaagctgaagcatcttctataaatacactcaaagcgatcattccgaggcgagtctggttagaaatttacatggactgcaaaaaggtatagccccacaaactgcgtctccacatcgctgcgtttcggcagctaattgccttttagaaattattttcccatttcgagaaactcgtgtgggatgccggatgcggctttcaatcacttctggcccgggatcggattgggtcacattgtctgcgggctctattgtctcgatccgcggatttaaggcgcagttcgcgtgcttagcggtcagaaaggcagagattcggttcggattgatgcgctggcagcagggcacaaagatctaatgactggcaaatcgctacaaataaattaaagtccggcggctaattaatgagcggactgaagccactttggcggagttcattaaccaaaaaacagcagataaacaaaaacggcaaagaaaattgccacagagttgtcacgctttgttgcacaaacatttgtgcagaaaagtgaaaagcttttagccattattaagtttttcctcagctcgctggcagcacttgcgaatgtaaattgaaactgatgttcctcataaatgaaaattaatgtttgctctacgctccaccgaactcgcttgtttgggggattggctggctaatcgcggctagatcccaggcggtataaccttttcgcttcatcagttgtgaaaccagatggctggtgttttggcaacgacagccagcggactcccctcgaacgctctcgaaatcaagtggctttccagccggcccgctgggccgctcgcccactggaccggtattcccaggccaggccacactgtaccgcaccgcataatcctcgccagactcggcgctgataaggcccaatgtcaagaccgcactccgcaggcgtctatttatgccaaggaccgttcttcttcagctttcggctcgagtatttgttgtgccatgttggttacgatgccaatcgcggtacagttatgcaaatgagcagcgaataccgctcactgacaatgaacggcgtcttgtcaaccattcatattcatgctgacattcatattcattcctttggttttttgtcttcgacggactgaaaagtgcggagagaaacccaaaaacagaagcgcgcaaagcgccgttaatatgcgaactcagcgaactcattgaagttatcacaacaccatatccataaccatacccatatccatatcaatatcaatatcgctattattaacgatcatgctctgctgatcaagtattcagcgctgcgctagattcgacagattgaatcgagctcaatagactcaacagactccactcgacagatgcgcaatgccaaggacaattgccgtgtgtaagtggagtaaacgaggcgtatgcgcaacctgcacctggcggacgcggcgtatgcgcaatgtgcaattcgcttaccttctcgttgcgggtcaggaactcccagatgggaatggccgatgacgagctgatctgaatgtggaaggcgcccagcaggcagcaggccaagattactttcgccgcagtcgtcatggtgtcgttgctgcttttatgttgcgtactccgcactacacggagagttcaggggattcgtgctccgtgatctgtgatccgtgttccgtgggtcaattgcacggttcggttgtgtaaccttcgtgtggcgactttctttttttttagggcccaataaaagcgcttttgtggcggcttgatagattatcacttggtttcggtggctagccaagtggctttcttctgtccgacgcacttaattgaattaaccaaacaacgagcgtggccaattcgtattatcgctgttttcgggtgtacgtgtgtctcagcttgaaacgcaaaagcttgtttcacacatcggtttctcggcaagatgggggagtcagtcggtctagggagaggggcgcccaccagtcgatcacgaaaacggcgaattccaagcgaaacggaaacggagcgagcactataatcttgcagtactatgtcgaacaaccgatcgcggcgatgtcagtgagtcgtcttcggacagcgctggcgctccacacgtatttaagctctgagatcggctttgggagagcgcagagagcgccatcgcacggcagagcgaaagcggcagtgagcgaaagcgagagggagagcggcagcgggtgggggatcgggagccccccgaaaaaaacagaggcgcacgtcgatgccatcggggaattggaacctcaatgtgtgggaatgtttaaatattctgtgttaggtagtgtagtttcatagactatagattctcatacagattgtatctgagagtccttcgagccgattatacacgacagcaaaatatttcagtcgcgcttgggcaaaaggcttaagcacgactcccagtccccccttacatttgtcttcctaagcccctggagccactatcaaacttgttctacgcttgcactgaaaatagaaattaaaaaccaaagtaaacaatcaaaaagaccaaaaacaataacaaccagcaccgagtcgaacatcagtgaggcattgcaaaaatttcaaagtcaagtttgcgtcgtcatcgcgtctgagtccgatcaagccgggcttgtaattgaagttgttgatgagccgcgggattactggattgtggcgaattctggtcagcatacttaacagcagcccgctaattaagcaaaataaacatatcaaattccagaatgcgacggcgccatcatcctgtttgggaattcaattcgcgggcagatcgtttaattcaattaaaaggtagaagtcggtaaaagggagcagaagaatgcgatcgctggaatttcctaacatcacggaccccataaatttgataagcccgagctcgctgcgttgagtcagccaccccacatccccaaatccccgccaaaagaagacagctgggttgttgactcgccagattgaggtgcggattgcagtggagtggacctggtcaaagaagcaccgttaatgtgctgattccattcgattccatccgggaatgcgataaagaaaggctctgatccaagcaactgcaatccggatttcgattttctctttccatttggttttgtatttacgtacttatgggaaagcattctaatgaagacttggagaagacttacgttatattcagaccatcgtgcgatagaggatgagtcatttccatatggccgaaatttattatgtttactatcgtttttagaggtgttttttggacttaccaaaagaggcatttgttttctgggattattcaactgaaaagatatttaaattttttcttggaccattttcaaggttccggatatatttgaaacacactagctagcagtgttggtaagttacatgtatttctataatgtcatattcctttgtccgtattcaaatcgaatactccacatctcacaggatgttgtacttgaggaattggcgatcgtagcgatttcccccgccgtaaagttcctgatcctcgttgtttttgtacatcataaagtccggattctgctcgtcgccgaagatgggaacgaagctgccaaagctgagagtctgcttgaggtgctggtccggacggcgtcccagctggataaccttgctgtacagatcggcatctgcctggagggcacgatcgaaatccttccagtggacgaacttcacctgctcgctgggaatagcgttgttgtcaagcagctcaaggagcgtattcgagttgacgggctgcaccacgttaagttcctgctccttcgctggggattcccctgcgggtaagcgccgcttgcttggactcgtttccaaatcccatagccacgccagcagaggagtaacagagctctgaaaacagttcatggtttaaaaatatcctttaagaaagcccatgggtataacttacgtgctcactgcgtcctatgcgaggaatggtctttaggttctttatggcaaagttctcgcctcgcttgcccagccgcggtacgttcttggtgatctttaggaagaatcctggactactgtcgtctgcctggcttatggccacaagacccaccaagagcgaaacagacaggactgttatgattctcatgctgatgcgactgaagcttcacctgactcctgctccacaattggtggcctttatatagcgagatccacccgcatcttgcgtggaatagaaatgcgggtgactccaggaattagcattatcgatcggaaagtgcaaagcagataaaactgaactaacctgacctaaatgcctggccataattaagtgcatacatacacattacattacttacatttgtataagaactaaattttatagtacataccacttgcgtatgtaaatgcttgtcttttctcttatatacgttttataaatt

Where is the gene?Where is the gene?

© Eric Xing @ CMU, 2006-2010

Paradigms of Machine Learning

Supervised Learning Given , learn , s.t.

Unsupervised Learning Given , learn , s.t.

Reinforcement Learning Given

learn , s.t.

Active Learning Given , learn , s.t.

iiD YX , ii XY f :)f( jjD YX new

iD X ii XY f :)f( jjD YX new

game trace/realsimulator/ rewards,,actions,envD

rea

are

,:utility

,:policy 321 aaa ,,game real new,env

)(G~ D jD Y policy, ),(G' all )f( and )(G'~new D

© Eric Xing @ CMU, 2006-2010

Elements of Learning Here are some important elements to consider before you start:

Task: Embedding? Classification? Clustering? Topic extraction? …

Data and other info: Input and output (e.g., continuous, binary, counts, …) Supervised or unsupervised, of a blend of everything? Prior knowledge? Bias?

Models and paradigms: BN? MRF? Regression? SVM? Bayesian/Frequents ? Parametric/Nonparametric?

Objective/Loss function: MLE? MCLE? Max margin? Log loss, hinge loss, square loss? …

Tractability and exactness trade off: Exact inference? MCMC? Variational? Gradient? Greedy search? Online? Batch? Distributed?

Evaluation: Visualization? Human interpretability? Perperlexity? Predictive accuracy?

It is better to consider one element at a time!

© Eric Xing @ CMU, 2006-2010

Theories of Learning

For the learned F(; )

Consistency (value, pattern, …) Bias versus variance Sample complexity Learning rate Convergence Error bound Confidence Stability …

© Eric Xing @ CMU, 2006-2010

Classification

Representing data:

Hypothesis (classifier)

© Eric Xing @ CMU, 2006-2010

Decision-making as dividing a high-dimensional space

Classification-specific Dist.: P(X|Y)

Class prior (i.e., "weight"): P(Y)

),;(

)|(

111

1

Xp

YXp

),;(

)|(

222

2

Xp

YXp

© Eric Xing @ CMU, 2006-2010

The Bayes Rule

What we have just did leads to the following general expression:

This is Bayes Rule

)(

)()|()|(

XP

YpYXPXYP

© Eric Xing @ CMU, 2006-2010

The Bayes Decision Rule for Minimum Error

The a posteriori probability of a sample

Bayes Test:

Likelihood Ratio:

Discriminant function:

)()|(

)|(

)(

)()|()|( Xq

iYXp

iYXp

Xp

iYPiYXpXiYP i

i ii

ii

)(Xh

)(X

© Eric Xing @ CMU, 2006-2010

Example of Decision Rules

When each class is a normal …

We can write the decision boundary analytically in some cases … homework!!

© Eric Xing @ CMU, 2006-2010

Bayes Error

We must calculate the probability of error the probability that a sample is assigned to the wrong class

Given a datum X, what is the risk?

The Bayes error (the expected risk):

© Eric Xing @ CMU, 2006-2010

More on Bayes Error

Bayes error is the lower bound of probability of classification error

Bayes classifier is the theoretically best classifier that minimize probability of classification error

Computing Bayes error is in general a very complex problem. Why? Density estimation:

Integrating density function:

© Eric Xing @ CMU, 2006-2010

Learning Classifier

The decision rule:

Learning strategies

Generative Learning

Discriminative Learning

Instance-based Learning (Store all past experience in memory) A special case of nonparametric classifier

© Eric Xing @ CMU, 2006-2010

Supervised Learning

K-Nearest-Neighbor Classifier: where the h(X) is represented by all the data, and by an algorithm

© Eric Xing @ CMU, 2006-2010

Recall: Vector Space Representation

Doc 1 Doc 2 Doc 3 ...

Word 1 3 0 0 ...

Word 2 0 8 1 ...

Word 3 12 1 10 ...

... 0 1 3 ...

... 0 0 0 ...

Each document is a vector, one

component for each term (= word).

Normalize to unit length. High-dimensional vector space:

Terms are axes, 10,000+ dimensions, or even 100,000+ Docs are vectors in this space

© Eric Xing @ CMU, 2006-2010

Test Document = ?

Sports

Science

Arts

© Eric Xing @ CMU, 2006-2010

1-Nearest Neighbor (kNN) classifier

Sports

Science

Arts

© Eric Xing @ CMU, 2006-2010

2-Nearest Neighbor (kNN) classifier

Sports

Science

Arts

© Eric Xing @ CMU, 2006-2010

3-Nearest Neighbor (kNN) classifier

Sports

Science

Arts

© Eric Xing @ CMU, 2006-2010

K-Nearest Neighbor (kNN) classifier

Sports

Science

Arts

Voting kNN

© Eric Xing @ CMU, 2006-2010

Classes in a Vector Space

Sports

Science

Arts

© Eric Xing @ CMU, 2006-2010

kNN Is Close to Optimal

Cover and Hart 1967 Asymptotically, the error rate of 1-nearest-neighbor

classification is less than twice the Bayes rate [error rate of classifier knowing model that generated data]

In particular, asymptotic error rate is 0 if Bayes rate is 0.

Where does kNN come from? Nonparametric density estimation

© Eric Xing @ CMU, 2006-2010

Nearest-Neighbor Learning Algorithm

Learning is just storing the representations of the training examples in D.

Testing instance x: Compute similarity between x and all examples in D. Assign x the category of the most similar example in D.

Does not explicitly compute a generalization or category prototypes.

Also called: Case-based learning Memory-based learning Lazy learning

© Eric Xing @ CMU, 2006-2010

kNN is an instance of Instance-Based Learning

What makes an Instance-Based Learner?

A distance metric

How many nearby neighbors to look at?

A weighting function (optional)

How to relate to the local points?

© Eric Xing @ CMU, 2006-2010

Euclidean Distance Metric

Or equivalently,

Other metrics: L1 norm: |x-x'|

L∞ norm: max |x-x'| (elementwise …)

Mahalanobis: where is full, and symmetric Correlation Angle Hamming distance, Manhattan distance …

i

iii xxxxD 22 )'()',(

)'()'()',( xxxxxxD T

© Eric Xing @ CMU, 2006-2010

Case Study:kNN for Web Classification

Dataset 20 News Groups (20 classes) Download :(http://people.csail.mit.edu/jrennie/20Newsgroups/) 61,118 words, 18,774 documents Class labels descriptions

© Eric Xing @ CMU, 2006-2008

© Eric Xing @ CMU, 2006-2010

Results: Binary Classes

© Eric Xing @ CMU, 2006-2008

alt.atheism vs.

comp.graphics

rec.autos vs.

rec.sport.baseball

comp.windows.x

vs. rec.motorcycles

k

Accuracy

© Eric Xing @ CMU, 2006-2010

Results: Multiple Classes

© Eric Xing @ CMU, 2006-2008

k

Accuracy

Random select 5-out-of-20 classes, repeat 10 runs and average

All 20 classes

© Eric Xing @ CMU, 2006-2010

Is kNN ideal?

© Eric Xing @ CMU, 2006-2010

Is kNN ideal? … more later

© Eric Xing @ CMU, 2006-2010

Effect of Parameters

Sample size The more the better Need efficient search algorithm for NN

Dimensionality Curse of dimensionality

Density How smooth?

Metric The relative scalings in the distance metric affect region shapes.

Weight Spurious or less relevant points need to be downweighted

K

© Eric Xing @ CMU, 2006-2010

Summary Machine Learning is Cool and Useful!!

Paradigms of Machine Learning. Design elements learning Theories on learning

Fundamental theory of classification Bayes optimal classifier Instance-based learning: kNN – a Nonparametric classifier A nonparametric method does not rely on any assumption concerning the structure of the underlying

density function. Very little “learning” is involved in these methods

Good news: Simple and powerful methods; Flexible and easy to apply to many problems. kNN classifier asymptotically approaches the Bayes classifier, which is theoretically the best classifier that

minimizes the probability of classification error.

Bad news: High memory requirements Very dependant on the scale factor for a specific problem.

Recommended