45
© Eric Xing @ CMU, 2006-2010 Machine Learning Machine Learning Introduction & Introduction & Nonparametric Classifiers Nonparametric Classifiers Eric Xing Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

  • View
    233

  • Download
    0

Embed Size (px)

Citation preview

Page 1: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Machine LearningMachine Learning

Introduction & Introduction &

Nonparametric ClassifiersNonparametric Classifiers

Eric XingEric Xing

Lecture 1, August 12, 2010

Reading:

Page 2: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Where does it come from: http://www.cs.cmu.edu/~epxing/Class/10701/ http://www.cs.cmu.edu/~epxing/Class/10708/

Machine Learning

Page 3: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Logistics

Text book Chris Bishop, Pattern Recognition and Machine Learning (required) Tom Mitchell, Machine Learning

David Mackay, Information Theory, Inference, and Learning Algorithms Daphnie Koller and Nir Friedman, Probabilistic Graphical Models

Class resource http://bcmi.sjtu.edu.cn/ds/

Host: Lu Baoliang, Shanghai JiaoTong University Xue Xiangyang, Fudan University

Instructors:

Page 4: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Local Hosts and co-instructors

Page 5: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Logistics

Class mailing list [email protected]

Home work

Exam

Project

Page 6: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Apoptosis + Medicine

What is Learning

Grammatical rulesManufacturing proceduresNatural laws…

Inference

Learning is about seeking a predictive and/or executable understanding of natural/artificial subjects, phenomena, or activities from …

Page 7: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Machine Learning

Page 8: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Fetching a stapler from inside an office --- the Stanford STAIR robot

Page 9: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

What is Machine Learning?

Machine Learning seeks to develop theories and computer systems for

representing; classifying, clustering and recognizing; reasoning under uncertainty; predicting; and reacting to …

complex, real world data, based on the system's own experience with data,

and (hopefully) under a unified model or mathematical framework, that

can be formally characterized and analyzed can take into account human prior knowledge can generalize and adapt across data and domains can operate automatically and autonomously and can be interpreted and perceived by human.

Page 10: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Where Machine Learning is being used or can be useful?

Speech recognitionSpeech recognition

Information retrievalInformation retrieval

Computer visionComputer vision

Robotic controlRobotic control

PlanningPlanning

GamesGames

EvolutionEvolution

PedigreePedigree

Page 11: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Natural language processing and speech recognition

Now most pocket Speech Recognizers or Translators are running on some sort of learning device --- the more you play/use them, the smarter they become!

Page 12: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Object Recognition

Behind a security camera, most likely there is a computer that is learning and/or checking!

Page 13: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Robotic Control I

The best helicopter pilot is now a computer! it runs a program that learns how to fly and make acrobatic maneuvers by itself! no taped instructions, joysticks, or things like …

A. Ng 2005

Page 14: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Text Mining

Reading, digesting, and categorizing a vast text database is too much for human!

We want:

Page 15: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Bioinformatics

tgtatcccagcatattttacgtaaaaacaaaacggtaatgcgaacataacttatttattggggcccggaccgcaaaccggccaaacgcgtttgcacccataaaaacataagggcaacaaaaaaattgttaagctgttgtttatttttgcaatcgaaaatcgaaagcgctcaaatagctgcgatcactcgggagcagggtaaagtcgcctcgaaacaggaagctgaagcatcttctataaatacactcaaagcgatcattccgaggcgagtctggttagaaatttacatggactgcaaaaaggtatagccccacaaactgcgtctccacatcgctgcgtttcggcagctaattgccttttagaaattattttcccatttcgagaaactcgtgtgggatgccggatgcggctttcaatcacttctggcccgggatcggattgggtcacattgtctgcgggctctattgtctcgatccgcggatttaaggcgcagttcgcgtgcttagcggtcagaaaggcagagattcggttcggattgatgcgctggcagcagggcacaaagatctaatgactggcaaatcgctacaaataaattaaagtccggcggctaattaatgagcggactgaagccactttggcggagttcattaaccaaaaaacagcagataaacaaaaacggcaaagaaaattgccacagagttgtcacgctttgttgcacaaacatttgtgcagaaaagtgaaaagcttttagccattattaagtttttcctcagctcgctggcagcacttgcgaatgtaaattgaaactgatgttcctcataaatgaaaattaatgtttgctctacgctccaccgaactcgcttgtttgggggattggctggctaatcgcggctagatcccaggcggtataaccttttcgcttcatcagttgtgaaaccagatggctggtgttttggcaacgacagccagcggactcccctcgaacgctctcgaaatcaagtggctttccagccggcccgctgggccgctcgcccactggaccggtattcccaggccaggccacactgtaccgcaccgcataatcctcgccagactcggcgctgataaggcccaatgtcaagaccgcactccgcaggcgtctatttatgccaaggaccgttcttcttcagctttcggctcgagtatttgttgtgccatgttggttacgatgccaatcgcggtacagttatgcaaatgagcagcgaataccgctcactgacaatgaacggcgtcttgtcaaccattcatattcatgctgacattcatattcattcctttggttttttgtcttcgacggactgaaaagtgcggagagaaacccaaaaacagaagcgcgcaaagcgccgttaatatgcgaactcagcgaactcattgaagttatcacaacaccatatccataaccatacccatatccatatcaatatcaatatcgctattattaacgatcatgctctgctgatcaagtattcagcgctgcgctagattcgacagattgaatcgagctcaatagactcaacagactccactcgacagatgcgcaatgccaaggacaattgccgtgtgtaagtggagtaaacgaggcgtatgcgcaacctgcacctggcggacgcggcgtatgcgcaatgtgcaattcgcttaccttctcgttgcgggtcaggaactcccagatgggaatggccgatgacgagctgatctgaatgtggaaggcgcccagcaggcagcaggccaagattactttcgccgcagtcgtcatggtgtcgttgctgcttttatgttgcgtactccgcactacacggagagttcaggggattcgtgctccgtgatctgtgatccgtgttccgtgggtcaattgcacggttcggttgtgtaaccttcgtgtggcgactttctttttttttagggcccaataaaagcgcttttgtggcggcttgatagattatcacttggtttcggtggctagccaagtggctttcttctgtccgacgcacttaattgaattaaccaaacaacgagcgtggccaattcgtattatcgctgttttcgggtgtacgtgtgtctcagcttgaaacgcaaaagcttgtttcacacatcggtttctcggcaagatgggggagtcagtcggtctagggagaggggcgcccaccagtcgatcacgaaaacggcgaattccaagcgaaacggaaacggagcgagcactataatcttgcagtactatgtcgaacaaccgatcgcggcgatgtcagtgagtcgtcttcggacagcgctggcgctccacacgtatttaagctctgagatcggctttgggagagcgcagagagcgccatcgcacggcagagcgaaagcggcagtgagcgaaagcgagagggagagcggcagcgggtgggggatcgggagccccccgaaaaaaacagaggcgcacgtcgatgccatcggggaattggaacctcaatgtgtgggaatgtttaaatattctgtgttaggtagtgtagtttcatagactatagattctcatacagattgtatctgagagtccttcgagccgattatacacgacagcaaaatatttcagtcgcgcttgggcaaaaggcttaagcacgactcccagtccccccttacatttgtcttcctaagcccctggagccactatcaaacttgttctacgcttgcactgaaaatagaaattaaaaaccaaagtaaacaatcaaaaagaccaaaaacaataacaaccagcaccgagtcgaacatcagtgaggcattgcaaaaatttcaaagtcaagtttgcgtcgtcatcgcgtctgagtccgatcaagccgggcttgtaattgaagttgttgatgagccgcgggattactggattgtggcgaattctggtcagcatacttaacagcagcccgctaattaagcaaaataaacatatcaaattccagaatgcgacggcgccatcatcctgtttgggaattcaattcgcgggcagatcgtttaattcaattaaaaggtagaagtcggtaaaagggagcagaagaatgcgatcgctggaatttcctaacatcacggaccccataaatttgataagcccgagctcgctgcgttgagtcagccaccccacatccccaaatccccgccaaaagaagacagctgggttgttgactcgccagattgaggtgcggattgcagtggagtggacctggtcaaagaagcaccgttaatgtgctgattccattcgattccatccgggaatgcgataaagaaaggctctgatccaagcaactgcaatccggatttcgattttctctttccatttggttttgtatttacgtacttatgggaaagcattctaatgaagacttggagaagacttacgttatattcagaccatcgtgcgatagaggatgagtcatttccatatggccgaaatttattatgtttactatcgtttttagaggtgttttttggacttaccaaaagaggcatttgttttctgggattattcaactgaaaagatatttaaattttttcttggaccattttcaaggttccggatatatttgaaacacactagctagcagtgttggtaagttacatgtatttctataatgtcatattcctttgtccgtattcaaatcgaatactccacatctcacaggatgttgtacttgaggaattggcgatcgtagcgatttcccccgccgtaaagttcctgatcctcgttgtttttgtacatcataaagtccggattctgctcgtcgccgaagatgggaacgaagctgccaaagctgagagtctgcttgaggtgctggtccggacggcgtcccagctggataaccttgctgtacagatcggcatctgcctggagggcacgatcgaaatccttccagtggacgaacttcacctgctcgctgggaatagcgttgttgtcaagcagctcaaggagcgtattcgagttgacgggctgcaccacgttaagttcctgctccttcgctggggattcccctgcgggtaagcgccgcttgcttggactcgtttccaaatcccatagccacgccagcagaggagtaacagagctcwhereisthegenetgattaaaaatatcctttaagaaagcccatgggtataacttacgtgctcactgcgtcctatgcgaggaatggtctttaggttctttatggcaaagttctcgcctcgcttgcccagccgcggtacgttcttggtgatctttaggaagaatcctggactactgtcgtctgcctggcttatggccacaagacccaccaagagcgaaacagacaggactgttatgattctcatgctgatgcgactgaagcttcacctgactcctgctccacaattggtggcctttatatagcgagatccacccgcatcttgcgtggaatagaaatgcgggtgactccaggaattagcattatcgatcggaaagtgcaaagcagataaaactgaactaacctgacctaaatgcctggccataattaagtgcatacatacacattacattacttacatttgtataagaactaaattttatagtacataccacttgcgtatgtaaatgcttgtcttttctcttatatacgttttataaatttgtatcccagcatattttacgtaaaaacaaaacggtaatgcgaacataacttatttattggggcccggaccgcaaaccggccaaacgcgtttgcacccataaaaacataagggcaacaaaaaaattgttaagctgttgtttatttttgcaatcgaaaatcgaaagcgctcaaatagctgcgatcactcgggagcagggtaaagtcgcctcgaaacaggaagctgaagcatcttctataaatacactcaaagcgatcattccgaggcgagtctggttagaaatttacatggactgcaaaaaggtatagccccacaaactgcgtctccacatcgctgcgtttcggcagctaattgccttttagaaattattttcccatttcgagaaactcgtgtgggatgccggatgcggctttcaatcacttctggcccgggatcggattgggtcacattgtctgcgggctctattgtctcgatccgcggatttaaggcgcagttcgcgtgcttagcggtcagaaaggcagagattcggttcggattgatgcgctggcagcagggcacaaagatctaatgactggcaaatcgctacaaataaattaaagtccggcggctaattaatgagcggactgaagccactttggcggagttcattaaccaaaaaacagcagataaacaaaaacggcaaagaaaattgccacagagttgtcacgctttgttgcacaaacatttgtgcagaaaagtgaaaagcttttagccattattaagtttttcctcagctcgctggcagcacttgcgaatgtaaattgaaactgatgttcctcataaatgaaaattaatgtttgctctacgctccaccgaactcgcttgtttgggggattggctggctaatcgcggctagatcccaggcggtataaccttttcgcttcatcagttgtgaaaccagatggctggtgttttggcaacgacagccagcggactcccctcgaacgctctcgaaatcaagtggctttccagccggcccgctgggccgctcgcccactggaccggtattcccaggccaggccacactgtaccgcaccgcataatcctcgccagactcggcgctgataaggcccaatgtcaagaccgcactccgcaggcgtctatttatgccaaggaccgttcttcttcagctttcggctcgagtatttgttgtgccatgttggttacgatgccaatcgcggtacagttatgcaaatgagcagcgaataccgctcactgacaatgaacggcgtcttgtcaaccattcatattcatgctgacattcatattcattcctttggttttttgtcttcgacggactgaaaagtgcggagagaaacccaaaaacagaagcgcgcaaagcgccgttaatatgcgaactcagcgaactcattgaagttatcacaacaccatatccataaccatacccatatccatatcaatatcaatatcgctattattaacgatcatgctctgctgatcaagtattcagcgctgcgctagattcgacagattgaatcgagctcaatagactcaacagactccactcgacagatgcgcaatgccaaggacaattgccgtgtgtaagtggagtaaacgaggcgtatgcgcaacctgcacctggcggacgcggcgtatgcgcaatgtgcaattcgcttaccttctcgttgcgggtcaggaactcccagatgggaatggccgatgacgagctgatctgaatgtggaaggcgcccagcaggcagcaggccaagattactttcgccgcagtcgtcatggtgtcgttgctgcttttatgttgcgtactccgcactacacggagagttcaggggattcgtgctccgtgatctgtgatccgtgttccgtgggtcaattgcacggttcggttgtgtaaccttcgtgtggcgactttctttttttttagggcccaataaaagcgcttttgtggcggcttgatagattatcacttggtttcggtggctagccaagtggctttcttctgtccgacgcacttaattgaattaaccaaacaacgagcgtggccaattcgtattatcgctgttttcgggtgtacgtgtgtctcagcttgaaacgcaaaagcttgtttcacacatcggtttctcggcaagatgggggagtcagtcggtctagggagaggggcgcccaccagtcgatcacgaaaacggcgaattccaagcgaaacggaaacggagcgagcactataatcttgcagtactatgtcgaacaaccgatcgcggcgatgtcagtgagtcgtcttcggacagcgctggcgctccacacgtatttaagctctgagatcggctttgggagagcgcagagagcgccatcgcacggcagagcgaaagcggcagtgagcgaaagcgagagggagagcggcagcgggtgggggatcgggagccccccgaaaaaaacagaggcgcacgtcgatgccatcggggaattggaacctcaatgtgtgggaatgtttaaatattctgtgttaggtagtgtagtttcatagactatagattctcatacagattgtatctgagagtccttcgagccgattatacacgacagcaaaatatttcagtcgcgcttgggcaaaaggcttaagcacgactcccagtccccccttacatttgtcttcctaagcccctggagccactatcaaacttgttctacgcttgcactgaaaatagaaattaaaaaccaaagtaaacaatcaaaaagaccaaaaacaataacaaccagcaccgagtcgaacatcagtgaggcattgcaaaaatttcaaagtcaagtttgcgtcgtcatcgcgtctgagtccgatcaagccgggcttgtaattgaagttgttgatgagccgcgggattactggattgtggcgaattctggtcagcatacttaacagcagcccgctaattaagcaaaataaacatatcaaattccagaatgcgacggcgccatcatcctgtttgggaattcaattcgcgggcagatcgtttaattcaattaaaaggtagaagtcggtaaaagggagcagaagaatgcgatcgctggaatttcctaacatcacggaccccataaatttgataagcccgagctcgctgcgttgagtcagccaccccacatccccaaatccccgccaaaagaagacagctgggttgttgactcgccagattgaggtgcggattgcagtggagtggacctggtcaaagaagcaccgttaatgtgctgattccattcgattccatccgggaatgcgataaagaaaggctctgatccaagcaactgcaatccggatttcgattttctctttccatttggttttgtatttacgtacttatgggaaagcattctaatgaagacttggagaagacttacgttatattcagaccatcgtgcgatagaggatgagtcatttccatatggccgaaatttattatgtttactatcgtttttagaggtgttttttggacttaccaaaagaggcatttgttttctgggattattcaactgaaaagatatttaaattttttcttggaccattttcaaggttccggatatatttgaaacacactagctagcagtgttggtaagttacatgtatttctataatgtcatattcctttgtccgtattcaaatcgaatactccacatctcacaggatgttgtacttgaggaattggcgatcgtagcgatttcccccgccgtaaagttcctgatcctcgttgtttttgtacatcataaagtccggattctgctcgtcgccgaagatgggaacgaagctgccaaagctgagagtctgcttgaggtgctggtccggacggcgtcccagctggataaccttgctgtacagatcggcatctgcctggagggcacgatcgaaatccttccagtggacgaacttcacctgctcgctgggaatagcgttgttgtcaagcagctcaaggagcgtattcgagttgacgggctgcaccacgttaagttcctgctccttcgctggggattcccctgcgggtaagcgccgcttgcttggactcgtttccaaatcccatagccacgccagcagaggagtaacagagctctgaaaacagttcatggtttaaaaatatcctttaagaaagcccatgggtataacttacgtgctcactgcgtcctatgcgaggaatggtctttaggttctttatggcaaagttctcgcctcgcttgcccagccgcggtacgttcttggtgatctttaggaagaatcctggactactgtcgtctgcctggcttatggccacaagacccaccaagagcgaaacagacaggactgttatgattctcatgctgatgcgactgaagcttcacctgactcctgctccacaattggtggcctttatatagcgagatccacccgcatcttgcgtggaatagaaatgcgggtgactccaggaattagcattatcgatcggaaagtgcaaagcagataaaactgaactaacctgacctaaatgcctggccataattaagtgcatacatacacattacattacttacatttgtataagaactaaattttatagtacataccacttgcgtatgtaaatgcttgtcttttctcttatatacgttttataaatt

Where is the gene?Where is the gene?

Page 16: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Paradigms of Machine Learning

Supervised Learning Given , learn , s.t.

Unsupervised Learning Given , learn , s.t.

Reinforcement Learning Given

learn , s.t.

Active Learning Given , learn , s.t.

iiD YX , ii XY f :)f( jjD YX new

iD X ii XY f :)f( jjD YX new

game trace/realsimulator/ rewards,,actions,envD

rea

are

,:utility

,:policy 321 aaa ,,game real new,env

)(G~ D jD Y policy, ),(G' all )f( and )(G'~new D

Page 17: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Elements of Learning Here are some important elements to consider before you start:

Task: Embedding? Classification? Clustering? Topic extraction? …

Data and other info: Input and output (e.g., continuous, binary, counts, …) Supervised or unsupervised, of a blend of everything? Prior knowledge? Bias?

Models and paradigms: BN? MRF? Regression? SVM? Bayesian/Frequents ? Parametric/Nonparametric?

Objective/Loss function: MLE? MCLE? Max margin? Log loss, hinge loss, square loss? …

Tractability and exactness trade off: Exact inference? MCMC? Variational? Gradient? Greedy search? Online? Batch? Distributed?

Evaluation: Visualization? Human interpretability? Perperlexity? Predictive accuracy?

It is better to consider one element at a time!

Page 18: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Theories of Learning

For the learned F(; )

Consistency (value, pattern, …) Bias versus variance Sample complexity Learning rate Convergence Error bound Confidence Stability …

Page 19: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Classification

Representing data:

Hypothesis (classifier)

Page 20: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Decision-making as dividing a high-dimensional space

Classification-specific Dist.: P(X|Y)

Class prior (i.e., "weight"): P(Y)

),;(

)|(

111

1

Xp

YXp

),;(

)|(

222

2

Xp

YXp

Page 21: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

The Bayes Rule

What we have just did leads to the following general expression:

This is Bayes Rule

)(

)()|()|(

XP

YpYXPXYP

Page 22: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

The Bayes Decision Rule for Minimum Error

The a posteriori probability of a sample

Bayes Test:

Likelihood Ratio:

Discriminant function:

)()|(

)|(

)(

)()|()|( Xq

iYXp

iYXp

Xp

iYPiYXpXiYP i

i ii

ii

)(Xh

)(X

Page 23: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Example of Decision Rules

When each class is a normal …

We can write the decision boundary analytically in some cases … homework!!

Page 24: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Bayes Error

We must calculate the probability of error the probability that a sample is assigned to the wrong class

Given a datum X, what is the risk?

The Bayes error (the expected risk):

Page 25: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

More on Bayes Error

Bayes error is the lower bound of probability of classification error

Bayes classifier is the theoretically best classifier that minimize probability of classification error

Computing Bayes error is in general a very complex problem. Why? Density estimation:

Integrating density function:

Page 26: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Learning Classifier

The decision rule:

Learning strategies

Generative Learning

Discriminative Learning

Instance-based Learning (Store all past experience in memory) A special case of nonparametric classifier

Page 27: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Supervised Learning

K-Nearest-Neighbor Classifier: where the h(X) is represented by all the data, and by an algorithm

Page 28: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Recall: Vector Space Representation

Doc 1 Doc 2 Doc 3 ...

Word 1 3 0 0 ...

Word 2 0 8 1 ...

Word 3 12 1 10 ...

... 0 1 3 ...

... 0 0 0 ...

Each document is a vector, one

component for each term (= word).

Normalize to unit length. High-dimensional vector space:

Terms are axes, 10,000+ dimensions, or even 100,000+ Docs are vectors in this space

Page 29: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Test Document = ?

Sports

Science

Arts

Page 30: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

1-Nearest Neighbor (kNN) classifier

Sports

Science

Arts

Page 31: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

2-Nearest Neighbor (kNN) classifier

Sports

Science

Arts

Page 32: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

3-Nearest Neighbor (kNN) classifier

Sports

Science

Arts

Page 33: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

K-Nearest Neighbor (kNN) classifier

Sports

Science

Arts

Voting kNN

Page 34: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Classes in a Vector Space

Sports

Science

Arts

Page 35: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

kNN Is Close to Optimal

Cover and Hart 1967 Asymptotically, the error rate of 1-nearest-neighbor

classification is less than twice the Bayes rate [error rate of classifier knowing model that generated data]

In particular, asymptotic error rate is 0 if Bayes rate is 0.

Where does kNN come from? Nonparametric density estimation

Page 36: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Nearest-Neighbor Learning Algorithm

Learning is just storing the representations of the training examples in D.

Testing instance x: Compute similarity between x and all examples in D. Assign x the category of the most similar example in D.

Does not explicitly compute a generalization or category prototypes.

Also called: Case-based learning Memory-based learning Lazy learning

Page 37: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

kNN is an instance of Instance-Based Learning

What makes an Instance-Based Learner?

A distance metric

How many nearby neighbors to look at?

A weighting function (optional)

How to relate to the local points?

Page 38: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Euclidean Distance Metric

Or equivalently,

Other metrics: L1 norm: |x-x'|

L∞ norm: max |x-x'| (elementwise …)

Mahalanobis: where is full, and symmetric Correlation Angle Hamming distance, Manhattan distance …

i

iii xxxxD 22 )'()',(

)'()'()',( xxxxxxD T

Page 39: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Case Study:kNN for Web Classification

Dataset 20 News Groups (20 classes) Download :(http://people.csail.mit.edu/jrennie/20Newsgroups/) 61,118 words, 18,774 documents Class labels descriptions

© Eric Xing @ CMU, 2006-2008

Page 40: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Results: Binary Classes

© Eric Xing @ CMU, 2006-2008

alt.atheism vs.

comp.graphics

rec.autos vs.

rec.sport.baseball

comp.windows.x

vs. rec.motorcycles

k

Accuracy

Page 41: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Results: Multiple Classes

© Eric Xing @ CMU, 2006-2008

k

Accuracy

Random select 5-out-of-20 classes, repeat 10 runs and average

All 20 classes

Page 42: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Is kNN ideal?

Page 43: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Is kNN ideal? … more later

Page 44: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Effect of Parameters

Sample size The more the better Need efficient search algorithm for NN

Dimensionality Curse of dimensionality

Density How smooth?

Metric The relative scalings in the distance metric affect region shapes.

Weight Spurious or less relevant points need to be downweighted

K

Page 45: © Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

© Eric Xing @ CMU, 2006-2010

Summary Machine Learning is Cool and Useful!!

Paradigms of Machine Learning. Design elements learning Theories on learning

Fundamental theory of classification Bayes optimal classifier Instance-based learning: kNN – a Nonparametric classifier A nonparametric method does not rely on any assumption concerning the structure of the underlying

density function. Very little “learning” is involved in these methods

Good news: Simple and powerful methods; Flexible and easy to apply to many problems. kNN classifier asymptotically approaches the Bayes classifier, which is theoretically the best classifier that

minimizes the probability of classification error.

Bad news: High memory requirements Very dependant on the scale factor for a specific problem.