47
LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: [email protected]

LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: [email protected]

Embed Size (px)

Citation preview

Page 1: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

LOGO

Classification I

Lecturer: Dr. Bo Yuan

E-mail: [email protected]

Page 2: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn
Page 3: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Overview

K-Nearest Neighbor Algorithm

Naïve Bayes Classifier

3

Thomas Bayes

Page 4: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Classification

4

Page 5: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Definition

Classification is one of the fundamental skills for survival.

Food vs. Predator

A kind of supervised learning

Techniques for deducing a function from data

<Input, Output>

Input: a vector of features

Output: a Boolean value (binary classification) or integer (multiclass)

“Supervised” means:

A teacher or oracle is needed to label each data sample.

We will talk about unsupervised learning later.5

Page 6: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Classifiers

6

Height

Wei

ght

Mary

Lisa

JaneJack

Peter

Tom

Sam

Helen

Z=f(x,y)

{boy, girl}

Height Weight

Page 10: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

K-Nearest Neighbor Algorithm

The algorithm procedure:

Given a set of n training data in the form of <x, y>.

Given an unknown sample x′.

Calculate the distance d(x′, xi) for i=1 … n.

Select the K samples with the shortest distances.

Assign x′ the label that dominates the K samples.

It is the simplest classifier you will ever meet (I mean it!).

No Training (literally)

A memory of the training data is maintained.

All computation is deferred until classification.

Produces satisfactory results in many cases.

Should give it a go whenever possible. 10

Page 11: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Properties of KNN

11

Instance-Based Learning

No explicit description of the target function

Can handle complicated situations.

Page 12: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Properties of KNN

12

?

Dependent of the data distributions.

Can make mistakes at boundaries.

K=7 Neighborhood

K=1 Neighborhood

Page 13: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Challenges of KNN

The Value of K

Non-monotonous impact on accuracy

Too Big vs. Too Small

Rule of thumbs

Weights

Different features may have different impact …

Distance

There are many different ways to measure the distance.

Euclidean, Manhattan …

Complexity

Need to calculate the distance between x′ and all training data.

In proportion to the size of the training data.13

K

Acc

urac

y

Page 14: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Distance Metrics

14

kd

i

k

iik yxyxL/1

1

,

2/1

1

2

2 ,

d

iii yxyxL

d

iii yxyxL

11 ,

Page 15: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Distance Metrics

15

The shortest path between two points …

Page 16: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Mahalanobis Distance

16

Distance from a point to a point set

Page 17: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Mahalanobis Distance

17

xSxxD TM

1)(

xxxD TM )(

For identity matrix S:

n

i i

iiM

xxD

12

2

)(

For diagonal matrix S:

Page 18: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Voronoi Diagram

18

perpendicular bisector

Page 20: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Structured Data

20

0 1

1

0.5

0.5

?

Page 22: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

KD-Tree

function kdtree (list of points pointList, int depth) { if pointList is empty return nil; else { // Select axis based on depth so that axis cycles through all valid values var int axis := depth mod k;   // Sort point list and choose median as pivot element select median by axis from pointList;   // Create node and construct subtrees

var tree_node node;

node.location := median;

node.leftChild := kdtree(points in pointList before median, depth+1);

node.rightChild := kdtree(points in pointList after median, depth+1);

return node; } }

22

Page 23: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

KD-Tree

23

Page 24: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Evaluation

Accuracy

Recall what we have learned in the first lecture … Confusion Matrix ROC Curve

Training Set vs. Test Set

N-fold Cross Validation

24

Test Set

Test Set

Test Set

Test Set

Test Set

Page 25: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

LOOCV

Leave One Out Cross Validation

An extreme case of N-fold cross validation

N=number of available samples

Usually very time consuming but okay for KNN

Now, let’s try KNN+LOOCV …

All students in this class are given one of two labels.

Gender: Male vs. Female

Major: CS vs. EE vs. Automation25

Page 26: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

26

10 Minutes …

Page 27: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Bayes Theorem

27

A B BAPBPAPBAP

APABPBPBAPBAP ||

BP

APABPBAP

||

evidence

priorlikelihoodposterior

Bayes

Theorem

Page 28: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Fish Example

Salmon vs. Tuna

P(ω1)=P(ω2)

P(ω1)>P(ω2)

Additional information

28

xP

PxPxP ii

i

||

Page 29: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Shooting Example

Probability of Kill

P(A): 0.6

P(B): 0.5

The target is killed with:

One shoot from A

One shoot from B

What is the probability that it is shot down by A?

C: The target is killed.

29

43

5.06.05.04.05.06.06.01

)(

)()()(

CP

APACPCAP

Page 30: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Cancel Example

ω1: Cancer; ω2: Normal

P(ω1)=0.008; P(ω2)=0.992

Lab Test Outcomes: + vs. –

P(+|ω1)=0.98; P(-|ω1)=0.02

P(+|ω2)=0.03; P(-|ω2)=0.97

Now someone has a positive test result…

Is he/she doomed?30

Page 31: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Cancel Example

31

0078.0008.098.0|| 111 PPP

0298.0992.003.0|| 222 PPP

11 21.00298.00078.0

0078.0| PP

|| 21 PP

Page 32: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Headache & Flu Example

H=“Having a headache”

F=“Coming down with flu”

P(H)=1/10; P(F)=1/40; P(H|F)=1/2

What does this mean?

One day you wake up with a headache …

Since 50% flu cases are associated with headaches …

I must have a 50-50 chance of coming down with flu!

32

Page 33: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Headache & Flu Example

33

8

1

10/1

40/12/1

)(

)()|()|(

HP

FPFHPHFP

Flu

Headache

The truth is …

Page 34: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Naïve Bayes Classifier

34

niMAP aaaPi

,...,,|maxarg 21

MAP: Maximum A Posterior

n

iinMAP aaaP

PaaaP

i ,...,,

|,...,,maxarg

21

21

iinMAP PaaaPi

|,...,,maxarg 21

j

ijiMAP aPPi

|maxarg

Conditionally Independent

Page 35: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Independence

35

BPAPBAP

ABPAPBAP | BPABP |

)|()|()|,( GBPGAPGBAP

Conditionally Independent

)|(),|( GAPBGAP

)|(),|(

)(/),(),|()(/),,()|,(

GBPGBAP

GPGBPGBAPGPGBAPGBAP

Page 36: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Conditional Independence

36

)|()|()|( YBPYRPYBRP

Page 37: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Independent ≠ Uncorrelated

37

2

];1,1[

XY

X

Cov (X,Y)=0 X and Y are uncorrelated

However, Y is completely determined by X.

X Y

1 1

0.5 0.25

0.2 0.04

0 0

-0.2 0.04

-0.5 0.25

-1 1 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

X

Y

𝜌𝑋 ,𝑌=𝑐𝑜𝑣 ( 𝑋 ,𝑌 )𝜎 𝑋𝜎𝑌

=𝐸 ( (𝑋 −𝜇𝑋 ) (𝑌 −𝜇𝑌 ))

𝜎 𝑋𝜎 𝑌

Page 38: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Estimating P(αj|ωi)

38

α1 α2 α3 ω

+ ω1

ω2

- ω1

+ ω1

ω2

3/2|'' 12 aP

3/1|'' 12 aP

5/2;5/3 21 PP

ji

ijkj

ijka

aaaP

1|Laplace Smoothing

How about continuous variables?

Page 39: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Tennis Example

39

Day Outlook Temperature Humidity Wind Play

Tennis

Day1 Sunny Hot High Weak No

Day2 Sunny Hot High Strong No

Day3 Overcast Hot High Weak Yes

Day4 Rain Mild High Weak Yes

Day5 Rain Cool Normal Weak Yes

Day6 Rain Cool Normal Strong No

Day7 Overcast Cool Normal Strong Yes

Day8 Sunny Mild High Weak No

Day9 Sunny Cool Normal Weak Yes

Day10 Rain Mild Normal Weak Yes

Day11 Sunny Mild Normal Strong Yes

Day12 Overcast Mild High Strong Yes

Day13 Overcast Hot Normal Weak Yes

Day14 Rain Mild High Strong No

Page 40: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Tennis Example

40

)(

:Predict

Wind,Humidity,eTemperatur,Outlook

:Given

nooryesPlayTennis

stronghighcoolsunny

795.00053.00206.0

0206.0 :yprobabilit with splay tenni not to is conclusion The

0206.0)|()|()|()|()(

0053.0)|()|()|()|()(

...

5/3|

9/3|

14/5

14/9

:

nostrongPnohighPnocoolPnosunnyPnoP

yesstrongPyeshighPyescoolPyessunnyPyesP

noPlayTennisstrongWindP

yesPlayTennisstrongWindP

noPlayTennisP

yesPlayTennisP

SolutionBayes

Page 41: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Text Classification Example

41

Interesting? Boring?

Politics? Entertainment? Sports?

Page 42: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Text Representation

42

α1 α2 α3 α4 … αn ω

Long long ago there … king 1

New sanctions will be … Iran 0

Hidden Markov models are … method 0

The Federal Court today … investigate 0

However, there are 2×n×|Vocabulary| terms in total. For n=100 and a

vocabulary of 50,000 distinct words, it adds up to 10 million terms!

We need to estimate probabilities such as .

Page 43: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Text Representation

By only considering the probability of encountering a specific word instead of the specific word position, we can reduce the number of probabilities to be estimated.

We only count the frequency of each word.

Now, 2×50,000=100,000 terms need to be estimated.

n: the total number of word positions in all training samples whose target value is ωi.

nk: the number of times word Vk is found among these n positions.

43

||

1|

Vocabularyn

nVP k

iK

Page 44: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Case Study: Newsgroups

Classification

Joachims, 1996

20 newsgroups

20,000 documents

Random Guess: 5%

NB: 89%

Recommendation

Lang, 1995

NewsWeeder

User rated articles

Interesting vs. Uninteresting

Top 10% selected articles

16% vs. 59%

44

Page 45: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Reading Materials

C. C. Aggarwal, A. Hinneburg and D. A. Keim, “On the Surprising Behavior of Distance

Metrics in High Dimensional Space,” Proc. the 8th International Conference on Database

Theory, LNCS 1973, pp. 420-434, London, UK, 2001.

J. H. Friedman, J. L. Bentley, and R. A. Finkel, “An Algorithm for Finding Best Matches in

Logarithmic Expected Time,” ACM Transactions on Mathematical Software, 3(3):209–226,

1977.

S. M. Omohundro, “Bumptrees for Efficient Function, Constraint, and Classification

Learning,” Advances in Neural Information Processing Systems 3, pp. 693-699, Morgan

Kaufmann, 1991.

Tom Mitchell, Machine Learning (Chapter 6), McGraw-Hill.

Additional reading about Naïve Bayes Classifier http://www-2.cs.cmu.edu/~tom/NewChapters.html

Software for text classification using Naïve Bayes Classifier http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html

45

Page 46: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Review

What is classification?

What is supervised learning?

What does KNN stand for?

What are the major challenges of KNN?

How to accelerate KNN?

What is N-fold cross validation?

What does LOOCV stand for?

What is Bayes Theorem?

What is the key assumption in Naïve Bayes Classifiers?

46

Page 47: LOGO Classification I Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Next Week’s Class Talk

Volunteers are required for next week’s class talk.

Topic 1: Efficient KNN Implementations

Hints:

Ball Trees

Metric Trees

R Trees

Topic 2: Bayesian Belief Networks

Length: 20 minutes plus question time

47