32
Data Mining and Machine Learning Naive Bayes David Corne, HWU [email protected]

Data Mining and Machine Learning Naive Bayes David Corne, HWU [email protected]

Embed Size (px)

Citation preview

Page 1: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

Data Mining and Machine Learning

Naive BayesDavid Corne, HWU

[email protected]

Page 2: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

A very simple dataset – one field / one class

P34 level Prostatecancer

High YMedium Y

Low YLow NLow N

Medium NHigh YHigh NLow N

Medium Y

Page 3: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

A very simple dataset – one field / one class

P34 level Prostate cancer

High YMedium Y

Low YLow NLow N

Medium NHigh YHigh NLow N

Medium Y

A new patient hasa blood test – his P34level is HIGH.

what is our best guess for prostate cancer?

Page 4: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

A very simple dataset – one field / one class

P34 level Prostate cancer

High YMedium Y

Low YLow NLow N

Medium NHigh YHigh NLow N

Medium Y

It’s useful to know: P(cancer = Y)

Page 5: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

A very simple dataset – one field / one class

P34 level Prostate cancer

High YMedium Y

Low YLow NLow N

Medium NHigh YHigh NLow N

Medium Y

It’s useful to know: P(cancer = Y)

- on basis of this tiny dataset, P(c = Y) is 5/10 = 0.5

Page 6: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

A very simple dataset – one field / one class

P34 level Prostate cancer

High YMedium Y

Low YLow NLow N

Medium NHigh YHigh NLow N

Medium Y

But we know that P34 =H,so actually we want: P(cancer=Y | P34 = H)

- the prob that cancer is Y, given that P34 is high

Page 7: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

A very simple dataset – one field / one class

P34 level Prostate cancer

High YMedium Y

Low YLow NLow N

Medium NHigh YHigh NLow N

Medium Y

P(cancer=Y | P34 = H)

- the prob that cancer is Y, given that P34 is high

- this seems to be 2/3 = ~ 0.67

Page 8: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

A very simple dataset – one field / one class

P34 level Prostate cancer

High YMedium Y

Low YLow NLow N

Medium NHigh YHigh NLow N

Medium Y

So we have:

P ( c=Y | P34 = H) = 0.67 P ( c =N | P34 = H) = 0.33

The class value with the highest probability is ourbest guess

Page 9: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

In general we may have any number of class values

P34 level Prostate cancer

High YMedium Y

Low YLow NLow N

Medium NHigh YHigh NHigh Maybe

Medium Y

suppose again we know that P34 is High; here we have:

P ( c=Y | P34 = H) = 0.5 P ( c=N | P34 = H) = 0.25 P(c = Maybe | H) = 0.25

... and again, Y is the winner

Page 10: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

That is the essence of Naive Bayes,

but:the probability calculations are much

trickier when there are >1 fieldsso we make a ‘Naive’ assumption that

makes it simpler

Page 11: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

Bayes’ theorem

P34 level Prostate cancer

High YMedium Y

Low YLow NLow N

Medium NHigh YHigh NLow N

Medium Y

As we saw, on the rightwe are illustrating:

P(cancer = Y | P34 = H)

Page 12: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

Bayes’ theorem

P34 level Prostate cancer

High YMedium Y

Low YLow NLow N

Medium NHigh YHigh NLow N

Medium Y

And now we are illustrating

P(P34 = H | cancer = Y)

This is a different thing, that turns out as 2/5 = 0.4

Page 13: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

Bayes’ theorem is this:

P( A | B) = P ( B | A ) P (A) P(B)

It is very useful when it is hard to get P(A | B) directly, but easier to get the things on the right

Page 14: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

Bayes’ theorem in 1-non-class-field DMML context:

P( Class=X | Fieldval = F) =

P ( Fieldval = F | Class = X ) × P( Class = X)

P(Fieldval = F)

Page 15: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

Bayes’ theorem in 1-non-class-field DMML context:

P( Class=X | Fieldval = F) =

P ( Fieldval = F | Class = X ) × P( Class = X)

P(Fieldval = F)

We want to check this for each class and choosethe class that gives the highest value.

Page 16: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

Bayes’ theorem in 1-non-class-field DMML context:

P( Class=X | Fieldval = F) =

P ( Fieldval = F | Class = X ) × P( Class = X)

P(Fieldval = F)

E.g. We compare: P(F | Yes) × P (Yes) P(F | No× P (No) P(F | Maybe) × P (Maybe)

... We can ignore the P(Fieldval = F) bit ... Why ?

Page 17: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

and that was Exactly how we do Naive Bayes for a

1-field dataset

Page 18: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

Note how this relates to our beloved histograms

P34 level Prostate cancer

High YMedium Y

Low YLow NLow N

Medium NHigh YHigh NLow N

Medium YLow Med High

0.60.40.20

P(L| N)P(H| Y)

Page 19: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

Nave-Bayes with Many-fields

P34 level P61 level BMI Prostate cancer

High Low Medium YMedium Low Medium Y

Low Low High YLow High Low NLow Low Low N

Medium Medium Low NHigh Low Medium YHigh Medium Low NLow Low High N

Medium High High Y

Page 20: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

Nave-Bayes with Many-fields

P34 level P61 level BMI Prostate cancer

High Low Medium YMedium Low Medium Y

Low Low High YLow High Low NLow Low Low N

Medium Medium Low NHigh Low Medium YHigh Medium Low NLow Low High N

Medium High High Y

New patient:P34=M, P61=M, BMI = H

Best guess at cancer field ?

Page 21: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

Nave-Bayes with Many-fields

P34 level P61 level BMI Prostate cancer

High Low Medium YMedium Low Medium Y

Low Low High YLow High Low NLow Low Low N

Medium Medium Low NHigh Low Medium YHigh Medium Low NLow Low High N

Medium High High Y

New patient:P34=M, P61=M, BMI = H

Best guess at cancer field ?

P(p34=M | Y) × P(p61=M | Y) × P(BMI=H |Y) × P(cancer = Y)P(p34=M | N) × P(p61=M | N) × P(BMI=H |N) × P(cancer = N)

which of these gives the highest value?

Page 22: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

Nave-Bayes with

P34 level P61 level BMI Prostate cancer

High Low Medium YMedium Low Medium Y

Low Low High YLow High Low NLow Low Low N

Medium Medium Low NHigh Low Medium YHigh Medium Low NLow Low High N

Medium High High Y

New patient:P34=M, P61=M, BMI = H

Best guess at cancer field ?

P(p34=M | Y) × P(p61=M | Y) × P(BMI=H |Y) × P(cancer = Y)P(p34=M | N) × P(p61=M | N) × P(BMI=H |N) × P(cancer = N)

which of these gives the highest value?

Page 23: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

Nave-Bayes with Many-fields

P34 level P61 level BMI Prostate cancer

High Low Medium YMedium Low Medium Y

Low Low High YLow High Low NLow Low Low N

Medium Medium Low NHigh Low Medium YHigh Medium Low NLow Low High N

Medium High High Y

New patient:P34=M, P61=M, BMI = H

Best guess at cancer field ?

P(p34=M | Y) × P(p61=M | Y) × P(BMI=H |Y) × P(cancer = Y)P(p34=M | N) × P(p61=M | N) × P(BMI=H |N) × P(cancer = N)

which of these gives the highest value?

Page 24: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

Nave-Bayes with Many-fields

P34 level P61 level BMI Prostate cancer

High Low Medium YMedium Low Medium Y

Low Low High YLow High Low NLow Low Low N

Medium Medium Low NHigh Low Medium YHigh Medium Low NLow Low High N

Medium High High Y

New patient:P34=M, P61=M, BMI = H

Best guess at cancer field ?

P(p34=M | Y) × P(p61=M | Y) × P(BMI=H |Y) × P(cancer = Y)P(p34=M | N) × P(p61=M | N) × P(BMI=H |N) × P(cancer = N)

which of these gives the highest value?

Page 25: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

Nave-Bayes with

P34 level P61 level BMI Prostate cancer

High Low Medium YMedium Low Medium Y

Low Low High YLow High Low NLow Low Low N

Medium Medium Low NHigh Low Medium YHigh Medium Low NLow Low High N

Medium High High Y

New patient:P34=M, P61=M, BMI = H

Best guess at cancer field ?

P(p34=M | Y) × P(p61=M | Y) × P(BMI=H |Y) × P(cancer = Y)P(p34=M | N) × P(p61=M | N) × P(BMI=H |N) × P(cancer = N)

which of these gives the highest value?

Page 26: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

Nave-Bayes with Many-fields

P34 level P61 level BMI Prostate cancer

High Low Medium YMedium Low Medium Y

Low Low High YLow High Low NLow Low Low N

Medium Medium Low NHigh Low Medium YHigh Medium Low NLow Low High N

Medium High High Y

New patient:P34=M, P61=M, BMI = H

Best guess at cancer field ?

0.4 × 0 × 0.4 × 0.5 = 0 0.2 × 0.4 × 0.2 × 0.5 = 0.008

which of these gives the highest value?

Page 27: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

Nave-Bayes -- in generalN fields, q possible class values, New unclassified instance: F1 = v1, F2 = v2, ... , Fn = vn

what is the class value? i.e. Is it c1, c2, .. or cq ?

calculate each of these q things – biggest one gives the class:

P(F1=v1 | c1) × P(F2=v2 | c1) × ... × P(Fn=vn | c1) × P(c1)P(F1=v1 | c2) × P(F2=v2 | c2) × ... × P(Fn=vn | c2) × P(c2)...P(F1=v1 | cq) × P(F2=v2 | cq) × ... × P(Fn=vn | cq) × P(cq)

Page 28: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

Nave-Bayes -- in general

Actually ... What we normally do, when there aremore than a handful of fields, is this

Calculate:log(P(F1=v1 | c1)) + ... + log(P(Fn=vn | c1)) + log( P(c1))

log(P(F1=v1 | c2)) + ... + log(P(Fn=vn | c2)) + log( P(c2))

and choose class based on highest of these. Why ?

Page 29: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

Nave-Bayes -- in general

Because

log( a × b × c × …) = log(a) + log(b) + log(c) + …

and this means we won’t get underflow errors, whichwe would otherwise get with, e.g.

0.003 × 0.000296 × 0.001 × …[100 fields] × 0.042 …

Page 30: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

Deriving NB

Essence of Naive Bayes, with 1 non-class field, is to calc this for each class value, given some new instance with fieldval = F:

P(class = C | Fieldval = F)

For many fields, our new instance is (e.g.) (F1, F2, ...Fn), and the ‘essence of Naive Bayes’ is to calculate this for each class:

P(class = C | F1,F2,F3,...,Fn)

i.e. What is prob of class C, given all these field vals together?

Page 31: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

Apply magic dust and Bayes theorem, and ...

... If we make the naive assumption that all of the fields are independent of each other

(e.g. P(F1| F2) = P(F1), etc ...) ... then

P (class = C | F1,F2,F3,...,Fn) = P( F1 and F2 and ... and Fn | C) x P (C)= P(F1| C) x P (F2 | C) x ... X P(Fn | C) x P(C)Which is what we calculate in NB

Page 32: Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com

EN(B)D