20
v=1 v=–1 v=–1 v=–1 v=–1 v=1 optima O O O X X O O O X X X X O O O X X X X O O O X X X X O OO X X X O O O X X X O O O O O X X X X O X O X O O X 2) 3) 5) 6) 7) 8) 9) 12) O O O O X X X X O 11) O O O O X X X O X O 13) O O INTRODUCTION TO ARTIFICIAL INTELLIGENCE DATA15001 EPISODE 5: PROBABILISTIC INFERENCE

5) 6) 7) 8) 9) INTRODUCTION TO - University of …...NAIVE BAYES FOR SPAM FILTERING CLASS WORD 1 WORD 2 WORD 3 WORD 4 WORD 6 WORD 7 • General strategy: – estimate word probabilities

  • Upload
    others

  • View
    22

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 5) 6) 7) 8) 9) INTRODUCTION TO - University of …...NAIVE BAYES FOR SPAM FILTERING CLASS WORD 1 WORD 2 WORD 3 WORD 4 WORD 6 WORD 7 • General strategy: – estimate word probabilities

v=1v=–1 v=–1 v=–1

v=–1 v=1

v=–1

optimaalin

en peli

O O

O

XX

O O

O

XXX X

O O

OX

XX X

OO

O

X

XX X

O

O OX

XX

O

O

O

X

XXO

O

O

O

O

X

XX XO

XO

XO

OX

2) 3)

5) 6) 7) 8) 9)

12)

O

O O

OX

XX XO

11)

O O

O

OX

XXO

XO

13)

OO

I N T R O D U C T I O N T O A R T I F I C I A L I N T E L L I G E N C E

D A TA 1 5 0 0 1

E P I S O D E 5 : P R O B A B I L I S T I C I N F E R E N C E

Page 2: 5) 6) 7) 8) 9) INTRODUCTION TO - University of …...NAIVE BAYES FOR SPAM FILTERING CLASS WORD 1 WORD 2 WORD 3 WORD 4 WORD 6 WORD 7 • General strategy: – estimate word probabilities

1. B AY E S R U L E

2. N A I V E B AY E S C L A S S I F I E R

3. S PA M F I LT E R

T O D A Y ’ S M E N U

Page 3: 5) 6) 7) 8) 9) INTRODUCTION TO - University of …...NAIVE BAYES FOR SPAM FILTERING CLASS WORD 1 WORD 2 WORD 3 WORD 4 WORD 6 WORD 7 • General strategy: – estimate word probabilities

H O W T O M A K E M O N E Y

Page 4: 5) 6) 7) 8) 9) INTRODUCTION TO - University of …...NAIVE BAYES FOR SPAM FILTERING CLASS WORD 1 WORD 2 WORD 3 WORD 4 WORD 6 WORD 7 • General strategy: – estimate word probabilities

H O W T O M A K E M O N E Y

FROM: "MARGARETTA NITA" <[email protected]>SUBJECT: SPECIAL OFFER : VIAGRA ON SALE AT $1.38 !!!X-BOGOSITY: YES, TESTS=BOGOFILTER, SPAMICITY=0.99993752, VERSION=2011-08-29DATE: MON, 26 SEP 2011 21:52:26 +0300X-CLASSIFICATION: JUNK - AD HOC SPAM DETECTED (CODE = 73)

SPECIAL OFFER : VIAGRA ON SALE AT $1.38 !!!

COMPARE THE BEST ONLINE PHARMACIES TO BUY VIAGRA. ORDER VIAGRA ONLINE WITH HUGE DISCOUNT.MULTIPLE BENEFITS INCLUDE FREE SHIPPING, REORDER DISCOUNTS, BONUS PILLSHTTP://RXPHARMACYCVS.RU

Page 5: 5) 6) 7) 8) 9) INTRODUCTION TO - University of …...NAIVE BAYES FOR SPAM FILTERING CLASS WORD 1 WORD 2 WORD 3 WORD 4 WORD 6 WORD 7 • General strategy: – estimate word probabilities

N A I V E B AY E S F O R S PA M F I LT E R I N G

CLASS

WORD1 WORD2 WORD3 WORD4 WORD6 WORD7

• General strategy: – estimate word probabilities in each class (spam/ham)

=> we'll return to this a bit later – use the estimates to compute the posterior probability

P(spam | message) where message is the contents (words) of a new message

– if P(spam | message) > 0.5, classify as spam

T H E S E M A N T I C S O F T H I S G R A P H : W O R D S A R E " C O N D I T I O N A L LY I N D E P E N D E N T O F E A C H O T H E R G I V E N T H E C L A S S "

Page 6: 5) 6) 7) 8) 9) INTRODUCTION TO - University of …...NAIVE BAYES FOR SPAM FILTERING CLASS WORD 1 WORD 2 WORD 3 WORD 4 WORD 6 WORD 7 • General strategy: – estimate word probabilities

N A I V E B AY E S

Random variables:– Class: spam/ham– Word1

– Word2

– ...

Distributions:– P(Class = spam) = 0.5– P(Wordi = 'viagra' | spam)=0.002– P(Wordi = 'viagra' | ham)=0.0001– P(Wordi = '$' | spam)=0.005– P(Wordi = '$' | ham)=0.0002– P(Wordi = 'is' | spam)=0.002– P(Wordi = 'is' | ham)=0.002– P(Wordi = 'algorithm' | spam)=0.0001– P(Wordi = 'algorithm' | ham)=0.002– ...

Page 7: 5) 6) 7) 8) 9) INTRODUCTION TO - University of …...NAIVE BAYES FOR SPAM FILTERING CLASS WORD 1 WORD 2 WORD 3 WORD 4 WORD 6 WORD 7 • General strategy: – estimate word probabilities

N A I V E B AY E S

Random variables:– Class: spam/ham– Word1

– Word2

– ...

Distributions:– P(Class = spam) = 0.5– P(Wordi = 'viagra' | spam)=0.002– P(Wordi = 'viagra' | ham)=0.0001– P(Wordi = '$' | spam)=0.005– P(Wordi = '$' | ham)=0.0002– P(Wordi = 'is' | spam)=0.002– P(Wordi = 'is' | ham)=0.002– P(Wordi = 'algorithm' | spam)=0.0001– P(Wordi = 'algorithm' | ham)=0.002– ...

Page 8: 5) 6) 7) 8) 9) INTRODUCTION TO - University of …...NAIVE BAYES FOR SPAM FILTERING CLASS WORD 1 WORD 2 WORD 3 WORD 4 WORD 6 WORD 7 • General strategy: – estimate word probabilities

N A I V E B AY E S

Inference:

1. P(spam) = 0.5 P(spam) P(Word1= 'viagra'| spam)2. P(spam | Word1= 'viagra') = -------------------------------------------------- P(Word1= 'viagra')

BAYES RULES!

Page 9: 5) 6) 7) 8) 9) INTRODUCTION TO - University of …...NAIVE BAYES FOR SPAM FILTERING CLASS WORD 1 WORD 2 WORD 3 WORD 4 WORD 6 WORD 7 • General strategy: – estimate word probabilities

N A I V E B AY E S

Inference:

1. P(spam) = 0.5 P(spam) P(Word1= 'viagra'| spam)2. P(spam | Word1= 'viagra') = -------------------------------------------------- P(Word1= 'viagra')

P(Word1= 'viagra') = P(spam) P(Word1= 'viagra'| spam) + P(ham) P(Word1= 'viagra'| ham)

Page 10: 5) 6) 7) 8) 9) INTRODUCTION TO - University of …...NAIVE BAYES FOR SPAM FILTERING CLASS WORD 1 WORD 2 WORD 3 WORD 4 WORD 6 WORD 7 • General strategy: – estimate word probabilities

N A I V E B AY E S

Inference:

1. P(spam) = 0.5 P(spam) P(Word1= 'viagra'| spam)2. P(spam | Word1= 'viagra') = -------------------------------------------------- P(Word1= 'viagra')

3. P(spam | Word1= 'viagra', Word2= 'is') = P(spam) P(Word1= 'viagra', Word2= 'is' | spam) ------------------------------------------------------------ P(Word1= 'viagra', Word2= 'is')

4. P(spam | Word1= 'viagra', Word2= 'is', Word3= 'algorithm') = P(spam) P(Word1= 'viagra', Word2= 'is', Word3= 'algorithm' | spam) -------------------------------------------------------------------------------------- P(Word1= 'viagra', Word2= 'is', Word3= 'algorithm')

Page 11: 5) 6) 7) 8) 9) INTRODUCTION TO - University of …...NAIVE BAYES FOR SPAM FILTERING CLASS WORD 1 WORD 2 WORD 3 WORD 4 WORD 6 WORD 7 • General strategy: – estimate word probabilities

N A I V E B AY E S

Avoiding the denominator:

P(spam | message) R = ------------------------ ⇔ P(spam | message) = R / (1+R)

P(ham | message)

P(spam) P(message | spam) / P(message) R = --------------------------------------------------- P(ham) P(message | ham) / P(message)

Page 12: 5) 6) 7) 8) 9) INTRODUCTION TO - University of …...NAIVE BAYES FOR SPAM FILTERING CLASS WORD 1 WORD 2 WORD 3 WORD 4 WORD 6 WORD 7 • General strategy: – estimate word probabilities

N A I V E B AY E S

Avoiding the denominator:

P(spam | message) R = ------------------------ ⇔ P(spam | message) = R / (1+R)

P(ham | message)

P(spam) P(message | spam) / P(message) P(spam) P(message | spam) R = --------------------------------------------------- = ----------------------------------- P(ham) P(message | ham) / P(message) P(ham) P(message | ham)

4. P(spam | Word1= 'viagra', Word2= 'is', Word3= 'algorithm') -------------------------------------------------------------------------- P(ham | Word1= 'viagra', Word2= 'is', Word3= 'algorithm') P(spam) P(Word1= 'viagra', Word2= 'is', Word3= 'algorithm' | spam) = -------------------------------------------------------------------------------------- P(ham) P(Word1= 'viagra', Word2= 'is', Word3= 'algorithm' | ham)

Page 13: 5) 6) 7) 8) 9) INTRODUCTION TO - University of …...NAIVE BAYES FOR SPAM FILTERING CLASS WORD 1 WORD 2 WORD 3 WORD 4 WORD 6 WORD 7 • General strategy: – estimate word probabilities

N A I V E B AY E S

Factorization and the naive Bayes assumption:

P(Word1= 'viagra', Word2= 'is', Word3= 'algorithm' | spam)= P(Word1= 'viagra' | spam) P(Word2= 'is' | spam) · P(Word3= 'algorithm' | spam)

• This is starting to look good because we can estimate terms like P(Word3= 'algorithm' | spam) can be estimated from a training data set...

• ...whereas estimating the probability of combinations of three or more words (n-grams) would require massive data sets

Page 14: 5) 6) 7) 8) 9) INTRODUCTION TO - University of …...NAIVE BAYES FOR SPAM FILTERING CLASS WORD 1 WORD 2 WORD 3 WORD 4 WORD 6 WORD 7 • General strategy: – estimate word probabilities

N A I V E B AY E S

Putting things together

P(spam |'viagra', 'is', 'algorithm') ------------------------------------------- P(ham | 'viagra', 'is', 'algorithm') P(spam) P('viagra' | spam) P('is' | spam) P('algorithm' | spam) = ------------------------------------------------------------------------------- P(ham) P('viagra' | ham) P('is' | ham) P('algorithm' | ham) P(spam) P('viagra' | spam) P('is' | spam) P('algorithm' | spam) = ----------- · ---------------------- · ----------------- · -------------------------- P(ham) P('viagra' | ham) P('is' | ham) P('algorithm' | ham)

• Here we use the fact that (ABC) / (DEF) = (A/D) (B/E) (C/F)

• A nice routine calculation that smoothly becomes a program

Page 15: 5) 6) 7) 8) 9) INTRODUCTION TO - University of …...NAIVE BAYES FOR SPAM FILTERING CLASS WORD 1 WORD 2 WORD 3 WORD 4 WORD 6 WORD 7 • General strategy: – estimate word probabilities

N A I V E B AY E S

Putting things together

1: spamicity(words, estimates):

2: R = estimates.prior_spam / estimates.prior_ham

3: for each w in words:

4: R = R * estimates.spam_prob(w) / estimates.ham_prob(w)

5: return R

• estimates.prior_spam = P(spam)estimates.prior_ham = P(ham) = 1 – P(spam)

• estimates.spam_prob(w) = P(Wordi= 'w'| spam)estimates.ham_prob(w) = P(Wordi = 'w' | ham)

• Recall that P(spam | message) = R / (1+R)

Page 16: 5) 6) 7) 8) 9) INTRODUCTION TO - University of …...NAIVE BAYES FOR SPAM FILTERING CLASS WORD 1 WORD 2 WORD 3 WORD 4 WORD 6 WORD 7 • General strategy: – estimate word probabilities

N A I V E B AY E S : I N T E R I M S U M M A R Y

• What we need to estimate from training data: – prior probabilities P(spam) and P(ham) – conditional probabilities for words P(Wordi= 'w'| spam)

and P(Wordi= 'w'| ham)

• The conditional independence assumption, aka. "naive Bayes assumption", implies that P(Wordi= 'w', Wordj= 'v'| spam) = P(Wordi= 'w' | spam) P(Wordi= 'v'| spam)

• The essential quantity is the ratio P(Wordi= 'w'| spam) / P(Wordi = 'w' | ham)

Page 17: 5) 6) 7) 8) 9) INTRODUCTION TO - University of …...NAIVE BAYES FOR SPAM FILTERING CLASS WORD 1 WORD 2 WORD 3 WORD 4 WORD 6 WORD 7 • General strategy: – estimate word probabilities

N A I V E B AY E S : I N T E R I M S U M M A R Y

• The cost function is asymmetric: – worse to misclassify ham as spam than vice versa – the decision to classify should be made only if

P(spam | message) > a, for some a > 0.5

• Under- and overflows: – a product with many terms tends to become either very

large or very small (close to zero) – can be solved by computing log(R) instead of R – log(AB) = log(A) + log(B), log(A/B) = log(A) – log(B) – at the end, get R by exp(log(R)) = R

Additional remarks

Page 18: 5) 6) 7) 8) 9) INTRODUCTION TO - University of …...NAIVE BAYES FOR SPAM FILTERING CLASS WORD 1 WORD 2 WORD 3 WORD 4 WORD 6 WORD 7 • General strategy: – estimate word probabilities

N A I V E B AY E S : E S T I M AT I O N

Estimating the word probabilities from training data:

1 MONEY ... 5 VIAGRA ... 10 IS ... 19 REPLICA 20 EMAIL 20 YOU 21 DATABASE 25 EMAILS 26 OF 31 TO 43 AND 48 THE TOTAL 2386

21 ALGORITHM ... 62 MONEY ... 2199 FOR 2492 THAT 2990 YOU 3141 IN 3160 I 3218 AND 3283 IS 3472 OF 3874 A 5442 TO 9196 THE TOTAL 283736

spam ham

0.04 %

0.21 %

0.42 %

0.80 % 0.84 % 0.84 % 0.88 % 1.05 % 1.09 % 1.30 % 1.80 % 2.01 %

0.01 %

0.02 %

0.78 % 0.88 % 1.05 % 1.11 % 1.11 % 1.13 % 1.16 % 1.22 % 1.37 % 1.92 % 3.24 %

Page 19: 5) 6) 7) 8) 9) INTRODUCTION TO - University of …...NAIVE BAYES FOR SPAM FILTERING CLASS WORD 1 WORD 2 WORD 3 WORD 4 WORD 6 WORD 7 • General strategy: – estimate word probabilities

N A I V E B AY E S : E S T I M AT I O N

Estimating the word probabilities from training data:

1 MONEY ... 5 VIAGRA ... 10 IS ... 19 REPLICA 20 EMAIL 20 YOU 21 DATABASE 25 EMAILS 26 OF 31 TO 43 AND 48 THE TOTAL 2386

21 ALGORITHM ... 62 MONEY ... 2199 FOR 2492 THAT 2990 YOU 3141 IN 3160 I 3218 AND 3283 IS 3472 OF 3874 A 5442 TO 9196 THE TOTAL 283736

spam ham

0.04 %

0.21 %

0.42 %

0.80 % 0.84 % 0.84 % 0.88 % 1.05 % 1.09 % 1.30 % 1.80 % 2.01 %

0.01 %

0.02 %

0.78 % 0.88 % 1.05 % 1.11 % 1.11 % 1.13 % 1.16 % 1.22 % 1.37 % 1.92 % 3.24 %

P(Wordi='money' | spam) 0.0004 --------------------------------- = --------- = 1.918 > 1 P(Wordi='money' | ham) 0.0002

P(Wordi='is' | spam) 0.0042 --------------------------- = --------- = 0.3622 < 1 P(Wordi='is' | ham) 0.0116

Page 20: 5) 6) 7) 8) 9) INTRODUCTION TO - University of …...NAIVE BAYES FOR SPAM FILTERING CLASS WORD 1 WORD 2 WORD 3 WORD 4 WORD 6 WORD 7 • General strategy: – estimate word probabilities

N A I V E B AY E S : E S T I M AT I O N

Estimating the word probabilities from training data:

1 MONEY ... 5 VIAGRA ... 10 IS ... 19 REPLICA 20 EMAIL 20 YOU 21 DATABASE 25 EMAILS 26 OF 31 TO 43 AND 48 THE TOTAL 2386

21 ALGORITHM ... 62 MONEY ... 2199 FOR 2492 THAT 2990 YOU 3141 IN 3160 I 3218 AND 3283 IS 3472 OF 3874 A 5442 TO 9196 THE TOTAL 283736

spam ham

0.04 %

0.21 %

0.42 %

0.80 % 0.84 % 0.84 % 0.88 % 1.05 % 1.09 % 1.30 % 1.80 % 2.01 %

0.01 %

0.02 %

0.78 % 0.88 % 1.05 % 1.11 % 1.11 % 1.13 % 1.16 % 1.22 % 1.37 % 1.92 % 3.24 %

P ( ' a l g o r i t h m ' | s p a m ) = 0 . 0 ? M a y l e a d t o a s i t u a t i o n w h e r e w e r a t i o i s 0 / 0 ! S o l u t i o n : R e p l a c e 0 b y, e . g . , 0 . 0 0 0 0 0 1 .