11
Lecture 1 Bayesian inference and maximum likelihood (Idż na całość) C ar Zonk Zonk P=1/3 P=1/2 C ar Zonk Zonk C ar Zonk Zonk C ar Zonk Zonk Rem ain C hange win loose 1.choice Shown loose win loose win

Lecture 1 Bayesian inference and maximum likelihood (Idż na całość)

  • View
    227

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Lecture 1 Bayesian inference and maximum likelihood (Idż na całość)

Lecture 1 Bayesian inference and maximum likelihood

 

(Idż na całość)

Car Zonk Zonk

P=1/3

P=1/2

Car Zonk Zonk

Car Zonk Zonk

Car Zonk Zonk

Remain Change

win loose

1. choice Shown

loose win

loose win

Page 2: Lecture 1 Bayesian inference and maximum likelihood (Idż na całość)

p(A B) p(A | B)p(B)

p(A B)p(A | B)

p(B)

p(B A) p(B | A)p(A)

p(B A)p(B | A)

p(A)

p(A B) p(A | B)p(B) p(B | A)p(A)

p(B | A)p(A)p(A | B)

p(B)

The law of dependent propability

conditional priori(A)posterior

priori(B)

Theorem of Bayes

Page 3: Lecture 1 Bayesian inference and maximum likelihood (Idż na całość)

1

( ) ( | ) ( )

n

i ii

p A p A B p B

Total probability

P(B1) P(B2) P(B3)

P(A|B1) P(A|B2)

p(B | A)p(A)p(A | B)

p(B)

1

( ) ( | )( | )

( | ) ( )

i i

i n

i ii

p B p A Bp B A

p A B p B

p(M3 G1)p(G1| M3)

p(M3)

p(M3 | G1)p(G1)

p(M3 | G1)p(G1) p(M3 | G2)p(G2) p(M3 | G3)p(G3)

1/ 2*1/ 3 1

1/ 2*1/ 3 1*1/ 3 0*1/ 3 3

(Idż na całość)

Assume we choose gate 1 (G1) at the first choice. We are looking for the probability p(G1|M3) that the car is behind gate 1 if we know that the moderator opened gate 3 (M3).

Page 4: Lecture 1 Bayesian inference and maximum likelihood (Idż na całość)

Calopteryx spelendens

We study the occurrence of the damselfly Calopteryx splendens at small rivers. We know from the literature that C. splendens occurs at about 10% of all rivers. Occurrence depends on water quality. Suppose we have five quality classes that occur in 10% (class I), 15% (class II), 27% (class III), 43% (class IV), and 5% (class V) of all rivers. The probability to find Calopteryx in these five classes is 1% (class I), 7% (class II), 14% (class III), 31% (class IV), and 47% (class V).

To which class belongs probably a river if we find Calopteryx?( | ) ( 1)

( | )( | ) ( 1) ( | 2) ( 2) ( | 3) ( 3) ( | 4) ( 4) ( | 5) ( 5)

0.1*0.01( | )

0.1*0.01 0.15*0.07 0.27*0.14 0.43*0.31 0.05*

p A classI p classp classI A

p A classI p class p A class p class p A class p class p A class p class p A class p class

p classI A 0.00480.47

p(class II|A) = 0.051, p(class III|A) = 0.183, p(class IV|A) = 0.647, p(class V|A) = 0.114

Indicator values

Page 5: Lecture 1 Bayesian inference and maximum likelihood (Idż na całość)

Bayes and forensic

False positive fallacyError of the prosecutor

500 suspects

DNA identical1 person

DNA not identical499 persons

DNA test positive1 person

DNA negative495 persons

DNA test positive4 persons

Let’s take a standard DNA test for identifying persons. The test has a precision of more than 99%.What is the probability that we identify the wrong person?

p( | c)p() 1*1/ 500 1p(c | )

p( ) 5 / 500 5

p( | c)p(c)

p(c | )p( | c)p(c) p( | c)p( c)

11* 1500p(c | )

1 4 499 51* *500 499 500

The forensic version of Bayes theorem

Page 6: Lecture 1 Bayesian inference and maximum likelihood (Idż na całość)

The error of the advocate

In the process against the basketball star E. O. Simpson, one of his advocates (a Harvard professor) argued that Simpson sometimes has beaten his wife. However, only very few man

who beat their wives later murder them (about 0.1%).

Whole population250 000 000

Murdered by husbandP = 1/10000

Beaten wives250 000 000 - N

Not beaten wivesN

Murdered otherwiseP = 1/10000

Murdered otherwiseP = 1 /10000

10000 beaten wivesMurdered by husband

P = 1/2

b

b b

p(m | h ) 1p(m | b)

p(m | h ) p(m | h ) 2

Page 7: Lecture 1 Bayesian inference and maximum likelihood (Idż na całość)

Assume a hawk searches a terrain for prey Let p(suc) be the probability to find a prey. Assume he spots a number of pixel at a time that can be model by

squares. Hence he surveys a grid. Of course he knows where to search. That means he

knows a priori probabilities for each square of the virtual grid (p(prey).

However there is another probability for each square, the probability p(suc|prey) to find the prey.

Our hawk will now systematically modify his a priori probabilities p(prey) depending on his successive failures and search where he assumes the highest

probability of success.

Foto: Peter Schild

p(prey | suc)p(suc) p(suc)p(suc | prey)

p(prey | suc)p(suc) p(prey | suc)p( suc) p(suc) p(prey | suc)(1 p(suc))

p(suc)(1 p(suc | prey))p(prey | suc)

p(suc | prey)(1 p(suc))

Now let AT be the total area of search and AE the empty part without prey. k denotes the number of successful hunts within the part of the area with prey. Hence p(suc|prey) = k / (AT-AE)

T E

T T T E T T E T E

T T

T E T T E T T E

A A kk k k

A A A A A (A A ) A A kp(prey | suc)

k k k A k A kA A A A A A (A A )

Page 8: Lecture 1 Bayesian inference and maximum likelihood (Idż na całość)

Gains and costs

Assume a parasitic wasps that attacks clutches of aphids. These clutches are of different quality (size, exposition). The wasp visits one clutch after another. However, because it has of course competitors it has to choose after a certain time. How long should the wasp search to make the best choice that means to attack the best clutch in the given situation?

Foto: R. Long

We define gain and cost functions and apply the odds strategy

p pOdds o

q 1 p

1

kk n

O o

Stopping rule:

Stop when the sum of the odds > 1

Page 9: Lecture 1 Bayesian inference and maximum likelihood (Idż na całość)

Clutch p(best) q=1-p(best) odds=p/q Sum odds Sum 1-p qo20.000 0.050 0.950 0.053 0.053 0.950 0.0519.000 0.053 0.947 0.056 0.108 0.9 0.09736818.000 0.056 0.944 0.059 0.167 0.85 0.14195917.000 0.059 0.941 0.063 0.230 0.8 0.18360916.000 0.063 0.938 0.067 0.296 0.75 0.22213315.000 0.067 0.933 0.071 0.368 0.7 0.25732414.000 0.071 0.929 0.077 0.445 0.65 0.28894413.000 0.077 0.923 0.083 0.528 0.6 0.31671712.000 0.083 0.917 0.091 0.619 0.55 0.34032411.000 0.091 0.909 0.100 0.719 0.5 0.35938610.000 0.100 0.900 0.111 0.830 0.45 0.3734479.000 0.111 0.889 0.125 0.955 0.4 0.3819538.000 0.125 0.875 0.143 1.098 0.35 0.3842097.000 0.143 0.857 0.167 1.264 0.3 0.3793226.000 0.167 0.833 0.200 1.464 0.25 0.3661025.000 0.200 0.800 0.250 1.714 0.2 0.3428814.000 0.250 0.750 0.333 2.048 0.15 0.3071613.000 0.333 0.667 0.500 2.548 0.1 0.2547742.000 0.500 0.500 1.000 3.548 0.05 0.1773871.000 1.000 0.000

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.000 5.000 10.000 15.000 20.000n

Su

cce

ss p

rob

ab

ility

Stopping number

k k

1 11 1k

kk n k nk n k nk

pp(S) (1 p ) q o

1 p

nn

a r 1 r

1 r 1 r r rp(S) da ln( )

n a 1 n a n n

dS 1 r r n 1 r nln( ) 0 ln( ) 1 r

dr n n n r n n e

For our clutch example r = 7.358. This called the 1/e-stopping rule

The wasp should attack that clutch (at position r) that is better than the best of the previous r-1. The probability that the second best clutch is within the first r trials is p2 = r/(a-1) where a is

the position of the best clutch

Page 10: Lecture 1 Bayesian inference and maximum likelihood (Idż na całość)

Maximum likelihoods

Suppose you studied 50 patients in a clinical trial and detected at 30 of them the presence of a certain bacterial disease.

What is the most probable frequency of this disease in the population?

50

0.5

30 20

0.6

30 20

0.8

50 1p (30 | 50) 0.042

30 2

50 3 2p (30 | 50) 0.115

30 5 5

50 4 1p (30 | 50) 0.001

30 5 5

p p 1 iL f (x ...x )

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0 0.2 0.4 0.6 0.8 1p

L(p

)Likelihood function

p30 20 29 20 30 19p

dL50 50 50L p (1 p) 30p (1 p) p 20(1 p) 0

30 30 30dp

33(1 p) 2p p

5

Page 11: Lecture 1 Bayesian inference and maximum likelihood (Idż na całość)

p

p

50ln(L ) ln( 30ln(p) 20ln(1 p)

30

d ln L 30 20 30 p

dp p 1 p 5

log likelihood estimator ln(Lp)