21
236607 Visual Recognition Tutorial 1 Example-1 from linear algebra Conditional probability Example 2: Bernoulli Distribution Bayes' Rule Example 3 Example 4 The game of three doors Tutorial 2 – the outline

Tutorial 2 – the outline

Embed Size (px)

DESCRIPTION

Tutorial 2 – the outline. Example-1 from linear algebra Conditional probability Example 2: Bernoulli Distribution Bayes' Rule Example 3 Example 4 The game of three doors. Example – 1: Linear Algebra. A line can be written as ax+by=1 . You are given a number of example points: - PowerPoint PPT Presentation

Citation preview

Page 1: Tutorial 2  –  the outline

236607 Visual Recognition Tutorial 1

Example-1 from linear algebra

Conditional probability

Example 2: Bernoulli Distribution

Bayes' Rule

Example 3

Example 4 The game of three doors

Tutorial 2 – the outline

Page 2: Tutorial 2  –  the outline

236607 Visual Recognition Tutorial 2

A line can be written as ax+by=1. You are given a number of example points:

Let

• (A) Write a single matrix equation that expresses the constraint that each of these points lies on a single line

• (B) Is it always the case that some M exists?

• (C) Write an expression for M assuming it does exist.

Example – 1: Linear Algebra

1 1

n n

x y

P

x y

aM

b

Page 3: Tutorial 2  –  the outline

236607 Visual Recognition Tutorial 3

(A) For all the points to lie on the line, a and b must satisfy the following set of simultaneous equations:

ax1+by1=1 :

axN+byN=1 This can be written much more compactly in the matrix form (linear

regression equation): PM=1 where 1 is an Nx1 column vector of 1’s.

(B) An M satisfying the matrix equation in part (A) will not exist unless al the points are collinear (i.e. fall on the same line). In general, three or more points may not be collinear. (C) If M exists, then we can find it by finding the left inverse of P, but since P is in general not a square matrix P -1 may not exist, so we need the pseudo-inverse (P TP) -1 P T. Thus M= (P TP) -1 P T 1.

Example – 1: Linear Algebra

Page 4: Tutorial 2  –  the outline

236607 Visual Recognition Tutorial 4

When 2 variables are statistically dependent, knowing the value of one of them lets us get a better estimate of the value of the other one. This is expressed by the conditional probability of x given y:

If x and y are statistically independent, then

Conditional probability

Pr{ , } ( , )Pr{ | } , or ( | )

Pr{ } ( )i j

i jj y

x v y w P x yx v y w P x y

y w P y

( | ) ( ).xP x y P x

Page 5: Tutorial 2  –  the outline

236607 Visual Recognition Tutorial 5

Bayes' Rule

• The law of total probability: If event A can occur in m different ways A1,A2,…,Am and if they are mutually exclusive,then the probability of A occurring is the sum of the probabilities A1,A2,…,Am .

From definition of condition probability

( ) ( , ).x

P y P x y

(,)(|)()(|)() PxyPxyPyPyxPx

Page 6: Tutorial 2  –  the outline

236607 Visual Recognition Tutorial 6

Bayes' Rule

or ( | ) ( ) ( | ) ( )

( | )( , ) ( | ) ( )

x x

P y x P x P y x P xP x y

P x y P y x P x

likelihood priorposterior

evidence

( | ) ( )( | )

( )xP y x P x

P x yP y

Page 7: Tutorial 2  –  the outline

236607 Visual Recognition Tutorial 7

• For continuous random variable we refer to densities rather than probabilities; in particular,

• The Bayes’ rule for densities becomes:

Bayes' rule – continuous case

( , )( | )

( )

p x yp x y

p y

( | ) ( )( | )

( | ) ( )

p y x p xp x y

p y x p x dx

Page 8: Tutorial 2  –  the outline

236607 Visual Recognition Tutorial 8

Call x a ‘cause’, y an effect. Assuming x is present, we know the likelihood of y to be observed

The Bayes’ rule allows to determine the likelihood of a cause x given an observation y.

(Note that there may be many causes producing y ). The Bayes’ rule shows how probability for x changes from

prior p(x) before we observe anything, to posterior p(x| y) once we have observed y.

Bayes' formula - importance

Page 9: Tutorial 2  –  the outline

236607 Visual Recognition Tutorial 9

A random variable X has a Bernoulli distribution with parameter if it can assume a value of 1 with a probability of and the value of 0 with a probability of (1-). The random variable X is also known as a Bernoulli variable with parameter and has the following probability mass function:

The mean of a random variable X that has a Bernoulli distribution with parameter p is

E(X) = 1() + 0(1- ) = The variance of X is

Example – 2: Bernoulli Distribution

( , )1

xx

x

p

Page 10: Tutorial 2  –  the outline

236607 Visual Recognition Tutorial 10

A random variable whose value represents the outcome of a coin

toss (1 for heads, 0 for tails, or vice-versa) is a Bernoulli variable

with parameter , where is the probability that the outcome

corresponding to the value 1 occurs. For an unbiased coin, where

heads or tails are equally likely to occur, = 0.5.

For Bernoulli rand. variable xn the probability mass function is:

For N independent Bernoulli trials we have random sample

Example – 2: Bernoulli Distribution

2 2 2 2 2 2( ) ( ) [ ( )] 1 ( ) 0 (1 ) (1 )Var X E X E X

1( | ) ( ) (1 ) ,n nx xn n nP x P x x

0 1 1( , ,..., )Nx x x x

Page 11: Tutorial 2  –  the outline

236607 Visual Recognition Tutorial 11

The distribution of the random sample is:

The distribution of the number of ones in N independent

Bernoulli trials is:

The joint probability to observe the sample x and the number k

Example – 2: Bernoulli Distribution

( ) (1 )k N kNP k

k

11

0

( ) (1 ) (1 )n n

Nx x k N k

n

P

x

1

0 1 10

number of ones ( , ,..., ).N

n Nn

k x x x x x

( ), number of ones( , )

0, otherwise

P kP k

x xx

Page 12: Tutorial 2  –  the outline

236607 Visual Recognition Tutorial 12

The conditional probability of x given the number k of ones:

Thus

Example – 2: Bernoulli Distribution

( , ) (1 ) 1( | )

( )(1 )

k N k

k N k

P kP k

N NP k

k k

xx

1( ) ( | ) ( ) ( )P P k P k P k

N

k

x x

Page 13: Tutorial 2  –  the outline

236607 Visual Recognition Tutorial 13

Assume that X is distributed according to the Gaussian density with mean =0 and variance 2=1.• (A) What is the probability that x=0 ?

Assume that Y is distributed according to the Gaussian density with mean =1 and variance 2=1.• (B) What is the probability that y=0 ?

Given a distribution: Pr(Z=z)=1/2Pr(X=z)+1/2Pr(Y=z)

known as a mixture (i.e. ½ of the time points are generated by the X process and ½ of the time points by the Y process ).• (C) If Z=0, what is the probability that the X process generated this

data point ?

Example - 3

Page 14: Tutorial 2  –  the outline

236607 Visual Recognition Tutorial 14

(A) Since p(x) is a continuous density, the probability that x=0 is

(B) As in part (A), the probability that y=0 is

(C) Let () be the state where the X (Y) process generates a data point. We want Pr( |Z=0). Using Bayes’ rule and working with the probability densities to get the total probability:

Example – 3 solutions

0

0

Pr(0 0) ( ) 0.x p x dx

0

0

Pr(0 0) ( ) 0.y p y dy

Page 15: Tutorial 2  –  the outline

236607 Visual Recognition Tutorial 15

where

Example - 3

0 0 0 00

0 0 1 1

( 0 | ) Pr( ) ( 0 | ) Pr( )Pr( | 0)

( 0) ( 0 | ) Pr( ) ( 0 | ) Pr( )

0.5 ( 0) ( 0)

0.5 ( 0) 0.5 ( 0) ( 0) ( 0)

0.39890.6224

0.3989 0.2420

X X

X Y X Y

p Z p ZZ

p Z p Z p Z

p X p X

p X p Y p X p Y

22 ( 1)2 2

1 1( ) ( )

2 2

yx

X Yp x e p y e

Page 16: Tutorial 2  –  the outline

236607 Visual Recognition Tutorial 16

A game: 3 doors, there is a prize behind one of them. You have to select one door.

Then one of the other two doors is opened (not revealing the prize).

At this point you may either stick with your door, or switch to the other (still closed) door.

Should one stick with his initial choice, or switch, and does your choice make any difference at all?

Example 4 The game of three doors

Page 17: Tutorial 2  –  the outline

236607 Visual Recognition Tutorial 17

Let Hi denote the hypothesis “the prize is behind the door i ”.

Assumption:

Suppose w.l.o.g.: initial choice of door 1,then door 3 is opened. We can stick with 1 or switch to 2.

Let D denote the door which is opened by the host. We assume:

By Bayes’ formula:

The game of three doors

1 2 3

1Pr( ) Pr( ) Pr( )

3H H H

1 2 3

1 2 3

1Pr( 2 | ) ,Pr( 2 | ) 0,Pr( 2 | ) 1,

21

Pr( 3 | ) ,Pr( 3 | ) 1,Pr( 3 | ) 02

D H D H D H

D H D H D H

1 2 3

Pr( 3 | ) Pr( )Pr( | 3) ,

Pr( 3)

1 1 112 3 3Pr( | 3) ,Pr( | 3) ,Pr( | 3) 0Pr( 3) Pr( 3)

i ii

D H HH D

D

H D H D H DD D

Page 18: Tutorial 2  –  the outline

236607 Visual Recognition Tutorial 18

The denominator is a normalizing factor.

So we get

which means that we are more likely to win the prize if we

switch to the door 2.

The game of three doors-the solutiion

1Pr( 3)

2D

1

2

3

1Pr( | 3) ,

22

Pr( | 3) ,3

Pr( | 3) 0,

H D

H D

H D

Page 19: Tutorial 2  –  the outline

236607 Visual Recognition Tutorial 19

A violent earthquake occurs just before the host has opened one of the doors; door 3 is opened accidentally, and there is no prize behind it. The host says “it is valid door, let’s let it stay open and go on with the game”.

What should we do now?

First, any number of doors might have been opened by the earthquake. There are 8 possible outcomes, which we assume to be equiprobable: d=(0,0,0),…,d=(1,1,1).

A value of D now consists of the outcome of the quake, and the visibility of the prize; e.g., <(0,0,1),NO>

We have to compare Pr(H1|D) vs. Pr(H2|D) .

Further complication

Page 20: Tutorial 2  –  the outline

236607 Visual Recognition Tutorial 20

Pr(D|Hi) hard to estimate, but we know that Pr(D|H3)=0 .

Also from Pr(H3 , D)= Pr(H3 | D)Pr(D)= Pr(D|H3) Pr(H3) and from we have Pr(H3 | D)=0.

Further, we have to assume that Pr(D|H1)= Pr(D|H2) (we don’t know the values, but we assume they are equal).

Now we have

The earthquake-continued

Pr( ) 0D

Page 21: Tutorial 2  –  the outline

236607 Visual Recognition Tutorial 21

(why they are ½?)

(Because and we get

Also )

So, we might just as well stick with our choice.

We have different outcome because the data here is of different nature (although looks the same).

The earthquake-continued

1 11

2 22

Pr( | ) Pr( ) 1Pr( | ) ,

Pr( ) 2

Pr( | ) Pr( ) 1Pr( | )

Pr( ) 2

D H HH D

D

D H HH D

D

1 2Pr( ) Pr( )H H 1 2Pr( | ) Pr( | )D H D H

1 2Pr( | ) Pr( | )H D H D 1 2 3Pr( | ) Pr( | ) Pr( | ) 1H D H D H D