27
Probability theory Much inspired by the presentation of Kren and Samuelsson

Probability theory Much inspired by the presentation of Kren and Samuelsson

  • View
    223

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Probability theory Much inspired by the presentation of Kren and Samuelsson

Probability theory

Much inspired by the presentation of Kren and Samuelsson

Page 2: Probability theory Much inspired by the presentation of Kren and Samuelsson

3 view of probability

• Frequentist

• Mathematical

• Bayesian (knowledge-based)

Page 3: Probability theory Much inspired by the presentation of Kren and Samuelsson

Sample space

• A universe of elementary outcomes. In elementary treatments, we pretend that we can come up with sets of equiprobable outcomes (dice, coins, ...). Outcomes are very small.

• An event is a set of those outcomes. Events are bigger than outcomes -- more interesting.

Page 4: Probability theory Much inspired by the presentation of Kren and Samuelsson

Probability measure• Every event (=set of outcomes) is assigned

a probability, by a function we call a probability measure.

• The probability of every set is between 0 and 1, inclusive.

• The probability of the whole set of outcomes is 1.

• If A and B are two event with no common outcomes, then the probability of their union is the sum of their probabilities.

Page 5: Probability theory Much inspired by the presentation of Kren and Samuelsson

Cards

• Out universe of outcomes is single card pulls.

• Events: a red card (1/2); a jack (1/13);

Page 6: Probability theory Much inspired by the presentation of Kren and Samuelsson

Other things to remember

• The probability that event P will not happen (=event ~P will happen) is 1-prob(P).

• Prob (null outcome) = 0.

• p ( A B ) = p(A) + p(B) - p( A B).

Page 7: Probability theory Much inspired by the presentation of Kren and Samuelsson

Independence (definition)

• Two events A and B are independent if the probability of AB = probability of A times the probability of B (that is, p(A)* p(B) ).

Page 8: Probability theory Much inspired by the presentation of Kren and Samuelsson

Conditional probabilityThis means: what's the probability of A if I

already know B is true?

p(A|B) = p(A and B) / p (B) =

p(A B) / p(B)

Probability of A given B.

p(A) is the prior probability; p(A|B) is called a posterior probability. Once you know B is true, the universe you care about shrinks to B.

Page 9: Probability theory Much inspired by the presentation of Kren and Samuelsson

Bayes' rule

• prob (A and B) = prob (B and A); so

• prob (A |B) prob (B) = prob (B|A) prob (A)

-- just using the definition of prob (X|Y));

• hence

)(

)()|()|(

Bprob

AprobABprobBAprob

Page 10: Probability theory Much inspired by the presentation of Kren and Samuelsson

Bayes’ rule as scientific reasoning

• A hypothesis H which is supported by a set of data D merits our belief to the degree that:

• 1. We believed H before we learned about D;

• 2. H predicts data D; and

• 3. D is unlikely.

Page 11: Probability theory Much inspired by the presentation of Kren and Samuelsson

A random variable

• a.k.a. stochastic variable.

• A random variable isn't a variable. It's a function. It maps from the sample space to the real numbers. This is a convenience: it is our way of translating events (whatever they are) to numbers.

Page 12: Probability theory Much inspired by the presentation of Kren and Samuelsson

Distribution function

• Distribution function:

• This is a function that takes a real number x as its input, and finds all those outcomes in the sample space that map onto x or anything less than x.

• For a die, F(0) = 0; F(1) = 1/6; F(2) = 1/3; F(3) = 1/2; F(4) = 2/3; F(5) = 5/6; and F(6) = F(7) = 1.

Page 13: Probability theory Much inspired by the presentation of Kren and Samuelsson

discrete distribution function

Page 14: Probability theory Much inspired by the presentation of Kren and Samuelsson

discrete, continuous

• If the set of values that the distribution function takes on is finite, or countable, then the random variable (which isn't a variable, it's a function) is discete; otherwise it's continuous (also, it ought to be mostly differentiable).

Page 15: Probability theory Much inspired by the presentation of Kren and Samuelsson

Distribution function aggregates

• It's a little bit counterintuitive, in a way. What about a function P for a die that tells us that P ( 1) = 1/6, P(2) = 1/6, ... p(6) = 1/6?

• That's a frequency function, or probability function. We'll use the letter f for this. For the case of continuous variables, we don't want to ask what the probability of "1/6" is, because the answer is always 0...

Page 16: Probability theory Much inspired by the presentation of Kren and Samuelsson

• Rather, we ask what's the probability that the value is in the interval (a,b) -- that's OK. So for continuous variables, we care about the derivative of the distribution function at a point (that's the derivative of an integral, after all...). This is called a probability density function. The probability that a random variable has a value in a set A is the integral of the p.d.f. over that set A.

Page 17: Probability theory Much inspired by the presentation of Kren and Samuelsson

Frequency function f

• The sum of the values of the frequency function f must add up to 1!

• The integral of the probability density function must be 1.

• A set of numbers that adds up to 1 is called a distribution.

Page 18: Probability theory Much inspired by the presentation of Kren and Samuelsson

Means that have nothing to do with meaning

• The mean is the average; in everyday terms, we add all the values and divide by the number of items. The symbol is 'E', for 'expected' (why is the mean expected? What else would you expect?)

• Since the frequency function f tells you how many there are of any particular value, the mean is

i

ii xfx )(

Page 19: Probability theory Much inspired by the presentation of Kren and Samuelsson

Weight a moment...

• The mean is the first moment; the second moment is the variance, which tells you how much the random variable jiggles. It's the sum of the differences from the mean (square those differences so they're positive). The square root of this is the standard deviation. (We don't divide by N here; that's inside the f-function, remember?)

i

ii xfx )()( 2

Page 20: Probability theory Much inspired by the presentation of Kren and Samuelsson

Particular probability distributions:

• Binomial

• Gaussian, also known as normal

• Poisson

Page 21: Probability theory Much inspired by the presentation of Kren and Samuelsson

Binomial distribution

If we run an experiment n times (independently: simultaneous or not, we don't care), and we care only about how many times altogether a particular outcome occurs -- that's a binomial distribution, with 2 parameters: the probability p of that outcome on a single trial, and n the number of trials.

Page 22: Probability theory Much inspired by the presentation of Kren and Samuelsson

• If you toss a coin 4 times, what's the probability that you'll get 3 heads?

• If you draw a card 5 times (with replacement), what's the probability that you'll get exactly 1 ace?

• If you generate words randomly, what's the probability that you'll have two the's in the first 10 words?

Page 23: Probability theory Much inspired by the presentation of Kren and Samuelsson

• In general, the answer is

knkknk qpkkn

nqp

k

n

!)!(

!

Page 24: Probability theory Much inspired by the presentation of Kren and Samuelsson

Normal or Gaussian distribution

• Start off with something simple, like this:

2xe

That's symmetric around the y-axis (negative and positive x treated the same way -- if x = 0, then the value is 1, and it slides to 0 as you go off to infinity, either positive or negative.

Page 25: Probability theory Much inspired by the presentation of Kren and Samuelsson

Gaussian or normal distribution

• Well, x's average can be something other than 0: it can be any old

2)( xe

Page 26: Probability theory Much inspired by the presentation of Kren and Samuelsson

2

2

2

)(

x

e

And its variance (2) can be other than 1

Page 27: Probability theory Much inspired by the presentation of Kren and Samuelsson

And then normalize--

so that it all adds up (integrates, really) to 1, we have to divide by a normalizing factor:

2

2

2

)(

2

1

x

e