66
Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

  • View
    220

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Probability & Statistical Inference Lecture 2

MSc in Computing (Data Analytics)

Page 2: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Lecture Outline Introduction

Introduction to Probability Theory

Discrete Probability Distributions

Question Time

Page 3: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Introduction

Page 4: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Probability & Statistics We want to make

decisions based on evidence from a sample i.e. extrapolate from sample evidence to a general population

To make such decisions we need to be able to quantify our (un)certainty about how good or bad our sample information is.

Population

Representative Sample

Sample Statistic

Describe

Make

Inference

Page 5: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Example: How many voters will give F.F. a first preference in the next general election ?

-researcher A takes a sample of size 10 and find 4 people who say they will

-researcher B takes a sample of size 100 and find 25 people who say they will

Researcher A => 40%Researcher B => 25%

Who would you believe?

Probability & Statistics - Example

Page 6: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Probability & Statistics - Example Intuitively the bigger sample would get more

credence but how much better is it, and are either of the samples any good?

Probability helps

Descriptive Statistics are helpful but still lead to decision making by 'intuition‘

Probability helps to quantify (un)certainty which is a more powerful aid to the decision maker

Page 7: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Probability & Statistics

Using probability theory we can measure the amount of uncertainty/certainty in our statistics.

Page 8: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Intuitions and Probability – Lotto example If you had an Irish lotto ticket which of

these sets of numbers is more likely to win:

1. 1 2 3 4 5 6

Odds of winning are 1 in 8145060

2. 2 11 26 27 35 42

Odds of winning are 1 in 8145060

Page 9: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Intuitions and Probability – Disease example Suppose we have a diagnostic test for a

disease which is 99% accurate.

A person is picked at random and tested for the disease

The test gives a positive result. What is the probability that the person actually has the disease?

99% ?

Page 10: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Disease example

Test Results

Those that don’t have/do have the

disease

If you take a population of

1,000,000

1,000,000

999,900

989,991 9,999

100

99 1

No!! IT depends on how common or rare the disease is.Suppose the disease affects 1 person in 10,000

Of those who test positive only

have the disease0098.0

999999

99

Page 11: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Introduction to Probability Theory

Page 12: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Some Definitions An experiment that can result in different outcomes,

even though it is repeated in the same manner every time, is called a random experiment.

The set of all possible outcomes of a random experiment is called the sample space of an experiment and is denote by S

Example: Experiment: Toss two coins and observe the up face on each Sample Space:

1. Observe HH2. Observe HT3. Observe TH4. Observe TT

S : {HH,HT,TH,TT}

Page 13: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Some Definitions A sample space is discrete if it consists of a

finite or countable infinite set if outcomes.

A sample space is continuous if it contains an interval or real numbers.

An event is a subset of the sample space of a random experiment.

Page 14: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Counting A permutation of the elements is an ordered

sequence of the elements. Example: S : {a,b,c} All the permutations of the elements of S are abc,

acb, bca, bac, cba & cab.

The number of permutations of n different elements is n! Where:

n! = n * (n-1) * (n-2) * .......* 2 * 1 Above n=3 => 3! = 3 * 2 * 1 = 6

Page 15: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Counting The number of permutations of subsets r

elements selected from a set of n different elements is

Where order is not important when selecting r elements from a set of n different elements is called a combination:

)!(

!

rn

nP

n

r

)!(!

!

rnr

nC

n

r

Page 16: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Probability

Whenever a sample space consists of N Possible outcomes that are equally likely, the probability of the outcome 1/N.

For a discrete sample space, the probability of an event E, denoted by P(E), equals the sum of the probabilities of the outcome in E.

Some rules for probabilities: For a given sample spce containing n event sE1, E2,

E3, ........,En

1. All simple event probabilities must lie between 0 and 1:0 <= P(Ei) <= 1 for i=1,2,........,n

2. The sum of the probabilities of all the simple events within a sample space must be equal to 1:

1)(1

n

iiEP

Page 17: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Probability – Example 1 Example:

Experiment: Toss two coins and observe the up face on each

Sample Space: S : {HH,HT,TH,TT} Probability of each event:

1. E = HH => P(HH) = 1/42. E = HT => P(HT) = 1/43. E = TH => P(TH) = 1/44. E = TT => P(HH) = 1/4

Page 18: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Probability – Example 1 The probability of an event A is equal to the sum of

all the probabilities in event A: Example:

Experiment: Toss two coins and observe the up face on each Event A: {Observe exactly one head}

P(A) = P(HT) + P(TH) = ¼ + ¼ = ½

Event B : {Observe at least one head}

P(B) = P(HH) + P(HT) + P(TH) = ¼ + ¼ + ¼ = ¾

Page 19: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Probability – Example 2 5 card poker example. It doesn't matter in poker

in what order the cards are dealt, merely the combination of certain cards wins. So, how many different hands of poker are there?

= ?

What is the probability of being dealt a royal straight flush?

What is the probability of being dealt a poker (i.e. 4 of a kind)?

C52

5

Page 20: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Compound Events The union of two event A and B is the event that occurs if

either A or B, or both, occur on a single performance of the experiment denoted by A U B (A or B)

The intersection of two events A and B is the event that occurs if both A and B occur on a single performance of an experiment denoted by A B or (A and B)

Example: Consider a die tossing experiment with equally likely simple events {1,2,3,4,5,6}. Define the events A, B and C.

A:{Toss an even number} = {2,4,6} B:{Toss a less than or equal to 3} = {1,2,3} C:{Toss a number greater than 1} = {2,3,4,5,6} Find:

)(

)(

CBAP

and

CBAP

Page 21: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Complementary Event The complementary of an event A is the

event that A does not occur denoted by A´

Note that AU A` = S, the sample space

P(A) + P(A`) =1 => P(A) = 1 – P(A`)

Page 22: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Questions

1. What is the sample space when a coin is tossed 3 times?

2. What is the probability of tossing all heads or all tails.

3. What is the sample space of throwing a fair die.

4. If a fair die is thrown what is the probability of throwing a prime number (2,3,5)?

Page 23: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Questions4. A factory has two assembly lines, each of which is shut down

(S), at partial capacity (P), or at full capacity (F). The following table gives the sample space

For where (S,P) denotes that the first assembly line is shut down and the second one is operating at partial capacity. What is the probability that:

a) Both assembly lines are shut down?

b) Neither assembly lines are shut down

c) At least one assembly line is on full capacity

d) Exactly one assembly line is at full capacity

Event A

P(A) Event A

P(A) Event A

P(A)

(S,S) 0.02 (S,P) 0.06 (S,F) 0.05

(P,S) 0.07 (P,P) 0.14 (P,F) 0.2

(F,S) 0.06 (F,P) 0.21 (F,F) 0.19

Page 24: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Conditional Probability The conditional probability of event A

conditional on event B is

for P(B)>0. It measures the probability that event A occurs when it is known that event B occur.

Example: A = odd result on die = {1,3,5} B = result > 3 = {4,5,6}

)(

)()|(

BP

BAPBAP

31

63

61

)|( BAP

Page 25: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Conditional Probability Example Example: A study was carried out to investigate

the link between people’s lifestyles and cancer. One of the areas looked at was the link between lung cancer and smoking. 10,000 people over the age of 55 were studied over a 10 year period. In that time 277 developed lung cancer.

What is the likelihood of somebody developing lung cancer given that they smoke?

Cancer No Cancer Total

Smoker 241 3,325 3,566Non-Smoker 36 6,398 6,434

Total 277 9,723 10,000

Page 26: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Conditional Probability Example Event A: A person develops lung cancer

Event B: A person is a smoker

P(A) = 277/10,000 = 0.027 P(B) = 3,566/10,000 = 0.356

068.03566.0

0241.0

)(

)()|(

BP

BAPBAP

0241.0000,10/241)( BAP

Page 27: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Exercises1. A ball is chosen at random from a bag

containing 150 balls that are either red or blue and either dull or shinny. There are 36 red, shiny balls and 54 blue balls. There are 72 dull balls.

1. What is the probability of a chosen ball being shiny conditional on it being red?

2. What is the probability of a chosen ball being dull conditional on it being blue?

Page 28: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Mutually Exclusive Events Two events, A and B, are mutually

exclusive given that if A happens then B can’t also happen.

Example: Roll of a dieA = less than 2B = even result

There is no way that A and B can happen at the same time therefore they are mutually exclusive events

Page 29: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Rules for Unions Additive Rule:

Additive Rule for Mutually Exclusive Events

)()()()( BAPBPAPBAP

)()()( BPAPBAP

Page 30: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Example Records at an industrial plant show that 12%

of all injured workers are admitted to hospital for treatment, 16% are back on the job the next day, and 2% are both admitted to a hospital for treatment and back to work the next day. If a worker is injured what is the probability that the worker will be either admitted to hospital or back on the job the next day or both?

Page 31: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Independent Events Events A and B are independent if it is the case that A

happening does not alter the probability that B happens.

Example : A = even result on dieB = result > 2

Then, let us say we are told the result on the die (which someone has observed but not us) is even so knowing this, what is the probability that the event B has happened?

Sample space: {2, 4, 6}

B = 4 or 6 => P(B) = 2/3

Page 32: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Independent Events But if we didn’t know about the even result

we would get:Sample space: {1, 2, 3, 4, 5, 6}

B = 3 or 4 or 5 or 6 => P(B) = 4/6 =

2/3

so knowledge about event A has in no way changed out probability assessment concerning event B

Page 33: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Rules for Intersection Multiplicative Rule of Probability

If events A and B are independent then

)()|()()|()( APABPBPBAPBAP

)()()( BPAPBAP

Page 34: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Bayes Theorem One of a number of very useful results: - here is simplest

definition:

Suppose: You have two events which are ME and exhaustive – i.e. account for all the sample space –

Call these events A and event (read ‘not A’).

Further suppose there is another event B, such that P(B|A) > 0 and P(A|B) > 0.

Then Bayes theorem states:

)'()'|()()|(

)()|()|(

APABPAPABP

APABPBAP

Page 35: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Discrete Probability Distributions

Page 36: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Discrete Random Variable

A Random Variable (RV) is obtained by assigning a numerical value to each outcome of a particular experiment.

Probability Distribution: A table or formula that specifies the probability of each possible value for the Discrete Random Variable (DRV)

DRV: a RV that takes a whole number value only

Page 37: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Example: What is the probability distribution for the experiment to assess the no of tails from tossing 2 coins;

Sample SpaceCoin 1 Coin 2 T T T H H T H H

x = no. of tails is the RV x P(x) 0 = P(HH) = 0.25 1 = P(TH) + P(HT) = 0.50 2 = P(TT) = 0.25

P( any other value ) = 0

N.B. P(x) = 10 P(x) 1 for all values of x

Page 38: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Mean of a Discrete Random Variable Mean of a DRV = = Σ x * p(x) Example: Throw a fair die

x P(x) x * P(x) 1 0.1667 0.17 2 0.1667 0.33 3 0.1667 0.50 4 0.1667 0.67 5 0.1667 0.83 6 0.1667 1.00P(any other value) = 0 0

Mean = = Σ x * p(x) = 3.5

Page 39: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Simulated Sample size = 10

3 1 4 6 6 6 1 3 6 1mean = 3.7S.D. = 2.1

Simulated Sample size = 100

4 6 4 6 3 2 4 2 1 3

5 1 3 2 6 3 3 1 5 6

2 3 2 5 6 4 6 2 3 3

5 2 5 4 4 3 1 4 1 3

6 5 6 4 3 3 2 2 2 3

3 1 5 5 1 5 1 2 5 3

6 2 4 3 6 1 3 1 1 2

2 3 5 3 4 6 3 3 5 4

6 5 6 1 2 4 3 6 1 4

2 4 5 6 6 6 6 3 1 5

mean = 3.54S.D. = 1.67

Some Examples

Page 40: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Simulated Sample size = 1000 1 3 3 2 2 6 1 3 4 6 1 4 4 4 1 3 2 3 1 4 6 2 2 2 2 5 5 2 3

5 5 1 3 1 3 4 5 1 4 4 6 1 3 1 4 4 6 3 1 5 6 1 5 5 1 6 3 2

3 1 4 1 6 3 3 5 6 6 6 6 3 2 5 3 6 5 5 5 2 6 6 6 1 3 4 3 3

5 3 3 6 5 5 1 2 5 5 2 6 1 4 3 2 1 1 1 4 4 5 1 5 3 2 5 1 3

3 2 1 1 2 6 3 5 3 1 5 6 2 3 5 6 6 2 6 2 2 1 6 3 5 3 4 3 5

4 4 3 2 1 2 6 5 6 2 1 2 1 6 4 2 5 6 3 6 6 5 1 6 5 4 6 2 1

5 4 2 5 4 1 3 4 6 5 3 5 1 3 3 6 1 3 3 2 1 4 6 6 4 4 1 5 3

4 2 1 3 4 2 6 3 2 2 3 3 5 1 3 5 5 3 1 6 6 1 3 4 5 3 3 1 5

2 1 5 2 5 2 1 6 1 2 6 3 2 6 5 1 1 5 3 6 1 1 2 1 4 1 1 2 3

4 3 4 4 5 5 4 3 3 6 5 5 6 2 6 6 1 6 2 5 6 3 2 6 5 4 2 2 6

4 3 3 1 3 4 4 3 2 2 1 6 2 2 1 5 2 1 6 1 4 6 4 5 4 5 2 4 4

3 6 1 4 2 1 6 6 2 6 4 2 6 2 5 5 4 5 6 2 3 1 2 5 1 6 5 1 5

2 5 6 5 2 3 2 1 1 3 6 6 3 1 5 3 5 2 6 6 6 3 2 1 2 6 2 6 5

1 4 2 5 1 3 2 1 1 3 3 6 5 6 3 6 4 3 2 5 2 5 2 2 4 3 3 6 1

2 3 4 6 1 1 5 4 5 6 1 4 1 2 1 1 3 1 2 5 4 5 1 6 6 4 2 1 6

5 4 6 2 2 4 3 3 5 5 6 2 3 5 1 1 1 4 2 4 1 4 2 1 3 1 2 3 1

1 2 2 4 2 2 2 1 4 5 2 2 2 5 1 4 1 2 2 2 6 5 5 3 1 2 3 6 6

4 2 4 3 4 5 4 1 5 3 5 2 1 6 6 6 6 6 2 6 2 2 5 3 6 2 4 5 4

6 4 4 5 1 2 1 6 3 6 2 1 4 1 2 2 5 6 1 3 5 2 5 5 1 2 3 6 1

6 4 4 5 1 2 4 1 4 4 6 3 3 5 6 5 4 6 4 5 5 2 5 2 1 2 3 2 6

4 6 3 4 3 6 5 6 6 5 1 6 6 5 2 1 5 3 5 1 1 2 6 4 1 3 5 1 3

3 6 3 1 2 2 3 3 4 5 5 6 3 2 1 5 1 4 3 5 5 2 4 3 6 3 2 6 2

4 6 5 3 1 6 6 5 1 2 6 3 5 3 4 4 2 2 5 1 4 5 3 3 5 6 6 3 1

2 2 5 3 5 5 5 1 5 4 3 3 3 5 1 5 4 5 4 5 2 6 2 1 5 3 3 2 1

6 4 1 3 1 1 4 1 1 4 4 1 5 4 1 2 2 6 6 5 2 4 2 6 3 1 5 3 6

Mean = 3.49 S.D. = 1.73

Page 41: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Note: The largest simulation had the mean closest

to that predicted by the probability distribution

As the simulations got bigger the mean approached 3.5

Mean of DRV is the mean of a large number of independent experiments (trials).

Page 42: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Standard Deviation of a DRV

22

22

))(()(

)(

xXxPxXPx

xXPx

Page 43: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

x P(x) x2 * P(x) 1 0.1667 0.17 2 0.1667 0.67 3 0.1667 1.50 4 0.1667 2.67 5 0.1667 4.17 6 0.1667 6.00P(any other value) = 0 0

= 15.17

15.17 - (3.5)2 = 15.17 - 12.25 = 2.92 => S.D. = 1.71

Simulations:

N=10 => S.D. = 2.1

N=100 => S.D. = 1.67

N=1,000 => S.D. = 1.73

Example: Rolling one die

Page 44: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Binomial (Probability) Distribution Many experiments lead to dichotomous

responses (i.e. either success/failure, yes/no etc.)

Often a number of independent trials make up the experiment

Example: number of people in a survey who agree with a particular statement? Survey 100 people => 100 independent trials of Yes/NoThe random variable of interest is the no. of successes (however defined)

These are Binomial Random Variables

Page 45: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

4 people tested for the presence of a particular gene.success = presence of gene

P(gene present / success) = 0.55 P(gene absent / failure) = 0.45

P(3 randomly tested people from 4 have gene)?

Assume trials are independent - e.g. the people are not related

There is 4 ways of getting 3 successes

Binomial Distribution Example

Outcome 1 Outcome2 Outcome 3 Outcome 4

Y Y Y N

Y Y N Y

Y N Y Y

N Y Y Y

Page 46: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Binomial Distribution Example

Using Independence rule we can calculate the probability of each outcome:

Outcome 1: 0.55 0.55 0.55 0.45 = 0.07486875Outcome 2: 0.55 0.55 0.45 0.55 = 0.07486875Outcome 3: 0.55 0.45 0.55 0.55 = 0.07486875Outcome 4: 0.45 0.55 0.55 0.55 = 0.07486875

4 ways of getting result each with P=0.07486875

=> 4 0.07486875 = 0.299475=> P(3 randomly tested people have gene) = 0.299475

Page 47: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Binomial Distribution Example A more convenient way of mathematically

writing the same result is as follows:

the number of ways you can get three successes from 4 trials is a combination:

2994.0)45.0()55.0( 13

3

4

!)!(

!

rrn

n

r

nC nr

Page 48: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Binomial Distribution – General Formula This all leads to a very general rule for calculating binomial

probabilities:

In General Binomial (n,p)

n = no. of trials

p = probability of a success

x = RV (no. of successes)

Where P(X=x) is read as the probability of seeing x successes.

xnx ppx

nxXP

)1()(

Page 49: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

?)4(

?)3(

?)2(

?)1(

0410065.0)45.0()55.0(0

4)0( 040

XP

XP

XP

XP

XP

Page 50: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Binomial Distribution For all binomials the mean is given by the simple formula;

= n p

Example: from previous example = 4 0.55 = 2.2

Standard deviation also has simple formula for all Binomials

Example: from previous example = 0.995

)1(2 pnp

)1( pnp

Page 51: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Binomial Distribution What is P(< 3 people have gene) from a group of

four people tested at random? Use the fact that the possible outcome are

mutually exclusive (ME)

= P(0) + P(1) + P(2)= 0.041 + 0.2 + 0.368= 0.609 [ to 3 decimal places ]

We can write this probability like this;

P(X>3)=?

609.045.0)55.0(4

)3( 42

0

xx

x xXP

Page 52: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Binomial Question There are two hospitals in a town. In

Hospital A, 10 babies are born each day, in Hospital B there are 30 babies born each day. If the hospitals only count those days on which over 70% of babies born are girls, and assuming the probability that a girl is born is ½, which of the two hospitals will record more such days?

Hospital A: Binomial (n=10, p=0.5)

Hospital B: Binomial (n=30, p=0.5)

Page 53: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Answer Hospital 1:

Calculate :

Hospital 2 : Calculate :

There is a higher probability of getting 70% of babies born being girl from hospital 1.

0.17188 0.8281251

)( - 1 7)P(X6

0

i

ixXP

0.02139 0.9786131

)( - 1 21)P(X20

0

i

ixXP

Page 54: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Binomial Question A flu virus hits a company employing 180

people. Independent of other employees , there is a probability p=0.35 that each person needs to take sick leave. What is the expectation and variance of the proportion of the workforce who needs to take sick leave. In general what is the value of the sick rate p that produces the largest variance for this proportion.

Page 55: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Poisson Probability Many experiments don't have a simple success/failure

response

Responses can be the number of events occurring over time, area, volume etc.

We don't know the number of 'failures' just the number of successes.

Example: The number of calls to a telesales company

- we know how many calls got through (successes)

- but don't know how many failed (lines busy etc.)

Knowledge of the mean number of events over time etc => Poisson Random Variable

Events must occur randomly

Page 56: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Poisson Probability Distribution Probability Distribution for Poisson

Where is the known mean:

x is the value of the RV with possible values 0,1,2,3,….e = irrational constant (like ) with value 2.71828…

The standard deviation , , is given by the simple relationship;

=

!)(

x

exXP

x

Page 57: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Example: Bombing of London WW2 1944 German V1 rockets feel on London Were they aimed at specific targets or

falling randomly? Important in AA strategy & Civil Defence Divide London into a 24 24 grid of equal sizes (576 equal square areas).

Page 58: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Example: Bombing of London WW2 If rockets are random => should fall according to

Poisson random variable per square

(mean) = No. of Bombs/ No of squares= 535/576 = 0.9288

So, for a particular square (assuming randomness)

Where x is the number of bombs landing in the square on the map grid.

!

)9288.0()(

9288.0

x

exXP

x

Page 59: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

003.0997.01

)4()3()2()1()0(1)4(

012.0!4

9288.0)4(

053.0!3

9288.0)3(

170.0!2

9288.0)2(

367.0!1

9288.0)1(

395.0!0

9288.0)0(

49288.0

39288.0

29288.0

19288.0

09288.0

XPXPXPXPXPXP

eXP

eXP

eXP

eXP

eXP

Page 60: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Example: Bombing of London WW2

Prediction from Poisson so good => British concluded rockets were not being aimed at specific targets - were falling randomly on London

X = no. of rockets

P(x) 576 p(x)

0 0.395 228

1 0.367 211

2 0.170 98

3 0.053 31

4 0.012 7

> 4 (i.e. 5+) 0.003 2

Actual no. of squares Hit

229

211

93

35

7

1

Page 61: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Other Basic Discrete Probability Distributions Geometric – No. of independent trials to

first success.

Negative Binomial - No. of independent trials to first, second, third fourth… success.

Hypergeometric – lottery type experiments.

many others….

Page 62: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Question The number of cracks in a ceramic tile has a

Poisson distribution with a mean µ = 2.4. What is the probability that a tile has no cracks? What is the probability that a tile has four or more

cracks?

Page 63: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Question Time

Page 64: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Exercise: There is more that one way to skin a cat!1. If two fair die are thrown what is the

probability that at least one score is a prime number (2, 3, 5)?

2. What is the compliment of the event? 3. What is its probability?

There are three ways (at least) that we can approach this problem

Page 65: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Exercise Consider a multiple choice test in which:

Each question has 4 possible answers of which only 1 is correct

The test is made up of 10 questions The pass mark is 40%

How well could we do if we just guessed each answer?1. What is the probability of guessing a single question

correctly?2. What is the probability of getting no answers correct

in the test?3. What is the probability of getting at least one question

correct in the test?4. What is the probability of getting a score of 40% in

the test?5. What is the probability of passing the test?

Page 66: Probability & Statistical Inference Lecture 2 MSc in Computing (Data Analytics)

Next Week A quick Recap Solutions to question Continuous Distributions.