23
1 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1 Unit 2: Review of Probability Statistics 571: Statistical Methods Ramón V. León 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 2 Approaches to Probability Approaches to probability Classical approach – Frequentist Personal or subjective approach Axiomatic approach Basic ideas of axiomatic approach Sample space – Events – Union – Intersection – Complement Disjoint or mutually exclusive events – Inclusion

Unit 2: Review of Probability - University of Tennesseeleon/stat571/2004SummerPDFs/571Unit2Handout… · 1 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1 Unit 2: Review of Probability

Embed Size (px)

Citation preview

Page 1: Unit 2: Review of Probability - University of Tennesseeleon/stat571/2004SummerPDFs/571Unit2Handout… · 1 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1 Unit 2: Review of Probability

1

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1

Unit 2: Review of Probability

Statistics 571: Statistical MethodsRamón V. León

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 2

Approaches to Probability• Approaches to probability

– Classical approach– Frequentist– Personal or subjective approach– Axiomatic approach

• Basic ideas of axiomatic approach– Sample space– Events– Union– Intersection– Complement– Disjoint or mutually exclusive events– Inclusion

Page 2: Unit 2: Review of Probability - University of Tennesseeleon/stat571/2004SummerPDFs/571Unit2Handout… · 1 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1 Unit 2: Review of Probability

2

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 3

Axioms of Probability• Axioms:

– P(A) ≥0– P(S) = 1 where S is the sample space– P(A ∪ B) = P(A) + P(B) if A and B are mutually

exclusive events• Theorems about probability can be proved using these

axioms• These theorems can be used in probability calculations

– E.g. assuming all elements of the sample space are equally likely

– Counting arguments used. (Take a look at Birthday Problem on Page 13.)

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 4

Conditional Probability and Independence• Conditional probability

– P(A | B) = P (A ∩ B) / P(B)• Events A and B are mutually independent if P (A | B) = P(A)

– Implies P (A ∩ B) = P(A)P(B)

Page 3: Unit 2: Review of Probability - University of Tennesseeleon/stat571/2004SummerPDFs/571Unit2Handout… · 1 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1 Unit 2: Review of Probability

3

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 5

Tossing Two Dice

(6,6)(6,5)(6,4)(6,3)(6,2)(6,1)6(5,6)(5,5)(5,4)(5,3)(5,2)(5,1)5(4,6)(4,5)(4,4)(4,3)(4,2)(4,1)4(3,6)(3,5)(3,4)(3,3)(3,2)(3,1)3

(2,6)(2,5)(2,4)(2,3)(2,2)(2,1)2(1,6)(1,5)(1,4)(1,3)(1,2)(1,1)1

654321

First Die Outcome

Second Die Outcome

Sample space has 6 x 6 = 36 outcomes

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 6

Conditional Probability Example

P(A)=8/36

P(B)=18/36

Page 4: Unit 2: Review of Probability - University of Tennesseeleon/stat571/2004SummerPDFs/571Unit2Handout… · 1 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1 Unit 2: Review of Probability

4

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 7

AIDS Example

100009900100

941094055TestNegative

59049595Test positive

Not AIDSAIDS

P(A) = 100/10000 =.01 P(+|A) = 95/100 =.95 P(-|~A) = 9405/9900 =.95P(A|+) = 95/590 =.16

The usual way of solving this problem uses Bayes Theorem

Given

Conclude

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 8

What Does a Positive HIV Test Means?

Page 5: Unit 2: Review of Probability - University of Tennesseeleon/stat571/2004SummerPDFs/571Unit2Handout… · 1 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1 Unit 2: Review of Probability

5

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 9

Bayes Theorem Consequences

( ) ( | )P A P A B⎯⎯→

( | ) ( | )P A B P B A≠

( ) ( | )P A P A Data⎯⎯→

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 10

Independence Example

Page 6: Unit 2: Review of Probability - University of Tennesseeleon/stat571/2004SummerPDFs/571Unit2Handout… · 1 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1 Unit 2: Review of Probability

6

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 11

Random Variables

• A random variable (r.v.) associates a unique numerical value with each outcome in the sample space

• Example:

• Discrete random variables: number of possible values is finite or countably infinite: x1, x2, x3, x4, x5, x6, …

• Probability mass (density) function (p.m.f. or p.d.f.)– f(x) = P(X= x )

• Cumulative distribution function (c.d.f.)

– F(x) = P (X ≤ x) =

10

X ⎧= ⎨

if coin toss results in a head

if coin toss results in a tail

( )k x

f k≤∑

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 12

Discrete Random Variable Example

Page 7: Unit 2: Review of Probability - University of Tennesseeleon/stat571/2004SummerPDFs/571Unit2Handout… · 1 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1 Unit 2: Review of Probability

7

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 13

Graphs of Probability Mass (Density) Function and Probability Distribution Function

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 14

Continuous Random Variables

( ) 0f x ≥

•An r.v. is continuous if it can assume any value from one or more intervals of real numbers•Probability density function f(x) :

( ) 1f x dx∞

−∞

=∫

( ) ( )b

a

P a X b f x dx≤ ≤ = ∫ for any a b≤

Page 8: Unit 2: Review of Probability - University of Tennesseeleon/stat571/2004SummerPDFs/571Unit2Handout… · 1 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1 Unit 2: Review of Probability

8

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 15

Cumulative Distribution Function

The cumulative distribution function (c.d.f.), denoted by F(x) , for a continuous random variable is given by:

( ) ( ) ( )x

F x P X x f y dy−∞

= ≤ = ∫

It follows that ( )( ) dF xf xdx

=

x

f(x)F(x)

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 16

Exponential Distribution Example

Page 9: Unit 2: Review of Probability - University of Tennesseeleon/stat571/2004SummerPDFs/571Unit2Handout… · 1 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1 Unit 2: Review of Probability

9

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 17

Mean and Variance of Random Variables: Discrete Case

2( ) ( ), ( ) ( ( )) ( )E X xf x Var X x E X f x= = −∑ ∑

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 18

Mean and Variance of Sum of Two Dice Tosses

Page 10: Unit 2: Review of Probability - University of Tennesseeleon/stat571/2004SummerPDFs/571Unit2Handout… · 1 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1 Unit 2: Review of Probability

10

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 19

Expected Value or Mean of Random Variables

The expected value or mean of a discrete r. v. X denoted by E(X), μX , or simply μ, is defined as:

1 1 2 2( ) ( ) ( ) ( ) ...x

E X xf x x f x x f xμ= = = + +∑The expected value of a continuous r. v. is defined as:

( ) ( )E X xf x dxμ= = ∫

0

Mean of Exponetial Distribution 1( ) xE X x e dxλλλ

∞ −= =∫

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 20

Variance and Standard DeviationThe variance of an r.v. X, denoted by Var(X), 2

Xσ , or simply 2σis defined as

2 2( ) ( )Var X E Xσ μ= = −We can show that

( )22( ) ( ) ( )Var X E X E X= −The standard deviation (SD) is the square root of the variance

Challenge exercise: Show that for the exponential distribution the standard deviation is 1/λ

Page 11: Unit 2: Review of Probability - University of Tennesseeleon/stat571/2004SummerPDFs/571Unit2Handout… · 1 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1 Unit 2: Review of Probability

11

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 21

Variance of the Mean of independent, Identically Distributed Random Variables

( )( )

1

2 1

2 1

22

2

( )

1

1

1

nii

nii

nii

XVar X Var

n

Var Xn

Var Xn

nn

n

σ

σ

=

=

=

⎛ ⎞⎜ ⎟=⎜ ⎟⎝ ⎠

⎛ ⎞= ⎜ ⎟⎝ ⎠⎛ ⎞= ⎜ ⎟⎝ ⎠⎛ ⎞= ⎜ ⎟⎝ ⎠

=

∑ by independence

since the r.v.’s are identically distributed

1 2

We often refer to , ,..., as

a randon samplewith replacement orfrom a very large population

nX X X

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 22

Quantiles and PercentilesFor 0 1p≤ ≤ the pth quantile (or the 100pth percentile), denoted by

pθ ,of a continuous r.v. X is defined by the following equation:

( ) ( )p pP X F pθ θ≤ = =

.5θ is called the median

pF(x)

θp

Page 12: Unit 2: Review of Probability - University of Tennesseeleon/stat571/2004SummerPDFs/571Unit2Handout… · 1 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1 Unit 2: Review of Probability

12

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 23

Exponential Distribution Percentiles

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 24

Jointly Distributed Random Variables

( , ) joint probability mass function

f x y =

32 0.16200

=

Page 13: Unit 2: Review of Probability - University of Tennesseeleon/stat571/2004SummerPDFs/571Unit2Handout… · 1 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1 Unit 2: Review of Probability

13

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 25

Marginal Distribution

Discrete: ( ) ( ) ( , )

Continuous: ( ) ( ) ( , )

y

X

g x P X x f x y

g x f x f x y dy∞

−∞

= = =

= =

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 26

Conditional Distribution

( , )( | ) ( | )( )

f x yf y x P Y y X xg x

= = = =

Conditional probability mass function (p.m.f.):( 4, 1) 0.005( 1| 4)

( 4) 0.315P X YP Y X

P X= =

= = = ==

Page 14: Unit 2: Review of Probability - University of Tennesseeleon/stat571/2004SummerPDFs/571Unit2Handout… · 1 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1 Unit 2: Review of Probability

14

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 27

Independent Random Variables and are independent r.v.'s if ( , ) ( ) ( )

( , )Note that ( | ) ( )( )

X Y f x y g x h yf x yf y x h yg x

=

= =

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 28

Covariance and Correlation

If X and Y are independent then E(XY)=E(X)E(Y) so the covarianceis zero. The other direction is not true; zero covariance does not imply independence.

( , ) ( )( ) ( ) ( ) ( )XY X YCov X Y E X Y E XY E X E Yσ μ μ= = − − = −

Note that: ( ) ( , )E XY xyf x y dxdy∞ ∞

−∞ −∞= ∫ ∫

( , )( , )var( ) var( )

XYXY

X Y

Cov X Ycorr X YX Y

σρσ σ

= = =

Measures strength of linear association

1 1XYρ− ≤ ≤

Page 15: Unit 2: Review of Probability - University of Tennesseeleon/stat571/2004SummerPDFs/571Unit2Handout… · 1 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1 Unit 2: Review of Probability

15

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 29

Covariance Example

•A positive covariance indicates positive dependence•A negative covariance indicates negative dependence

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 30

Correlation Example

Page 16: Unit 2: Review of Probability - University of Tennesseeleon/stat571/2004SummerPDFs/571Unit2Handout… · 1 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1 Unit 2: Review of Probability

16

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 31

Chebyshev’s InequalityLet c > 0 be a constant. Then irrespective of thedistribution of X

2

( )( ( ) ) Var XP X E X cc

− ≥ ≤

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 32

Weak Law of Large Numbers

Let X be the sample mean of n i.i.d. observations from a

population with finite mean μ and variance 2σ . Then forany fixed c > 0

2

2( ) 0P X cncσμ− ≥ ≤ → as n→∞

We see that X approaches μ as n gets large.

This follows from Chebyshev’s inequality and the fact that2

( ) and ( )E X Var Xnσμ= =

Page 17: Unit 2: Review of Probability - University of Tennesseeleon/stat571/2004SummerPDFs/571Unit2Handout… · 1 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1 Unit 2: Review of Probability

17

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 33

Selected Discrete Distributions

Bernoulli distribution:

( ) ( )1p

f x P X xp

⎧= = = ⎨ −⎩

if x = 1if x = 0

( ) ,E X p= ( ) (1 )V ar X p p= −

Binomial distribution:

( ) ( ) (1 )x n xnf x P X x p p

x−⎛ ⎞

= = = −⎜ ⎟⎝ ⎠

for x = 0, 1, …,n

( ) ,E X np= ( ) (1 )Var X np p= −

( )5! 5!, e.g., 103! ! 3!2!

n nk k n k⎛ ⎞ ⎛ ⎞

= = =⎜ ⎟ ⎜ ⎟−⎝ ⎠ ⎝ ⎠

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 34

Binomial Distribution Example

Suppose that the probability of a thumbtack landing with the pin up is 0.9. If we toss the thumbtack ten times what is the probability that it lands with the pin up exactly 7 times?

7 3 7 310( 7) (.9) (1 .9) 120(.9) (.1) .057

7P X ⎛ ⎞

= = − = =⎜ ⎟⎝ ⎠

Answer:

See Example 2.30, Page 43 for another application of the Binomial distribution

( ) 10 .9 9, ( ) (1 ) 10 .9 .1 .9E X np Var X np p= = × = = − = × × =

Also note:

Page 18: Unit 2: Review of Probability - University of Tennesseeleon/stat571/2004SummerPDFs/571Unit2Handout… · 1 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1 Unit 2: Review of Probability

18

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 35

Hypergeometric Distribution(Sampling without replacement from a small population)

A lot of 50 tables has two defective tables. A sample of five tables are selected without replacement. What is the probabilitythat none of these five tables is defective?

2 480 5

( 0) .8082505

P X

⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠= = =⎛ ⎞⎜ ⎟⎝ ⎠

Suppose the five tables had been selected with replacement? What would then be the probability?

0 55 2 48( 0) .81540 50 50

P X ⎛ ⎞⎛ ⎞ ⎛ ⎞= = =⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 36

( ) ( ) ,!

xef x P X xx

λλ−

= = = for x = 0, 1, 2, …

( ) ,E X λ= ( )Var X λ=

Poisson Distribution:

Example: On the average five Prussian soldiers die from horsekicks in a year. What is the probability that exactly four soldiers are killed this way in a given year?

5 4(5)( 4) .1754!

eP X−

= = =

Page 19: Unit 2: Review of Probability - University of Tennesseeleon/stat571/2004SummerPDFs/571Unit2Handout… · 1 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1 Unit 2: Review of Probability

19

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 37

Geometric Distribution

Probability of waiting time to an event in independent trials

1

2

( ) (1 ) , 1, 2,...1 1( ) and ( )

xP X x p p xpE X Var X

p p

−= = − =−

= =

Suppose the probability of winning the jackpot in a slot machineis .01. What is the expected number of tries to win the jackpot?What the is the probability that you hit the jackpot for the first time on your fifth try?

41( ) 100, ( 5) (.99) (.01) .0096.01

E X P X= = = = =

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 38

Uniform DistributionDistribution when all values in an interval are equally likely

Suppose that you select a real number at random in the interval [1,5]. What is the probability that it turns out to be between 2and 4?

4 2(2 4) 0.55 1

P X −≤ ≤ = =

−Proportion of the lengths of the intervals [2,4] to the length of the interval [1,5]

Page 20: Unit 2: Review of Probability - University of Tennesseeleon/stat571/2004SummerPDFs/571Unit2Handout… · 1 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1 Unit 2: Review of Probability

20

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 39

Exponential DistributionDistribution of waiting time when arrivals occur at random

0( ) , ( ) 1 for 0

xx t xf x e F x e dt e xλ λ λλ λ− − −= = = − ≥∫2

1 1( ) and ( )E X Var Xλ λ

= =

( ) 1 ( ) 1 ( ) xP X x P X x F x e λ−> = − ≤ = − =

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 40

Memoryless Property of the Exponential Distribution

( ) ( )( ) ( )

|

5 3 | 5 3

P X s t X s P X t

P X X P X

> + > = >

> + > = >

The probability of having to wait t additional minutes after havingwaited s minutes is the same as the probability of having to wait t minutes to begin with.

Page 21: Unit 2: Review of Probability - University of Tennesseeleon/stat571/2004SummerPDFs/571Unit2Handout… · 1 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1 Unit 2: Review of Probability

21

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 41

Normal DistributionA continuous r.v. X has a normal distribution with parameter μand 2σ if its p.d.f. is given by

2

2( )

21( ) for 2

x

f x e xμσ

σ π

−−

= −∞< <∞

( )E X μ= 2( )Var X σ=and

2~ ( , )X N μ σNotation:

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 42

Standard Normal Distribution = N(0,1)2If ~ ( , ) then ~ (0,1)XX N Z Nμμ σ

σ−

=

( ) X x xP X x P Z μ μ μσ σ σ− − −⎛ ⎞ ⎛ ⎞≤ = = ≤ = Φ⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠

2~ (205,5 )( 200)

205 200 2055 5

( 1) ( 1) 0.1587

X NP X

XP Z

P Z

< =

− −⎛ ⎞= < =⎜ ⎟⎝ ⎠

< − = Φ − =

Page 22: Unit 2: Review of Probability - University of Tennesseeleon/stat571/2004SummerPDFs/571Unit2Handout… · 1 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1 Unit 2: Review of Probability

22

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 43

Standard Normal Table

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 44

Empirical Rule

Page 23: Unit 2: Review of Probability - University of Tennesseeleon/stat571/2004SummerPDFs/571Unit2Handout… · 1 6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1 Unit 2: Review of Probability

23

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 45

Mean of i.i.d. Normal Random Variable

1 2

21 2

2

21

Let , ,..., be independent, indentically

distributed ( , ). We say that , ,...,

is a random sample from a ( , ) population.

Then for we have ~ , .

n

n

nii

X X XN X X X

N

XX X N

n n

μ σ

μ σ

σμ= ⎛ ⎞= ⎜ ⎟

⎝ ⎠

Hint: Use this result to do homework problem 2.83

6/7/2007 Unit 2 - Stat 571 - Ramón V. León 46

Percentiles of the Normal Distribution

Suppose that the scores on a standardized test are normally distributed with mean 500 and standard deviation 100. Whatis the 75th percentile score of this test?

500 500 500 500( ) .75100 100 100 100

X x x xP X x P P Z− − − −⎛ ⎞ ⎛ ⎞ ⎛ ⎞≤ = ≤ = ≤ = Φ =⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠

500 0.675 500 (0.675)(100) 567.5100

x x−= ⇒ = + =

From Table A.3 (0.675) 0.75.Φ = So

For 75 percentile means that ( ) .75. Sothx P X x= ≤ =