97
Brief Review Probability and Statistics

Brief Review Probability and Statistics. Probability distributions Continuous distributions

Embed Size (px)

Citation preview

Page 1: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Brief Review

Probability and Statistics

Page 2: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Probability distributions

Continuous distributions

Page 3: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (density function)

Let x denote a continuous random variable then f(x) is called the density function of x

1) f(x) ≥ 0

2)

3)

( ) 1f x dx

( )

b

a

f x dx P a x b

Page 4: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (Joint density function)

Let x = (x1 ,x2 ,x3 , ... , xn) denote a vector of continuous random variables then

f(x) = f(x1 ,x2 ,x3 , ... , xn)

is called the joint density function of x = (x1 ,x2 ,x3 , ... , xn)

if

1) f(x) ≥ 0

2)

3)

1)( xx df

Rxxx PdfR

)(

Page 5: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Note:

nn dxdxdxxxxfdf 2121 ,,)(

xx

n

R

n

R

dxdxdxxxxfdf 2121 ,,)( xx

Page 6: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (Marginal density function)

The marginal density of x1 = (x1 ,x2 ,x3 , ... , xp) (p < n) is defined by:

f1(x1) = =

where x2 = (xp+1 ,xp+2 ,xp+3 , ... , xn)

2)( xx df 221 ),( xxx df

The marginal density of x2 = (xp+1 ,xp+2 ,xp+3 , ... , xn) is defined by:

f2(x2) = =

where x1 = (x1 ,x2 ,x3 , ... , xp)

121 ),( xxx df 1)( xx df

Page 7: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (Conditional density function)

The conditional density of x1 given x2 (defined in previous slide) (p < n) is defined by:

f1|2(x1 |x2) =

conditional density of x2 given x1 is defined by:

f2|1(x2 |x1) =

22

21

22

),()(

x

xx

x

x

f

f

f

f

11

21

11

),()(

x

xx

x

x

f

f

f

f

Page 8: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Marginal densities describe how the subvector xi behaves ignoring xj

Conditional densities describe how the subvector xi behaves when the subvector xj is held fixed

Page 9: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (Independence)

The two sub-vectors (x1 and x2) are called independent if:

f(x) = f(x1, x2) = f1(x1)f2(x2)

= product of marginals

or

the conditional density of xi given xj :

fi|j(xi |xj) = fi(xi) = marginal density of xi

Page 10: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Example (p-variate Normal)

The random vector x (p × 1) is said to have the

p-variate Normal distribution with

mean vector (p × 1) and

covariance matrix (p × p)

(written x ~ Np(,)) if:

)()'(

2

1exp

2

1 12/12/

μxμxxp

f

Page 11: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Example (bivariate Normal) The random vector is said to have the bivariate

Normal distribution with mean vector

and

covariance matrix

2

1

μ

)()'(

2

1exp

2

1 12/12/

μxμxxp

f

2

1

x

xx

2221

2121

2212

1211

Page 12: Brief Review Probability and Statistics. Probability distributions Continuous distributions

)()'(

2

1exp

2

1, 1

2/121 μxμx

xxf

212/12

122211

,exp2

1xxQ

)()'(,1

2212

121121 μxμx

xxQ

2122211

22211221112

21122 )())((2)(

xxxx

Page 13: Brief Review Probability and Statistics. Probability distributions Continuous distributions

21211

21 ,exp12

1, xxQxxf

21, xxQ

2

2

2

22

2

22

1

11

2

1

11

1

2

xxxx

Page 14: Brief Review Probability and Statistics. Probability distributions Continuous distributions

x

y

f(x,y)

x

y

f(x,y)

x

y

f(x,y)

The Bivariate Normal Distribution

x

y y y

x x1

2

1 1

2 2

Contour Plots of the Bivariate Normal Distribution

x

y y y

x x1

2

1 1

2 2

Scatter Plots of data from the Bivariate Normal Distribution

1 21 2 1 2

1 2 1 2 1 2

1 21 2

1 2

Page 15: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Theorem (Transformations)

Let x = (x1 ,x2 ,x3 , ... , xn) denote a vector of continuous random variables with joint density function f(x1 ,x2 ,x3 , ... , xn) = f(x). Let

y1 =1(x1 ,x2 ,x3 , ... , xn)

y2 =2(x1 ,x2 ,x3 , ... , xn)

...

yn =n(x1 ,x2 ,x3 , ... , xn)

define a 1-1 transformation of x into y.

Page 16: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Then the joint density of y is g(y) given by:

g(y) = f(x)|J| where

),...,,,(

),...,,,(

)(

)(

321

321

n

n

yyyy

xxxxJ

y

x

n

n

nn

n

n

y

x

y

x

y

x

y

x

y

x

y

xy

x

y

x

y

x

...

...

...

...

det

21

22

2

2

1

11

2

1

1

= the Jacobian of the transformation

Page 17: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Corollary (Linear Transformations)

Let x = (x1 ,x2 ,x3 , ... , xn) denote a vector of continuous random variables with joint density function f(x1 ,x2 ,x3 , ... , xn) = f(x). Let

y1 = a11x1 + a12x2 + a13x3 , ... + a1nxn

y2 = a21x1 + a22x2 + a23x3 , ... + a2nxn

...

yn = an1x1 + an2x2 + an3x3 , ... + annxn

define a 1-1 transformation of x into y.

Page 18: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Then the joint density of y is g(y) given by:

)det(

1)(

)det(

1)()( 1

AAf

Afg yxy

nnnn

n

n

aaa

aaa

aaa

A

...

...

...

where

21

22221

11211

Page 19: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Corollary (Linear Transformations for Normal Random variables)

Let x = (x1 ,x2 ,x3 , ... , xn) denote a vector of continuous random variables having an n-variate Normal distribution with mean vector and covariance matrix .

i.e. x ~ Nn(, ) Let

y1 = a11x1 + a12x2 + a13x3 , ... + a1nxn

y2 = a21x1 + a22x2 + a23x3 , ... + a2nxn ...

yn = an1x1 + an2x2 + an3x3 , ... + annxn define a 1-1 transformation of x into y.

Then y = (y1 ,y2 ,y3 , ... , yn) ~ Nn(A,AA')

Page 20: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (Expectation)

Let x = (x1 ,x2 ,x3 , ... , xn) denote a vector of continuous random variables with joint density function

f(x) = f(x1 ,x2 ,x3 , ... , xn).

Let U = h(x) = h(x1 ,x2 ,x3 , ... , xn)

Then

xxxx dfhhEUE )()()(

Page 21: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (Conditional Expectation)

Let x = (x1 ,x2 ,x3 , ... , xn) = (x1 , x2 ) denote a vector of continuous random variables with joint density function

f(x) = f(x1 ,x2 ,x3 , ... , xn) = f(x1 , x2 ).

Let U = h(x1) = h(x1 ,x2 ,x3 , ... , xp)

Then the conditional expectation of U given x2

1212|11212 )()()( xxxxxxx dfhhEUE

Page 22: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (Variance)

Let x = (x1 ,x2 ,x3 , ... , xn) denote a vector of continuous random variables with joint density function

f(x) = f(x1 ,x2 ,x3 , ... , xn).

Let U = h(x) = h(x1 ,x2 ,x3 , ... , xn)

Then

222 )()( xx hEhEUEUEUVarU

Page 23: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (Conditional Variance)

Let x = (x1 ,x2 ,x3 , ... , xn) = (x1 , x2 ) denote a vector of continuous random variables with joint density function

f(x) = f(x1 ,x2 ,x3 , ... , xn) = f(x1 , x2 ).

Let U = h(x1) = h(x1 ,x2 ,x3 , ... , xp)

Then the conditional variance of U given x2

22

112 )()( xxxx hEhEUVar

Page 24: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (Covariance, Correlation) Let x = (x1 ,x2 ,x3 , ... , xn) denote a vector of continuous random variables with joint density function

f(x) = f(x1 ,x2 ,x3 , ... , xn).

Let U = h(x) = h(x1 ,x2 ,x3 , ... , xn) and

V = g(x) =g(x1 ,x2 ,x3 , ... , xn) Then the covariance of U and V.

)()()()( xxxx gEghEhE

VEVUEUEVUCov ,

ncorrelatio

)()(

, and

VVarUVar

VUCovUV

Page 25: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Properties

• Expectation

• Variance

• Covariance • Correlation

Page 26: Brief Review Probability and Statistics. Probability distributions Continuous distributions

1. E[a1x1 + a2x2 + a3x3 + ... + anxn]

= a1E[x1] + a2E[x2] + a3E[x3] + ... + anE[xn]

or E[a'x] = a'E[x]

Page 27: Brief Review Probability and Statistics. Probability distributions Continuous distributions

2. E[UV] = E[h(x1)g(x2)]

= E[U]E[V] = E[h(x1)]E[g(x2)]

if x1 and x2 are independent

Page 28: Brief Review Probability and Statistics. Probability distributions Continuous distributions

3. Var[a1x1 + a2x2 + a3x3 + ... + anxn]

or Var[a'x] = a′ a

n

jijiji

n

iii xxCovaaxVara ],[2][

1

2

)(...),(),(

...

),(...)(),(

),(...),()(

where

21

2212

1211

nnn

n

n

xVarxxCovxxCov

xxCovxVarxxCov

xxCovxxCovxVar

Page 29: Brief Review Probability and Statistics. Probability distributions Continuous distributions

4. Cov[a1x1 + a2x2 + ... + anxn ,

b1x1 + b2x2 + ... + bnxn]

or Cov[a'x, b'x] = a′ b

n

jijiji

n

iiji xxCovbaxVarba ],[][

1

Page 30: Brief Review Probability and Statistics. Probability distributions Continuous distributions

5.

6.

22xx UEEUE

22 22xx xx UEVarUVarEUVar

Page 31: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Statistical Inference

Making decisions from data

Page 32: Brief Review Probability and Statistics. Probability distributions Continuous distributions

There are two main areas of Statistical Inference

• Estimation – deciding on the value of a parameter– Point estimation– Confidence Interval, Confidence region Estimation

• Hypothesis testing– Deciding if a statement (hypotheisis) about a

parameter is True or False

Page 33: Brief Review Probability and Statistics. Probability distributions Continuous distributions

The general statistical modelMost data fits this situation

Page 34: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (The Classical Statistical Model)

The data vector

x = (x1 ,x2 ,x3 , ... , xn)

The model

Let f(x| ) = f(x1 ,x2 , ... , xn | 1 , 2 ,... , p) denote the joint density of the data vector x = (x1 ,x2 ,x3 , ... , xn) of observations where the unknown parameter vector (a subset of p-dimensional space).

Page 35: Brief Review Probability and Statistics. Probability distributions Continuous distributions

An Example

The data vector

x = (x1 ,x2 ,x3 , ... , xn) a sample from the normal distribution with mean and variance 2

The model

Then f(x| , 2) = f(x1 ,x2 , ... , xn | , 2), the joint density of x = (x1 ,x2 ,x3 , ... , xn) takes on the form:

where the unknown parameter vector ( , 2) ={(x,y)|-∞ < x < ∞ , 0 ≤ y < ∞}.

n

i

iix

nn

n

i

x

eef 1

22

2

2/1

22

2

1

2

1

x

Page 36: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (Sufficient Statistics)

Let x have joint density f(x| ) where the unknown parameter vector .

Then S = (S1(x) ,S2(x) ,S3(x) , ... , Sk(x)) is called a set of sufficient statistics for the parameter vector if the conditional distribution of x given S = (S1(x) ,S2(x) ,S3(x) , ... , Sk(x)) is not functionally dependent on the parameter vector .

A set of sufficient statistics contains all of the information concerning the unknown parameter vector

Page 37: Brief Review Probability and Statistics. Probability distributions Continuous distributions

A Simple Example illustrating Sufficiency

Suppose that we observe a Success-Failure experiment n = 3 times. Let denote the probability of Success. Suppose that the data that is collected is x1, x2, x3 where xi takes on the value 1 is the ith trial is a Success and 0 if the ith trial is a Failure.

Page 38: Brief Review Probability and Statistics. Probability distributions Continuous distributions

The following table gives possible values of (x1, x2, x3).

(x1, x2, x3) f(x1, x2, x3|) S =xi g(S |) f(x1, x2, x3| S) (0, 0, 0) (1 - )3 0 (1 - )3 1 (1, 0, 0) (1 - )2 1 1/3 (0, 1, 0) (1 - )2 1 1/3 (0, 0, 1) (1 - )2 1

3(1 - )2

1/3 (1, 1, 0) (1 - )2 2 1/3 (1, 0, 1) (1 - )2 2 1/3 (0, 1, 1) (1 - )2 2

3(1 - )2

1/3 (1, 1, 1) 3 3 3 1

The data can be generated in two equivalent ways:

1. Generating (x1, x2, x3) directly from f (x1, x2, x3|) or

2. Generating S from g(S|) then generating (x1, x2, x3) from f (x1, x2, x3|S). Since the second step does involve no additional information will be obtained by knowing (x1, x2, x3) once S is determined

Page 39: Brief Review Probability and Statistics. Probability distributions Continuous distributions

The Sufficiency Principle

Any decision regarding the parameter should be based on a set of Sufficient statistics S1(x), S2(x), ...,Sk(x) and not otherwise on the value of x.

Page 40: Brief Review Probability and Statistics. Probability distributions Continuous distributions

A useful approach in developing a statistical procedure

1. Find sufficient statistics

2. Develop estimators , tests of hypotheses etc. using only these statistics

Page 41: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (Minimal Sufficient Statistics)

Let x have joint density f(x| ) where the unknown parameter vector .

Then S = (S1(x) ,S2(x) ,S3(x) , ... , Sk(x)) is a set of Minimal Sufficient statistics for the parameter vector if S = (S1(x) ,S2(x) ,S3(x) , ... , Sk(x)) is a set of Sufficient statistics and can be calculated from any other set of Sufficient statistics.

Page 42: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Theorem (The Factorization Criterion)

Let x have joint density f(x| ) where the unknown parameter vector .

Then S = (S1(x) ,S2(x) ,S3(x) , ... , Sk(x)) is a set of Sufficient statistics for the parameter vector if

f(x| ) = h(x)g(S, )

= h(x)g(S1(x) ,S2(x) ,S3(x) , ... , Sk(x), ).

This is useful for finding Sufficient statistics

i.e. If you can factor out q-dependence with a set of statistics then these statistics are a set of Sufficient statistics

Page 43: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (Completeness)

Let x have joint density f(x| ) where the unknown parameter vector .

Then S = (S1(x) ,S2(x) ,S3(x) , ... , Sk(x)) is a set of Complete Sufficient statistics for the parameter vector if S = (S1(x) ,S2(x) ,S3(x) , ... , Sk(x)) is a set of Sufficient statistics and whenever

E[(S1(x) ,S2(x) ,S3(x) , ... , Sk(x)) ] = 0

then

P[(S1(x) ,S2(x) ,S3(x) , ... , Sk(x)) = 0] = 1

Page 44: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (The Exponential Family)

Let x have joint density f(x| )| where the unknown parameter vector . Then f(x| ) is said to be a member of the exponential family of distributions if:

,

0

)()(exp)()(1

Otherwise

bxapSghf iiii

k

ii θxθx

θx

,where

Page 45: Brief Review Probability and Statistics. Probability distributions Continuous distributions

1) - ∞ < ai < bi < ∞ are not dependent on .

2) contains a nondegenerate k-dimensional rectangle.

3) g(), ai ,bi and pi() are not dependent on x.

4) h(x), ai ,bi and Si(x) are not dependent on q.

Page 46: Brief Review Probability and Statistics. Probability distributions Continuous distributions

If in addition.

5) The Si(x) are functionally independent for i = 1, 2,..., k.

6) [Si(x)]/ xj exists and is continuous for all i = 1, 2,..., k j = 1, 2,..., n.

7) pi() is a continuous function of for all i = 1, 2,..., k.

8) R = {[p1(),p2(), ...,pK()] | ,} contains nondegenerate k-dimensional rectangle.

Then

the set of statistics S1(x), S2(x), ...,Sk(x) form a Minimal Complete set of Sufficient statistics.

Page 47: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (The Likelihood function)

Let x have joint density f(x|) where the unkown parameter vector . Then for a

given value of the observation vector x ,the Likelihood function, Lx(), is defined by:

Lx() = f(x|) with

The log Likelihood function lx() is defined by:

lx() =lnLx() = lnf(x|) with

Page 48: Brief Review Probability and Statistics. Probability distributions Continuous distributions

The Likelihood Principle

Any decision regarding the parameter should be based on the likelihood function Lx() and not otherwise on the value of x.

If two data sets result in the same likelihood function the decision regarding should be the same.

Page 49: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Some statisticians find it useful to plot the likelihood function Lx() given the value of x.

It summarizes the information contained in x regarding the parameter vector .

Page 50: Brief Review Probability and Statistics. Probability distributions Continuous distributions

An Example

The data vector

x = (x1 ,x2 ,x3 , ... , xn) a sample from the normal distribution with mean and variance 2

The joint distribution of x

Then f(x| , 2) = f(x1 ,x2 , ... , xn | , 2), the joint density of x = (x1 ,x2 ,x3 , ... , xn) takes on the form:

where the unknown parameter vector ( , 2) ={(x,y)|-∞ < x < ∞ , 0 ≤ y < ∞}.

n

i

iix

nn

n

i

x

eef 1

22

2

2/1

22

2

1

2

1

x

Page 51: Brief Review Probability and Statistics. Probability distributions Continuous distributions

The Likelihood function

Assume data vector is known

x = (x1 ,x2 ,x3 , ... , xn)

The Likelihood function

Then L( , )= f(x| , ) = f(x1 ,x2 , ... , xn | , 2),

22

1 22/ 2

1

1 1

2 2

nii

i

xxn

n ni

e e

2

1

1

2

/ 2

1

2

n

ii

x

n ne

2 2

1

12

2

/ 2

1

2

n

i ii

x x

n ne

Page 52: Brief Review Probability and Statistics. Probability distributions Continuous distributions

or

2 2

1

12

2

/ 2

1,

2

n

i ii

x x

n nL e

2 2

1 1

12

2

/ 2

1

2

n n

i ii i

x x n

n ne

2 2 21

1 22

/ 2

1

2

n s nx nx n

n ne

2 2

2 2 2 21

1

since or 11

n

i ni

ii

x nxs x n s nx

n

1

1

and since then

n

i ni

ii

xx x nx

n

Page 53: Brief Review Probability and Statistics. Probability distributions Continuous distributions

hence

2 2 211 2

2/ 2

1,

2

n s nx nx n

n nL e

221

12

/ 2

1

2

n s n x

n ne

Now consider the following data: (n = 10)

57.1 72.3 75.0 57.8 50.3 48.0 49.6 53.1 58.5 53.7

mean 57.54s 9.2185

2 219 9.2185 10 57.54

25 10

1,

6.2832L e

Page 54: Brief Review Probability and Statistics. Probability distributions Continuous distributions

1

S1

0

5E-17

1E-16

1.5E-16

2E-16

2.5E-16

3E-16

Likelihood n = 10

0

2050

70

Page 55: Brief Review Probability and Statistics. Probability distributions Continuous distributions

1S1

Contour Map of Likelihood n = 100

0 20

50

70

Page 56: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Now consider the following data: (n = 100)

2 2199 11.8571 100 62.02

250 100

1,

6.2832L e

57.1 72.3 75.0 57.8 50.3 48.0 49.6 53.1 58.5 53.7

77.8 43.0 69.8 65.1 71.1 44.4 64.4 52.9 56.4 43.9

49.0 37.6 65.5 50.4 40.7 66.9 51.5 55.8 49.1 59.5

64.5 67.6 79.9 48.0 68.1 68.0 65.8 61.3 75.0 78.0

61.8 69.0 56.2 77.2 57.5 84.0 45.5 64.4 58.7 77.5

81.9 77.1 58.7 71.2 58.1 50.3 53.2 47.6 53.3 76.4

69.8 57.8 65.9 63.0 43.5 70.7 85.2 57.2 78.9 72.9

78.6 53.9 61.9 75.2 62.2 53.2 73.0 38.9 75.4 69.7

68.8 77.0 51.2 65.6 44.7 40.4 72.1 68.1 82.2 64.7

83.1 71.9 65.4 45.0 51.6 48.3 58.5 65.3 65.9 59.6

mean 62.02s 11.8571

Page 57: Brief Review Probability and Statistics. Probability distributions Continuous distributions

1

S1

0

2E-170

4E-170

6E-170

8E-170

1E-169

1.2E-169

1.4E-169

1.6E-169

Likelihood n = 100

0

2050

70

Page 58: Brief Review Probability and Statistics. Probability distributions Continuous distributions

1S1

Contour Map of Likelihood n = 100

0 20

50

70

Page 59: Brief Review Probability and Statistics. Probability distributions Continuous distributions

The Sufficiency Principle

Any decision regarding the parameter should be based on a set of Sufficient statistics S1(x), S2(x), ...,Sk(x) and not otherwise on the value of x.

If two data sets result in the same values for the set of Sufficient statistics the decision regarding should be the same.

Page 60: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Theorem (Birnbaum - Equivalency of the Likelihood Principle and Sufficiency Principle)

Lx1() Lx

2()

if and only if

S1(x1) = S1(x2),..., and Sk(x1) = Sk(x2)

Page 61: Brief Review Probability and Statistics. Probability distributions Continuous distributions

The following table gives possible values of (x1, x2, x3).

(x1, x2, x3) f(x1, x2, x3|) S =xi g(S |) f(x1, x2, x3| S) (0, 0, 0) (1 - )3 0 (1 - )3 1 (1, 0, 0) (1 - )2 1 1/3 (0, 1, 0) (1 - )2 1 1/3 (0, 0, 1) (1 - )2 1

3(1 - )2

1/3 (1, 1, 0) (1 - )2 2 1/3 (1, 0, 1) (1 - )2 2 1/3 (0, 1, 1) (1 - )2 2

3(1 - )2

1/3 (1, 1, 1) 3 3 3 1

The Likelihood function

S = 0

0

0.2

0.4

0.6

0.8

1

1.2

0 0.2 0.4 0.6 0.8 1

S = 1

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0 0.2 0.4 0.6 0.8 1

S = 2

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0 0.2 0.4 0.6 0.8 1

S = 3

0

0.2

0.4

0.6

0.8

1

1.2

0 0.2 0.4 0.6 0.8 1

Page 62: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Estimation Theory

Point Estimation

Page 63: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (Estimator)

Let x = (x1 ,x2 ,x3 , ... , xn) denote the vector of observations having joint density f(x|) where the unknown parameter vector .

Then an estimator of the parameter () = (1 ,2 , ... , k) is any function T(x)=T(x1 ,x2 ,x3 , ... , xn) of the observation vector.

Page 64: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (Mean Square Error)

Let x = (x1 ,x2 ,x3 , ... , xn) denote the vector of observations having joint density f(x|) where the unknown parameter vector . Let T(x) be an estimator of the parameter (). Then the Mean Square Error of T(x) is defined to be:

2))()((... θxθx TEESM T

xθxθx dfT )|())()(( 2

Page 65: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (Uniformly Better)

Let x = (x1 ,x2 ,x3 , ... , xn) denote the vector of observations having joint density f(x|) where the unknown parameter vector . Let T(x) and T*(x) be estimators of the parameter (). Then T(x) is said to be uniformly better than T*(x) if:

θθ xx *...... TT ESMESM θwhenever

Page 66: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (Unbiased )

Let x = (x1 ,x2 ,x3 , ... , xn) denote the vector of observations having joint density f(x|) where the unknown parameter vector . Let T(x) be an estimator of the parameter (). Then T(x) is said to be an unbiased estimator of the parameter () if:

θxθxxx dfTTE )|()(

Page 67: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Theorem (Cramer Rao Lower bound) Let x = (x1 ,x2 ,x3 , ... , xn) denote the vector of observations having joint density f(x|) where the unknown parameter vector . Suppose that: i) exists for all x and for all . θ

θ

θx

)|(f

ii)

θxxθx

θd

fdf

)|()|(

iii)

iv)

θxxxθxx

θd

ftdft

)|()|(

θ

θx allfor

)|(0

2

i

fE

Page 68: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Let M denote the p x p matrix with ijth element.

θ̂

pjif

Emji

ij ,,2,1, )|(ln2

θx

Then V = M-1 is the lower bound for the covariance matrix of unbiased estimators of .

That is, var(c' ) = c'var( )c ≥ c'M-1c = c'Vc where is a vector of unbiased estimators of .

θ̂ θ̂

Page 69: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (Uniformly Minimum Variance Unbiased Estimator)

Let x = (x1 ,x2 ,x3 , ... , xn) denote the vector of observations having joint density f(x|) where the unknown parameter vector . Then T*(x) is said to be the UMVU (Uniformly minimum variance unbiased) estimator of() if:

1) E[T*(x)] = () for all .2) Var[T*(x)] ≤ Var[T(x)] for all

whenever E[T(x)] = ().

Page 70: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Theorem (Rao-Blackwell)

Let x = (x1 ,x2 ,x3 , ... , xn) denote the vector of observations having joint density f(x|) where the unknown parameter vector . Let S1(x), S2(x), ...,SK(x) denote a set of sufficient statistics.Let T(x) be any unbiased estimator of (). Then T*[S1(x), S2(x), ...,Sk (x)] = E[T(x)|S1(x), S2(x), ...,Sk (x)] is an unbiased estimator of () such that:

Var[T*(S1(x), S2(x), ...,Sk(x))] ≤ Var[T(x)] for all .

Page 71: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Theorem (Lehmann-Scheffe')

Let x = (x1 ,x2 ,x3 , ... , xn) denote the vector of observations having joint density f(x|) where the unknown parameter vector .

Let S1(x), S2(x), ...,SK(x) denote a set of complete

sufficient statistics.

Let T*[S1(x), S2(x), ...,Sk (x)] be an unbiased estimator of (). Then:

T*(S1(x), S2(x), ...,Sk(x)) )] is the UMVU estimator of ().

Page 72: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (Consistency)

Let x = (x1 ,x2 ,x3 , ... , xn) denote the vector of observations having joint density f(x|) where the unknown parameter vector . Let Tn(x) be an estimator of(). Then Tn(x) is called a consistent estimator of () if for any > 0:

θθx allfor 0lim nn

TP

Page 73: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (M. S. E. Consistency)

Let x = (x1 ,x2 ,x3 , ... , xn) denote the vector of observations having joint density f(x|) where the unknown parameter vector . Let Tn(x) be an estimator of(). Then Tn(x) is called a M. S. E. consistent estimator of () if for any > 0:

0lim...lim 2

θxθ nn

Tn

TEESMn

θ allfor

Page 74: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Methods for Finding Estimators

1. The Method of Moments

2. Maximum Likelihood Estimation

Page 75: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Methods for finding estimators

1. Method of Moments

2. Maximum Likelihood Estimation

Page 76: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Let x1, … , xn denote a sample from the density function

f(x; 1, … , p) = f(x; )

Method of Moments

The kth moment of the distribution being sampled is defined to be:

1 1, , ; , ,k kk p pE x x f x dx

Page 77: Brief Review Probability and Statistics. Probability distributions Continuous distributions

To find the method of moments estimator of 1, … , p we set up the equations:

The kth sample moment is defined to be:

1

1 nk

k ii

m xn

1 1 1, , p m

2 1 2, , p m

1, ,p p pm

Page 78: Brief Review Probability and Statistics. Probability distributions Continuous distributions

for 1, … , p.

We then solve the equations

1 1 1, , p m

2 1 2, , p m

1, ,p p pm

The solutions 1, , p

are called the method of moments estimators

Page 79: Brief Review Probability and Statistics. Probability distributions Continuous distributions

The Method of Maximum Likelihood

Suppose that the data x1, … , xn has joint density function

f(x1, … , xn ; 1, … , p)

where (1, … , p) are unknown parameters assumed to lie in (a subset of p-dimensional space).

We want to estimate the parameters1, … , p

Page 80: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Definition: Maximum Likelihood Estimation

Suppose that the data x1, … , xn has joint density function

f(x1, … , xn ; 1, … , p)

Then the Likelihood function is defined to be

L() = L(1, … , p)

= f(x1, … , xn ; 1, … , p)

the Maximum Likelihood estimators of the parameters 1, … , p are the values that maximize

L() = L(1, … , p)

Page 81: Brief Review Probability and Statistics. Probability distributions Continuous distributions

the Maximum Likelihood estimators of the parameters 1, … , p are the values

1

1 1, ,

ˆ ˆ, , max , ,p

p pL L

1̂ˆ, , p

Such that

Note: 1maximizing , , pL is equivalent to maximizing

1 1, , ln , ,p pl L

the log-likelihood function

Page 82: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Application

The General Linear Model

Page 83: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Consider the random variable Y with

1. E[Y] = g(U1 ,U2 , ... , Uk)

= 11(U1 ,U2 , ... , Uk) + 22(U1 ,U2 , ... , Uk) + ... + pp(U1 ,U2 , ... , Uk)

=

and

2. var(Y) = 2

• where 1, 2 , ... ,p are unknown parameters

• and 1 ,2 , ... , p are known functions of the nonrandom variables U1 ,U2 , ... , Uk.

• Assume further that Y is normally distributed.

k

p

iii UUU ,...,, 2

1

Page 84: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Thus the density of Y is:

f(Y|1, 2 , ... ,p, 2) = f(Y| , 2)

2

2122),...,,(

2

1exp

2

1kUUUgY

s

2

211

22,...,

2

1exp

2

1ki

p

ii UUUY

2

221122...

2

1exp

2

1pp XXXY

kii UUUX ,..., where 21 i = 1,2, … , p

Page 85: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Now suppose that n independent observations of Y,

(y1, y2, ..., yn) are made

corresponding to n sets of values of (U1 ,U2 , ... , Uk) - (u11 ,u12 , ... , u1k),

(u21 ,u22 , ... , u2k),...

(un1 ,un2 , ... , unk).

Let xij = j(ui1 ,ui2 , ... , uik) j =1, 2, ..., p; i =1, 2, ..., n.

Then the joint density of y = (y1, y2, ... yn) is:

f(y1, y2, ..., yn|1, 2 , ... ,p, 2) = f(y|, 2)

n

ikiiiin

uuugy1

22122/2

),...,,(2

1exp

2

1

Page 86: Brief Review Probability and Statistics. Probability distributions Continuous distributions

n

i

p

jkiiijjin

uuuy1

2

12122/2

),...,,(2

1exp

2

1

n

i

p

jijjin

xy1

2

122/2 2

1exp

2

1

XβyXβy

22/2 2

1exp

2

1

n

XβXβXβyyy 2

2

1exp

2

122/2 n

XβyyyXβXβ 2

2

1exp

2

1exp

2

1222/2 n

Xβyyyβy 2

2

1exp,

22

gh

Page 87: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Thus f(y|,2) is a member of the exponential family of distributions

and S = (y'y, X'y) is a Minimal Complete set of Sufficient Statistics.

Page 88: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Hypothesis Testing

Page 89: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (Test of size )

Let x = (x1 ,x2 ,x3 , ... , xn) denote the vector of observations having joint density f(x| ) where the unknown parameter vector .

Let be any subset of .

Consider testing the the Null Hypothesis

H0:

against the alternative hypothesis

H1: .

Page 90: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Let A denote the acceptance region for the test. (all values x = (x1 ,x2 ,x3 , ... , xn) of such that the decision to accept H0 is made.)

and let C denote the critical region for the test (all values x = (x1 ,x2 ,x3 , ... , xn) of such that the decision to reject H0 is made.).

Then the test is said to be of size if

and allfor )|( θxθxxC

dfCP

0 oneleast at for )|( θxθxxC

dfCP

Page 91: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (Power) Let x = (x1 ,x2 ,x3 , ... , xn) denote the vector of observations having joint density f(x| ) where the unknown parameter vector .

Consider testing the the Null Hypothesis

H0:

against the alternative hypothesis

H1: .

where is any subset of . Then the Power of the test for is defined to be:

C

C dfCP xθxxθ )|(

Page 92: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (Uniformly Most Powerful (UMP) test of

size )

Let x = (x1 ,x2 ,x3 , ... , xn) denote the vector of observations having joint density f(x|) where the unknown parameter vector . Consider testing the the Null Hypothesis

H0: against the alternative hypothesis

H1: . where is any subset of .Let C denote the critical region for the test . Then the test is called the UMP test of size if:

Page 93: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Let x = (x1 ,x2 ,x3 , ... , xn) denote the vector of observations having joint density f(x| ) where the unknown parameter vector . Consider testing the the Null Hypothesis

H0: against the alternative hypothesis

H1: . where is any subset of .Let C denote the critical region for the test . Then the test is called the UMP test of size if:

Page 94: Brief Review Probability and Statistics. Probability distributions Continuous distributions

and allfor )|( θxθxxC

dfCP

0 oneleast at for )|( θxθxxC

dfCP

and for any other critical region C* such that:

and allfor )|(**

θxθxxC

dfCP

0

*

oneleast at for )|(* θxθxxC

dfCP

then

. allfor )|()|(*

θxθxxθxCC

dfdf

Page 95: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Theorem (Neymann-Pearson Lemma)Let x = (x1 ,x2 ,x3 , ... , xn) denote the vector of observations having joint density f(x| ) where the unknown parameter vector = (0, 1).

Consider testing the the Null Hypothesis

H0: = 0

against the alternative hypothesis

H1: = 1.

Then the UMP test of size has critical region:

Kf

fC

)|(

)|(

1

0

θx

θxx

where K is chosen so that C

df xθx )|( 0

Page 96: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Defn (Likelihood Ratio Test of size )Let x = (x1 ,x2 ,x3 , ... , xn) denote the vector of observations having joint density f(x| ) where the unknown parameter vector .

Consider testing the the Null Hypothesis

H0:

against the alternative hypothesis

H1: .

where is any subset of Then the Likelihood Ratio (LR) test of size a has critical region:

where K is chosen so that

Kf

fC

)|(max

)|(max

θx

θxx

θ

θ

and allfor )|( θxθxxC

dfCP

0 oneleast at for )|( θxθxxC

dfCP

Page 97: Brief Review Probability and Statistics. Probability distributions Continuous distributions

Theorem (Asymptotic distribution of Likelihood ratio test criterion)

Let x = (x1 ,x2 ,x3 , ... , xn) denote the vector of observations having joint density f(x| ) where the unknown parameter vector .

Consider testing the the Null Hypothesis

H0:

against the alternative hypothesis

H1: .

where is any subset of

Then under proper regularity conditions on U = -2ln(x) possesses an asymptotic Chi-square distribution with degrees of freedom equal to the difference between the number of independent parameters in and .

)|(max

)|(maxLet

θx

θxx

θ

θ

f

f