62
Statistical Decision Making

Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Embed Size (px)

Citation preview

Page 1: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Statistical Decision Making

Page 2: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

• Almost all problems in statistics can be formulated as a problem of making a decision .

• That is given some data observed from some phenomena, a decision will have to be made about the phenomena

Page 3: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Decisions are generally broken into two types:

• Estimation decisions

and

• Hypothesis Testing decisions.

Page 4: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Probability Theory plays a very important role in these decisions and the assessment of error made by these decisions

Page 5: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Definition:

A random variable X is a numerical quantity that is determined by the outcome of a random experiment

Page 6: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Example :

An individual is selected at random from a population

and

X = the weight of the individual

Page 7: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

The probability distribution of a random variable (continuous) is describe by:

its probability density curve f(x).

Page 8: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

i.e. a curve which has the following properties :• 1.      f(x) is always positive.

• 2.      The total are under the curve f(x) is one.

• 3.      The area under the curve f(x) between a and b is the probability that X lies between the two values.

Page 9: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

0

0.005

0.01

0.015

0.02

0.025

0 20 40 60 80 100 120

f(x)

Page 10: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Examples of some important Univariate distributions

Page 11: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

1.The Normal distribution A common probability density curve is the “Normal” density curve - symmetric and bell shaped Comment: If = 0 and = 1 the distribution is called the standard normal distribution

0

0.005

0.01

0.015

0.02

0.025

0.03

0 20 40 60 80 100 120

Normal distribution with = 50 and =15

Normal distribution with = 70 and =20

Page 12: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

f(x) 1

2e

x 2

2 2

Page 13: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

2.The Chi-squared distribution with degrees of freedom

0 x if2

1)( 2/2/)2(

2/2

xexxf

Page 14: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

2 4 6 8 10 12 14

0.1

0.2

0.3

0.4

0.5

Page 15: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Comment: If z1, z2, ..., z are

independent random variables each having a standard normal distribution then

U =

has a chi-squared distribution with degrees of freedom.

222

21 zzz

Page 16: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

3. The F distribution with degrees of freedom in the

numerator and degrees of

freedom in the denominator if x 0

where K =

f(x) K x (1 2)2 1 1

2

x

12 / 2

1 2

2

1

2

1 / 2

1

2

2

2

Page 17: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

F dist

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 1 2 3 4 5 6

Page 18: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Comment: If U1 and U2 are independent random variables each having Chi-squared distribution with 1 and 2 degrees of freedom respectively then

F =

has a F distribution with degrees of freedom in the numerator and degrees of freedom in the denominator

U1 1

U 2 2

Page 19: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

4.The t distribution with degrees of freedom

where K =

f(x) K 1x2

1 / 2

12

2

Page 20: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

-4 -2 2 4

0.1

0.2

0.3

0.4

Page 21: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Comment: If z and U are independent random variables, and z has a standard Normal distribution while U has a Chi-squared distribution with degrees of freedom then

t =

has a t distribution with degrees of freedom.

z

U

Page 22: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

The Sampling distribution of a statistic

Page 23: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

A random sample from a probability distribution, with density function f(x) is a collection of n independent random variables, x1, x2, ...,xn with a

probability distribution described by f(x).

Page 24: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

If for example we collect a random sample of individuals from a population and

– measure some variable X for each of those individuals,

– the n measurements x1, x2, ...,xn will

form a set of n independent random variables with a probability distribution equivalent to the distribution of X across the population.

Page 25: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

A statistic T is any quantity computed from the random observations x1, x2, ...,xn.

Page 26: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

• Any statistic will necessarily be also a random variable and therefore will have a probability distribution described by some probability density function fT(t).

• This distribution is called the sampling distribution of the statistic T.

Page 27: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

• This distribution is very important if one is using this statistic in a statistical analysis.

• It is used to assess the accuracy of a statistic if it is used as an estimator.

• It is used to determine thresholds for acceptance and rejection if it is used for Hypothesis testing.

Page 28: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Some examples of Sampling distributions of statistics

Page 29: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Distribution of the sample mean for a

sample from a Normal popululation

Let x1, x2, ...,xn is a sample from a normal

population with mean and standard deviation

Let

x x i

i

n

Page 30: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Than

has a normal sampling distribution with mean

and standard deviation

x x i

i

n

x

x n

Page 31: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

0

20 40 60 80 100

Page 32: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Distribution of the z statistic

Let x1, x2, ...,xn is a sample from a normal

population with mean and standard deviation

Let

Then z has a standard normal distibution

n

xz

Page 33: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Comment:

Many statistics T have a normal distribution with mean T and standard deviation T. Then

will have a standard normal distribution.

z T T

T

Page 34: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Distribution of the 2 statistic for sample variance

Let x1, x2, ...,xn is a sample from a normal population with mean and standard deviation Let

= sample variance

and

= sample standard deviation

1

2

2

n

xxs i

i

1

2

n

xxs i

i

Page 35: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Let

Then 2 has chi-squared distribution with = n-1 degrees of freedom.

2 x i x 2

i

2 (n 1)s2

2

Page 36: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

0

0.5

0 4 8 12 16 20 24

The chi-squared distribution

Page 37: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Distribution of the t statistic

Let x1, x2, ...,xn is a sample from a normal population with mean and standard deviation Let

then t has student’s t distribution with = n-1 degrees of freedom

t x s

n

Page 38: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Comment:

If an estimator T has a normal distribution with mean T and standard deviation T.

If sT is an estimatior of T based on degrees of freedom Then

will have student’s t distribution with degrees of freedom.

t T T

s T

Page 39: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

t distribution

standard normal distribution

Page 40: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Point estimation

• A statistic T is called an estimator of the parameter if its value is used as an estimate of the parameter .

• The performance of an estimator T will be determined by how “close” the sampling distribution of T is to the parameter, , being estimated.

Page 41: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

• An estimator T is called an unbiased estimator of if T, the mean of the

sampling distribution of T satisfies T = .

• This implies that in the long run the average value of T is .

Page 42: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

• An estimator T is called the Minimum Variance Unbiased estimator of if T is an unbiased estimator and it has the smallest standard error T amongst all unbiased

estimators of .

• If the sampling distribution of T is normal, the standard error of T is extremely important. It completely describes the variability of the estimator T.

Page 43: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Interval Estimation

• Point estimators give only single values as an estimate. There is no indication of the accuracy of the estimate.

• The accuracy can sometimes be measured and shown by displaying the standard error of the estimate.

Page 44: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

• There is however a better way.

• Using the idea of confidence interval estimates

• The unknown parameter is estimated with a range of values that have a given probability of capturing the parameter being estimated.

Page 45: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

• The interval TL to TU is called a (1 - ) 100 % confidence interval for the parameter , if the probability that lies in the range TL to TU is equal to 1 -

• Here are statistics random numerical quantities calculated from the data.

Page 46: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Examples Confidence interval for the mean of a Normal population

(based on the z statistic).

is a (1 - ) 100 % confidence interval for , the mean of a normal population.

Here z/2 is the upper /2 100 % percentage point of the

standard normal distribution.

TL x z / 2

n

to TU x z / 2

n

Page 47: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

More generally if T is an unbiased estimator of the parameter and has a normal sampling distribution with known standard error T then

is a (1 - ) 100 % confidence interval for .

 

TL T z / 2T to TU T z / 2 T

Page 48: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Confidence interval for the mean of a Normal population (based on the t statistic).

is a (1 - ) 100 % confidence interval for , the mean of a normal population.

Here t/2 is the upper /2 100 % percentage point of the Student’s t distribution with = n-1 degrees of freedom.

TL x t / 2

s

n to TU x t / 2

s

n

Page 49: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

More generally if T is an unbiased estimator of the parameter and has a normal sampling distribution with estmated standard error sT, based on n degrees of freedom, then

is a (1 - ) 100 % confidence interval for .

 

TL T t / 2s T to TU T t / 2s T

Page 50: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Multiple Confidence intervals

In many situations one is interested in estimating not only a single parameter, , but a collection of parameters, 1, 2, 3, ... .

A collection of intervals, TL1 to TU1, TL2 to TU2, TL3 to

TU3, ... are called a set of (1 - ) 100 % multiple

confidence intervals if the probability that all the intervals capture their respective parameters is 1 -

Page 51: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Hypothesis Testing

• Another important area of statistical inference is that of Hypothesis Testing.

• In this situation one has a statement (Hypothesis) about the parameter(s) of the distributions being sampled and one is interested in deciding whether the statement is true or false.

Page 52: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

• In fact there are two hypotheses – The Null Hypothesis (H0) and

– the Alternative Hypothesis (HA).

• A decision will be made either to – Accept H0 (Reject HA) or to

– Reject H0 (Accept HA). The following table

gives the different possibilities for the decision and the different possibilities for the correctness of the decision

Page 53: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

• The following table gives the different possibilities for the decision and the different possibilities for the correctness of the decision

 

Accept H0 Reject H0

H0

is true

Correct Decision

Type I error

H0

is false

Type II error

Correct Decision

Page 54: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

• Type I error - The Null Hypothesis H0 is

rejected when it is true.

• The probability that a decision procedure makes a type I error is denoted by , and is sometimes called the significance level of the test.

• Common significance levels that are used are = .05 and = .01

Page 55: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

• Type II error - The Null Hypothesis H0 is

accepted when it is false.

• The probability that a decision procedure makes a type II error is denoted by .

• The probability 1 - is called the Power of the test and is the probability that the decision procedure correctly rejects a false Null Hypothesis.

Page 56: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

A statistical test is defined by

• 1.    Choosing a statistic for making the decision to Accept or Reject H0. This

statisitic is called the test statistic.

• 2.     Dividing the set of possible values of the test statistic into two regions - an Acceptance and Critical Region.

Page 57: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

• If upon collection of the data and evaluation of the test statistic, its value lies in the Acceptance Region, a decision is made to accept the Null Hypothesis H0.

• If upon collection of the data and evaluation of the test statistic, its value lies in the Critical Region, a decision is made to reject the Null Hypothesis H0.

Page 58: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

• The probability of a type I error, , is usually set at a predefined level by choosing the critical thresholds (boundaries between the Acceptance and Critical Regions) appropriately.

 

Page 59: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

• The probability of a type II error, , is decreased (and the power of the test, 1 - , is increased) by

1.     Choosing the “best” test statistic.

2.     Selecting the most efficient experimental design.

3.     Increasing the amount of information (usually by increasing the sample sizes involved) that the decision is based.

 

Page 60: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Multiple testing

Quite often one is interested in performing collection (family) of tests of hypotheses.

1.     H0,1 versus HA,1.

2.     H0,2 versus HA,2.

3.     H0,3 versus HA,3.

etc.

Page 61: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

• Let * denote the probability that at least one type I error is made in the collection of tests that are performed.

• The value of *, the family type I error rate, can be considerably larger than , the type I error rate of each individual test.

• The value of the family error rate, *, can be controlled by altering the thresholds of each individual test appropriately.

• A testing procedure of this nature is called a Multiple testing procedure.

Page 62: Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from

Independent variables

Dependent Variables

Categorical Continuous Continuous & Categorical

Categorical Multiway frequency Analysis(Log Linear Model)

Discriminant Analysis Discriminant Analysis

Continuous ANOVA (single dep var)MANOVA (Mult dep var)

MULTIPLE REGRESSION(single dep variable)MULTIVARIATEMULTIPLE REGRESSION (multiple dependent variable)

ANACOVA (single dep var)MANACOVA (Mult dep var)

Continuous & Categorical

?? ?? ??