34
3-1 Lecture 3: Inference in SLR STAT 512 Spring 2011 Background Reading KNNL: 2.1 2.6

Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-1

Lecture 3: Inference in SLR

STAT 512

Spring 2011

Background Reading

KNNL: 2.1 – 2.6

Page 2: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-2

Topic Overview

This topic will cover:

Review of hypothesis testing

Inference about 1

Inference about 0

Confidence Intervals

Prediction Intervals

Page 3: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-3

Review: Significance Tests

One Sample T-test

Take a sample of size n from some

(normal) population:

0 0

0

0

:

:a

H Yt

H s Y

Compare t to a critical value from the

students-T distribution (table B.2) with

(typically) 0.05 .

Page 4: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-4

Review: Significance Tests (2)

One Sample T-test: Can turn the test

statistic into a confidence interval for

1 /2, 1nY t s Y

Generally a confidence interval takes the

form

Point Est. ± Crit. Value * SE

Two Sample T-test: Compares the means of

two samples.

Page 5: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-5

Significance Levels

The significance level is the probability

of making a Type I error and rejecting the

null hypothesis when it is in fact true (false

positive).

The most common significance level that we

will use is 0.05 .

The corresponding confidence level is

1 . So for 0.05 our confidence

level will be 95%.

Page 6: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-6

P-Values

The p-value for a test is the probability

(under the null hypothesis) of observing a

test statistic that is at least as extreme as

the one that is actually observed. We

reject the null if P-value

Mathematically, the p-value is

0 1Pr , where ~H nT t T t

Graphically, the p-value is twice the area in

the upper tail of the 1nt distribution (above

the observed t ).

Page 7: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-7

Conclusions

“Conclude Ha” means “there is sufficient

evidence in the data to conclude that H0 is

false, and hence we can assume Ha is true”.

“Fail to Reject H0” means “there is

insufficient evidence in the data to

conclude that either H0 or Ha is true or

false, so we default to assuming that H0 is

true”. Unless prepared to make further

justification (power) it is not appropriate to

“conclude H0”.

Page 8: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-8

Power of a Test

The probability of a Type II error (failing to

reject H0 when Ha is in fact true or a false

negative) is often denoted (not to be

confused with regression coefficients).

The power of a test is 1 . This is the

probability that H0 will be rejected given

that Ha is true.

Power calculations involve the non-central t-

distribution (generally use a computer).

Page 9: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-9

1β Inference

Recall that

1 2

i i XY

Xi

X X Y Y SSb

SSX X

X‟s are constant, Y‟s are normally

distributed. Using probability theory it can

thus be shown that (page 42-43)

21 1 1~ ,b Normal b

where 2

21

X

bSS

Page 10: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-10

Test for 00 1H : β

As in the case of the one-sample t-test, we

can develop the test statistic for testing

H0: 1 0 vs. Ha: 1 0 :

1

1

0bts b

where 1

X

MSEs b

SS

This statistic has a t-distribution with n – 2

degrees of freedom (not n – 1 because we

are also estimating 0 ).

Page 11: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-11

Test for 00 1H : β

Reject H0 if | | critt t , where

(1 ; 2)2critt t n

.

SAS will give us both the value of the t-

statistic and the P-value. If the P-value is

smaller than , reject in favor of

1: 0aH

Page 12: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-12

Confidence Interval for 1β

The 100 1 % CI for 1 is

1 1critb t s b

where (1 ; 2)2critt t n

.

In terms of hypothesis testing, if the CI does

not contain 0, then we reject 0 1: 0H

and conclude that 1: 0aH is true.

Page 13: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-13

Power

In cases where we fail to reject, it is

important to know the power of the test for

0 1: 0H . There are two important

questions we must answer before we can

determine power:

1. What size difference is important?

2. Guess for the variance 2 ?

Note that power calculations should be done

prior to collection of data if possible.

Page 14: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-14

Power (2)

The power to detect a difference of size d is

calculated using the non-central t

distribution. In addition to and the

degrees of freedom, we need the non-

centrality parameter:

1 1

21 / Xb SS

Power for some values of , can be looked

up in Table B5. SAS also has a procedure

for computing power (for any values).

Page 15: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-15

0β Inference

Similar to inference for 1

20 0 0~ ,b Normal b

where 2

2 20

1

X

Xb

n SS

To test 0 k :

0

0

b kts b

where

2

0

1

X

Xs b MSE

n SS

Page 16: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-16

Test for k0 0H : β

The statistic has a t-distribution with n – 2

degrees of freedom; compare it with the

appropriate t-critical value.

SAS gives both statistic and p-value for

testing 0 0 ; to test 0 k , obtain and

use a confidence interval.

The 100 1 % CI for 0 is

0 0critb t s b

Remember: If X = 0 is not within the scope of

the model, inference may be meaningless!!

Page 17: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-17

Robustness

In cases where the errors are not quite

normal, the CIs and significance tests for

1 and 0 are still generally reasonable

approximations.

We say that these tests are robust with

respect to minor violations of the normality

assumption.

Page 18: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-18

SAS Coding

PROC REG data=diamonds;

model price=weight /clb;

RUN;

„clb‟ option in PROC REG requests the

confidence limits for 1b and 0b .

You can also specify alpha=0.xxx to change

the significance level (default = 0.05)

Page 19: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-19

SAS Output

Parameter Std

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 -259.625 17.318 -14.99 <.0001

weight 1 3721.024 81.785 45.50 <.0001

Variable DF 95% Confidence Limits

Intercept 1 -294.48696 -224.76486

weight 1 3556.39841 3885.65129

Page 20: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-20

Summary of Inference

SLR Model

0 1i i iY X

2~ 0,i Normal are independent, random

errors

20 1~ ,i iY Normal X

Page 21: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-21

Summary of Inference

Parameter Estimates

For 1 :

1 2

i i XY

Xi

X X Y Y SSb

SSX X

For 0 : 0 1b Y b X

For 2 :

22

2i

E

eSSEs MSE

df n

Page 22: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-22

Summary of Inference

100 1 %

Confidence Intervals

1 1critb t s b

0 0critb t s b

Where (1 ; 2)2critt t n

.

Page 23: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-23

Summary of Inference

Significance tests

H0: 1 0 vs. Ha: 1 0 :

1

1

0( 2)

bt t ns b

under H0

H0: 0 0 vs. Ha: 0 0 :

0

0

0( 2)

bt t ns b

under H0

Reject H0 if the P-value is small (<)

Page 24: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-24

CI for the Mean Response

The mean response when hX X is

0 1h hY b b X

hY is a normal random variable (since the

parameter estimates are linear combos of

the iY and these are normal).

To develop a confidence interval we can

obtain a formula for the standard error

from 20b and 2

1b .

Page 25: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-25

Standard Error

The variance associated to hY is

20 1

2

2

ˆ

1

h h

h

X

Var Y Var b X Var b

X X

n SS

Substitute MSE for 2 to get the estimated

variance. Take the square root to get the

hs Y

Page 26: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-26

Confidence Interval for hE Y

Recall: Point Est. ± Crit. Value * SE

Confidence Limits are

ˆ ˆh crit hY t s Y

Where (1 ; 2)2critt t n

Page 27: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-27

Prediction Intervals

Predicting a new observation for hX X is

different from estimating the mean

response in that there is additional

variation associated to the normal curve

that is centered at hE Y

Hence two components to ,h news Y

Variance associated to the estimated

mean response.

Variance associated to the new obs.

Page 28: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-28

Prediction Intervals (2)

The variance associated to ,h newY is

2,

2

2

ˆ ˆ

11

h new h

h

X

Var Y Var Y

X X

n SS

As before, substitute MSE for 2 and take

the square root to get ,h news Y , or

equivalently, s pred .

Page 29: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-29

Prediction Intervals (3)

The 100 1 % prediction interval for a

new observation at hX X is given by

h critY t s pred

Where (1 ; 2)2critt t n

Page 30: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-30

CI’s and PI’s in SAS

PROC REG data=diamonds;

model price=weight

/clm cli;

„clm‟ produces CI‟s for the mean response

„cli‟ produces prediction intervals

Intervals produced for each data point

including those with missing values

Page 31: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-31

SAS Output

Predicted Std Error

Obs Wt Price Value Mean Predict 95% CL Mean

1 0.12 223.00 186.897 8.2768 170.237 203.558

2 0.15 323.00 298.528 6.3833 285.679 311.377

49 0.43 . 1340 19.033 1302 1379

Obs Wt 95% CL Predict Residual

1 0.12 120.6754 253.1187 36.1029

2 0.15 233.1609 363.8947 23.4722

49 0.43 1266 1415 .

Page 32: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-32

Comparing Standard Errors

1X

MSEs b

SS

2

0

1

X

Xs b MSE

n SS

2

1ˆ h

hX

X Xs Y MSE

n SS

2

11

h

X

X Xs pred MSE

n SS

Page 33: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-33

Minimizing Standard Errors

Can sometimes design experiments to minimize

standard errors

Increase sample size

Increase XSS by spreading out the values of

the predictor variable

Arrange for the predictor of interest to be

hX X

Page 34: Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-34

Upcoming in Lecture 4...

We will look at one more example

illustrating the use of SAS.

We‟ll discuss the Working-Hotelling

Confidence Band (2.6), details of the

ANOVA table (2.7 – 2.9) and clean up a

few details in 2.10.