Chapter 5 Inferences Regarding Population Central Values

Chapter 5

Inferences Regarding Population Central Values

Inferential Methods for Parameters

• Parameter: Numeric Description of a Population• Statistic: Numeric Description of a Sample• Statistical Inference: Use of observed statistics to

make statements regarding parameters– Estimation: Predicting the unknown parameter based on

sample data. Can be either a single number (point estimate) or a range (interval estimate)

– Testing: Using sample data to see whether we can rule out specific values of an unknown parameter with a certain level of confidence

Estimating with Confidence

• Goal: Estimate a population mean based on sample mean • Unknown: Parameter ()• Known: Approximate Sampling Distribution of Statistic

n

NY,~

• Recall: For a random variable that is normally distributed, the probability that it will fall within 2 standard deviations of mean is approximately 0.95

95.022

n

Yn

P

Estimating with Confidence

• Although the parameter is unknown, it’s highly likely that our sample mean (estimate) will lie within 2 standard deviations (aka standard errors) of the population mean (parameter)

• Margin of Error: Measure of the upper bound in sampling error with a fixed level (we will typically use 95%) of confidence. That will correspond to 2 standard errors:

error ofmargin estimate :Interval Confidence

2 :)Confidence (95%Error ofMargin

n

Confidence Interval for a Mean • Confidence Coefficient (1 Probability (based on

repeated samples and construction of intervals) that a confidence interval will contain the true mean

• Common choices of 1- and resulting intervals:

nzy

ny

ny

ny

2/ :Confidence %100)1(

576.2 :Confidence 99%



1- /2 1- /2 z_ /20.90 0.050 0.950 1.6450.95 0.025 0.975 1.9600.99 0.005 0.995 2.576

Standard Normal Distribution

2/z2/z

1- 2

2

0

Normal Distribution

nz

2/n

z 2/

1- 2

2

Philadelphia Monthly Rainfall (1825-1869)

Histogram

0

20

40

60

80

100

120

140

Freq

uenc

y

1

2

3

4

5

6

7

8

9

10

11

12

13

14

1584.0

20

92.196.1:%)95 20,(error ofMargin 92.168.3 Cn

4 Random Samples of Size n=20, 95% CI’s Sample 1 Sample 2 Sample 3 Sample 4

Month Rain Ran# Month Rain Ran# Month Rain Ran# Month Rain Ran#156 2.56 0.0028 349 2.33 0.0007 185 2.69 0.0005 171 1.50 0.001151 2.87 0.0050 149 4.86 0.0013 527 5.28 0.0029 175 2.52 0.0048176 4.64 0.0052 227 4.15 0.0054 114 3.99 0.0048 130 1.22 0.0085364 2.05 0.0082 336 5.17 0.0073 312 4.51 0.0084 167 3.35 0.0094271 2.76 0.0142 124 4.33 0.0081 49 5.37 0.0085 101 5.88 0.01337 2.06 0.0145 330 4.03 0.0101 398 2.29 0.0166 33 0.79 0.0148312 4.51 0.0153 468 4.63 0.0132 396 5.55 0.0187 299 2.60 0.0164219 4.41 0.0160 293 3.99 0.0145 99 2.22 0.0233 337 1.85 0.019116 3.87 0.0171 511 2.39 0.0149 181 1.84 0.0235 447 3.55 0.0193484 2.83 0.0190 235 5.28 0.0172 364 2.05 0.0244 78 3.53 0.0213316 4.56 0.0202 314 3.11 0.0190 392 7.59 0.0253 117 3.57 0.0224318 3.44 0.0257 372 5.42 0.0260 477 7.16 0.0283 399 1.09 0.0227517 3.62 0.0272 164 2.78 0.0272 434 2.07 0.0290 52 4.99 0.0240249 2.16 0.0301 48 0.26 0.0281 229 4.05 0.0318 162 6.60 0.0261445 4.79 0.0320 236 2.40 0.0284 223 4.54 0.0320 95 2.59 0.029613 1.11 0.0324 50 3.75 0.0319 279 2.76 0.0364 479 3.93 0.0296479 3.93 0.0325 39 3.35 0.0325 520 5.44 0.0374 51 2.87 0.0303370 4.11 0.0345 417 7.68 0.0333 245 1.60 0.0374 380 6.00 0.0311348 2.17 0.0374 503 1.76 0.0359 183 2.63 0.0391 61 1.63 0.032489 5.40 0.0380 151 5.89 0.0361 41 3.49 0.0395 302 2.87 0.0339

Mean 3.39 3.88 3.86 3.15Mean-me 2.55 3.04 3.02 2.31Mean+me 4.23 4.72 4.70 3.99

84.020

92.196.1:%)95 20,(error ofMargin 92.168.3 Cn

Factors Effecting Confidence Interval Width

• Goal: Have precise (narrow) confidence intervals– Confidence Level (1) Increasing 1- implies increasing

probability an interval contains parameter implies a wider confidence interval. Reducing 1- will shorten the interval (at a cost in confidence)

– Sample size (n): Increasing n decreases standard error of estimate, margin of error, and width of interval (Quadrupling n cuts width in half)

– Standard Deviation (): More variable the individual measurements, the wider the interval. Potential ways to reduce are to focus on more precise target population or use more precise measuring instrument. Often nothing can be done as nature determines

Precautions• Data should be simple random sample from population

(or at least can be treated as independent observations)• More complex sampling designs have adjustments

made to formulas (see Texts such as Elementary Survey Sampling by Scheaffer, Mendenhall, Ott)

• Biased sampling designs give meaningless results• Small sample sizes from nonnormal distributions will

have coverage probabilities (1) typically below the nominal level

• Typically is unknown. Replacing it with the sample standard deviation s works as a good approximation in large samples

Selecting the Sample Size

• Before collecting sample data, usually have a goal for how large the margin of error should be to have useful estimate of unknown parameter (particularly when comparing two populations)

• Let E be the desired level of the margin of error and be the standard deviation of the population of measurements (typically will be unknown and must be estimated based on previous research or pilot study)

• The sample size giving this margin of error is:

2

2/2/

E

zn

nzE

Hypothesis Tests

• Method of using sample (observed) data to challenge a hypothesis regarding a state of nature (represented as particular parameter value(s))

• Begin by stating a research hypothesis that challenges a statement of “status quo” (or equality of 2 populations)

• State the current state or “status quo” as a statement regarding population parameter(s)

• Obtain sample data and see to what extent it agrees/disagrees with the “status quo”

• Conclude that the “status quo” is not true if observed data are highly unlikely (low probability) if it were true

Elements of a Hypothesis Test (I)

• Null hypothesis (H0): Statement or theory being tested. Stated in terms of parameter(s) and contains an equality. Test is set up under the assumption of its truth.

• Alternative Hypothesis (Ha): Statement contradicting H0. Stated in terms of parameter(s) and contains an inequality. Will only be accepted if strong evidence refutes H0 based on sample data. May be 1-sided or 2-sided, depending on theory being tested.

• Test Statistic (T.S.): Quantity measuring discrepancy between sample statistic (estimate) and parameter value under H0

• Rejection Region (R.R.): Values of test statistic for which we reject H0 in favor of Ha

• P-value: Probability (assuming H0 true) that we would observe sample data (test statistic) this extreme or more extreme in favor of the alternative hypothesis (Ha)

Example: Interference Effect

• Does the way items are presented effect task time?– Subjects shown list of color names in 2 colors: different/black

– yi is the difference in times to read lists for subject i: diff-blk

– H0: No interference effect: mean difference is 0 ( = 0)

– Ha: Interference effect exists: mean difference > 0 ( > 0)

– Assume standard deviation in differences is = 8 (unrealistic*)

– Experiment to be based on n=70 subjects

39.2 :mean sample Observed

)96.070

8,0(~:under mean sample ofon Distributi eApproximat

0:under valueParameter

0

0

x

nNXH

H

How likely to observe sample mean difference 2.39 if = 0?

Sampling Distribution of X-bar

0 2.39

P-value

Elements of a Hypothesis Test (II)

• Type I Error: Test resulting in rejection of H0 in favor of Ha when H0 is in fact true – P(Type I error) = (typically .10, .05, or .01)

• Type II Error: Test resulting in failure to reject H0 in favor of Ha when in fact Ha is true (H0 is false)– P(Type II error) = (depends on true parameter value)

• 1-Tailed Test: Test where the alternative hypothesis states specifically that the parameter is strictly above (below) the null value

• 2-Tailed Test: Test where the alternative hypothesis is that the parameter is not equal to null value (simultaneously tests “greater than” and “less than”)

Test Statistic

• Parameter: Population mean ( ) under H0 is 0

• Statistic (Estimator): Sample mean obtained from sample measurements is

• Standard Error of Estimator:

• Sampling Distribution of Estimator: – Normal if shape of distribution of individual measurements is

normal – Approximately normal regardless of shape for large samples

• Test Statistic: (labeled simply as z in text)

yn

y

n

yzobs

0

Note: Typically is unknown and is replaced by s in large samples

Decision Rules and Rejection Regions• Once a significance () level has been chosen a decision

rule can be stated, based on a critical value:

• 2-sided tests: H0: = 0 Ha: 0

– If test statistic (zobs) > z/2 Reject Ho and conclude > 0

– If test statistic (zobs) < -z/2 Reject Ho and conclude < 0

– If -z/2 < zobs < z/2 Do not reject H0: = 0

• 1-sided tests (Upper Tail): H0: 0 Ha: > 0

– If test statistic (zobs) > z Reject Ho and conclude > 0

– If zobs < z Do not reject H0: 0

• 1-sided tests (Lower Tail): H0: 0 Ha: < 0

– If test statistic (zobs) < -z Reject Ho and conclude < 0

– If zobs > -z Do not reject H0: 0

Computing the P-Value

• 2-sided Tests: How likely is it to observe a sample mean as far of farther from the value of the parameter under the null hypothesis? (H0: 0 Ha: 0)

)1,0(~,~:Under 000 N

n

YZ

nNYH

After obtaining the sample data, compute the mean and convert it to a z-score (zobs) and find the area above |zobs| and below -|zobs| from the standard normal (z) table

• 1-sided Tests: Obtain the area above zobs for upper tail tests (Ha:0) or below zobs for lower tail tests (Ha:0)

Interference Effect (1-sided Test)• Testing whether population mean time to read list of colors is higher

when color is written in different color

• Data: yi: difference score for subject i (Different-Black)

• Null hypothesis (H0): No interference effect (H0: 0)

• Alternative hypothesis (Ha): Interference effect (Ha: > 0)

• n = 70 subjects in experiment, reasonably large sample

0051.9949.1)57.2(:)7.81on (Based value-

645.1 ifHReject :0.05)(Region Rejection

57.293.0

39.2

70

81.7

039.2:)81.7on (Based StatisticTest

7081.739.2 :Data Sample

05.0

ZPsP

zz

zs

nsy

obs

obs

Conclude there is evidence of an interference effect ( > 0)

Interference Effect (2-sided Test)• Testing whether population mean time to read list of colors is effected

(higher or lower) when color is written in different color

• Data: Xi: difference score for subject i (Different-Black)

• Null hypothesis (H0): No interference effect (H0: = 0)

• Alternative hypothesis (Ha): Interference effect (+ or -) (Ha: 0)

0102.)9949.1(2|)57.2|(2:)7.81on (Based value-

96.1|| : 0.05)(Region Rejection

57.293.0

39.2

70

81.7

039.2:)81.7on (Based StatisticTest

7081.739.2 :Data Sample

2/05.

ZPsP

zz

zs

nsx

obs

obs

Again, evidence of an interference effect ( > 0)

Equivalence of 2-sided Tests and CI’s

• For given , a 2-sided test conducted at significance level will give equivalent results to a (1)level confidence interval:– If entire interval > 0, P-value < , zobs > z/2 (conclude > 0)

– If entire interval < 0, P-value < , zobs < -z/2 (conclude < 0)

– If interval contains 0, P-value > , -z/2< zobs < z/2 (don’t conclude 0)

• Confidence interval is the set of parameter values that we would fail to reject the null hypothesis for (based on a 2-sided test)

Power of a Test

• Power - Probability a test rejects H0 (depends on )

– H0 True: Power = P(Type I error) = – H0 False: Power = 1-P(Type II error) = 1-

· Example (Using context of interference data): · H0: = 0 HA: > 0

n=16

· Decision Rule: Reject H0 (at =0.05 significance level) if:

29.3645.14

02

yy

n

yzobs

Power of a Test

• Now suppose in reality that = 3.0 (HA is true)

• Power now refers to the probability we (correctly) reject the null hypothesis. Note that the sampling distribution of the sample mean is approximately normal, with mean 3.0 and standard deviation (standard error) 2.0.

• Decision Rule (from last slide): Conclude population mean interference effect is positive (greater than 0) if the sample mean difference score is above 3.29

• Power for this case can be computed as:

)0.2,0.3(~when 29.3 NyyP

Power of a Test

4424.5576.1145.00.2

0.329.3)29.3(

ZPyPPower

• All else being equal:

• As sample size increases, power increases

• As population variance decreases, power increases

• As the true mean gets further from 0 , power increases

Power of a Test

Distribution (H0) Distribution (HA)

Power of Z-test

-5 -3 -1 1 3 5 7 9

Y-bar

Den

sity

of

Sam

plin

g D

istr

ibu

tio

n

H0(m=0)

HA(m=3)

Reject H0

Fail to reject H0

.4424

.05.95

.5576

•Power Curves for sample sizes of 16,32,64,80 and varying true values from 0 to 5 with = 8.

• For given , power increases with sample size

• For given sample size, power increases with

Power of Z-test

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Mu

P(R

ejec

t H

0) n=16

n=32

n=64

n=80

Sample Size Calculations for Fixed Power• Goal - Choose sample size to have a favorable chance of detecting

important difference from 0 in 2-sided test: H0:= 0 vs Ha: 0

• Step 1 - Define an important difference to be detected )– Case 1: approximated from prior experience or pilot study - difference can

be stated in units of the data– Case 2: unknown - difference must be stated in units of standard deviations

of the data

0

0 :2 Case - :1 Case

• Step 2 - Choose the desired power to detect the desired important difference (1-, typically at least .80). For 2-sided test:

2

22/2

2/2

2

:2 Case :1 Case

zznzzn

Example - Interference Data

• 2-Sided Test: H0:= vs Ha: 0

• Set = P(Type I Error) = 0.05

• Choose important difference of |0|==2.0

• Choose Power=P(Reject H0|=2.0) = .90

• Set = P(Type II Error) = 1-Power = 1-.90 = .10 • From study, we know 8

1692.168282.196.1)2(

)8(

282.196.1

2

2

22

2/2

2

10.025.2/05.2/

zzn

zzzzz

Would need 169 subjects to have a .90 probability of detecting effect

Potential for Abuse of Tests

• Should choose a significance () level in advance and report test conclusion (significant/nonsignificant) as well as the P-value. Significance level of 0.05 is widely used in the academic literature

• Very large sample sizes can detect very small differences for a parameter value. A clinically meaningful effect should be determined, and confidence interval reported when possible

• A nonsignificant test result does not imply no effect (that H0 is true).

• Many studies test many variables simultaneously. This can increase overall type I error rates

Family of t-distributions

• Symmetric, Mound-shaped, centered at 0 (like the standard normal (z) distribution

• Indexed by degrees of freedom (df) the number of independent observations (deviations) comprising the estimated standard deviation. For one sample problems df = n-1

• Have heavier tails (more probability over extreme ranges) than the z-distribution

• Converge to the z-distribution as df gets large• Tables of critical values for certain upper tail

probabilities are available (Table 3, p. 679)

Inference for Population Mean

• Practical Problem: Sample mean has sampling distribution that is Normal with mean and standard deviation / n (when the data are normal, and approximately so for large samples). is unknown.

• Have an estimate of , s obtained from sample data. Estimated standard error of the sample mean is:

)1(~

nt

nsx

t

n

sSE

x

When the sample is SRS from N(then the t-statistic (same as z- with estimated standard deviation) is distributed t with n-1 degrees of freedom

df 0.25 0.2 0.15 0.1 0.05 0.025 0.02 0.01 0.005 0.0025 0.001 0.00051 1.000 1.376 1.963 3.078 6.314 12.71 15.89 31.82 63.66 127.3 318.3 636.62 0.816 1.061 1.386 1.886 2.920 4.303 4.849 6.965 9.925 14.09 22.33 31.603 0.765 0.978 1.250 1.638 2.353 3.182 3.482 4.541 5.841 7.453 10.21 12.924 0.741 0.941 1.190 1.533 2.132 2.776 2.999 3.747 4.604 5.598 7.173 8.6105 0.727 0.920 1.156 1.476 2.015 2.571 2.757 3.365 4.032 4.773 5.894 6.8696 0.718 0.906 1.134 1.440 1.943 2.447 2.612 3.143 3.707 4.317 5.208 5.9597 0.711 0.896 1.119 1.415 1.895 2.365 2.517 2.998 3.499 4.029 4.785 5.4088 0.706 0.889 1.108 1.397 1.860 2.306 2.449 2.896 3.355 3.833 4.501 5.0419 0.703 0.883 1.100 1.383 1.833 2.262 2.398 2.821 3.250 3.690 4.297 4.781

10 0.700 0.879 1.093 1.372 1.812 2.228 2.359 2.764 3.169 3.581 4.144 4.58711 0.697 0.876 1.088 1.363 1.796 2.201 2.328 2.718 3.106 3.497 4.025 4.43712 0.695 0.873 1.083 1.356 1.782 2.179 2.303 2.681 3.055 3.428 3.930 4.31813 0.694 0.870 1.079 1.350 1.771 2.160 2.282 2.650 3.012 3.372 3.852 4.22114 0.692 0.868 1.076 1.345 1.761 2.145 2.264 2.624 2.977 3.326 3.787 4.14015 0.691 0.866 1.074 1.341 1.753 2.131 2.249 2.602 2.947 3.286 3.733 4.07316 0.690 0.865 1.071 1.337 1.746 2.120 2.235 2.583 2.921 3.252 3.686 4.01517 0.689 0.863 1.069 1.333 1.740 2.110 2.224 2.567 2.898 3.222 3.646 3.96518 0.688 0.862 1.067 1.330 1.734 2.101 2.214 2.552 2.878 3.197 3.610 3.92219 0.688 0.861 1.066 1.328 1.729 2.093 2.205 2.539 2.861 3.174 3.579 3.88320 0.687 0.860 1.064 1.325 1.725 2.086 2.197 2.528 2.845 3.153 3.552 3.85021 0.686 0.859 1.063 1.323 1.721 2.080 2.189 2.518 2.831 3.135 3.527 3.81922 0.686 0.858 1.061 1.321 1.717 2.074 2.183 2.508 2.819 3.119 3.505 3.79223 0.685 0.858 1.060 1.319 1.714 2.069 2.177 2.500 2.807 3.104 3.485 3.76824 0.685 0.857 1.059 1.318 1.711 2.064 2.172 2.492 2.797 3.091 3.467 3.74525 0.684 0.856 1.058 1.316 1.708 2.060 2.167 2.485 2.787 3.078 3.450 3.72526 0.684 0.856 1.058 1.315 1.706 2.056 2.162 2.479 2.779 3.067 3.435 3.70727 0.684 0.855 1.057 1.314 1.703 2.052 2.158 2.473 2.771 3.057 3.421 3.68928 0.683 0.855 1.056 1.313 1.701 2.048 2.154 2.467 2.763 3.047 3.408 3.67429 0.683 0.854 1.055 1.311 1.699 2.045 2.150 2.462 2.756 3.038 3.396 3.66030 0.683 0.854 1.055 1.310 1.697 2.042 2.147 2.457 2.750 3.030 3.385 3.64640 0.681 0.851 1.050 1.303 1.684 2.021 2.123 2.423 2.704 2.971 3.307 3.55150 0.679 0.849 1.047 1.299 1.676 2.009 2.109 2.403 2.678 2.937 3.261 3.49660 0.679 0.848 1.045 1.296 1.671 2.000 2.099 2.390 2.660 2.915 3.232 3.46080 0.678 0.846 1.043 1.292 1.664 1.990 2.088 2.374 2.639 2.887 3.195 3.416

100 0.677 0.845 1.042 1.290 1.660 1.984 2.081 2.364 2.626 2.871 3.174 3.3901000 0.675 0.842 1.037 1.282 1.646 1.962 2.056 2.330 2.581 2.813 3.098 3.300

z* 0.674 0.842 1.036 1.282 1.645 1.960 2.054 2.326 2.576 2.807 3.090 3.290

Probability

Degrees

of

Freedom

Cri t ical

Values

Critical Values

t(5), t(15), t(25), z distributions

-4 -3 -2 -1 0 1 2 3 4

Den

sity

t(5)

t(15)

t(25)

z

One-Sample Confidence Interval for • SRS from a population with mean is obtained.• Sample mean, sample standard deviation are obtained• Degrees of freedom are df= n-1, and confidence level

(1-) are selected• Level (1-) confidence interval of form:

1))1(( thatso table- from selected 2/2/2/

2/

tnttPttn

sty

Procedure is theoretically derived based on normally distributed data, but has been found to work well regardless for large n

1-Sample t-test (2-tailed alternative)

• 2-sided Test: H0: = 0 Ha: 0

• Decision Rule (t/2 such that P(t(n-1) t/2)=/2) :

– Conclude > 0 if Test Statistic (tobs) is greater than t/2

– Conclude < 0 if Test Statistic (tobs) is less than -t/2

– Do not conclude Conclude 0 otherwise

• P-value: 2P(t(n-1) |tobs|)

• Test Statistic:

ns

ytobs

/0

P-value (2-tailed test)t(n-1)

-4 -3 -2 -1 0 1 2 3 4|tobs|-|tobs|

P -value

-|tobs| |tobs|

1-Sample t-test (1-tailed (upper) alternative)

• 1-sided Test: H0: = 0 Ha: > 0

• Decision Rule (t such that P(t(n-1) t)=) :

– Conclude > 0 if Test Statistic (tobs) is greater than t– Do not conclude > 0 otherwise

• P-value: P(t(n-1) tobs)

• Test Statistic:

ns

ytobs

/0

P-value (Upper Tail Test)t(n-1)

-4 -3 -2 -1 0 1 2 3 4tobs

P-value

1-Sample t-test (1-tailed (lower) alternative)

• 1-sided Test: H0: = 0 Ha: < 0

• Decision Rule (t obtained such that P(t(n-1) t)=) :

– Conclude < 0 if Test Statistic (tobs) is less than -t– Do not conclude < 0 otherwise

• P-value: P(t(n-1) tobs)

• Test Statistic:

ns

ytobs

/0

P-value (Lower Tail Test)t(n-1)

-4 -3 -2 -1 0 1 2 3 4tobstobs

P-value

Example: Mean Flight Time ATL/Honolulu

• Scheduled flight time: 580 minutes• Sample: n=31 flights 10/2004 (treating as SRS from all

possible flights• Test whether population mean flight time differs from

scheduled time

• H0: = 580 Ha: 580

• Critical value (2-sided test, = 0.05, n-1=30 df): t.025=2.042

• Sample data, Test Statistic, P-value:

10.)05(.2)697.1)30((2|)67.1|)30((2 value-

67.154.3

9.5

31/7.19

5801.574

317.191.574

tPtPP

t

nsy

obs

Inference on a Population Median

• Median: “Middle” of a distribution (50th Percentile)– Equal to Mean for symmetric distribution

– Below Mean for Right-skewed distribution

– Above Mean for Left-skewed dsitribution

• Confidence Interval for Population Median:– Sort observations from smallest to largest (y(1) ...y(n))

– Obtain Lower (L/2) and Upper (U/2) “Bounds of Ranks”

– Small Samples: Obtain C(2),n from Table 5 (p. 682)

– Large Samples: 42 2/),2(

nz

nC n

2/2/

,),( :(M)Median for CI %100)1(

1 ),2(2/),2(2/

ULUL

nn

yyMM

CnUCL

Example - ATL/HNL Flight Times• n=31,

• Small-Sample: C.05(2),31=9

• Large-Sample:

Day Time Ordered1 567 5312 582 5383 592 5424 601 5455 567 5536 585 5587 569 5588 568 5679 569 56710 553 56711 531 56812 538 56913 545 56914 542 57615 558 57716 558 57917 579 58018 584 58219 583 58220 582 58221 577 58222 582 58323 583 58324 596 58425 589 58526 586 58627 567 58928 582 59229 627 59630 580 60131 576 627

105.55.154

3196.1

2

3131),2(05. C

SampleSize

L/2 U/2 ML MU

Small 10 22 567 583Large 11 21 568 582

Documents

Chapter 5 Inferences Regarding Population Central Values