7.11 Using Statistics To Make Inferences 7 Summary Single sample test of variance. Comparison of two...

Preview:

Citation preview

7.11

Using Statistics To Make Inferences 7

Summary 

Single sample test of variance.Comparison of two variances.

Tuesday 18 April 2023 09:31 PM

7.22

GoalsTo perform and interpret χ2 and F tests.

These tests are not available individually within SPSS, but embedded within more complex procedures.

PracticalRevert to the data from practical 5 and employ the Mann-Whitney test where you previously employed a t test.Perform a t test on reading ability data.

Chi squared

7.33

Recall

In lecture 4 we compared the mean of two samples using an appropriate t test.

What assumption did we make about the variances of the two samples?It was assumed that the two sample variances were effectively equal.

Ffffffffffffffffffffffffffff

7.44

Recall

What notation do we employ for a population mean and for a sample mean?

Population mean μSample mean Typically assessed with a t test

xCCCz

7.55

Recall

What notation do we employ for a population variance and for a sample variance?

Population variance σ2

Sample variance s2

Typically assessed with a χ2 or F test

FffffffffffffffffffffFffffff

7.66

Examination of the Variance

Equality of means does not imply equality of variances.

It is often important to control (minimise) the variance.

7.77

Examination of the Variance

How do we compare a sample variance against the expected population value?

7.88

Single Sample Variance Test

The null hypothesis is that a population standard deviation is equal to a particular value σ. Assuming that the data are normally distributed, a sample of size n is obtained from the population and a standard deviation, s, calculated. The test statistic is

2

22 1

sn

calc

7.99

Conclusion

This statistic follows a Chi-squared distribution with ν = n - 1 degrees of freedom and significance level α, )(2

7.1010

Example

From past records students marks have a standard deviation of 10. A group of 20 students are taught by a new method, the standard deviation of their marks is 7.6. Is the group significantly more, or less variable?

H0 is that σ = 10H1 is that σ ≠ 10

7.1111

Conclusion

97.10

10

6.712012

2

2

22

sn

calc

n = 20 s = 7.6 σ = 10 ν = n – 1 = 19

7.1212

Use of Tables

ν p=0.1 p=0.05

p=0.025

p=0.01 p=0.005

p=0.002

19 27.204

30.144

32.852 36.191 38.582 41.610

ν p=0.9 p=0.95

p=0.975

p=0.99 p=0.995

p=0.998

19 11.651 10.117

8.907 7.633 6.844 5.970

14.30)05(.2

19

12.10)95(.2

19

97.102 calc ν = 19

Here the 5% and 95% values are 30.14 and 10.12, the value is not significant at the 10% level (two tail test), the null hypothesis is accepted, there is no evidence that the new method affects the variability of the marks.For 5% (2.5% and 97.5%)the corresponding values are 32.85 and 8.91 and the conclusion is unchanged.

7.1313

Example

The following data is obtained.

36.2 38.1 35.3 34.8 39.6 39.331.4 34.6 40.2 32.2 35.2 37.2

Past experiments suggest that the standard deviation is never more than 2.

H0 is that σ = 2H1 is that σ > 2 (a one sided test)

7.1414

CalculationYou might find the following sums useful Σx = 434.1

and Σx2 = 15790.71

7.1515

CalculationYou might find the following sums useful Σx = 434.1

and Σx2 = 15790.71

815.2

922.71.43412

171.15790

112

1

1

1

1

2

222

s

xn

xn

s

CCCCCCCc

7.1616

Calculation

n = 12 s = 2.815 (direct calculation) σ = 2 ν = n – 1 = 11

79.21

2

815.211212

2

2

22

sn

calc

ν p=0.1 p=0.05

p=0.025

p=0.01

p=0.005

p=0.002

11 17.275

19.675 21.920 24.725 26.757 29.354 68.19)05(.2

11

Since 19.68 < 21.79, the result is significant at the 5% level, this means you can be 95% confident of your result, the null hypothesis is rejected, and the variability is significantly higher.

7.1717

Confidence Interval

21

1

2

1

21

22

21

2

nn

snsn

A confidence interval for the variance, with confidence level 1-α is

7.1818

Confidence IntervalFor example, if n = 31 and s = 27.63, our degrees of freedom are n-1 = 30, so that if the confidence level is 95% (α = 0.05), we look up

979.46025.2

230

21

n

P=0.1 P=0.05 P=0.025 P=0.01 P=0.005 P=0.002 30 40.256 43.773 46.979 50.892 53.672 57.167

21

1

2

1

21

22

21

2

nn

snsn

7.1919

Confidence IntervalFor example, if n = 31 and s = 27.63, our degrees of freedom are n-1 = 30, so that if the confidence level is 95% (α = 0.05), we look up

P=0.9 P=0.95 P=0.975 P=0.99 P=0.995 P=0.998 30 20.599 18.493 16.791 14.953 13.787 12.461

791.16975.2

1 230

21

n

21

1

2

1

21

22

21

2

nn

snsn

7.2020

Confidence Interval

791.16975.2

1 230

21

n

21

1

2

1

21

22

21

2

nn

snsn

Recall n = 31 and s = 27.63

979.46025.2

230

21

n

98.1363

791.16

63.27131

979.46

63.2713151.487

22

2

7.2121

Confidence Interval

So we can be 95% certain that σ lies in the interval [22.08, 36.93] (on taking the square root).

98.1363

791.16

63.27131

979.46

63.2713151.487

22

2

7.2222

Aside

Consider Boys and Girls and the desire to predict gender based on some simple test. Assume that 50% of births are Boys so that Prob(Boy) = Prob(Girl) = ½.

A simple, inexpensive, non-invasive gender testing procedure indicates that it is "perfect" for boys, Prob(Test Boy|Boy) = 1, implying Prob(Test Girl|Boy) = 0. Unfortunately, this simple gender testing procedure for girls is a "coin toss," Prob(Test Girl|Girl) = Prob(Test Boy|Girl) = ½.

By evaluating Prob(Boy|Test Boy) and Prob(Girl|Test Girl) assess which is the most likely.

7.2323

Aside

Consider Boys and Girls and the desire to predict gender based on some simple test. Assume that 50% of births are Boys so that Prob(Boy) = Prob(Girl) = ½.

A simple, inexpensive, non-invasive gender testing procedure indicates that it is "perfect" for boys, Prob(Test Boy|Boy) = 1, implying Prob(Test Girl|Boy) = 0. Unfortunately, this simple gender testing procedure for girls is a "coin toss," Prob(Test Girl|Girl) = Prob(Test Boy|Girl) = ½.

By evaluating Prob(Boy|Test Boy) and Prob(Girl|Test Girl) assess which is the most likely.

7.2424

Aside

Consider Boys and Girls and the desire to predict gender based on some simple test. Assume that 50% of births are Boys so that Prob(Boy) = Prob(Girl) = ½.

A simple, inexpensive, non-invasive gender testing procedure indicates that it is "perfect" for boys, Prob(Test Boy|Boy) = 1, implying Prob(Test Girl|Boy) = 0. Unfortunately, this simple gender testing procedure for girls is a "coin toss," Prob(Test Girl|Girl) = Prob(Test Boy|Girl) = ½.

By evaluating Prob(Boy|Test Boy) and Prob(Girl|Test Girl) assess which is the most likely.

What approach is appropriate (simplest)?

7.2525

SolutionCCCCCCCCCCCCC

What is the grand total of the probabilities?

CCCCCCCCCCCCc

7.2626

Prob(girl|test says girl)CCCCCCCCCCCCCc

Test says girl

CCCCCCCCCCCCCCc

7.2727

Prob(boy|test says boy)CCCCCCCCCCCCC

Test says boy

7.2828

Solution

The tree diagram or application of Bayes theorem yields what seems to be a strange inversion, Prob(Boy|Test Boy) = ⅔ and Prob(Girl|Test Girl) = 1.

That is, somehow, "perfection" switched from Boy to Girl. The test itself was perfect in "confirming" that a Boy was a Boy and has a 50% error rate in confirming that a Girl was a Girl.

CCCCCCCCCCCc

7.2929

Alternate Approach

What if we tested 100 boys and 100 girls?

Complete the following table.

7.3030

Alternate ApproachTest says

boyTest says

girl

Boy 100

Girl 100

200Complete the table

Prob(Test Boy|Boy) = 1Prob(Test Girl|Girl) = Prob(Test Boy|Girl) = ½.

7.3131

Alternate Approach

Test says boy

Test says girl

Boy 100 0 100

Girl 50 50 100

150 50 200

CCCCCCCCCCCCc

7.3232

Prob(girl|test says girl)

Test says boy

Test says girl

Boy 100 0 100

Girl 50 50 100

150 50 200

Prob(Girl|Test says girl) = 50/50 =1

CCCCCCCCCCCCCc

7.3333

Prob(boy|test says boy)

Test says boy

Test says girl

Boy 100 0 100

Girl 50 50 100

150 50 200

Prob(boy|test says boy)=100/150= ⅔

CCCCCCCCCCCCCc

7.3434

Conclusion

The previous result follows.

Of course!

More of this next week.

7.3535

Comparison of Two Sample Variances

We know that a t test may be used to compare two sample means.

We now compare two sample variances, assuming that the data are normally distributed.

7.3636

Two Sample Variance Test

Note that the tables only give upper tail significance levels, so the larger sample variance must be placed in the numerator.

2

2

2

1 ss

2

2

2

1

ss

Fcalc

1

2F111 n 122 n From tables

Significant if

1

2

FFcalc

7.3737

Two Sample Variance Test

The tables only give upper tail significance levels. What if the lower tail is required?

)(

1)1(

2

1

1

2

F

F

So, swap the degrees of freedom and reciprocate.

7.3838

Two Sample Variance Test To illustrate, swapping the degrees of freedom and reciprocating.

ν1 5 ν2 4 α 0.025 Fcrit 9.36ν1 4 ν2 5 α 0.025 Fcrit 7.39ν1 5 ν2 4 α 0.975 Fcrit 0.14 (reciprocal7.39)

)025.0(

1)975.0(

4

5

5

4F

F

Calculator

7.3939

Two Sample Variance Test

2

2

2

1

ss

Fcalc

2

2

2

1 ss

7.4040

Example

Two samples are taken to check for equality of their variances.

 sample 1 - 16 observations with

standard deviation 8.4sample 2 - 20 observations with

standard deviation 5.2.

7.4141

Hypothesis

H0 is that 22

21

61.22.5

4.82

2

22

21

s

sFcalc

22

21 ss Not

e

So n1 = 16 n2 = 20

And ν1 = n1 – 1 = 15 ν2 = n2 – 1 = 19

sample 1 - 16 observations with standard deviation 8.4sample 2 - 20 observations with standard deviation 5.2

7.4242

Tables 05.01

2

F

ν1

ν2

11 12 13 14 15 16 17 18 19 20 40 60 100

19 2.34

2.31

2.28

2.26

2.23

2.22

2.22

2.18

2.17

2.16

2.03

1.98

1.94 23.205.015

19 F

025.01

2

F

ν1

ν2

11 12 13 14 15 16 17 18 19 20 40 60 100

19 2.77

2.72

2.68

2.65

2.62

2.59

2.59

2.55

2.53

2.51

2.33

2.27

2.22

62.2025.01519 F

7.4343

Conclusion 23.205.015

19 F 62.2025.01519 F 61.2calcF

At 90% the upper cut off is 2.23 (2.23 < 2.61).

At 90% the upper cut off is 2.23 (2.23 < 2.61).

The result is significant at the 10% level, this means you can be 90% confident of your result, reject H0, the variances are probably inconsistent. But further work is probably required.

7.4444

Example

In a clinical test the following scores were obtained for “normal” and “diseased” patients.

Normal 10.3 11.8 12.6 8.6 9.2 10.1 10.2 7.4Diseased 10.1 12.7 14.3 13.6 9.8 15.0 11.2 11.4

Is there a significant difference between the mean test scores for the two groups?

A t test was performed previously, see lecture 4 example 1. This assumed “equality” of the variances!

7.4545

Previously

H0 is that μ1 = μ2

H1 is that μ1 ≠ μ2 under a two tail test

81 n 025.101 x 669.11 s

82 n 262.122 x 936.12 s

Because s1 and s2 are similar we assumed that σ1 = σ2 (chapter 4). Was this justified?

7.4646

Conclusion

H0 is that 22

21 345.1

669.1

936.12

2

22

21

s

sFcalc

79.305.077 F

ν1

ν2

1 2 3 4 5 6 7 8 9 10

7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64

There would appear to be no significant difference and the original assumption was justified.

05.077F

7.4747

Confidence IntervalThe confidence interval for the ratio of the two variances is

2

112

1

22

21

22

21

121

22 1

22

1

1

nnn

n

Fs

s

Fs

s

Note the change in the degrees of freedom for the two choices of F. In fact one gives the upper tail and one the lower.

It is not necessary that .2

2

2

1

7.4848

Confidence IntervalIt is not necessary that . The two bounds

are always and .

2

2

2

1

7.4949

α = 0.05 Confidence IntervalFor example if 6 056.0 and 5 797.1 2

221

21 nsns

36.9025.542

11

2

1

FF nn

2

112

1

22

21

22

21

121

22 1

22

1

1

nnn

n

Fs

s

Fs

s

First value from table

7.5050

α = 0.05 Confidence IntervalFor example if 6 056.0 and 5 797.1 2

221

21 nsns

39.7025.)025(. 45

11

1

2

FF nn

2

112

1

22

21

22

21

121

22 1

22

1

1

nnn

n

Fs

s

Fs

s

Second value from table

7.5151

α = 0.05 Confidence IntervalFor example if 6 056.0 and 5 797.1 2

221

21 nsns

39.7025.)025(. 45

11

1

2

FF nn 36.9025.5

421

12

1

FF nn

2

112

1

22

21

22

21

121

22 1

22

1

1

nnn

n

Fs

s

Fs

s

39.7797.1

056.0

36.9

1

797.1

056.021

22

0 0033 0 230322

12

. .

Switching the roles of the groups gives bounds 4.342 and 300.356, the reciprocal of the values reported above.

7.5252

What if I have lost my statistical tables?

Most tabulated statistical values may be obtained from Excel

Excel Statistical Calculator

7.5353

Next Week

Bring your calculators next week

7.5454

Read

Read Howitt and Cramer 181-186

Read Davis and Smith pages 434-448

7.5555

Solution To The First Assignment

The individual solutions to the first assignment should now be available on the module web page.

Please access the “SPSS Verification” which employs the syntax window. You will find this particularly useful at Stage III.

7.5656

Practical 7

This material is available from the module web page.

http://www.staff.ncl.ac.uk/mike.cox

Module Web Page

7.5757

Practical 7

This material for the practical is available.

Instructions for the practical

Practical 7

Material for the practicalPractical 7

7.5858

Whoops!

Last week, a formatting error led to us inadvertently suggesting that there was a one in 1,019 chance of the world ending before thisedition. That should have read, er, one in 1019 rather less likely. Sorry. Feel free to remove the crash helmet.

Independent

13/09/08

7.5959

Whoops!

Yeah... that's not the quadratic formula.