Hypothesis Testing and Sampling Distributions 2011

8/6/2019 Hypothesis Testing and Sampling Distributions 2011

1/34


2/34

Test StatisticsA Statistic for which the frequency of particular

values is known.

Observed values can be used to test hypotheses.


3/34

Single case: used z score and normal distribution.

But usually have scores from a sample.

Example:

n = 25

M= 106


4/34

In order to determine probability of obtaining a score, ithas to be compared to the appropriate distribution known as the Sampling Distribution.

Distribution of sample means

The distribution of sample means is defined as the set ofmeans from all the possible random samples of a specific size(n) selected from a specific population.


5/34

Characteristics of the Sampling Distribution of the mean

The sampling distribution of means will have the same mean as the population

m =

The sampling distribution of means has a smaller variance. This is because the means ofsamples are less likely to be extreme compared to individual scores.

M= standard deviation of the means = standard error

The shape of the sampling distribution approximates a normal curve if either

the population of individual cases is normally distributed the sample size being considered is 30 or more

Demonstration

Mn

2

2

= Mn

=
http://onlinestatbook.com/stat_sim/sampling_dist/index.htmlhttp://onlinestatbook.com/stat_sim/sampling_dist/index.html


6/34

IQ Scores - Ranked

71, 76, 76, 77, 79, 80, 81, 82, 83, 83, 84, 84, 84, 85, 85, 86,87, 88, 88, 88, 89, 90, 90, 91, 91, 91, 92, 92, 92, 93, 93, 93,93, 93, 93, 94, 94, 94, 94, 95, 95, 95, 96, 96, 97, 97, 97, 97,97, 97, 97, 98, 99, 99, 99, 99, 100, 100, 100, 100, 101, 101, 101,102, 102, 102, 102, 103, 103, 103, 103, 103, 103, 103, 104, 104, 104,105, 105, 106, 106, 107, 107, 107, 107, 107, 108, 108, 108, 108,

109, 109, 110, 111, 111, 112, 112, 112, 113, 113, 113, 113, 114, 114, 115,115, 115, 117, 118, 120, 121, 121, 121, 123, 123, 125, 125, 126, 131, 136

Sample Size Lowest extreme score Highest extreme score

MM

N = 1 71 71 136 136

N = 2 71+76 73.5 131 + 136 133.5

N = 3 71+76 + 76 74.3 126+ 131 + 136 131.0

N = 4 71+76 =76 + 77 75.0 125 + 126 + 131 + 136 127


7/34

Using a sampling distribution

Sample: n = 25, M= 106

IQ test:

Appropriate Sampling Distribution

M= = 100

IQ: = =

100 15

Mn

= = = =15

25

15

53

zM

M

=

zX

=

zM

M

=

=

= =

106 100

3

6

32 00.

Area = .0228 one tailed.0456 two tailed

Reject Null Hypothesis


8/34

Z Test

zM

M

=


9/34

tDistribution

What happens if we do not have the population standarddeviation?

We can use the sample standard deviation as anestimate.

Problem Cannot use z and the normal distribution to estimate

probability.

Sample variance tends to underestimate the populationvariance

Have to use a slightly different distribution - Students tDistribution

zM

M

=


10/34

tDistribution cont.

Not a single distribution but a family of distributions.

One for each degree of freedom (df).df = n - 1


11/34

One sample ttest

Psychomotor abilities of low-birthweight infants (PDI scores)(Nurcombe et al, 1984)

Sample: n = 56 M= 104.3 s = 12.58

Norms: = 100


12/34

Sample: N = 56 M= 104.3 s = 12.58Norms: = 100

Hypotheses:

H1: 1 2H0: 1 = 2 = 100

tM

sM=

z

M

M

=

s sn

M =


13/34

Sample: N = 56 M= 104.3 s = 12.58Norms: = 100

Hypotheses:

H1: 1 2H0: 1 = 2 = 100

tM

s

M

s

nx

=

=

=

= = 104 3 100

1258

56

4125

16822 45

.

.

.

..

df = 56 - 1 = 55


14/34

Potential Problem

What happens if we do not have a population mean?

Two ways of dealing with it. Use a repeated measures design

Use two independent samples

t MsM

=


15/34

Repeated-Measures ttest

(In SPSS called the Paired-Samples t-test)

Use one sample, but test at two different times.

Occasionally matched samples used.

If treatment has no effect what will the outcome be? There will be little difference between the scores on the first

and second testing.

Allows us to hypothesise what the population mean will be.

Mean of the differences will be 0 (zero). Will not be exactly the same due to sampling error

So have to test whether the mean difference observed issignificantly different from o.

Need a sampling distribution


16/34

MDI 6months (X1)

MDI 24months (X2)

Difference (D)

124 114 -10

94 88 -6

115 102 -13

110 127 17

. . .

. . .

126 114 -12

123 132 9M 111.0 106.71 -4.29

s 13.85 12.95 16.04

n 31 31 31

Example:


17/34

D= =

1 2

0

D = - 4.29 sD= 16.04 n = 31

H0

:

t MsM

=

t D

sD

=

tD

s

D

s

D

s

nD D D

=

=

= 0 0

D

s

n

D

=

=

=

0 4 29 0

1604

31

4 29

2 881 49

.

.

.

..

df = n - 1 (n is number of pairs of observations)

df = 31 - 1 = 30


18/34

Independent samples ttest

When we do not have population parameters, we canuse an independent samples (between subjects) designto get sample data that allows us to evaluate thedifference between two populations using theindependent samples t test.

As with all hypothesis tests, the general purpose of theindependent-measures t test is to determine whether

the sample mean difference obtained indicates a realmean difference between the two populations (ortreatments) or whether the obtained difference issimply the result of sampling error.


19/34

Distribution of Differences between Means


20/34

Variance of the two Distributions of the Means

12

1

2

2

2Nand

N

Standard Error of the Distributions of theMeans

1

1

2

2N

andN

Mean of the Distribution of MeanDifferences

1 2 0 =

Variance of the Distribution of MeanDifference

X XN N1 2

2 1

2

1

2

2

2

=+

Standard Error of the Distribution of MeanDifference

X XN N1 2

1

2

1

2

2

2

= +


21/34

tM

sM

=

tM M

sM M

=

( ) ( )1 2 1 2

1 2

=

+

=

+

( ) ( ) ( )M M

s

n

s

n

M M

s

n

s

n

1 2 1 2

1

2

1

2

2

2

1 2

1

2

1

2

2

2

Degrees of Freedom df = (n1 - 1) + (n2 - 1) = n1 + n2 - 2


22/34

Summary


23/34

Z score z X=

Z test zM

m

=

test - single sample tM

sM

=

t test - repeated measures tD

s

D

sD D

=

= 0

test - two independent samples tM M

sM M

=

( ) ( )1 2 1 2

1 2

tM M

sM M

=

( )1 2

0

1 2

Tests


24/34

Standard Errors

Z test

X

N=

t test - single sample ss

nM =

t test - two matched samples ss

nD

D=

t test - two independentsampless s

n n M M p

1 2

2

1 2

1 1

= +

sn s n s

n np

2 1 1

2

2 2

2

1 2

1 1

2=

+

+

( ) ( )


25/34

Assumptions for the t test

Repeated measures:

Random sampling

The data are measured on at least an interval scale

The data are matched

The distribution of the population scores is normal

Independent samples

Random sampling

The data are measured on at least an interval scale

The participants in the two samples are independent

The distribution of the populations scores are normal

The variances of the two populations are the same (homogeneity of variance)

Robust


26/34

Analysis of Variance

What happens when we have more than two groups?


27/34


28/34


29/34


30/34

s

k

s s s s sj2

1

2

2

2

3

2

4

2

5

2

5=

+ + + +

First Method


31/34


32/34

( ) j G

k

2

1

Second Method


33/34


34/34

variance between treatments MSbetweenF = =

variance within treatments MSwithin

Systematic variance (treatment effect) + unsystematic variance MSbetweenF = =

unsystematic variance MSwithin

obtained mean differences (including treatment effects) MSbetweenF = =

differences expected by chance (without treatment effects) MSwithin

Documents

Hypothesis Testing and Sampling Distributions 2011