16
COMMON STATISTICAL TEST PROBLEMS: Tests dealing with the mean of data samples Tests dealing with the variance of the samples Tests dealing with correlation coefficients Tests dealing with regression parameters Testing sample mean: Is it equal/ larger/ smaller a prescribed value? Comparing two sample sets: Are the mean values different? Comparing paired samples: Are the differences equal/ larger/smaller a certain value? Testing the correlation coefficient obtained from two paired samples: Is correlation equal 0, larger 0, or smaller 0? Testing a single sample variance: Is the variance equal/greater/ smaller a prescribed value? Testing the ratio between the estimated variances from two sample sets: Are the variances equal? Is the ratio between the variances equal 1 greater 1 or smaller 1 ? Testing a simple linear regression model: (a) Is the regression coefficient different from 0, greater 0 or smaller 0. (b) With multiple predictors: Are all regression coefficients as a whole significantly different from 0? Which individual regression parameters are different from 0?

Tests dealing with the mean of data samples Tests dealing with the variance of the samples Tests dealing with correlation coefficients Tests dealing with

Embed Size (px)

Citation preview

Page 1: Tests dealing with the mean of data samples Tests dealing with the variance of the samples Tests dealing with correlation coefficients Tests dealing with

COMMON STATISTICAL TEST PROBLEMS:

Tests dealing with the mean of

data samples

Tests dealing with the

variance of the samples

Tests dealing with correlation

coefficients

Tests dealing with regression

parameters

Testing sample mean: Is it equal/ larger/ smaller a prescribed value?

Comparing two sample sets: Are the mean values different?

Comparing pairedsamples: Are the differences equal/ larger/smaller a certain value?

Testing the correlation coefficient obtained from two paired samples: Is correlation equal 0, larger 0, or smaller 0?

Testing a single sample variance: Is the varianceequal/greater/smaller a prescribed value?

Testing the ratio betweenthe estimated variancesfrom two sample sets:Are the variances equal?Is the ratio between the variances equal 1 greater 1 or smaller 1 ?

Testing a simple linear regression model:(a) Is the regression

coefficient different from 0, greater 0 or smaller 0.

(b) With multiple predictors: Are all regression coefficients as a whole significantly different from 0?Which individual regression parameters are different from 0?

Page 2: Tests dealing with the mean of data samples Tests dealing with the variance of the samples Tests dealing with correlation coefficients Tests dealing with

Testing the significance of the differences in the speed(of the Starling bird flying through a corridor with striped walls)

Experiment

Sample size

n

Standarddeviations(guessed)

Horizontal stripes

16.5ft/s 10 1.5

Vertical stripes

15.3ft/s 10 1

Step 1: Identifying the type of statistical test:

We want to test the difference in the two mean values: The test compares two estimated means. [Both are random variables with an underlying Probability Density Function (PDF)]

The variance of samples (and the variance of the means) are also unknown and must be estimated from the data

The samples are not paired (the experiments were all done independent)

Page 3: Tests dealing with the mean of data samples Tests dealing with the variance of the samples Tests dealing with correlation coefficients Tests dealing with

Testing the significance of the differences in the speed(of the Starling bird flying through a corridor with striped walls)

Experiment

Sample size

n

StandardDeviations(guessed)

Horizontal stripes

16.5ft/s 10 1.5

Vertical stripes

15.3ft/s 10 1

The appropriate test is: “A test for the differences of means under independence”(or “Comparing two independent population means with unknownpopulation standard deviations”)

The null hypothesis is H0: The average speed is the same in both experiments

If H0 is true then the random variable z is a realization from a population with approximate standard Gaussian distribution.*

*Note: Only for large sample sizes n1 and n2

Page 4: Tests dealing with the mean of data samples Tests dealing with the variance of the samples Tests dealing with correlation coefficients Tests dealing with

The classical Student+ t-test*

*`Student' (1908a). The probable error of a mean. Biometrika, 6, 1-25. + William S. Gosset: ‘He received a degree from Oxford University in Chemistry and went to work as a “brewer'' in 1899 at Arthur Guinness Son and Co. Ltd. in Dublin, Ireland’ (Steve Fienberg. "William Sealy Gosset" (version 4). StatProb: The Encyclopedia Sponsored by Statistics and Probability Societies. Freely available at http://statprob.com/encyclopedia/WilliamSealyGOSSET.html)

Testing if Albany temperatures anomalies from 1950-1980 were different from 0: January 1950-1980

anomalies with respect to the 1981-2010 climatological mean Dashed line:

Theoretical probability density functionof our test variable. If H0 was true then our test valueshould be a random sample from this distribution.That means we would expect it to be close tozero. The more our test value lies in the tailsof the distribution, the more unlikely it is to bepart of the distribution.

The test value calculated fromthe sample

Page 5: Tests dealing with the mean of data samples Tests dealing with the variance of the samples Tests dealing with correlation coefficients Tests dealing with

The classical Student+ t-test*

Testing if Albany temperatures anomalies from 1950-1980 were different from zero: Annual mean 1950-1980

anomalies with respect to the 1981-2010 climatological mean

The test value calculated fromthe sample.

Test variable

: sample meann : sample sizeμ0 : population mean (here μ0=0)

: sample variance

The test variable t is calculated from a random sample. As any other quantity estimatedfrom random samples, it is a random variable drawn from a theoretical population with

Page 6: Tests dealing with the mean of data samples Tests dealing with the variance of the samples Tests dealing with correlation coefficients Tests dealing with

The classical Student+ t-test*

Testing H0 : Albany (New York Central Park) temperatures anomalies from 1950-1980 not different from 0.

NYC 1950-1980 JanAlbany 1950-1980 Jan

Solid lines: Cumulative density function(for the test variable if H0 is true)

Page 7: Tests dealing with the mean of data samples Tests dealing with the variance of the samples Tests dealing with correlation coefficients Tests dealing with

The classical Student+ t-test*

Testing H0 : Albany (New York Central Park) temperatures anomalies from 1950-1980 not different from 0.

Alternative hypothesis: the mean anomaly was less than 0!(i.e. it was colder 1950-1980 than 1981-2010)

NYC 1950-1980 JanAlbany 1950-1980 Jan

Solid lines: Choose a significance test level 5% one sided t-test

0.050.05

Page 8: Tests dealing with the mean of data samples Tests dealing with the variance of the samples Tests dealing with correlation coefficients Tests dealing with

The classical Student+ t-test*

Testing H0 : Albany (New York Central Park) temperatures anomalies from 1950-1980 not different from 0.

Alternative hypothesis: the mean anomaly was less than 0!(i.e. it was colder 1950-1980 than 1981-2010)

NYC 1950-1980 JanAlbany 1950-1980 Jan

Solid lines: Choose a significance test level 5% one sided t-test

0.050.05

Reject H0! Accept alternative! Accept H0!

Page 9: Tests dealing with the mean of data samples Tests dealing with the variance of the samples Tests dealing with correlation coefficients Tests dealing with

The single sided t-test

0

Null Hypothesis H0 : Albany temperatures anomalies from 1950-1980 not different from 0.

Alternative Hypothesis Ha : Temperature anomalies were negative*

*Note that we formed anomalies with respect to the 1981-2010 climatology. Thus we test if 1950-1980 was significantly cooler than the 1981-2010.

t

Area under the curve gives the probability P(t< tcrit)

tcrit

Page 10: Tests dealing with the mean of data samples Tests dealing with the variance of the samples Tests dealing with correlation coefficients Tests dealing with

The single sided t-test

0

Null Hypothesis H0 : Albany temperatures anomalies from 1950-1980 not different from 0.

Alternative Hypothesis Ha : Temperature anomalies were negative*

*Note that we formed anomalies with respect to the 1981-2010 climatology. Thus we test if 1950-1980 was significantly cooler than the 1981-2010.

t

Area under the curve gives the probability p(t< tcrit)

tcrit

We reject the null hypothesis if thecalculated t-value falls into the tail of the distribution. The p-valueis chosen usually chosen to be small0.1 0.05 0.01 are typical –p-values.We then say: “We reject the null-hypothesisat the level of significance of 10% (5%) (1%)”

Calculated t

Page 11: Tests dealing with the mean of data samples Tests dealing with the variance of the samples Tests dealing with correlation coefficients Tests dealing with

The two-sided t-test

0

Null Hypothesis H0 : Albany temperatures anomalies from 1950-1980 not different from 0.

Alternative Hypothesis Ha : Temperature anomalies were different from zero

*Note that we formed anomalies with respect to the 1981-2010 climatology. Thus we test if 1950-1980 was significantly cooler than the 1981-2010.

t

Area under the curve gives the probability P(t< -tcrit)

-tcrit

Area under the curve gives the probability P(t > +tcrit)

+tcrit

Page 12: Tests dealing with the mean of data samples Tests dealing with the variance of the samples Tests dealing with correlation coefficients Tests dealing with

The two-sided t-test

0

Null Hypothesis H0 : Albany temperatures anomalies from 1950-1980 not different from 0.

Alternative Hypothesis Ha : Temperature anomalies were different from zero

ttcrit

Calculated t

We cannot reject H0 at the two-sidedsignificance level of ‘p’-percent (e.g. 5%)

Page 13: Tests dealing with the mean of data samples Tests dealing with the variance of the samples Tests dealing with correlation coefficients Tests dealing with

TESTING A NULL HYPOTHESIS

Hypothesis/Conclusion

Null hypothesis H0 true

Null hypothesis H0

false

Null hypothesis accepted

Correct decision False decision(Type II error)

Null hypothesis rejected

False decision(Type I error)

Correct decision

Page 14: Tests dealing with the mean of data samples Tests dealing with the variance of the samples Tests dealing with correlation coefficients Tests dealing with

TEST FOR DIFFERENCES IN THE MEAN H0 : Here we would reject H0 for the given p-

value (α = 0.05)

Figure 5.1 from Wilks “Statistical Methods in Atmospheric Sciences” (2006)

Calculated test value

Page 15: Tests dealing with the mean of data samples Tests dealing with the variance of the samples Tests dealing with correlation coefficients Tests dealing with

TEST FOR DIFFERENCES IN THE MEAN H0 : Here we would accept H0 for the given p-

value (α = 0.05)

Figure 5.1 from Wilks “Statistical Methods in Atmospheric Sciences” (2006)

Calculated test value

Page 16: Tests dealing with the mean of data samples Tests dealing with the variance of the samples Tests dealing with correlation coefficients Tests dealing with

TESTING A NULL HYPOTHESIS

Hypothesis/Conclusion

Null hypothesis H0 true

Null hypothesis H0

false

Null hypothesis accepted

Correct decision False decision(Type II error)Probability of this type oferror is usually hard to quantify ( β‘beta’)

Null hypothesis rejected

False decision(Type I error) Probability of this error isgiven by the p-value( α ‘alpha’)

Correct decision