61
Descriptive Statistics: The mean , and the number of the measurements (n ) that it was based on The standard deviation : A measure of the variability of the data about the mean Other useful information derived from the mean and standard deviation, such as: 1) The confidence intervals : the range within which 95% or 99% or 99.9% of measurements of this sort would be expected to fall 2) The standard error : the range of means that we could expect 95% or 99% or 99.9% of the time if we were to repeat the same type of measurement again and again on different samples. To describe your sample, you need to state:

Descriptive Statistics: The mean, and the number of the measurements (n) that it was based on The standard deviation: A measure of the variability

Embed Size (px)

Citation preview

Page 1: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Descriptive Statistics:

The mean, and the number of the measurements (n) that it was based on

The standard deviation: A measure of the variability of the data about the mean

Other useful information derived from the mean and standard deviation, such as:

1) The confidence intervals: the range within which 95% or 99% or 99.9% of measurements of this sort would be expected to fall

2) The standard error: the range of means that we could expect 95% or 99% or 99.9% of the time if we were to repeat the same type of measurement again and again on different samples.

To describe your sample, you need to state:

Page 2: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Descriptive Statistics II:

Degree of freedom: n-1

For each observation (x) the deviation (d) from the mean ( ) is

The sum of squares:

Sample variance:

Standard deviation:

xxx

n

xxd

2

22

1

22

n

dS

1

2

n

dS

Page 3: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Descriptive Statistics III:

Standard error (SE) or Standard error of the mean (SEM):

To show how good is our estimate of the mean, which is actually the standard deviation of the mean. The notation for the SE/SEM is , to calculate , use equation , or

If n is extremely large then S and SE would almost be the same. Otherwise, the difference between them would be quite large.

Many people prefer to cite the SE rather than S, because it make the range of variation appear to be smaller!

Another advantage of using SE is that if anyone were to repeat an experiment, then the mean would be likely to fall within the , which is the confidence intervals.

nn / nSSE /nn

nt

Page 4: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Descriptive Statistics IV:

Confidence intervals of a mean: The range within which in 95% or 99% or 99.9% (level of confidence) of cases the mean would be expected to fall if we repeat an experiment over and over again. The limiting values are the confidence limits.

; t value can be got from the t table

The number of degrees of freedom determines the t value. So, when designing experiments we have to find a compromise between the level of confidence we want and the amount of work involved.

The t values fall off rapidly from 2 to 5 df (3 to 6 replicate observations) but more slowly thereafter. For a complicated or costly experiment there would be little justification for using more than, say, 4 or 5 replicates.

tSEx

Page 5: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Statistic test: -- hypothesis testing

The statistical hypothesis is sometimes called the Null Hypothesis. The null hypothesis is often the reverse of what the experimenter actually believes;

it is put forward to allow the data to contradict it. (Science proceeds conservatively, always assuming that something interesting is not happening unless convincing evidence

suggests, for the moment, that it might be. ) Errors in hypothesis testing

Null True

(exp. hypothesis

wrong)

No Effect

Null False

(exp.

hypothesis

correct)

Real Effect

Reject Null

Decides: there are real effects (accept exp. Hypothesis)

Type I ERROR prob = “significance level ”

Correct prob = 1- “power”

Accept Null

Decides: there are no real effects (reject exp. Hypothesis)

Correct prob = 1-

Type II ERROR

prob =

Exp. Hypothesis: there is difference between OFF and NF

Null: there’s no difference between OFF and NF

Truth is: there is no difference

Truth is: there is difference

Decide to believe our hypothesis

Type I ERROR prob = “significance level ”

Correct prob = 1- “power”

Decides that our hypothesis is wrong Correct

prob = 1-

Type II ERROR

prob =

Page 6: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Statistical Tests – What test to do?

Does the Sample group # <3?

Can the two sets of replicate data be arranged in pairs?

Yes No

Paired-samples test Student’s t-test

ANOVA

Is the treatments separate?

One-way ANOVA Two-way ANOVA

Does the values of things fall in same category?

Yes

Yes

No

No

Yes No

Chi-squared test

Page 7: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Student’s t-test (I):

Usually a minimum of 2, but ideally at least 3, replicates of each sample or treatment. DON’T need the same number of replicates of each treatment.

The t-test compares the actual difference between two means ( ) in relation to the variation in the data ( ):

One-tailed t-test: test only that one particular mean is larger (or smaller) than the other.

For most purposes, use a two-tailed test

21 xx 2

22

1 SESE

22

21

21

SESE

xxt

Page 8: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Student’s t-test (II):

Usually the variances of the two treatments are assumed similar.

To test whether this is true, divide the larger variance by the smaller, and compare this variance ratio with a value from a table of ‘F’ (variance ratio) for p= 0.05. if the calculated variance ratio is less than the tabulated F value, the variances do not differ significantly, and doing a t-test with assumption of equal variances is valid.

When use Excel: Tools Data analysis t-test: Two-sample assuming equal variances OK. For ‘output range’, choose the top-left cell of the area where you want the results of analysis to be displayed.

Page 9: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

NULL Hypothesis: There is no difference between the means in the data

Null hypothesis is wrong: The difference is significant. Or, we can be 95% confident that the means differ significantly.

Question: if the null hypothesis is true, what is the probability (p-value) of obtaining our given result (Type I error)? It compares the size of the difference between two means in relation to the amount of inherent variability (the random error, not related to treatment differences) in the data.

Calculate the index of our given result -- Calculated t value (t, the ratio between the difference of the two mean values and the variance of the difference between the two means)

Decide the probability level (significant level) below which we would accept the hypothesis is wrong. Usually 0.05 in biology study

Yes

Test: p < 0.05? Or, t >Tt?

Student’s t-test (III) – Work flow

Calculate: 1) the probability (p-value) of getting the calculated t value (t) by chance alone; 2) Using t-Table, Get the t value corresponding to the desired significant level from Table (Tt). In other words, we want to know that in order for the difference between the means to be significant at the desired level, the t value has to exceed what value (Tt)

Null hypothesis is right: There’s no significant difference

Conclusions:

No

or

Page 10: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Paired-samples test (I):

Use this test as an alternative to the t-test, for cases where data can be paired to reduce incidental variation - i.e. variation that you expect to be present but that is irrelevant to the hypothesis you want to test.

In general, more replicates are needed for this test than for a t-test (say, a minimum of 5 for each treatment), and you will need the same number of replicates for each treatment.

This test compares the mean difference between the pairs of treatments in relation to the standard error of this mean difference. Test whether the mean difference is significantly different from zero. Use One-tailed test (p=0.1)

When use Excel: Tools Data analysis t-test: paired two sample for means OK. For ‘output range’, choose the top-left cell of the area where you want the results of analysis to be displayed.

Page 11: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Hypothesis: There is no difference between the means in the data

Hypothesis is wrong: The difference is significant. Or, we can be 95% confident that the means differ significantly.

Question: if this hypothesis is true, what is the probability (p-value) of obtaining our given result? Compare the mean value of difference between the pairs of treatments in relation to the variance of the difference between the pairs of the treatments.

Calculate the index of our given result in terms of -- Calculated t value (Tc, the ratio between the mean of difference of the pairs of treatment and the standard error of this mean)

Decide the probability level (significant level) below which we would accept the hypothesis is wrong. Usually 0.1 for one-tailed test

Yes

Test: p < 0.1? Or, Tc >Tt?

Paired-sample test (II) – Work flow

Calculate: 1) the probability (p-value) of getting the calculated t value (Tc) by chance alone; 2) Using Table, Get the t value corresponding to the desired significant level from Table (Tt). In other words, we want to know that in order for the difference between the treatments to be significant at the desired level, the t value has to exceed what value (Tt)

Hypothesis is right: There’s no significant difference

Conclusions:

No

or

Page 12: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Analysis of variance (ANOVA) or F-test:

Use this test to compare several treatments. If only two treatments, it’s a equivalent of t-test.

One-way ANOVA: The simplest from of ANOVA, it is used to compare several separate treatments. One way ANOVA needs at least 2 replicates of each treatment. It tells if there are differences between the treatments as a whole. But it can also be used, with caution, like a multiple t-test, to tell you which of the treatments differ from each other.

Two-way ANOVA: compare combinations of treatments. You can get useful information even if you don’t have replication, but get much more information if you have 2 (or more) replicates of each combination of treatments. Then the test can tell you if you have significant interaction.

Page 13: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

One-way ANOVA (I):

Use this test for comparing means of 3 or more samples/treatments, to avoid the error inherent in performing multiple t-test (in each t-test we accept a 5% chance of our conclusion being wrong, if we have many separate t-tests, say 21 for 7 treatments, we would expect (by probability) that one test would give us a false result).

Ideally, for this test we would have the same number of replicates for each treatment, but this is not essential. Advanced computer programmes can overcome the problem of unequal replicates by entering ‘missing values’.

Page 14: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

One-way ANOVA (II):

An important assumption underlies the ANOVA: all treatments have similar variance.

To test whether this is true, divide the highest variance by the lowest to obtain a variance ratio (F), called omnibus F-ratio, and compare this variance ratio with a value from a table of ‘Fmax’ for p= 0.05. if the calculated variance ratio is less than the tabulated F value, the variances do not differ significantly. If not, the data might need to be transformed.

When apply ANOVA analysis to 2 treatments,

Degree of freedom (df) df of between treatments mean square = number of treatments -1 df of residual mean square = number of treatment x (number of replicates -1)

2tF

Page 15: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

One-way ANOVA (III):

ANOVA involves the partitioning of the total variance into (1) variance associated with the different treatments/samples and (2) random variance, evidenced by the variability within the treatments. In this test, we ask, in effect, is there a large amount of variance associated with the different treatments compared with the amount of random variance?

ANOVA tells only whether there are differences between treatments in the experiment as a whole, but doesn’t give any information on which treatment is differ from which. To solve this problem, we need multiple comparison test.

Although Excel can run ANOVA test, but it doesn’t have the multiple comparison test that usually follow the analysis, therefore, recommend to use StatView instead for ANOVA and the test follow it.

Page 16: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Hypothesis: There is no difference between the means in the data

Hypothesis is wrong: The difference is significant. Or, we can be 95% confident that the means differ significantly.

Question: if this hypothesis is true, what is the probability (p-value) of obtaining our given result? Is there a large amount of variance associated with the different treatments compared with the amount of random variance?

Calculate the index of our given result in terms of -- Calculated F value (Fc, the ratio between the amount of variance associated with the different treatments and the amount of random variance)

Decide the probability level (significant level) below which we would accept the hypothesis is wrong. Usually 0.05 for biology test

Yes

Test: p < 0.05? Or, Fc >Ft?

One-way ANOVA (IV) – Work flow

Calculate: 1) the probability (p-value) of getting the calculated F value (Fc) by chance alone; 2) Using Table, Get the F value corresponding to the desired significant level from Table (Ft). In other words, we want to know that in order for the difference between the treatments to be significant at the desired level, the F value has to exceed what value (Ft)

Hypothesis is right: There’s no significant difference

Conclusions:

No

or

Page 17: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

After ANOVA: Analytical Comparisons

The F ratio we have calculated from ANOVA cannot tell us which groups is different from which group.

The procedure for finding out the difference between groups is know as making analytical comparisons.

With a non significant omnibus F we are prepared to assert that there are no real differences among the means. – We can stop the analysis there.

A significant omnibus F demands further analysis of the data. – Which differences between the means are real and which are not?

– Exception: Factor with only two

Page 18: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Multiple comparison tests – Background

After one-way ANOVA, the follow up tests can be called "planned comparisons",  "post-hoc tests", "multiple comparison tests" or "post tests".  Several ways to make multiple comparisons:

1. All possible comparisons, including averages of groups. So you might compare the average of groups A and B with the average of groups C, D and E. Or compare group A, to the average of B-F. Scheffe's test does this.

2. All possible pairwise comparisons. Compare the mean of every group with the mean of every other group. Use Tukey or Newman-Keuls comparisons.

3. All against a control. If group A is the control, you may only want to compare A with B, A with C, A with D... but not compare B with C or C with D. Dunnett's test does this.

4. Only a few comparisons based on your scientific goals. So you might want to compare A with B and B with C and that's it. Bonferroni's test does this.

Page 19: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Multiple comparison tests – Terminology

Multiple comparison test applies whenever you make several comparisons at once.

Post test is generally used interchangeably with multiple comparison test.

Post-hoc test is used for situations where you can decide which comparisons you want to make after looking at the data. You don't need to plan ahead. 

Planned comparison tests  require that you focus in on a few scientifically sensible comparisons. You can't decide which comparisons to do after looking at the data. The choice must be based on the scientific questions you are asking, and be chosen when you design the experiment.

Page 20: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Multiple comparison tests – Basic Tests Type

Planned comparisons A priori Few in number Theoretically motivated

Post hoc comparisons Based on looking at the data Exploratory Risky business

Page 21: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Probability that we will have one or more false alarms =

familywise (F.A.)= experimentwise It increases as number of possible comparisons increases, this

is termed as -inflation

e.g. for 7 levels, we have a .66 chance of making at least one F. A.

Post hoc tests were created to deal with -inflation Post hoc tests should be used when you are examining more

than two categories in an ANOVA

Multiple comparison tests – Familywise alpha

Page 22: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Multiple comparison tests – Post Hoc Test – Fisher’s Protected LSD

Basic strategy: only perform pairwise t-tests if the ANOVA provides a significant F.

Insight: requiring significant F reduces the number of opportunities for false alarms.

Problem: opportunities remain, especially with many levels.

Solution: Don’t use it!

(Use it for 3-group situations only if you like)

Page 23: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Multiple comparison tests – Post Hoc Test – Commonly used post hoc tests

Bonferroni (For pairwise comparison for ANOVA design, Dunn test is identical to Bonferroni)

Tukey’s Honestly Significant Difference (Tukey’s HSD)

Page 24: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Multiple comparison tests – Post Hoc Test – Bonferroni

It calculates a new pairwise alpha to keep the familywise alpha value at .05 e.g., for a all possible pairwise comparisons on five means, the new pairwise would

be 0.05/10 = 0.005 Pros: The Bonferroni is probably the most commonly used post hoc test. It is highly

flexible, very simple to compute, and can be used with any type of statistical test, not just post hoc test with ANOVA.

Cons: tends to lack power the familywise error calculation depends on the assumption that, for all tests, the null

hypothesis is true. This is unlikely to be the case, especially after a significant omnibus test;

all tests are assumed to be orthogonal (i.e., independent or nonoverlapping) when calculating the familywise error test, and this is usually not the case when all pairwise comparisons are made

the test does not take into account whether the findings are consistent with theory and past research. If consistent with previous findings and theory, an individual test should be less likely to be a Type I error

Type II error rates are too high for individual tests. It overcorrects for Type I error

Page 25: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Multiple comparison tests – Post Hoc Test – Modified Bonferroni Approaches

Several alternatives to the traditional Bonferroni have been developed, including the Dunn-Bonferroni, the Dunn-Sidak method, Hochberg’s sequential method, or Keppel’s modified Bonferri among several others.

These tests have greater power than the Bonferroni while retaining the general nature of the Bonferroni.

Page 26: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Multiple comparison tests – Post Hoc Test – Tukey’s Honestly Significant Difference (HSD) Test

Mechanism: It calculates a new critical value that can be used to evaluate whether differences between any two pairs of means are significant. One simply calculates one critical value and then the difference between all possible pairs of means. Each difference is then compared to the Tukey critical value. If the difference is larger than the Tukey value, the comparison is significant.

Insight: The greatest chance of making a type 1 error (F.A.) arises in comparing the largest mean with the smallest. If we can protect against an F.A. in this case, all other comparisons are

also protected If this comparison is not significant, nor is any other one!

Advantage: provides protection of familywise alpha

Page 27: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Suppose we do an experiment with one factor, 5 levels. There is the ANOVA table:

Source SS df MS F p

Between 2942 4 736 4.13 <.05

Within (error) 9801 55 178

Here are the group means:

I II III IV V

63 82 80 77 70

Multiple comparison tests – Post Hoc Test

– Tukey’s Honestly Significant Difference (HSD) Test– Doing Tukey HSD tests (equal n)

Page 28: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Here are the differences between the means. Which are significant?

I V IV III II

63 70 77 80 82

I 63 0 7 14 17 19

V 70 0 7 10 12

IV 77 0 3 5

III 80 0 2

II 82 0

Need to find a critical value for the difference between group means…

Multiple comparison tests – Post Hoc Test

– Tukey’s Honestly Significant Difference (HSD) Test – Doing Tukey HSD tests (equal n) (II)

Page 29: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

The studentized range statistic, q

This is a statistic (just like F or t), for which an expected distribution is known under the null hypothesis of no differences.

However, we don’t compute this for our data. Rather, we use this formula to figure out a critical value for the numerator thus:

set alpha to desired level (.05)

use tables to obtain critical value of q for appropriate degrees of freedom

work backward to get critical difference between means

Multiple comparison tests – Post Hoc Test

– Tukey’s Honestly Significant Difference (HSD) Test – Doing Tukey HSD tests (equal n) (III)

Page 30: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Get critical q using correct degrees of freedom

• Numerator = no of groups

• Denominator = d.f. for MSW (use closest available)

Here, d.f. = 5, 55, qcrit = 3.98 (Table A11 in Cohen)

= 15.34 in this case

Which pairs of means exceed this?

1 & 2, 1 & 3

Multiple comparison tests – Post Hoc Test

– Tukey’s Honestly Significant Difference (HSD) Test – Doing Tukey HSD tests (equal n) (IV)

Page 31: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

has greater power than the other tests under most circumstances

readily available in computer packages.

Fine if all groups have equal n

The Tukey-Kramer test is used by SPSS when the group sizes are unequal.

Used only for post hoc testing: stronger tests are available if only a small number of comparisons are to be made (planned comparison)

Do not use if group sizes are very unequal.

When not all possible comparisons are needed, other tests, such as the Dunnett or a modified Bonferroni method should be considered because they may have power advantages.

Multiple comparison tests – Post Hoc Test

– Tukey’s Honestly Significant Difference (HSD) Test – Practical use of Tukey’s HSD

Page 32: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Multiple comparison tests – Post Hoc Test – Other methods

Scheffe: computes a new critical value for an F test conducted when comparing two groups from the larger ANOVA (i.e., a correction for a standard t-test). The formula simply modifies the F-critical value by taking into account the number of groups being compared: (a –1) Fcrit. The new critical value represents the critical value for the maximum possible familywise error rate. results in a higher than desired Type II error rate, by imposing a severe correction.

Dunn: Identical to the Bonferroni correction.

Dunnett: similar to the Tukey test but is used only if a set of comparisons are being made to one particular group (e.g. several treatment groups are compared to one control group).

Games-Howell: used when variances are unequal and also takes into account unequal group sizes. appears to do better than the Tukey HSD if variances are very unequal (or moderately so in combination with small sample size) or can be used if the sample size per cell is very small (e.g., <6).

Page 33: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Multiple Comparison Tests – Post Hoc Test– What test to do?

Are all possible comparisons needed ?

Tukey HSD Dunnett or Modified Bonferroni

Equal group size ?

YesNo

Yes No

Games-Howell or Tukey-Kramer

Page 34: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Post hoc comparisons are fine for exploratory studies.

Ideally, however, we know in advance which differences we expect to find in our data….

… and these may or may not be simple differences between two means.

Unlike post hoc tests, these can be done whether or not the ANOVA is significant.

Multiple Comparison Tests – Planned comparisons

Page 35: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

The biggest problem with post hoc comparisons was the proliferation of possible comparisons.

Planned comparisons put a lid on that. For a single factor with 6 levels, there are at most 5 (n-1) independent comparisons among means possible.

Independent comparisons are also called orthogonal contrasts.

For comparison between two treatments, the planned comparison is unadjusted t-test.

Multiple Comparison Tests – Planned comparisons (II)

Page 36: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Multiple Comparison Tests – Planned comparisons – Rationale that underpins planned contrasts

Rationale that underpins planned comparisons or planned contrasts differ from post hoc tests on three important attributes. Specifically:

First, planned contrasts are undertaken in lieu of the initial ANOVA. In contrast, post hoc tests are conducted after the initial ANOVA.

Second, the level of alpha associated with each planned contrast is 0.05 and does not need to be adjusted, which optimizes power.

Third, unlike most - but not all - post hoc procedures, planned contrasts can compare amalgams or combinations of groups with another

Page 37: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

A study looked at the influence of the medium used (lecture, movie) on tendency of subjects to change their attitude towards Bush Administration. The media used were:

A movie, favorable to BA

A lecture, also favorable

A combination of lecture and movie.

Subjects were assigned at random to groups, each having been given a preliminary attitude test. After the treatment, each was tested again, and the change in attitude was the dependent variable.

Multiple Comparison Tests – Planned comparisons – an Example

Page 38: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Problem: mere repetition of the test may affect a subject’s score, so a control group without any exposure to lecture or movie was included.

Problem: Perhaps seeing any movie or hearing any lecture would cause a change in score.

…so 2 more control groups were introduced:

Experimental Groups Control Groups

I II III IV V VI

Movie Lecture Mov + Lec Nothing Neutral Neutral

Movie Lecture

Multiple Comparison Tests – Planned comparisons – an Example (II)

Page 39: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

The investigators now had the following specific questions:

[1] Do the experimental groups (as a whole) differ from the control groups?

[2] Among Experimental groups, is the Movie +Lecture different from average effect of either Movie alone or Lecture alone?

[3] Is the Experimental Lecture different from the Experimental Movie?

[4] Among control groups, does “Nothing” differ from either “Movie” or “Lecture”?

Each specific question can be expressed as a comparison among sample means…..

Multiple Comparison Tests – Planned comparisons – an Example (III)

Page 40: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Experimental Groups Control Groups

I II III IV V VI

Movie Lecture Mov + Lec Nothing Neutral Neutral

Movie Lecture

[1] Do the experimental groups (as a whole) differ from the control groups?

+1 +1 +1 -1 -1 -1

Comparison weights

Multiple Comparison Tests – Planned comparisons – an Example (IV)

Page 41: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Experimental Groups Control Groups

I II III IV V VI

Movie Lecture Mov + Lec Nothing Neutral Neutral

Movie Lecture

+1 +1 -2 0 0 0

[2] Among Experimental groups, is the Movie+Lecture different from average effect of either Movie alone or Lecture alone?

Multiple Comparison Tests – Planned comparisons – an Example (V)

Page 42: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Experimental Groups Control Groups

I II III IV V VI

Movie Lecture Mov + Lec Nothing Neutral Neutral

Movie Lecture

+1 -1 0 0 0 0

[3] Is the Experimental Lecture different from the Experimental Movie?

Multiple Comparison Tests – Planned comparisons – an Example (VI)

Page 43: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Experimental Groups Control Groups

I II III IV V VI

Movie Lecture Mov + Lec Nothing Neutral Neutral

Movie Lecture

0 0 0 +2 -1 -1

[4] Among control groups, does “Nothing” differ from either “Movie” or “Lecture”?

Multiple Comparison Tests – Planned comparisons – an Example (VII)

Page 44: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

For example, assume the three levels of the independent variable are lecturer style:

From previous research, we anticipate that A1 > A2, A1 > A3 (we are making no predictions about A2 vs. A3).

Does the data support this?

Multiple Comparison Tests – Planned comparisons – another Example

A1 A2 A3

16 4 2

18 6 10

10 8 9

12 10 13

19 2 11

Mean 15 6 9

Residual mean square 14.29

Page 45: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Multiple Comparison Tests – Planned comparisons – another Example (II)

Let us adopt the symbol to represent the difference we are interested in between A1 and A2:

We can rewrite this as:

Including all the means in the experiment:

21 XX

21 )1()1( XX

321 )0()1()1( XXX

Page 46: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Planned comparisons are based on the calculation of an F- ratio.

Very similar to ANOVA (or, F-test), the F-ratio is calculated as the ratio of the between treatments variance (Vb) over random variance (Va, which is the Residual mean square). The difference is that the Vb in ANOVA is calculated as: total variance – random variance. Therefore, the Vb here is associated with all the different treatments; whereas the Vb in planned comparison is calculated as the exact difference between the specific treatments we are interested.

To calculate Vb, a sum of squares associated with the comparison (SS Acomp) has to be calculated first:

= the difference between the compared meansn = the number of subjects that contribute to the meanci = the coefficient with which we weight the mean

In our case,

Multiple Comparison Tests – Planned comparisons – another Example (III)

2

2)(

i

Acompc

nSS

5.202)0()1()1(

))12)(0()6)(1()15)(1((5222

2

AcompSS

Page 47: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

The F- ratio is calculated by:

MSS/A: Residual mean square calculated in ANOVA test

MSAcomp: The mean square for the comparison, which is given by

All planned comparisons have on 1 degree of freedom.

In this case,

Get critical F value from F-table using the same method as in ANOVA.in this case, df is (1, 12), Fcritical = 4.8 for p < 0.05

If FAcomp > F critical , as in this case, null hypothesis is rejected.

Multiple Comparison Tests – Planned comparisons – another Example (IV)

AS

AcompAcomp MS

MSF

/

AcompAcomp

Acomp

AcompAcomp SS

SS

df

SSMS

1

29.1417.14

5.202

/

AS

AcompAcomp MS

MSF

Page 48: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

if you have strong and precise theoretical questions before you run the experiment, planned comparisons are considerably more powerful than post hoc tests.

Multiple Comparison Tests – Planned comparisons

Page 49: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Power Analysis

"How precise will my parameter estimates tend to be if I select a particular sample size?" and "How big a sample do I need to attain a desirable level of precision?"

Null True

(exp. hypothesis

wrong)

No Effect

Null False

(exp.

hypothesis

correct)

Real Effect

Reject Null

Decides: there are real effects (accept exp. Hypothesis)

Type I ERROR prob = “significance level ”

Correct prob = 1- “power”

Accept Null

Decides: there are no real effects (reject exp. Hypothesis)

Correct prob = 1-

Type II ERROR

prob =

Ideally, power should be at least .80 to detect a reasonable departure from the null hypothesis

Page 50: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Online power calculator

Page 51: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Online power calculator

Page 52: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Calculate power

Page 53: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Calculate power

Page 54: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Calculate sample size

Page 55: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

Calculate sample size

Page 56: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

StatView

Page 57: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability
Page 58: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability
Page 59: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability
Page 60: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability
Page 61: Descriptive Statistics:  The mean, and the number of the measurements (n) that it was based on  The standard deviation: A measure of the variability

References:

1. ‘Asking questions in biology’ Chris Barnard, Francis Gibert & Peter McGregor. Prentice Hall, Edition 2, 2001. (A fun book to read :)

2. The Really Easy Statistics Site (Strongly recommend!) http://helios.bto.ed.ac.uk/bto/statistics/tress1.html

3. Post Hoc Test: www.ioa.pdx.edu/newsom/da1/ho_posthoc.doc

4. http://www.une.edu.au/WebStat/unit_materials/c7_anova/oneway_post_hoc.htm

5. Power analysis: http://www.math.yorku.ca/SCS/Online/power/; http://calculators.stat.ucla.edu/powercalc/