25
Analysis of Variance (ANOVA) Peter Shaw RU

# 1 Way Analysis of Variance (ANOVA) Peter Shaw RU

Embed Size (px)

Citation preview

1 Way Analysis of Variance (ANOVA)

Peter Shaw

RU

1 way ANOVA – What is it?

This is a parametric test, examining whether the means differ between 2 or more populations.Males Females

Do males differ from females?

Site 1 Site 2

Do results differ between these sites?

Site 3

This is not in itself so unusual, indeed we are spoiled for choice:

Parametric Non-parametric

2 classes only

t test, anova Mann-Whitney U

2 or more classes

anova Kruskal-Wallis test

So why am I spending so much time on anova?

1: Because anova is the definitive analytical tool: it allows one to ask questions that cannot be asked any other way.

2: You need to be familiar with the layout of anova tables.

3: Because I want you to understand the degrees of freedom associated with anova models. There are deep pitfalls associated with allocation of dfs, and inspection of the dfs in an anova table allow one to understand immediately what model another researcher has used.

What anova actually does:

It partitions the variation in the data into components, some of which can be explained by the experimenter (such as the difference between two treatments), and some of which is unexplained.

The unexplained variation is called “error”, but is in fact essential to performing the anova.

It generates a test statistic F, which is the ratio of explained to unexplained variation. This can be thought of as a signal:noise ratio. Thus large values of F indicate a high degree of pattern within the data and imply rejection of H0.

It is thus similar to the t test - in fact ANOVA on 2 groups is equivalent to a t test [F = t2 ; formally F 1,n-2 = (Tn-2)2]

1 2 3 4 5 6 7 8 Datapoint number

Value

Overall mean (μ)

The core of anova is to partition the sum of squares of a dataset: This is the summed values of (X-mean) 2, otherwise known as the sum of residuals2.

Residuals

Linear model: Each observation is the mean plus a random errorXi = μ + ei

Total sum of squares = SStot= Σi (Xi-mean) 2 = Σi (ei * ei)

1 2 3 4 5 6 7 8 Datapoint number

Overall mean (μ)

New residuals

Linear model: Each observation is the mean plus a treatment effect plus random error: Xti = μ +Tt+ eti

Treatment 1 Treatment 2

Total sum of squares = Σi (Xi- μ) 2 = Σti (eti * eti) + Σti (Tti * Tti)

= error sum of squares + treatment sum of squares(This is how variation is partitioned. Notice that it only works if Σti (eti) = Σti (Tti) = 0)

Mean of treatment 2

Now we split the data up into treatments:

Now we have one sum of squares which has been partitioned into two sources, explained and unexplained.

The null hypothesis H0 says that these two sources of variation should be equally unimportant, both unexplained random noise. In order to test this we cannot simply look at the sums of squares (because the more samples you collect the more variation you may find), but first divide these by their degrees of freedom to convert SS into variance:

Total variance = total SS / total df – true but not used in most anova tables

treatment variance = treatment SS / treatment dferror variance = error SS / error df.

F ratio (signal/noise) = treatment variance /error variance.

Anova tables:

Learn this layout parrot-fashion! It is correct for a 1-way anova with N observations and T treatments.

Source df SS MS Ftreatment (T-1) SStrt =SStrt/(T-1) MStrt/MSerr

error…………by subtraction Sserr =SSerr/dferr

Total (N-1)

Finally, you (or the PC) consult tables or otherwise obtain a probability of obtaining this F value given dfs for treatment and error.

Exact layout varies somewhat - I dislike SPSS’s version!

It is formally possible to perform an anova by calculating the values of treatment and error for each observation in turn – I have a handout showing this.

In practice no-one does it this way because there is a labour-saving shortcut that is easily learned and implemented, which I intend to show you now.

How to do an ANOVA by hand:1: Calculate N, Σx, Σx2 for the whole dataset.2: Find the Correction factor

CF = (Σx * Σx) /N3: Find the total Sum of Squares for the data

= Σ(xi2) – CF

4: add up the totals for each treatment in turn (Xt.), then calculate Treatment Sum of Squares

SStrt = Σt(Xt.*Xt.)/r - CF

where Xt. = sum of all values within treatment t, and r is the number of observations that went into that total.

3: Draw up ANOVA table, getting error terms by subtraction.

One way ANOVA’s limitations

This technique is only applicable when there is one treatment used.

Note that the one treatment can be at 3, 4,… many levels. Thus fertiliser trials with 10 concentrations of fertiliser could be analysed this way, but a trial of BOTH fertiliser and insecticide could not.

T1 T2 T37 14 208 16 1811 19 2215 18 1912 15 16

Totals (to be nice to you!)53 82 95

What to do when you want to test :H0: group means are the same

When the data are clearly not normally distributed?

If you have 2 groups, you can fall back on Mann-Whitney’s U testBUT: 3 or more groups – you can’t do multiple U tests, just as you can’t do multiple t tests in place of a 1-way anova. (Why not?)

There are 2 good alternatives, one of which is supplied in SPSS, one of which needs special code (I have some home-written).

1: Kruskal-Wallis non-parametric anova (good and safe)

2: use normal anova but use a Monte-Carlo approach to empirically estimate p values. (This is a perfect, safe and reliable way to generate p values, but is not widely available).

Post-hoc testsOften one runs an ANOVA on a dataset where the “treatment” variable comes at >3 levels. If p>0.05 you simply assume that the groups do not differ. If however p<0.05, students often ask whether this proves some specific difference, such as showing that site 1 differs from site 2.

The simple answer is “NO”. The p value tests the classification as a whole, and you can’t infer specific differences from it. If you do want to ask about a specific division within your classification you need to explore the world of post-hoc tests (=”after the event”).

There are a plethora of these, and you can run them by hand, but you need to be careful of handling your significance levels.

hat

Take random data and assemble into 2 piles, then test H0: no difference between them. Using p = 0.05 you know that you will reject this H0 1 time in 20. That is what p = 0.05 means.

hat

Now assemble into 3 piles, then test H0: no difference between teach pair: P1-P2, P1-P3, P2-P3

1 time in 20 p1-p2 is *1 time in 20 p1-p3 is *1 time in 20 p2-p3 is *

p1 p2 p3

Why you don’t do multiple t tests. Or any other test, unless you have your eyes open….

Now we ask what the probability is that we will end up accepting H0. This involves accepting H0 in test 1 (P1P2), AND in P1-P3, AND in P2P3. In each case the probability of accepting H0 is 0.95 (=1-p), but the probability of accepting the 3 together is 0.95*0.95*0.95 = 0.857375 (nearly, but not quite, 1-3*p).

But if p(accepting H0) = 0.86, then p(rejecting H0) = 0.14. So in random data you will reject H0 1 time in 7, not 1 in 20. So if you claim in your write-up that you used p=0.05 you are lying, albeit probably unwittingly.

It is OK to do this PROVIDING you know what you are doing, and you apply a more stringent criterion to each individual test. If you are doing N different tests on subsets of the same data, each one should run at a significance level of

P = 1-(1-α)1/N = 1- n (1- α)

Where α is the final significance level.

3 tests, α = 0.05, adjusted p = 1-0.95^(1/3) = 0.017.

Post-hoc tests in SPSSAre hidden under “Compare means – 1 way anova”.

8888888N =

SITE

site 7site 6site 5site 4site 3site 2site 1

FE120

100

80

60

40

20

0

-20

25

Dissolved Fe in water draining Pelenna mine, Swansea.

Fe,

ppm

F6,49 = 72.9 p<0.001

But which sites differ from each other?

Duncan’s multiple range test:

FE

Duncana

8 1.0000

8 1.1250

8 1.2500

8 1.3750

8 2.3750

8 19.0000

8 62.5000

.752 1.000 1.000

NUMSITE1.00

7.00

6.00

5.00

4.00

3.00

2.00

Sig.

N 1 2 3

Subset for alpha = .05

Means for groups in homogeneous subsets are displayed.

Uses Harmonic Mean Sample Size = 8.000.a.

Note

1: Means are sorted into ascending order

2: all bar 2 are in a homogenous subgroup: site 3 is in a group by itself, as is site 2.

Presentation methods:1: Leave means sorted into order and underline those that do not differ

8888888N =

SIZEORDR

7.006.005.004.003.002.001.00

FE

120

100

80

60

40

20

0

-20

25

1 7 6 5 4 3 2Site

A B C

2: the ABC method

1.00

62.50

19.00

2.38

1.38

1.25

1.13

1.00

2.00

3.00

4.00

5.00

6.00

7.00

Leave the means in their original order but indicate which group they are in by giving a letter of the alphabet to each line in the graph just presented. Then you add the text “means followed by the same letter do not differ at p<0.05”.

ACBAAAA

And if the data are very non-normal?

You have always got a non-parametric anova, known as the Kruskal Wallis test. This does not have a post-hoc test, but you can create one with care.

1: Compare every group with every other by a U or K-W test, but apply a more stringent significance test as explained earlier.

2: Sort means (or better medians) into ascending order, and underline those which do not differ significantly as before.

10121518N =

SITE

4.003.002.001.00

MA

YF

LY

50

40

30

20

10

0

-10

48

Mayflies on Pelenna stream (4 sites only). P<0.05 by Kruskal-Wallis test.

Site 1 2 3 41 - NS NS NS2 - 0.036 0.0063 - NS4 -

P values for each pairwise comparison in turn:

10121815N =

SIZEORDR

4.003.002.001.00

MA

YF

LY

50

40

30

20

10

0

-10

48

Adjust significance to 1-(0.95^1/6) = 0.0085, and underline sites that do not differ at this level

2 1 3 4

Site

Or list as follows:Site 1AB 2A 3AB 4B

BA