13 Nested ANOVA 2012

Nested (hierarchical) ANOVA

what it is

how you do it

variance components

power and optimal allocation of replication

Nested ANOVA

designs with subsamples nested within replicates if the nesting is not acknowledged, these designs are

pseudoreplicated

nesting is usually spatial, but can be temporal

variation is partitioned among hierarchical levels

all levels of one factor are not present in all levels of another factor

some levels are uniquely present within some levels of another factor, but not other levels

nested factors are usually random factors nested fixed factors require justification

What is a nested factor?

Factor A Factor B

A 1

A 2

A 3

B 4

B 5

B 6

C 7

C 8

C 9

Factor A Factor B

A 1

A 2

A 3

B 1

B 2

B 3

C 1

C 2

C 3

Nested B(A) Not Nested: A x B

creeks (tributaries) are unique to each river

multiple samples of a single tissue type within a rat

subsamples in time (if sampled w/o replacement) can only be sampled at one time and not another

replicates are always nested within treatments -- but we dont consider this nesting when we construct the

ANOVA model

Examples of Nesting

factors A & B

factor A with p groups or levels

factor B with q groups or levels within each level of A

nested design: different (randomly chosen) levels of Factor B in each

level of Factor A

often one or more levels of subsampling

Two factor nested ANOVA design

Factor A - fixed sea urchin density four levels:

100% of original (control) 66% 33% 0%

Factor B - random randomly chosen patches 4 within each treatment

n = 5 quadrats / patch

Example: sea urchin grazing on reefs Andrew & Underwood (1997)

effect of sea urchin density on the % cover of filamentous algae

Density: 100% 66% etc.

Patch: 1 2 3 4 5 6 7 8

Reps: n = 5 in each of 16 cells

p = 4 densities, q = 4 patches

layout: sea urchin grazing on reefs

yijk = + i + j(i) + ijk where

overall mean i effect of factor A (i - ) i(i) effect of factor B within each level of A

(ij - i) ijk unexplained variation (error term) -

variation within each cell

(% cover algae)ijk = + (sea urchin density)i + (patch within sea urchin density)j(i) + ijk

Linear model

Main effect: effect of factor A i.e., variation among factor A group means

Nested (random) effect: effect of factor B within each level of factor A variation among means of factor B within each level of A

Effects

Factor A:

H0: no difference among means of factor A (= no difference among means of urchin density treatments)

1 = 2 = = i =

is equivalent to

H0: no main effect of factor A (no effect of urchin density): 1 = 2 = = i = 0 i = (i - ) = 0

Null hypotheses

H0: Factor A: No difference in mean amount of filamentous algae

between the four sea urchin density treatments

H0: Factor B: No difference in the mean amount of filamentous algae

between all possible patches in any of the treatments

Factor B(A)

H0: no difference among means of factor B within any level of factor A

(no difference among patches in mean filamentous algae

cover within any urchin density treatment)

11 = 12 = = 1j 21 = 22 = = 2j etc.

H0: no variance among levels of nested random factor B within any level of factor A

(no variance among patches within each density treatment):

2 = 0

Null hypotheses

SSTotal

SSA + SSB(A) + SSResidual

SSA variation among A means

SSB(A) variation among B means within each level of A

SSResidual variation among replicates within each cell (each B(A))

Partitioning total variation

Nested ANOVA table

Source SS df MS

Factor A SSA p-1 SSA/(p-1)

Factor B(A) SSB(A) p(q-1) SSB(A)/(p(q-1))

Residual SSResidual pq(n-1) SSResidual/(pq(n-1))

A fixed, B random:

MSA

MSB(A)

MSResidual

Expected Mean Squares

if no main effect of factor A: H0: 1 = 2 = i = (i = 0) is

true

F-ratio: MSA / MSB(A) 1

if no effect of nested random factor B(A):

H0: 2 = 0 is true F-ratio: MSB(A) / MSResidual 1

MSA

MSB(A)

MSResidual

Testing null hypotheses estimate parameters

H0: true; no variation among A H0: false; is variation among A

patches dont differ patches do differ dont differ patches do differ

A1

Ai

B1

B2

Bj

B1

B2

Bj

Bjs=0 Bjs0 Ais=0

Bjs=0 Bjs0 Ais0

4 possible outcomes

doesnt matter whether nested factor varies or not, you can look for differences among treatments

e.g., compared to the 1-way design, the nested design un-confounds subsamples from true replicates

nested designs separate confounded additive factors

Treatment effects in nested designs

no effect of urchin density on percentage cover of filamentous

algae

filamentous algal cover varies significantly from patch to patch

about 50% of the variance in percentage cover of algae is

explained by differences between patches

remaining 50% is explained by differences at the scale of

quadrats within patches

Results: Andrew & Underwood 1993 Source df MS F p var. comp. %

Density 3 4810 2.72 0.09 - -

Patches(Density) 12 1770 5.93

Main effect: planned contrasts & trend analyses as part of design unplanned multiple comparisons if main F-ratio test

significant

Nested effect: usually random factor usually of little interest in further tests often can provide information on the characteristic

spatial signal of a population

Additional Tests

what is the effect of schools on standardized tests? (i.e., do scores differ among schools?)

is the effect of school driven in part by differences in teachers?

Another worked example

Data: three schools,

two teachers at each

schools, two scores per

teacher

True data matrix,

accounts for teachers

not being the same at

each school

Data format for statistics

Analysis of Variance

Source Sum-of-Squares df Mean-Square F-ratio P

SCHOOL$ 156.50000 2 78.25000 11.17857 0.00947

TEACHER(SCHOOL$) 567.50000 3 189.16667 27.02381 0.00070

Error 42.00000 6 7.00000

What does this mean???

ANOVA output

SYSTAT and other stats software generally will not

automatically construct the

F ratio correctly

F-ratio is: MSschool / MSteacher(school)

Big effect of teacher!

What about effect of school?

Test of Hypothesis

Source SS df MS F P

Hypothesis 156.50000 2 78.25000 0.41366 0.69397

Error 567.50000 3 189.16667

Analysis of Variance

Source Sum-of-Squares df Mean-Square F-ratio P

SCHOOL$ 156.50000 2 78.25000 11.17857 0.00947

TEACHER(SCHOOL$) 567.50000 3 189.16667 27.02381 0.00070

Error 42.00000 6 7.00000

accounting for teacher effect

No effect of school!

Before:

After:

Power

more replication always gives you more power

but in nested ANOVA, there is replication at various levels

where does your power come from? if you have nested factors within your treatments,

you need to replicate the nested factor, not the

subsamples

used to provide information on the characteristic spatial signal of populations

other techniques (geostatistical models) also can do this, but nested models are very efficient

variance component models (part of nested) can provide the percent of variation that is associated with particular spatial scales

Spatially nested designs

Region 1

Region 2

Region 3

Locations Sites

Transects

Regions

Locations(Regions)

Sites(Locations(Regions))

Transects(Sites(Locations(Regions)))

What spatial scale is most of the variance associated with?

At what scale is most

of the variance?

Source df MS F Var. comp (%)

Region 2 6658 10.4 247 (42)

Location(Region) 6 638 2.43 71 (12)

Site(Location(Region)) 18 263 1.40 88 (14)

Transect = Residual, Error 54 187 187 (32)

Optimal Allocation of Replication at different levels

can calculate at what level it is best to spend your time or money on replication

must know variance at each level (var. components) cost / effort to obtain replication at each hierarchical

level

Documents

13 Nested ANOVA 2012