13 Nested ANOVA 2012

Embed Size (px)

DESCRIPTION

nested anova

Citation preview

  • Nested (hierarchical) ANOVA

    what it is

    how you do it

    variance components

    power and optimal allocation of replication

    Nested ANOVA

    designs with subsamples nested within replicates if the nesting is not acknowledged, these designs are

    pseudoreplicated

    nesting is usually spatial, but can be temporal

    variation is partitioned among hierarchical levels

  • all levels of one factor are not present in all levels of another factor

    some levels are uniquely present within some levels of another factor, but not other levels

    nested factors are usually random factors nested fixed factors require justification

    What is a nested factor?

    Factor A Factor B

    A 1

    A 2

    A 3

    B 4

    B 5

    B 6

    C 7

    C 8

    C 9

    Factor A Factor B

    A 1

    A 2

    A 3

    B 1

    B 2

    B 3

    C 1

    C 2

    C 3

    Nested B(A) Not Nested: A x B

    creeks (tributaries) are unique to each river

    multiple samples of a single tissue type within a rat

    subsamples in time (if sampled w/o replacement) can only be sampled at one time and not another

    replicates are always nested within treatments -- but we dont consider this nesting when we construct the

    ANOVA model

    Examples of Nesting

  • factors A & B

    factor A with p groups or levels

    factor B with q groups or levels within each level of A

    nested design: different (randomly chosen) levels of Factor B in each

    level of Factor A

    often one or more levels of subsampling

    Two factor nested ANOVA design

    Factor A - fixed sea urchin density four levels:

    100% of original (control) 66% 33% 0%

    Factor B - random randomly chosen patches 4 within each treatment

    n = 5 quadrats / patch

    Example: sea urchin grazing on reefs Andrew & Underwood (1997)

    effect of sea urchin density on the % cover of filamentous algae

  • Density: 100% 66% etc.

    Patch: 1 2 3 4 5 6 7 8

    Reps: n = 5 in each of 16 cells

    p = 4 densities, q = 4 patches

    layout: sea urchin grazing on reefs

    yijk = + i + j(i) + ijk where

    overall mean i effect of factor A (i - ) i(i) effect of factor B within each level of A

    (ij - i) ijk unexplained variation (error term) -

    variation within each cell

    (% cover algae)ijk = + (sea urchin density)i + (patch within sea urchin density)j(i) + ijk

    Linear model

  • Main effect: effect of factor A i.e., variation among factor A group means

    Nested (random) effect: effect of factor B within each level of factor A variation among means of factor B within each level of A

    Effects

    Factor A:

    H0: no difference among means of factor A (= no difference among means of urchin density treatments)

    1 = 2 = = i =

    is equivalent to

    H0: no main effect of factor A (no effect of urchin density): 1 = 2 = = i = 0 i = (i - ) = 0

    Null hypotheses

    H0: Factor A: No difference in mean amount of filamentous algae

    between the four sea urchin density treatments

    H0: Factor B: No difference in the mean amount of filamentous algae

    between all possible patches in any of the treatments

  • Factor B(A)

    H0: no difference among means of factor B within any level of factor A

    (no difference among patches in mean filamentous algae

    cover within any urchin density treatment)

    11 = 12 = = 1j 21 = 22 = = 2j etc.

    H0: no variance among levels of nested random factor B within any level of factor A

    (no variance among patches within each density treatment):

    2 = 0

    Null hypotheses

    SSTotal

    SSA + SSB(A) + SSResidual

    SSA variation among A means

    SSB(A) variation among B means within each level of A

    SSResidual variation among replicates within each cell (each B(A))

    Partitioning total variation

  • Nested ANOVA table

    Source SS df MS

    Factor A SSA p-1 SSA/(p-1)

    Factor B(A) SSB(A) p(q-1) SSB(A)/(p(q-1))

    Residual SSResidual pq(n-1) SSResidual/(pq(n-1))

    A fixed, B random:

    MSA

    MSB(A)

    MSResidual

    Expected Mean Squares

  • if no main effect of factor A: H0: 1 = 2 = i = (i = 0) is

    true

    F-ratio: MSA / MSB(A) 1

    if no effect of nested random factor B(A):

    H0: 2 = 0 is true F-ratio: MSB(A) / MSResidual 1

    MSA

    MSB(A)

    MSResidual

    Testing null hypotheses estimate parameters

    H0: true; no variation among A H0: false; is variation among A

    patches dont differ patches do differ dont differ patches do differ

    A1

    Ai

    B1

    B2

    Bj

    B1

    B2

    Bj

    Bjs=0 Bjs0 Ais=0

    Bjs=0 Bjs0 Ais0

    4 possible outcomes

  • doesnt matter whether nested factor varies or not, you can look for differences among treatments

    e.g., compared to the 1-way design, the nested design un-confounds subsamples from true replicates

    nested designs separate confounded additive factors

    Treatment effects in nested designs

    no effect of urchin density on percentage cover of filamentous

    algae

    filamentous algal cover varies significantly from patch to patch

    about 50% of the variance in percentage cover of algae is

    explained by differences between patches

    remaining 50% is explained by differences at the scale of

    quadrats within patches

    Results: Andrew & Underwood 1993 Source df MS F p var. comp. %

    Density 3 4810 2.72 0.09 - -

    Patches(Density) 12 1770 5.93

  • Main effect: planned contrasts & trend analyses as part of design unplanned multiple comparisons if main F-ratio test

    significant

    Nested effect: usually random factor usually of little interest in further tests often can provide information on the characteristic

    spatial signal of a population

    Additional Tests

    what is the effect of schools on standardized tests? (i.e., do scores differ among schools?)

    is the effect of school driven in part by differences in teachers?

    Another worked example

  • Data: three schools,

    two teachers at each

    schools, two scores per

    teacher

    True data matrix,

    accounts for teachers

    not being the same at

    each school

    Data format for statistics

    Analysis of Variance

    Source Sum-of-Squares df Mean-Square F-ratio P

    SCHOOL$ 156.50000 2 78.25000 11.17857 0.00947

    TEACHER(SCHOOL$) 567.50000 3 189.16667 27.02381 0.00070

    Error 42.00000 6 7.00000

    What does this mean???

    ANOVA output

  • SYSTAT and other stats software generally will not

    automatically construct the

    F ratio correctly

    F-ratio is: MSschool / MSteacher(school)

    Big effect of teacher!

    What about effect of school?

    Test of Hypothesis

    Source SS df MS F P

    Hypothesis 156.50000 2 78.25000 0.41366 0.69397

    Error 567.50000 3 189.16667

    Analysis of Variance

    Source Sum-of-Squares df Mean-Square F-ratio P

    SCHOOL$ 156.50000 2 78.25000 11.17857 0.00947

    TEACHER(SCHOOL$) 567.50000 3 189.16667 27.02381 0.00070

    Error 42.00000 6 7.00000

    accounting for teacher effect

    No effect of school!

    Before:

    After:

  • Power

    more replication always gives you more power

    but in nested ANOVA, there is replication at various levels

    where does your power come from? if you have nested factors within your treatments,

    you need to replicate the nested factor, not the

    subsamples

    used to provide information on the characteristic spatial signal of populations

    other techniques (geostatistical models) also can do this, but nested models are very efficient

    variance component models (part of nested) can provide the percent of variation that is associated with particular spatial scales

    Spatially nested designs

  • Region 1

    Region 2

    Region 3

    Locations Sites

    Transects

    Regions

    Locations(Regions)

    Sites(Locations(Regions))

    Transects(Sites(Locations(Regions)))

    What spatial scale is most of the variance associated with?

    At what scale is most

    of the variance?

    Source df MS F Var. comp (%)

    Region 2 6658 10.4 247 (42)

    Location(Region) 6 638 2.43 71 (12)

    Site(Location(Region)) 18 263 1.40 88 (14)

    Transect = Residual, Error 54 187 187 (32)

  • Optimal Allocation of Replication at different levels

    can calculate at what level it is best to spend your time or money on replication

    must know variance at each level (var. components) cost / effort to obtain replication at each hierarchical

    level