IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Analysis of Variance
Janette Walde
Department of StatisticsUniversity of Innsbruck
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Outline I1 Introduction
ProblemsWhat is Analysis of VarianceSome Terminology
2 ANOVAObject of InvestigationExploratory AnalysisNotationAssumptions
3 One-Way ANOVAArea of ApplicationHypothesis TestingExample
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Outline IIPost-Hoc AnalysisPower Analysis
4 Two-Way ANOVATerminologyAssumptionsResultsExploratory AnalysisExample
5 Further Extensions
6 Useful R-commands
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
ProblemsWhat is Analysis of VarianceSome Terminology
Problems/Questions
Do fertilizer have different effects on different kind ofwheat?React females differently on anti-cancer drugs as males?Does water evaporation of soil depend on the kind ofvegetation growing, controlling for climate conditions?A new treatment meant to help those with chronic arthritispain was developed and tested for its long-terneffectiveness. Participants in the experiment rated theirlevel of pain on a 0 (no pain) to 9 (extreme pain) scale atthree-month intervals. Was the treatment effective? (Does the exposure of plants to various amounts of CO2affect characteristics of the plant?
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
ProblemsWhat is Analysis of VarianceSome Terminology
What is ANOVA?
ANalysis Of VAriance.
Partitions the observed variance based on explanatory(independent) variables.
Compares partitions to test significance on explanatoryvariables.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
ProblemsWhat is Analysis of VarianceSome Terminology
Some Terminology
Between subject design - each subject participates in oneand only one group.
Within subjects design - the same group of subjects servesin more than one treatment - Subject is now a factor.
Mixed design - a study which has both between and withinsubject factors.
Repeated measures - general term for any study in whichmultiple measurements are measured on the samesubject.Can be either multiple treatments or severalmeasurements over time.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Object of InvestigationExploratory AnalysisNotationAssumptions
Object of investigation
Use variances and variance like quantities to study theequality or non-equality of population means.
So, although it is analysis of variance we are actuallyanalyzing means, not variances.
There are other methods which analyze the variancesbetween groups.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Object of InvestigationExploratory AnalysisNotationAssumptions
Typical exploratory analysis include
Tabulation of the number of subjects in experimental group.
Side-by-side box plots.
Statistics about each group.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Object of InvestigationExploratory AnalysisNotationAssumptions
Notation
If we have K groups denote the means of the groups asµ1, µ2, ..., µK .Subject i in group j has observation yij :
yij = µj + εij
where εij are independent distributed N(0, σ2).Can combine this and say that subjects from group j havedistribution N(µ, σ2).
With random assignment the sample mean for anytreatment group is representative of the population meanfor that group.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Object of InvestigationExploratory AnalysisNotationAssumptions
Assumptions
1 The errors εij are normally distributed.2 Across the conditions the errors have equal spread. Often
referred to as equal vaiances
Rule of thumb: the assumption is met if the largest varianceis less than twice the smallest variance.If unequal variances need to make a correction. This isusually α/2.
3 The errors are independent from each other.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Object of InvestigationExploratory AnalysisNotationAssumptions
Checking the assumptions
Use the residuals which are the estimates of εij .1 Look at normal probability plot.2 Look at residual versus fitted plot.3 Hard to check often assumed from study design.
For mild violations of the assumptions there are options forcorrection.
When the assumptions are NOT met the p-values aresimply wrong.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Basics
One-way ANOVA is used whenOnly testing the effect of one explanatory variable.Each subject has only one treatment or condition. Thus, abetween-subject design.
Used to test for differences among two or moreindependent groups.
Gives the same results as two sample t-tests if explanatoryvariable has to levels.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Hypothesis
H0 : µ1 = µ2 = ... = µK
H1: The µ’s are not all equal.
The null hypothesis is called the overall null and is thehypothesis tested by ANOVA.
If the overall null is rejected you must do more specifichypothesis testing to determine which means are different,often referred to as contrasts.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Terminology
The sample variance is the sum of the squared deviationsfrom the mean divided by the number of observationsminus 1
s2 =
∑
(xi − x̄)2
n − 1
A mean square (MS) is a variance like quantity calculatedas the sum of the squared deviations (SS) divided by thedegrees of freedom (df )
MS =SSdf
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Within versus Between
In one-way ANOVA we work with two mean squarequantities
* MSwithin ... the mean square within-groups* MSbetween ... the mean square between-groups
For each individual group we have
SSidfi
=∑ni
j=1(xij−x̄i)2
ni−1
So the estimate of MSwithin is
MSwithin = SSwithindfwithin
=∑K
i=1 SSiN−K
And the estimate of MSbetween is
MSbetween = SSbetweendfbetween
=∑K
i=1 ni(x̄i−x̄)2
K−1
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Mean Squares
What do these values mean?
MSwithin is considered a true estimate of σ2 that isunaffected by whether the null or alternative hypothesis istrue.
MSbetween is considered a good estimate of σ2 only whenthe null hypothesis is true. If the alternative is true, valuesof MSbetween tend to be inflated.
Thus, we can look at the ratio of the two mean squarevalues to evaluate the null hypothesis.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Testing the Hypothesis
The F -test looks at the variation among the group meansrelative to the variation within the sample
F = MSbetweenMSwithin
= SSbetween/dfbetweenSSwithin/dfwithin
= SSbetween/(K−1)SSwithin/(N−K )
The F -statistic tends to be larger if the alternativehypothesis is true than if the null hypothesis is true.
The test statistic F has an F (K − 1,N − K ) distribution.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
What does the F ratio tell us?
F = MSbetween/MSwithin
The denominator is always an estimate of σ2 (under boththe null and alternative hypotheses).
The numerator is either another estimate of σ2 (under thenull) or is inflated (under the alternative).
If the null is true, values of F are close to 1.
If the alternative is true, values of F are larger.
Large values of F depend on the degrees of freedom.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
The ANOVA table
When running an ANOVA, statistical packages will return anANOVA table summarizing the SS, MS, df , F -statistic, andp-value:
SS df MS F SigGroup
(Treatment, SSbet. dfbet. MSbet.MSbet.MSwithin
p-valuebetween)Residual(Error, SSwithin dfwithin MSwithin
within)Total SSbet.+ dfbet.+
SSwithin dfwithin
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Example
The data are gathered from a plant physiology experimentwhich investigated the effect of various sugar on the growth ofpeas. Growth or length is measured in ’ocular units’. Fivegroups are analyzed, a control group and four groups varying inthe kind of sugar and its amount. In each groups there are 10measurements.
sub- control 2% 2% 1% glucose + 1%jects group glucose fructose 2% saccharose fructose1 71 57 58 58 622 68 58 61 59 663 70 60 56 58 65...
......
......
...
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Example
We want to know whether the means of the variables’length’ differ significantly across the groups, i.e. does thesupply with sugar influence the growth of the peas?
Use 5 Control group, group 1, group 2, group 3, and group4.
H0 : Growth is independent of the sugar support.H0 : µcontrolgroup = µgroup1 = µgroup2 = µgroup3 = µgroup4
H1 : Growth varies across groups.H1 : At least one of the means is different.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Box plots
"1% fructose" "2% fructose" "control group"
6065
70
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Summary
The largest variance is less than twice the smallest variance(2.2 < 2 · 1.4 = 2.8). Use α = 0.05.
Groups ni Mean Variance
Control group 10 70.1 2.2Group 1 10 64.1 1.8Group 2 10 58.0 1.4Group 3 10 58.2 1.9Group 4 10 59.3 1.6
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Degrees of Freedom
How many groups do we have?There are K = 5 groups
What is the sample size?There are N = 50 peas.Using these values:
What is dfwithin?K − 1 = 5 − 1 = 4What is dfbetween?N − K = 50 − 5 = 45
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Degrees of Freedom
How many groups do we have?There are K = 5 groups
What is the sample size?There are N = 50 peas.Using these values:
What is dfwithin?K − 1 = 5 − 1 = 4What is dfbetween?N − K = 50 − 5 = 45
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Degrees of Freedom
How many groups do we have?There are K = 5 groups
What is the sample size?There are N = 50 peas.Using these values:
What is dfwithin?K − 1 = 5 − 1 = 4What is dfbetween?N − K = 50 − 5 = 45
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Degrees of Freedom
How many groups do we have?There are K = 5 groups
What is the sample size?There are N = 50 peas.Using these values:
What is dfwithin?K − 1 = 5 − 1 = 4What is dfbetween?N − K = 50 − 5 = 45
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Sample Output
SS df MS F SigGroup
(Treatment, 1077.3 4 269.330 82.168 0.000between)Residual
(Error, 147.5 45 3.278within)Total 1224.8 9
Our estimate of σ2 is approximately 3.3.
The numerator MS = 269.330 and appears to be highlyinflated.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Results
F -statistic = 22.1.
p-value: < 0.05.
Conclusion - the growth differs for at least one of thegroups.
To make stronger statements need to do further testing.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Checking the assumptions
−2 −1 0 1 2
−1
01
2
Theoretical Quantiles
Sta
ndar
dize
d re
sidu
als
Normal Q−Q
4
8
34
58 60 62 64 66 68 70
−2
02
4
Fitted values
Res
idua
ls
Residuals vs Fitted
4
8
34
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Further Analysis
If H0 is rejected, we conclude that not all the µ’s are equal.
We would like to make statements about where there aredifferences.Can use planned or unplanned comparisons (or contrasts).
* Planned comparisons are interesting comparisons decidedon before analysis.
* Unplanned comparisons occur after seeing the results.Be careful not to go fishing for results!
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Contrasts
A simple contrast hypothesis compares two populationmeans:
* H0 : µ1 = µ5
A complex contrast hypothesis has multiple populationmeans on either side:
* H0 : (µ1 + µ2)/2 = µ3
* H0 : (µ1 + µ2)/2 = (µ3 + µ4 + µ5)/3
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Post-Hoc Analysis
Coefficients: Estimate Std. Error t value Pr(> |t |)(Intercept) 64.1 0.5725 111.961 < 2e − 16"1% glu, 2% sac" −6.1 0.8097 −7.534 1.66e − 09"2% fructose" −5.9 0.8097 −7.287 3.83e − 09"2% glucose" −4.8 0.8097 −5.928 3.99e − 07control group" 6.0 0.8097 7.410 2.52e − 09
Residual standard error: 1.81 on 45 degrees of freedomMultiple R-squared: 0.8796, Adjusted R-squared: 0.8689F-statistic: 82.17 on 4 and 45 DF, p-value: < 2.2e − 16
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Post-Hoc Analysis, cont.
What if we notice a possible interesting difference whenlooking at the results?
Can do comparisons but need to adjust the α-level tocontrol for Type-1 error.
Bonferroni correction for the number of comparisons done:α∗ = α
number of comparisons (Bonferroni-Holm
correction).
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Other Options
One common method is to use Tukey’s simultaneousconfidence intervals to calculate any and all pairs of grouppopulation means. This procedure takes multiplecomparisons into consideration to preserve the α-level.
Dunnett’s tests.
Scheffe procedure.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Bonferroni-Holm correction for previous example
Pairwise comparisons using t tests with pooled SD
data: length and group_name"1% fru" "1% glu, "2% fru" "2% glu"
2% sac""1% glu, 2% sac" 1.2e − 08 − − −"2% fructose" 1.9e − 08 0.81 − −"2% glucose" 1.6e − 06 0.35 0.36 −control group" 1.5e − 08 < 2e − 16 < 2e − 16 2.4e − 16
P value adjustment method: holm.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Comparison to Regression Analysis
The conclusions about the overall null hypothesis will bethe same.
In regression can make statements comparing groups tobaseline.
To make more conclusive statements will need to do moreanalysis.
ANOVA and either planned or post-hoc comparisons willdo the same and is often easier.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
One-way ANOVA Power
Two different TOEFL prep. courses charge $1200 for a twomonth course. An (unethical) experiment would be torandomize students into one of the two courses or take nocourse.What information is needed to calculate power for thisone-way ANOVA?
* Sample size* Within group variance (σ2)* Estimated or minimally interesting outcome means for each
group.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Estimate of σ2
Based on previous years, we know that 95% of the studentscores on TOEFL fall between 900 and 1500:
σ2 = (1500 − 900)/4 = 150
σ2 = 1502
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Minimally interesting outcome
What is the minimally average benefit, in points gained,that would justify the program?The minimally interesting outcome is based on previousknowledge.
For this example we’ll try several different values.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Computing the Power
Different applets will define things slightly different (http ://www .epibiostat .ucsf .edu/biostat/sampsize.html).
For the applet I used (’nQuery’), they require’sd[treatment]’. From their definition this is calculated as:
sd[treatment] =
√
∑Ki=1(µi − µ)2
K
µi ... mean of group i
K ... number of groups
Ready to go to power applet.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Computing the Power, Cont. I
Let σ = 150, n = 50, effect = 50 pointsPower = 38%
Let σ = 150, n = 100, effect = 50 pointsPower = 68%
Let σ = 150, n = 50, effect = 100 pointsPower = 94%
Let σ = 150, n = 50, effect = 25 pointsPower = 12%
Let σ = 100, n = 50, effect = 50 pointsPower = 73%
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Computing the Power, Cont. II
Let σ = 100, n = 100, effect = 50 pointsPower = 96%
Let σ = 100, n = 50, effect = 100 pointsPower = 99%
Let σ = 100, n = 50, effect = 25 pointsPower = 23%
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Area of ApplicationHypothesis TestingExamplePost-Hoc AnalysisPower Analysis
Moving past One-way ANOVA
What if we have two categorical explanatory variables?
What if we have categorical and quantitative explanatoryvariables?
What if subjects have more than one treatment?
What if there is more than one response variable?
And many other combinations...
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
TerminologyAssumptionsResultsExploratory AnalysisExample
Two-way ANOVA
Two-way (or multi-way) ANOVA is an appropriate analysismethod for a study with a quantitative outcome and two (ormore) categorical explanatory variables.
Suppose we now have two categorical explanatory variables:
Is there a significant X1 effect?
Is there a significant X2 effect?
Are there significant interaction effects?
If X1 has k levels and X2 has m levels, then the analysis is oftenreferred to as a ’k by m ANOVA’ or ’k × m ANOVA’.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
TerminologyAssumptionsResultsExploratory AnalysisExample
Terminology
If the interaction is significant, the model is called aninteraction model.
If the interaction is not significant, the model is called anadditive model.
Explanatory variables are often referred to as factors.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
TerminologyAssumptionsResultsExploratory AnalysisExample
Assumptions
The assumptions are the same as in One-way ANOVA:1 The errors εij are normally distributed.2 Across the conditions, the errors have equal spread. Often
referred to as equal variances.3 The errors are independent from each other.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
TerminologyAssumptionsResultsExploratory AnalysisExample
Results
Results are again displayed in an ANOVA table
Will have one line for each term in the model. For a modelwith two factors, we will have one line for each factor andone line for the interaction. We will also have a line for theerror and the total.
See next page.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
TerminologyAssumptionsResultsExploratory AnalysisExample
The ANOVA table
SS df MS F Sig.Factor 1 k − 1Factor 2 m − 1
Interaction (k − 1)(m − 1)Error N − k · m ?
Total N − 1
The MS(error), denoted by ? in the above table, is the trueestimate of σ2.
The MS in each row is that row’s SS/df .
The F -statistic is the MS/MS(error).
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
TerminologyAssumptionsResultsExploratory AnalysisExample
Exploratory Analysis
Table of meansInteraction or profile plots
* An interaction plot is a way to look at outcome means fortwo factors simultaneously.
* A plot with parallel lines suggests an additive model.* A plot with non-parallel lines suggests an interaction model.* Note that an interaction plot should NOT be the deciding
factor in whether or not to run an interaction model.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
TerminologyAssumptionsResultsExploratory AnalysisExample
Example
Do anti-cancer drugs have different effects on males andfemales? Three types of different drugs are given patientshaving cancer. The diameter of the tumor is measured.
X1: Kind of drug - 3 levels
X2: Gender - 2 levels
Response: Tumor diameter
We will fit a 3 by 2 ANOVA.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
TerminologyAssumptionsResultsExploratory AnalysisExample
Table of means and counts
Male Female OverallCisplatin 66.875 60.0 63.4375
Vinblastine 66.875 62.5 64.68755-fluorouracil 40.625 57.5 49.0625
Overall 58.125 60.0 59.0625
Note, this table should also include the standard error of each of themeans.
Male FemaleCisplatin 8 8
Vinblastine 8 85-fluorouracil 8 8
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
TerminologyAssumptionsResultsExploratory AnalysisExample
Interaction plots
4045
5055
6065
as.factor(drug_name)
mea
n of
dia
met
er
"5−fluorouracil" "cisplatin" "vinblastine"
as.factor(gender_name)
"male""female"
4045
5055
6065
as.factor(gender_name)
mea
n of
dia
met
er
"female" "male"
as.factor(drug_name)
"cisplatin""vinblastine""5−fluorouracil"
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
TerminologyAssumptionsResultsExploratory AnalysisExample
Interaction plots
There are two ways to do an interaction plot. Both arelegitimate. Ease of interpretation is the final criteria ofwhich to do.
If one explanatory variable has more levels than the other,interpretation is often easier if the explanatory variable withmore levels defines the x-axis.
If one explanatory variable is quantitative but has beencategorized and the other is categorical, interpretation isoften easier if the categorized quantitative variable definesthe x-axis. Example: age, 20 − 29, 30 − 39, 40 − 49, etc.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
TerminologyAssumptionsResultsExploratory AnalysisExample
Results
Output:Df Sum Sq Mean Sq F value Pr(> F )
as.factor(drug) 2 2412.50 1206.25 16.5260 5.077e − 06as.factor(gender) 1 42.19 42.19 0.5780 0.4513514as.factor(drug): 2 1362.50 681.25 9.3333 0.0004429as.factor(gender)Residuals 42 3065.62 72.99
The last column contains the p-values
* Always check interaction first!
* If the interaction is not significant, rerun without it.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
TerminologyAssumptionsResultsExploratory AnalysisExample
Checking the assumptions
−2 −1 0 1 2
−2
−1
01
2
Theoretical Quantiles
Sta
ndar
dize
d re
sidu
als
Normal Q−Q
25
27
9
40 45 50 55 60 65
−20
−10
010
20
Fitted values
Res
idua
ls
Residuals vs Fitted
25
27
9
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
TerminologyAssumptionsResultsExploratory AnalysisExample
Notes
The main effects should always be kept if the interaction issignificant.
Note that due to the groups of students, you will seevertical lines in the residual versus predicted plot. This isdue to the fact that all students with a particularcombination of the factors will have the same predictedvalue.
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
TerminologyAssumptionsResultsExploratory AnalysisExample
Post-hoc Comparisons
You can get Tukey HSD (Tukey Honestly SignificantDifferences) tests in order to calculate post hoc comparisons oneach factor in the model. You can specify specific factors as anoption.
diff lwr upr p adjcisplatin":female 2.500 -10.252 15.252 0.991"5-fluorouracil":female""vinblastine":female 5.000 -7.752 17.752 0.848"5-fluorouracil":female""5-fluorouracil":male -16.875 -29.627 -4.123 0.004"5-fluorouracil":female"...
...Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
Extensions
Analysis of Covariance* At least one quantitative and one categorical explanatory
variable are included in the model.* In general, the main interest is the effects of the categorical
variable and the quantitative variable is considered to be acontrol variable.
* It is a blending of regression and ANOVA.
Multivariate designs: MANOVA/MANCOVA
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
R-commands I
boxplot(length ∼ group_name)
peas.aov < − aov(length ∼ group_name, data =peas.data)
pairwise.t.test(length, group_name, p.adj = "holm")
tapply(diameter, interaction(drug_name, gender_name),mean)
interaction.plot(as.factor(drug_name),as.factor(gender_name), diameter)
Janette Walde Analysis of Variance
IntroductionANOVA
One-Way ANOVATwo-Way ANOVA
Further ExtensionsUseful R-commands
R-commands II
cancer.aovfit1 < − aov(diameter ∼ as.factor(drug_name) ∗as.factor(gender_name))summary(cancer.aovfit1)plot(cancer.aovfit1, which= 2)TukeyHSD(cancer.aovfit1)
cancer.aovfit2 < − aov(diameter ∼ as.factor(drug_name) +as.factor(gender_name)+as.factor(drug_name) :as.factor(gender_name))summary(cancer.aovfit2)
Janette Walde Analysis of Variance