An introduction to Models
ANOVA, Regression and Models
fertil yield1 6.271 5.361 6.391 4.851 5.991 7.141 5.081 4.071 4.351 4.952 3.072 3.292 4.042 4.192 3.412 3.752 4.872 3.942 6.282 3.153 4.043 3.793 4.563 4.553 4.553 4.533 3.533 3.713 7.00
Error variation is caused by forces other than fertiliser
Treatment variation is caused by fertiliser
Fertil M F Y MY MF FY1 1 4.64 5.45 6.27 1.63 0.80 0.822 1 4.64 5.45 5.36 0.72 0.80 -0.093 1 4.64 5.45 6.39 1.75 0.80 0.944 1 4.64 5.45 4.85 0.21 0.80 -0.605 1 4.64 5.45 5.99 1.35 0.80 0.546 1 4.64 5.45 7.14 2.50 0.80 1.697 1 4.64 5.45 5.08 0.44 0.80 -0.378 1 4.64 5.45 4.07 -0.57 0.80 -1.389 1 4.64 5.45 4.35 -0.29 0.80 -1.1010 1 4.64 5.45 4.95 0.31 0.80 -0.5011 2 4.64 4.00 3.07 -1.57 -0.64 -0.9312 2 4.64 4.00 3.29 -1.35 -0.64 -0.7113 2 4.64 4.00 4.04 -0.60 -0.64 0.0414 2 4.64 4.00 4.19 -0.45 -0.64 0.1915 2 4.64 4.00 3.41 -1.23 -0.64 -0.5916 2 4.64 4.00 3.75 -0.89 -0.64 -0.2517 2 4.64 4.00 4.87 0.23 -0.64 0.8718 2 4.64 4.00 3.94 -0.70 -0.64 -0.0619 2 4.64 4.00 6.28 1.64 -0.64 2.2820 2 4.64 4.00 3.15 -1.49 -0.64 -0.8521 3 4.64 4.49 4.04 -0.60 -0.16 -0.4522 3 4.64 4.49 3.79 -0.85 -0.16 -0.7023 3 4.64 4.49 4.56 -0.08 -0.16 0.0724 3 4.64 4.49 4.55 -0.09 -0.16 0.0625 3 4.64 4.49 4.55 -0.09 -0.16 0.0626 3 4.64 4.49 4.53 -0.11 -0.16 0.0427 3 4.64 4.49 3.53 -1.11 -0.16 -0.9628 3 4.64 4.49 3.71 -0.93 -0.16 -0.7829 3 4.64 4.49 7.00 2.36 -0.16 2.5130 3 4.64 4.49 4.61 -0.03 -0.16 0.12
0
2
4
6
8
0 1 2 3 4
FERTILISER
1 2 3
YIELD PER PLOT
0
1
2
3
4
5
6
7
8
0 5 10 15 20 25 30 35
YIELD PER PLOT
Mean yield
PLOT NUMBER
0123456789
10
0 10 20 30
PLOT NUMBER
Mean ofY
0123456789
10
0 10 20 30
PLOT NUMBER
Mean ofA
Mean ofB
Mean ofC
0123456789
10
0 10 20 30
PLOT NUMBER
Mean ofY
0123456789
10
0 10 20 30
PLOT NUMBER
Means ofA, B & C
MS (fertil) estimate (Error variation + Treatment variation)MS (error) estimates (Error variation)
So if there is no effect of treatment, both MSs estimate the same thing, and their ratio should be about one.
In other words, the F-ratio should be about one.
If there is an effect of treatment, then MS (fertil) estimates something bigger than MS(error), so the F-ratio should be bigger than one.
Question: is the F-ratio bigger than one, to a greater extent than would occur just by chance?
0
10
20
30
40
50
60
70
80
62.5 65 67.5 70 72.5 75 77.5 80 82.5 85 87.5
Height (feet)Height (feet)
Volume(cubic feet)Volume(cubic feet)
Volume of timber plotted against height of a tree
DIAM HT VOL 8.3 70 10.3 8.6 65 10.3 8.8 63 10.210.5 72 16.410.7 81 18.810.8 83 19.711.0 66 15.611.0 75 18.211.1 80 22.611.2 75 19.911.3 79 24.211.4 76 21.011.4 76 21.411.7 69 21.312.0 75 19.112.9 74 22.212.9 85 33.813.3 86 27.413.7 71 25.713.8 64 24.914.0 78 34.514.2 80 31.714.5 74 36.316.0 72 38.316.3 77 42.617.3 81 55.417.5 82 55.717.9 80 58.318.0 80 51.518.0 80 51.020.6 87 77.0
Y
X
y
Positive deviation
Negative deviation
Y
X
y
x
Residual deviation
0
10
20
30
40
50
60
70
80
62.5 65 67.5 70 72.5 75 77.5 80 82.5 85 87.5
Volume of timber plotted against height of a tree
Model formulae aid communication
Linear Model
• categorical and/or continuous• as many x-variables as we like• interactions• does hypothesis testing (whether) and
estimation (what)• covers many existing tests with separate
names…
General
Example Traditional test GLM model formula
Comparison of YIELD between twoTREATments
Independent samples t-test
YIELD=TREAT
Comparison of YIELD between threeTREATments
One way analysis ofvariance
YIELD=TREAT
Comparison of YIELD between twoTREATments on matched pairs of SITEs
Paired samples t-test YIELD=SITE+TREAT
Comparison of YIELD between threeTREATments in a BLOCked experiment
One way blockedanalysis of variance
YIELD=BLOC+TREAT
Comparison of YIELD according toELEVation of field
Bivariate regression YIELD=ELEV
Comparison of YIELD between threeTREATments controlling for ELEVation
Analysis of covariance YIELD=ELEV+TREAT
Comparison of YIELD according toELEVation and mean TEMPerature
Multiple regression YIELD=ELEV+TEMP
Comparison of YIELD according tomanipulated PHOSPHate and NITRatelevels
Two way analysis ofvariance
YIELD=PHOSPH|NITR
Minitab Model Syntax
+ addition operator is (optionally) placed between terms in a list. e.g. A+B+C is factors A, B and C
* interaction operator placed between terms. e.g. A*B is the interaction of the factors A and B
( ) brackets indicate nesting. When B is nested within A, it is expressed as B(A). When C is nested within both A and B, it is expressed as C(A B).
| a model may be abbreviated using a | or ! to indicate factors and their interaction terms. e.g. A|B is equivalent to A+B+A*B.
- operator to exclude some of the higher level interactions.e.g. if you want A+B+C+A*B+A*C+B*C,
you could use A|B|C-A*B*C.
Examples of Model Specifications
• Two factors crossed: A B A*B or A|B
• Three factors crossed: A B C A*B A*C B*C A*B*C or A|B|C
• Three factors nested: A B(A) C(A B)
• Crossed and nested (B nested within A, and both crossed with C): A B(A) C A*C B*C(A)
Generality is good because
• one set of principles covers a diversity of tests
• the relationships between the tests are clear• you can learn about assumptions and model
criticism once, and it covers all the tests• you can construct tests that don’t have
separate names, now you can use GLMs
Four assumptions of GLM
If an assumption is not met, then all the results of the GLMare in question.
• Independence• Homogeneity of variance• Linearity/Additivity• Normality of Error
How can we tell if assumptions are met?
• Normality of error
• Homogeneity of variance
• Linearity/additivity
• Independence
• Histogram of residuals
• Fitted values vs residuals
• Fitted values vs residuals and continuous x-variable vs residuals
• No easy answer
These techniques are called model criticism