Upload
joleen-johns
View
246
Download
1
Tags:
Embed Size (px)
Citation preview
Orthogonal Linear Contrasts
A technique for partitioning ANOVA sum of squares into individual
degrees of freedom
Definition
Let 1, 2, ... , p denote p means and c1, c2, ... , cp denote p coefficients such that:
c1+ c2 + ... + cp = 0,
Then the linear combination
L = c11+ c22+ ... + cpp
is called a linear contrast of the p means
1, 2, ... , p .
DefinitionLet
A = a11+ a22+ ... + app and
B = b11+ b22+ ... + bpp
be two linear contrasts of the p means 1, 2, ... , p. Then A and B are called Orthogonal Linear Contrasts if in addition to:
a1+ a2+ ... + ap = 0 and
b1+ b2+ ... + bp = 0,it is also true that:
a1b1+ a2b2+ ... + apbp = 0.
Definition
Let A = a11+ a22+ ... + app,
B= b11+ b22+ ... + bpp ,
..., and
L= l11+ l22+ ... + lpp
be a set linear contrasts of the p means 1, 2, ... , p.
Then the set is called a set of Mutually Orthogonal Linear Contrasts if each linear contrast in the set is orthogonal to any other linear contrast.
Theorem:
The maximum number of linear contrasts in a set of Mutually Orthogonal Linear Contrasts of the quantities 1, 2, ... , p is p - 1.
p - 1 is called the degrees of freedom (d.f.) for comparing quantities 1, 2, ... , p .
Comments
1. Linear contrasts are making comparisons amongst the p values 1, 2, ... , p
2. Orthogonal Linear Contrasts are making independent comparisons amongst the p values 1, 2, ... , p .
3. The number of independent comparisons amongst the p values 1, 2, ... , p is p – 1.
Definition
Let
denote a linear contrast of the p means
Let
where each mean, , is calculated from n observations.
ppaaaL 2211
ix
p ,,2,1
pp xaxaxaL 2211ˆ
Then the Sum of Squares for testing
the Linear Contrast L,
i.e.
H0: L = 0 against HA: L 0
is defined to be:
222
21
2ˆ =
pL +...+a+aa
Ln SS
the degrees of freedom (df) for testing the Linear Contrast L, is defined to be
the F-ratio for testing the Linear Contrast L, is defined to be:
1Ldf
2
1 =
1
s
SS
MS
SSF L
Error
L
To test if a set of mutually orthogonal
linear contrasts are zero: i.e.
H0: L1 = 0, L2 = 0, ... , Lk= 0
then the Sum of Squares is:
kL LLH + SS + ... + SS = SSSS210
kdfH 0
200 =
s
kSS
MS
kSSF H
Error
H
the degrees of freedom (df) is
the F-ratio is:
Theorem:
Let L1, L2, ... , Lp-1 denote p-1 mutually orthogonal Linear contrasts for comparing the p means . Then the Sum of Squares for comparing the p means based on p – 1 degrees of freedom , SSBetween, satisfies:
p
iiBetween xx = nSS
1
2
121 p-L LL + SS + ... + SS= SS
Comment
Defining a set of Orthogonal Linear Contrasts for comparing the p means
allows the researcher to "break apart" the Sum of Squares for comparing the p means, SSBetween, and make individual tests of each the Linear Contrast.
pxxx ,,2,1
Helmert contrastsContrast coefficients
L1 -1 1 0 0 0
L2 -1 -1 2 0 0
L3 -1 -1 -1 3 0
L4 -1 -1 -1 -1 4
Contrast explanation
L1 2nd versus 1st
L2 3rd versus 1st and 2nd
L3 4th versus 1st, 2nd and 3rd
L4 5th versus 1st, 2nd, 3rd and 4th
The Diet-Weight Gain example
The sum of Squares for comparing the 6 means is given in the Anova Table:
,5.999.850.100 3 2 1 xxx ,,
7.789.832.79 6 5 4 xxx ,,
Five mutually orthogonal contrasts are given below (together with a description of the purpose of these contrasts) :
6543211 3
1
3
1xxxxxxL
(A comparison of the High protein diets with Low protein diets)
63412 2
1
2
1xxxxL
(A comparison of the Beef source of protein with the Pork source of protein)
(A comparison of the Meat (Beef - Pork) source of protein with the Cereal source of protein)
(A comparison representing interaction between Level of protein and Source of protein for the Meat source of Protein)
(A comparison representing interaction between Level of protein with the Cereal source of Protein)
5264313 2
1
4
1xxxxxxL
43614 2
1
2
1xxxxL
2645315 24
12
4
1xxxxxxL
Table of Coefficients
Contrast
diet
1 2 3 4 5 6
L1 1 1 1 -1 -1 -1
L2 1 0 -1 1 0 -1
L3 -1 2 -1 -1 2 -1
L4 1 0 -1 -1 0 1
L5 -1 2 -1 1 -2 1Note: L4 = L1 × L2 and L5 = L1 × L3
L1 is the 1 df for the Level main effect
L2 and L3 are the 2 df for the Source main effect
L4 and L5 are the 2 df for the Source-Level interaction
The Anova Table for Testing these contrasts is given below:
Source: DF: Sum Squares: Mean Square: F-test:
Contrast L1 1 3168.267 3168.267 14.767
Contrast L2 1 2.500 2.500 0.012
Contrast L3 1 264.033 264.033 1.231
Contrast L4 1 0.000 0.000 0.000
Contrast L5 1 1178.133 1178.133 5.491
Error 54 11586.000 214.556
The Mutually Orthogonal contrasts that are eventually selected should be determine prior to observing the data and should be determined by the objectives of the experiment
Another Five mutually orthogonal contrasts are given below (together with a description of the purpose of these contrasts) :
63411 2
1
2
1xxxxL
(A comparison of the Beef source of protein with the Pork source of protein)
(A comparison of the Meat (Beef - Pork) source of protein with the Cereal source of protein)
5264312 2
1
4
1xxxxxxL
(A comparison of the high and low protein diets for the Beef source of protein)
(A comparison of the high and low protein diets for the Cereal source of protein)
(A comparison of the high and low protein diets for the Pork source of protein)
413 xxL
524 xxL
635 xxL
Table of Coefficients
Contrast
diet
1 2 3 4 5 6
L1 1 0 -1 1 0 -1
L2 1 -2 1 1 -2 1
L3 1 0 0 -1 0 0
L4 0 1 0 0 -1 0
L5 0 0 1 0 0 -1Note:
L1 and L2 are the 2 df for the Source main effect
L3 ,L4 and L5 are the 3 df comparing the Level within the Source.
The Anova Table for Testing these contrasts is given below:
Source: DF: Sum Squares: Mean Square: F-test:
Beef vs Pork ( L1) 1 2.500 2.500 0.012
Meat vs Cereal ( L2) 1 264.033 264.033 1.231
High vs Low for Beef ( L3) 1 2163.200 2163.200 10.082
High vs Low for Cereal ( L4) 1 20.000 20.000 0.093
High vs Low for Pork ( L5) 1 2163.200 2163.200 10.082
Error 54 11586.000 214.556
Orthogonal Linear Contrasts
Polynomial Regression
Let 1, 2, ... , p denote p means and consider the first differences
i = i - i-1
if 1 = 2 = ... = p then
i = i - i-1 = 0
If the points (1, 1), (2, 2) … (p, p) lie on a straight line with non-zero slope then
i = i - i-1 0
but equal.
Consider the 2nd differences
2i = (i - i-1)-(i -1 - i-2) = i - 2i-1 + i-2
If the points (1, 1), (2, 2) … (p, p) lie on a straight line then
2i = i - 2i-1 + i-2 = 0
If the points (1, 1), (2, 2) … (p, p) lie on a quadratic curve then
2i = i - 2i-1 + i-2 0
but equal.
Consider the 3rd differences
3i = i - 3i-1 + 3i-2 - i-3
If the points (1, 1), (2, 2) … (p, p) lie on a quadratic curve then
3i = i - 3i-1 + 3i-2 - i-3 = 0
If the points (1, 1), (2, 2) … (p, p) lie on a cubic curve then
3i = i - 3i-1 + 3i-2 - i-3 0
but equal.
Continuing,
4th differences, 4i will be non- zero but equal if the points (1, 1), (2, 2) … (p, p) lie on a quartic curve (4th degree).
5th differences, 5i will be non- zero but equal if the points (1, 1), (2, 2) … (p, p) lie on a quintic curve (5th degree).
etc.
Let
L = a2 2 + a3 3 + … + ap p
Q2 = b3 23 + … + bp 2p
C = c4 34 + … + cp 3p
Q4 = d5 45+ … + dp 4p
etc.
Where a2, …, ap, b1, …, bp, c1, … etc are chosen so that L, Q2, C, Q4, … etc are mutually orthogonal contrasts.
If the means are equal then
L = Q2 = C = Q4 = … = 0.
If the means are linear then
L 0 but Q2 = C = Q4 = … = 0.
If the means are quadratic then
Q2 0 but C = Q4, … = 0.
If the means are cubic then
C 0 but Q4 = … = 0.
Orthogonal Linear Contrasts for Polynomial Regression
k P o l y n o m i a l 1 2 3 4 5 6 7 8 9 1 0 a
i2
3 L i n e a r - 1 0 1 2 Q u a d r a t i c 1 - 2 1 6 4 L i n e a r - 3 - 1 1 3 2 0 Q u a d r a t i c 1 - 1 - 1 1 4 C u b i c - 1 3 - 3 1 2 0 5 L i n e a r - 2 - 1 0 1 2 1 0 Q u a d r a t i c 2 - 1 - 2 - 1 2 1 4 C u b i c - 1 2 0 - 2 1 1 0 Q u a r t i c 1 - 4 6 - 4 1 7 0 6 L i n e a r - 5 - 3 - 1 1 3 5 7 0 Q u a d r a t i c 5 - 1 - 4 - 4 - 1 5 8 4 C u b i c - 5 7 4 - 4 - 7 5 1 8 0 Q u a r t i c 1 - 3 2 2 - 3 1 2 8 7 L i n e a r - 3 - 2 - 1 0 1 2 3 2 8 Q u a d r a t i c 5 0 - 3 - 4 - 3 0 5 8 4 C u b i c - 1 1 1 0 - 1 - 1 1 6 Q u a r t i c 3 - 7 1 6 1 - 7 3 1 5 4
Orthogonal Linear Contrasts for Polynomial Regression
k P o l y n o m i a l 1 2 3 4 5 6 7 8 9 1 0 a
i2
8 L i n e a r - 7 - 5 - 3 - 1 1 3 5 7 1 6 8 Q u a d r a t i c 7 1 - 3 - 5 - 5 - 3 1 7 1 6 8 C u b i c - 7 5 7 3 - 3 - 7 - 5 7 2 6 4 Q u a r t i c 7 - 1 3 - 3 9 9 - 3 - 1 3 7 6 1 6 Q u i n t i c - 7 2 3 - 1 7 - 1 5 1 5 1 7 - 2 3 7 2 1 8 4 9 L i n e a r - 4 - 3 - 2 - 1 0 1 2 3 4 2 0 Q u a d r a t i c 2 8 7 - 8 - 1 7 - 2 0 - 1 7 - 8 7 2 8 2 7 7 2 C u b i c - 1 4 7 1 3 9 0 - 9 - 1 3 - 7 1 4 9 9 0 Q u a r t i c 1 4 - 2 1 - 1 1 9 1 8 9 - 1 1 - 2 1 1 4 2 0 0 2 Q u i n t i c - 4 1 1 - 4 - 9 0 9 4 - 1 1 4 4 6 8 1 0 L i n e a r - 9 - 7 - 5 - 3 - 1 1 3 5 7 9 3 3 0 Q u a d r a t i c 6 2 - 1 - 3 - 4 - 4 - 3 - 1 2 6 1 3 2 C u b i c - 4 2 1 4 3 5 3 1 1 2 - 1 2 - 3 1 - 3 5 - 1 4 4 2 8 5 8 0 Q u a r t i c 1 8 - 2 2 - 1 7 3 1 8 1 8 3 - 1 7 - 2 2 1 8 2 8 6 0 Q u i n t i c - 6 1 4 - 1 - 1 1 - 6 6 1 1 1 - 1 4 6 7 8 0
Example
Table Activation
Temperature 0 25 50 75 100 53 60 67 65 58 50 62 70 68 62 47 58 73 62 60 T.. Ti. 150 180 210 195 180 915 Mean 50 60 70 65 60
yij2 = 56545 Ti.2/n = 56475 T..2/nt = 55815
In this example we are measuring the “Life” of an electronic component and how it depends on the temperature on activation
The Anova Table
Source SS df MS FTreat 660 4 165.0 23.57
Linear 187.50 1 187.50 26.79Quadratic 433.93 1 433.93 61.99Cubic 0.00 1 0.00 0.00Quartic 38.57 1 38.57 5.51
Error 70 10 7.00Total 730 14
L = 25.00 Q2 = -45.00 C = 0.00 Q4 = 30.00
The Anova Tables for Determining degree of polynomial
Testing for effect of the factor
Source SS df MS F Treat 660 4 165 23.57 Error 70 10 7 Total 730 14
Testing for departure from Linear
S o u r c e S S d f M S F
L i n e a r 1 8 7 . 5 0 1 . 0 0 1 8 7 . 5 0 2 6 . 7 9 D e p a r t u r e f r o m L i n e a r 4 7 2 . 5 0 3 . 0 0 1 5 7 . 5 0 2 2 . 5 0 E r r o r 7 0 . 0 0 1 0 . 0 0 7 . 0 0
Q2 + C + Q4
Testing for departure from Quadratic
S o u r c e S S d f M S F
L i n e a r 1 8 7 . 5 0 1 . 0 0 1 8 7 . 5 0 2 6 . 7 9 Q u a d r a t i c 4 3 3 . 9 3 1 . 0 0 4 3 3 . 9 3 6 1 . 9 9 D e p a r t u r e f r o m Q u a d r a t i c 3 8 . 5 7 2 . 0 0 1 9 . 2 9 2 . 7 6 E r r o r 7 0 . 0 0 1 0 . 0 0 7 . 0 0
C + Q4
y = 49.751 + 0.61429 x -0.0051429 x^2
40
45
50
55
60
65
70
0 20 40 60 80 100 120
Act. Temp
Lif
e
The Analysis of Covariance
ANACOVA
Multiple Regression1. Dependent variable Y (continuous)
2. Continuous independent variables X1, X2, …, Xp
The continuous independent variables X1, X2, …, Xp are quite often measured and observed (not set at specific values or levels)
Analysis of Variance1. Dependent variable Y (continuous)
2. Categorical independent variables (Factors) A, B, C,…
The categorical independent variables A, B, C,… are set at specific values or levels.
Analysis of Covariance1. Dependent variable Y (continuous)
2. Categorical independent variables (Factors) A, B, C,…
3. Continuous independent variables (covariates) X1, X2, …, Xp
Example1. Dependent variable Y – weight gain
2. Categorical independent variables (Factors) i. A = level of protein in the diet (High, Low)
ii. B = source of protein (Beef, Cereal, Pork)
3. Continuous independent variables (covariates)
i. X1= initial wt. of animal.
Statistical Technique
Independent variables
continuous categorical
Multiple Regression ×
ANOVA ×
ANACOVA × ×
Dependent variable is continuous
It is possible to treat categorical independent variables in Multiple Regression using Dummy variables.
The Multiple Regression Model
0 1 1 p pY X X
The ANOVA Model
Main Effects Interactions
i j ijY
The ANACOVA Model
Main Effects Interactions
i j ijY
1 1 1 1Covariate Effects
X X
ANOVA Tables
The Multiple Regression Model
Source S.S. d.f.
Regression SSReg p
Error SSError n – p - 1
Total SSTotal n - 1
The ANOVA ModelSource S.S. d.f.
Main Effects
A SSA a - 1
B SSB b - 1
Interactions
AB SSAB (a – 1)(b – 1)
Error SSError n – p - 1
Total SSTotal n - 1
The ANACOVA ModelSource S.S. d.f.
Covariates SSCovaraites p
Main Effects
A SSA a - 1
B SSB b - 1
Interactions
AB SSAB (a – 1)(b – 1)
Error SSError n – p - 1
Total SSTotal n - 1
Example1. Dependent variable Y – weight gain
2. Categorical independent variables (Factors) i. A = level of protein in the diet (High, Low)
ii. B = source of protein (Beef, Cereal, Pork)
3. Continuous independent variables (covariates)
X = initial wt. of animal.
The data
wtgn initial wt Level Source wtgn initial wt Level Source
112 1031 High Beef 56 1044 Low Beef126 1087 High Beef 86 1025 Low Beef88 890 High Beef 78 878 Low Beef97 1089 High Beef 69 1193 Low Beef91 894 High Beef 76 1024 Low Beef78 917 High Beef 65 1078 Low Beef86 972 High Beef 60 965 Low Beef83 899 High Beef 80 958 Low Beef
108 821 High Beef 78 1135 Low Beef104 846 High Beef 41 847 Low Beef42 1041 High Cereal 68 986 Low Cereal93 1108 High Cereal 67 1003 Low Cereal
102 1132 High Cereal 71 968 Low Cereal77 1023 High Cereal 76 1035 Low Cereal85 1090 High Cereal 85 1018 Low Cereal88 921 High Cereal 37 882 Low Cereal82 909 High Cereal 119 1053 Low Cereal41 1091 High Cereal 91 978 Low Cereal63 838 High Cereal 51 1057 Low Cereal88 935 High Cereal 57 1035 Low Cereal
104 1098 High Pork 96 965 Low Pork114 888 High Pork 67 1025 Low Pork78 1000 High Pork 85 970 Low Pork
111 993 High Pork 17 836 Low Pork109 1043 High Pork 67 961 Low Pork115 992 High Pork 54 931 Low Pork47 834 High Pork 105 1017 Low Pork
124 1005 High Pork 64 845 Low Pork80 905 High Pork 92 1092 Low Pork97 1059 High Pork 62 932 Low Pork
The ANOVA Table
Source Sum of Squares df Mean Square F Sig.
Initial (Covariate) 3357.8165 1 3357.82 9.075 0.00397LEVEL 6523.4815 1 6523.48 17.631 0.0001SOURCE 2013.6469 2 1006.82 2.721 0.07499LEVEL * SOURCE 2528.0163 2 1264.01 3.416 0.04022Error 19609.4835 53 369.99
Total 31966.8500 59
Using SPSS to perform ANACOVA
The data file
Select AnalyzeGeneral Linear Model Univariate
Choose the Dependent Variable, the Fixed Factor(s) and the Covaraites
The following ANOVA table appears
Tests of Between-Subjects Effects
Dependent Variable: WTGN
12357.366a 6 2059.561 5.567 .000
24.883 1 24.883 .067 .796
3357.816 1 3357.816 9.075 .004
6523.482 1 6523.482 17.631 .000
2013.647 2 1006.823 2.721 .075
2528.016 2 1264.008 3.416 .040
19609.484 53 369.990
421265.0 60
31966.850 59
SourceCorrected Model
Intercept
INITIAL
LEVEL
SOURCE
LEVEL * SOURCE
Error
Total
Corrected Total
Type IIISum ofSquares df
MeanSquare F Sig.
R Squared = .387 (Adjusted R Squared = .317)a.
40
60
80
100
120
140
700 800 900 1000 1100 1200 1300 1400
Covariate
Dep
end
ent
vari
able
The Process of Analysis of Covariance
Covariate
Ad
just
ed D
epen
den
t va
riab
leThe Process of Analysis of Covariance
40
60
80
100
120
140
700 800 900 1000 1100 1200 1300 1400
• The dependent variable (Y) is adjusted so that the covariate takes on its average value for each case
• The effect of the factors ( A, B, etc) are determined using the adjusted value of the dependent variable.
• ANOVA and ANACOVA can be handled by Multiple Regression Package by the use of Dummy variables to handle the categorical independent variables.
• The results would be the same.