1
Clinical Research Training Program 2021
ANOVA and ANCOVAANOVA and ANCOVA
Fall 2004
www.edc.gsph.pitt.edu/faculty/dodge/clres2021.html
2
ANOVA vs. REGRESSION
• ANOVA can be regarded as a special type of linear regressions.
• By using dummy coding, we can get coefficients which indicate difference in means in various groups.
3
REGRESSIONREGRESSION METHODSMETHODS
Regression
ANOVA
ANCOVA/Regression
4
Analysis of VarianceAnalysis of Variance
The null and alternative hypotheses are
H0:
H1: for some
kμμμ 21
ji μμ kji ,1,,
, where i represents the mean of population i
Hypotheses - Whether all group means are equal versus at least two group means are different
5
Assumptions - k independent random samples from k
normal populations with distributions N(
1, 2), …, N(
k, 2), respectively.
Outcome variable should be continuous. All the populations have the same unknown
variance 2 (homogeneous variance).
Analysis of VarianceAnalysis of Variance
6
1x 2x• •
• •1x 2x
• • • • • • •• • • • • • •
• • • • • • •• • • • • • •
• • • • • • •• • • • • • •
••••••••••••••
7
Analysis of VarianceAnalysis of Variance
The main idea for comparing means: what matters is not how far apart the sample means are but how far apart they are relative to the variability of individual observations
ANOVA compares the variation due to specific sources within the variation among individuals who should be similar. In particular, ANOVA tests whether several populations have the same mean by comparing how far apart the sample means are with how much variation there is within the sample
8
Analysis of VarianceAnalysis of Variance
Between Sum of Squares (SSB)
Within Sum of Squares (SSW)
Total Sum of Squares (SST)
SST = SSB + SSW
k
i
n
j
i
i
xx
1 1
2)(
k
i
n
j
ij
i
xx
1 1
2)(
k
i
n
j
iij
i
xx
1 1
2)(
9
Analysis of VarianceAnalysis of Variance
Hypotheses: H0:
H1: for some
Test statistic:
kμμμ 21
ji μμ kji ,1,,
knkF 1,~ )(
)(
)1(
)(
1 1
2
1 1
2
kn
xx
k
xx
k
i
n
jiij
k
i
n
ji
i
i
MSW
MSB
samples within sindividual amongVariation
means sample theamongVariation F
10
Analysis of VarianceAnalysis of Variance
Between Mean Squares (MSB)
Within Mean Squares (MSW)
kn
xxk
i
n
jiij
i
1 1
2)(
1
)(1 1
2
k
xxk
i
n
ji
i
knnnn 21
11
Analysis of VarianceAnalysis of VarianceSummary Table
Sourceof
variation
Sum of Squares(SS)
df MeanSquares
(MS)
FStatistic
P-value
Between
k
i
n
ji
i
xx1 1
2)(SSB 1k 1
SSBMSB
k MSE
MSBF ) calculated Pr( FF
Within(Errors)
k
i
n
jiij
i
xx1 1
2)(SSEkn kn
SSE
MSE
Total
k
i
n
jij
i
xx1 1
2)(SST1n
(ANOVA Table)
12
ANOVA vs. REGRESSION
SSR: Between Sum of Squares (SSB)
SSE: Within Sum of Squares (SSW)
SST: Total Sum of Squares (SST)
SST = SSB + SSW
k
i
n
jiij
i
xx1 1
2)(
k
i
n
ji
i
xx1 1
2)(
k
i
n
jij
i
xx1 1
2)(
13
ANOVA vs. REGRESSION
• By using a reference coding, we can get similar results to ANOVA with additional information.
• Additional information Difference in means between the
reference group and other groups Difference in means between each group
and overall mean (when each group has equal N).
14
REFERENCE CODING
• Each variable takes on only values of 1 and 0.
• 3 groups: groups 1, 2, and 3
need 2 dummy variables
X2 1 if group 2
0 Otherwise
X3 1 if group 3
0 Otherwise
15
REFERENCE CODING
For Group 1: X2=0, X3=0:
For Group 2: X2=1, X3=0:
For Group 3: X2=0, X3=1:
ˆˆˆ Y
33 ˆˆˆˆ Y22 ˆˆˆˆ Y
3322 ˆˆˆˆ XXY
16
REFERENCE CODING
33
22
1
ˆˆˆ
ˆˆˆ
ˆˆ
1333
1222
1
ˆˆˆˆˆ
ˆˆˆˆˆ
ˆˆ
17
REFERENCE CODING
• Intercept μ is the mean of group 1 (mean of reference group).
• α2 indicates difference in mean between group 1 (reference group) and group2
• α3 indicates difference in mean between group 1 (reference group) and group3
18
• . list y x x2 x3 x4 x5
• y x x2 x3 x4 x5 • 1. 5 1 0 0 0 0 • 2. 8 1 0 0 0 0 • 3. 7 1 0 0 0 0 • 4. 7 1 0 0 0 0 • 5. 10 1 0 0 0 0 • 6. 8 1 0 0 0 0 • 7. 4 2 1 0 0 0 • 8. 6 2 1 0 0 0 • 9. 6 2 1 0 0 0 • 10. 3 2 1 0 0 0 • 11. 5 2 1 0 0 0 • 12. 6 2 1 0 0 0 • 13. 6 3 0 1 0 0 • 14. 4 3 0 1 0 0 • 15. 4 3 0 1 0 0 • 16. 5 3 0 1 0 0 • 17. 4 3 0 1 0 0 • 18. 3 3 0 1 0 0 • 19. 7 4 0 0 1 0 • 20. 4 4 0 0 1 0 • 21. 6 4 0 0 1 0 • 22. 6 4 0 0 1 0 • 23. 3 4 0 0 1 0 • 24. 5 4 0 0 1 0 • 25. 9 5 0 0 0 1 • 26. 3 5 0 0 0 1 • 27. 5 5 0 0 0 1 • 28. 7 5 0 0 0 1 • 29. 7 5 0 0 0 1 • 30. 6 5 0 0 0 1
19
• sort x
• . by x:summarize y
• _______________________________________________________________________________• -> x = 1
• Variable | Obs Mean Std. Dev. Min Max• -------------+-----------------------------------------------------• y | 6 7.5 1.643168 5 10
• _______________________________________________________________________________• -> x = 2
• Variable | Obs Mean Std. Dev. Min Max• -------------+-----------------------------------------------------• y | 6 5 1.264911 3 6
• _______________________________________________________________________________• -> x = 3
• Variable | Obs Mean Std. Dev. Min Max• -------------+-----------------------------------------------------• y | 6 4.333333 1.032796 3 6
• _______________________________________________________________________________• -> x = 4
• Variable | Obs Mean Std. Dev. Min Max• -------------+-----------------------------------------------------• y | 6 5.166667 1.47196 3 7
• _______________________________________________________________________________• -> x = 5
• Variable | Obs Mean Std. Dev. Min Max• -------------+-----------------------------------------------------• y | 6 6.166667 2.041241 3 9
20
• regress y x2 x3 x4 x5
• Source | SS df MS Number of obs = 30• -------------+------------------------------ F( 4, 25) = 3.90• Model | 36.4666667 4 9.11666667 Prob > F = 0.0136• Residual | 58.50 25 2.34 R-squared = 0.3840• -------------+------------------------------ Adj R-squared = 0.2854• Total | 94.9666667 29 3.27471264 Root MSE = 1.5297
• ------------------------------------------------------------------------------• y | Coef. Std. Err. t P>|t| [95% Conf. Interval]• -------------+----------------------------------------------------------------• x2 | -2.5 .8831761 -2.83 0.009 -4.318935 -.6810648• x3 | -3.166667 .8831761 -3.59 0.001 -4.985602 -1.347731• x4 | -2.333333 .8831761 -2.64 0.014 -4.152269 -.5143981• x5 | -1.333333 .8831761 -1.51 0.144 -3.152269 .4856019• _cons | 7.5 .6244998 12.01 0.000 6.213819 8.786181• ------------------------------------------------------------------------------
MeanGroup1=7.5Group2=5 α2=5- 7.5= -2.5Group3=4.3 α3=4.3-7.5= -3.2Group4=5.2 α4=5.2-7.5= -2.3Group5=6.2 α5=6.2-7.5= -1.3
21
ANOVA vs. REGRESSION
SSR: Between Sum of Squares (SSB)
SSE: Within Sum of Squares (SSW)
SST: Total Sum of Squares (SST)
SST = SSB + SSW
k
i
n
jiij
i
xx1 1
2)(
k
i
n
ji
i
xx1 1
2)(
k
i
n
jij
i
xx1 1
2)(
22
ANOVA vs. REGRESSION
• anova y x
• Number of obs = 30 R-squared = 0.3840• Root MSE = 1.52971 Adj R-squared = 0.2854
• Source | Partial SS df MS F Prob > F• -----------+----------------------------------------------------• Model | 36.4666667 4 9.11666667 3.90 0.0136• |• x | 36.4666667 4 9.11666667 3.90 0.0136• |• Residual | 58.50 25 2.34 • -----------+----------------------------------------------------• Total | 94.9666667 29 3.27471264
23
EFFECT CODING
• 3 groups: groups 1, 2, and 3
need 2 dummy variables
X2 1 if group 2
0 if group 3
-1 if group 1
X3 1 if group 3
0 if group 2
-1 if group 1
24
EFFECT CODING
For Group 1: X2=-1, X3=-1:
For Group 2: X2=1, X3=0:
For Group 3: X2=0, X3=1:
32 ˆˆˆˆˆ Y
33 ˆˆˆˆ Y22 ˆˆˆˆ Y
3322 ˆˆˆˆ XXY
25
EFFECT CODING
3
ˆˆˆˆˆ
3
ˆˆˆˆˆ
3
ˆˆˆˆ
32133
32122
321
33
22
32
ˆˆˆ
ˆˆˆ
ˆˆˆˆ
26
EFFECT CODING
• Intercept μ is the unweighted average of the K group means. In the example here, K=3. If all groups have equal sample size, this is a grand mean.
• α2 indicates difference between mean of group 2 and unweighted average of K group mean.
• α3 indicates difference between mean of group 3 and unweighted average of K group mean.
27
• list y x x2 x3 x4 x5
• y x x2 x3 x4 x5 • 1. 5 1 -1 -1 -1 -1 • 2. 8 1 -1 -1 -1 -1 • 3. 7 1 -1 -1 -1 -1 • 4. 7 1 -1 -1 -1 -1 • 5. 10 1 -1 -1 -1 -1 • 6. 8 1 -1 -1 -1 -1 • 7. 4 2 1 0 0 0 • 8. 6 2 1 0 0 0 • 9. 6 2 1 0 0 0 • 10. 3 2 1 0 0 0 • 11. 5 2 1 0 0 0 • 12. 6 2 1 0 0 0 • 13. 6 3 0 1 0 0 • 14. 4 3 0 1 0 0 • 15. 4 3 0 1 0 0 • 16. 5 3 0 1 0 0 • 17. 4 3 0 1 0 0 • 18. 3 3 0 1 0 0 • 19. 7 4 0 0 1 0 • 20. 4 4 0 0 1 0 • 21. 6 4 0 0 1 0 • 22. 6 4 0 0 1 0 • 23. 3 4 0 0 1 0 • 24. 5 4 0 0 1 0 • 25. 9 5 0 0 0 1 • 26. 3 5 0 0 0 1 • 27. 5 5 0 0 0 1 • 28. 7 5 0 0 0 1 • 29. 7 5 0 0 0 1 • 30. 6 5 0 0 0 1
• .
28
• regress y x2 x3 x4 x5
• Source | SS df MS Number of obs = 30• -------------+------------------------------ F( 4, 25) = 3.90• Model | 36.4666667 4 9.11666667 Prob > F = 0.0136• Residual | 58.50 25 2.34 R-squared = 0.3840• -------------+------------------------------ Adj R-squared = 0.2854• Total | 94.9666667 29 3.27471264 Root MSE = 1.5297
• ------------------------------------------------------------------------------• y | Coef. Std. Err. t P>|t| [95% Conf. Interval]• -------------+----------------------------------------------------------------• x2 | -.6333333 .5585696 -1.13 0.268 -1.783729 .5170623• x3 | -1.3 .5585696 -2.33 0.028 -2.450396 -.1496044• x4 | -.4666667 .5585696 -0.84 0.411 -1.617062 .683729• x5 | .5333333 .5585696 0.95 0.349 -.6170623 1.683729• _cons | 5.633333 .2792848 20.17 0.000 5.058136 6.208531• ------------------------------------------------------------------------------
• MeanGroup1=7.5 intercept=(7.5+5+4.3+5.2+6.2)/5=5.6Group2=5 α2=5- 5.6= -0.6Group3=4.3 α3=4.3- 5.6= -1.3Group4=5.2 α4=5.2- 5.6= -0.4Group5=6.2 α5=6.2- 5.6= 0.6
29
Analysis of Covariance (ANACOVA)
Why we need to consider control variables?
• Need to produce accurate estimates of coefficients. interaction confounding increase precision
30
Analysis of Covariance (ANACOVA)
A Question to be answered by using ANACOVA
• If each control variables have the same distributions between group A and B, what would be the mean response value for group A and B?
31
Analysis of Covariance(ANACOVA)
• Outcome---continuous
• Covariates
– Nominal (study factors of interests)
– Control variables involve any level of measurements
32
Analysis of Covariance(ANACOVA)
• Most importantly….. This method is applicable only when
there is no interaction effect between the variable of interest with covariates.
Y=β0+ β1X+ β2Z+ β3XZ+Ε
See first whether H0: β3=0 is supported.
33
Analysis of Covariance(ANACOVA)
• Blood pressure data example (X=age, Z=sex)
14.46
89.140)14.46(96.078.96)(ˆ
40.154)14.46(96.029.110)(ˆ
96.078.96ˆ:)1(
96.029.110ˆ:)0(
51.1396.029.110ˆ
X
adjY
adjY
XYZFemale
XYZMale
ZXY
F
M
F
M
34
Analysis of Covariance(ANACOVA)
sex Unadjusted mean BP
Adjusted mean BP
Male 155.15 154.40
Female 139.86 140.89
Using the adjusted mean scores removes the influence of age on the comparison of mean blood pressures by considering what the mean BP in the two groups would be if both groups had the same mean age.