8/7/2019 Anova 1way
1/36
One-way ANOVA(Independent Group and repeated Measures)
BINF 5210 (Web-based)
Spring 2011
1
8/7/2019 Anova 1way
2/36
Analysis of Variance (ANOVA)
Used to compare means across groups
To compare 3 or more means of repeatedmeasures
In SAS there are 3 main procedures foranalysis of variance (PROC ANOVA, PROC GLMand PROC MIXED). We will cover the first two.
(In this lecture we will cover ONE-WAY ANOVA only. We will cover Two-
way ANOVA after Correlation and Regression)
2
8/7/2019 Anova 1way
3/36
One-way ANOVA
It is an extension of t-test (independent
group) but with more than 2 groups
This is also known as between-subjects
analysis of variance.
Basically performs a variance comparison test
3
8/7/2019 Anova 1way
4/36
One-way ANOVA Assumptions
Data in the sample groups are normallydistributed
Variances for the groups are equal
Samples are random and independent (nomatching observations in the groups)- mostcritical (must meet this criteria)
For a large dataset (sample size), moderatedeparture from normality and for non-equalvariance across sample groups- ANOVA can stillbe used (robustness of ANOVA)
4
8/7/2019 Anova 1way
5/36
One-way ANOVA
Lets say, we have k numbers of independent
random samples that we want to compare
Our hypotheses for comparison would be:
Null hypothesis: H0 : There is no difference
among the means of the groups. In other
words, all means are equal (H0: 1= 2==k ).
Alternative hypothesis: Ha: At least 2 means
are not equal (Ha: i j for some i j)
5
8/7/2019 Anova 1way
6/36
One-way ANOVA
Test for equality of means produces basically an Ftest statistics (with k-1 and N-k degrees offreedom, N is the total subjects)
After checking F statistics, if a low p-value (forexample .05) for F statistics is seen, we reject thenull hypothesis.
We can conclude from the F-test in this case (withlow p-value) that at least two means are not
equal. Then perform multiple comparison test to
identify the significant differences
6
8/7/2019 Anova 1way
7/36
Determining Significant Differences
Once we know that at least 2 means of the independent samplesare not equal (from F test), we need to identify which meandifferences are significant.
In order to determine this, there are a number of multiplecomparison test available in statistics.
SAS supports a list of multiple comparison test such as Bonferroni t-test difference, Duncans multiple range test, Tukeys studentizedtest and so on (please check the SAS manual and the text book formore test names) (do not get scared of the names!!!)
You can apply any of these test by specifying it in the options inANOVA procedure in SAS.
Note: Just remember that you have to use multiple comparison forindependent samples in case the means of the samples are notequal.
7
8/7/2019 Anova 1way
8/36
SAS Procedure ANOVA
SAS provides PROC ANOVA to compare more than2 groups of independent random samples (bycomparing means)
PROC ANOVA takes the following form:
PROC ANOVA ;
CLASS VARIABLE_ NAME; /*(THIS IS INDEPENDENT VARIABLE);also group variable*/
MODEL DEPENDENT variable= INDEPENDENT variable(s);
MEANS INDEPENDENT variable(s)(same as classvariable)/type_of_multiple_comparison ;
8
8/7/2019 Anova 1way
9/36
PROC ANOVA
Some of the in PROC ANOVA can bedata=, order= etc.
Class variable is required for grouping
MODEL statement is required to indicate dependent
(quantitative, outcome variable) and independentvariables (grouping)
In MEANS statement you can specify which multiplecomparison test to use for pair wise means comparison
MEANS can be specifying a significancelevel (ALPHA= a given p-value). By default SAS uses .05.You can also request for confidence limit in the output(see SAS manual)
9
8/7/2019 Anova 1way
10/36
PROC ANOVA
For example, you can specify Tukeys
studentized range test to be used for multiple
means comparison test and ALPHA =.05 for
significance level in MEANS statement-
MEANS INDEPENDENT/TUKEY ALPHA=.05;
Please see the text book and SAS manual for a
list of multiple comparison tests and options
for MEANS statement available in SAS.
10
8/7/2019 Anova 1way
11/36
PROC ANOVA Example
As part of the health status maintenanceprogram, 36 individuals were randomly put into 3groups. A health score was given to eachindividual based on the physical functioning
ability of the individual. As a researcher, you wantto verify the health score given to 3 groups ofindividuals randomly selected and independentof each other. You want to see if there is any
difference in the health scores among the groups.You have ID for each individual, correspondinggroup for that individual and a health score forthe individual.
11
8/7/2019 Anova 1way
12/36
PROC ANOVA Example
At first we read data into SAS
You could read it as an external
file into SAS (if it was given asa text or dat file.)
12
8/7/2019 Anova 1way
13/36
PROC ANOVA Example
Lets run PROC ANOVA procedure in SAS:
ODS HTML;
PROC ANOVA DATA=MYDATA;
CLASS GROUP; /* group is independentvariable*/
MODEL HEALTH_SCORE= GROUP;
MEANS GROUP/TUKEY CLDIFF;
RUN;
dependent
variable
You are requesting SAS to
use Tukeys multiple
comparison test and
confidence limit (by
CLDIFF) to be added to
the output
13
8/7/2019 Anova 1way
14/36
PROC ANOVA Example
This table
describes the
analysis of
variance for the
whole model.
Check the F-statistic for
variances.
Since the p-value for F is
not low enough (less
than .05), we can not
reject the null hypothesisthat there is no mean
differences among
groups. We stop our
analysis here. We do not
need to go any further to
verify multiple
comparison test , Tukeys
test in this case (next
slide)
Notice these two table reports the same because
we have only one factor (group). Otherwise they
would be different
14
8/7/2019 Anova 1way
15/36
PROC ANOVA Example
Although SAS
generated this output
based on our request
of Tukeys multiple
comparison test in
MEANS statement andreported 95%
confidence limit for
each difference. Since
we can not reject null
hypothesis, we do not
report this
Notice there is no
significant meandifference since the
last column is
empty (it would
show *** in the last
column for
corresponding row
if there is any
significant
difference in means
across groups.
We do not
report thistable since
our first step
for PROC
ANOVA could
not reject null
hypothesis (so
there is nodifference
among
means.)
15
8/7/2019 Anova 1way
16/36
PROC ANOVA Example
Lets consider the same problem in theprevious example but for 15 individuals of 3randomly selected independent groups. Lets
read the data in SAS.
16
8/7/2019 Anova 1way
17/36
PROC ANOVA Example
Now we run the same exact PROC ANOVA
with the same SAS code as before-ODS HTML;
PROC ANOVA DATA=MYDATA;CLASS GROUP; /* group is independent variable*/
MODEL HEALTH_SCORE= GROUP;
MEANS GROUP/TUKEY CLDIFF;
RUN;
Notice the output (Next slide)
17
8/7/2019 Anova 1way
18/36
PROC ANOVA Example
In this case we can reject the
null hypothesis since
variance analysis for the full
model shows that p-value
for f statistic is highly
significant (0.0188).
Therefore, we know that
means across the groups are
not equal. Now we gofurther to second step to
verify the multiple
comparison test to see
which means are different.
18
8/7/2019 Anova 1way
19/36
PROC ANOVA Example
We can see mean difference
between group 1 and 2 is significant
at .05 level (indicated by ***). You
can set up the significance level by
specifying ALPHA= options in MEANSstatement. The table indicates that
mean HEALTH_SCORE for all the
group except group 1 versus group 2
are not different. In other words,
mean health score for group 2 is
significantly (high) different from
mean health score for group 1.
95% confidence limit indicates how
small or large the difference
between means could likely to be.
For example, difference between
group 2 and group 1 could likely to be
as small as 0.731 and as high as
9.269. If this is significant according
to your experimental setting to, you
can conclude that group 2s mean
health score is higher than that of
group 1.
19
8/7/2019 Anova 1way
20/36
Analysis of variance(Repeated measures of 3 or more)
Measures (observations) repeated on the samesubjects or related subjects over time period or indifferent settings (before and after or observation onsame subjects for various different drug effects etc.)
If we have 2 repeated measures, then paired t-test isappropriate for comparison of means.
If we have 3 or more repeated measures, thenrepeated measures analysis of variance is appropriate.
It is different from independent group one-way ANOVA(described earlier) because subjects for repeatedmeasures are not independent (same subjects orrelated for all measures)
20
8/7/2019 Anova 1way
21/36
Repeated Measures ANOVA-Assumptions
Dependent variable is normally distributed
Variances across the repeated measures are
equal
Moderately deviation from both normality of
dependent variable and equality of variances
would still allow this procedure to be robust
Note: This procedure is also known as within-subject or treatment-by-subject or
Single-factor design with repeated measure on same subjects.
21
8/7/2019 Anova 1way
22/36
Repeated Measures ANOVA
Like the one-way independent ANOVA, this procedureis also a two step process:
First step, analysis of variance to check if the meansacross repeated measures (time) are different.
Second step, if the means are different in step one, thenmultiple comparison test is performed to verifydifferences.
PROC ANOVA can not be used for this analysis becauseof the complexity (repeated measures not independentgroup), instead PROC GLM is appropriate for this typeof one way analysis.
Note: GLM stands for General Linear Model
22
8/7/2019 Anova 1way
23/36
PROC GLM for One-way ANOVA
PROC GLM has the same exact structure as one-way PROC ANOVA we discussed earlier. It just usethe word GLM instead of the word ANOVA
PROC GLM ;
CLASS VARIABLE_ NAME; /*(THIS IS INDEPENDENTVARIABLE); also group variable*/
MODEL DEPENDENT variable= INDEPENDENTvariable(s);
MEANS INDEPENDENT variable(s)(same as classvariable)/type_of_multiple_comparison;
23
8/7/2019 Anova 1way
24/36
Repeated Measure One-way ANOVA
(Example)
Lets say, as an experimenter you want to verifythe 4 treatment plans (treatment plans 1,2,3, and4) to lower blood sugar in Type 2 diabetes
patients. You administered these 4 treatmentplans on 5 subjects (patients) in a randomizedorder and recorded the time (days) it took tolower the blood sugar on the subjects for each of
the treatment plans. Your goal is to find outwhich treatment plan reduces the blood sugar inshortest time period (days).
24
8/7/2019 Anova 1way
25/36
Repeated Measure One-way ANOVA
(Example)
So you raw data looks like this:
Subject_ID Plan1 Plan2 Plan3 Plan4
1 21 45 35 34
2 19 32 21 25
3 18 35 29 31
4 11 23 17 15
5 15 32 28 27PROC GLM is appropriate for analysis of this
dataset.
25
8/7/2019 Anova 1way
26/36
PROC GLM for One-way ANOVA
(Example)
First lets read the data in SAS:
Notice how data is rearranged in SAS coding- we created one group variable for
subjects and one group variable for treatment plans and recorded thecorresponding time accordingly in another variable
Output dataset in SAS
26
8/7/2019 Anova 1way
27/36
PROC GLM for One-way ANOVA
ODS HTML;
ODS GRAPHICS ON; /* Turn on the graphics option TO PRODUCE ALL THE
GRAPH AND PLOTS THAT SAS GENERATEs FOR PROC GLM*/
TITLE " GLM PROCEDURE FOR ONE-WAY ANALYSIS OF VARIANCE";
PROC GLM DATA=GLMdata;CLASS SUBJECT_ID PLAN;
MODEL TIME_DAYS = SUBJECT_ID PLAN; /* TO VERIFY EFFECTS OF BOTH
SUBJECTS AND PLAN TYPE*/
MEANS PLAN/DUNCAN;
RUN;
ODS HTML CLOSE;
ODS GRAPHICS OFF; /*Turn off the graphics option*/
27
8/7/2019 Anova 1way
28/36
PROC GLM Output
1
2
3
4
28
P-value is
less than
.05 for
overall
model of
analysis of
variance
They
are
same in
our case
(simple
case)
8/7/2019 Anova 1way
29/36
PROC GLM Output
5
6
29
8/7/2019 Anova 1way
30/36
PROC GLM Output Note
PROC GLM reports Type I SS and Type III SS (SSstands for sum of squares)
In our example, as a simple case, they are
essentially the same. In many cases withcomplicated settings, they will differ. It isadvisable to report the Type III SS in suchsituations.
(Just keep this in mind for your analysis. Dont worry about details about
that and please consult statistics book if you are interested to know moreabout this)
30
8/7/2019 Anova 1way
31/36
PROC GLM Output Interpretation
In page 28, table 2 in figure 2 describes full model test(overall) to see if there is any statistically significantdifference across subjects or treatment plans. Since thep-value is
8/7/2019 Anova 1way
32/36
PROC GLM Output Interpretation
In Type III SS (sum of squares) table, the p-value for F-value in the row for Plan is
8/7/2019 Anova 1way
33/36
PROC GLM Output Interpretation
The multiple comparison test (figure 6) is the same as themultiple comparison test we discussed in the previousexample except we used DUNKANs multiple range testinstead of TUKEYs.
We can see from this table (figure 6) that treatment plan 2(Duncan group A) takes a significantly different (higher)time (mean 33.4 days) to lower blood sugar than otherplans. There is no significant difference between plan 4 andplan 3 (they are in the same Duncan group B) and thetreatment plan 1 (in Duncan group C) takes significant
lower time (the shortest time- lowest mean time 16.6 days)to lower blood sugar.
Therefore, you can conclude that treatment plan 1 ispreferable to lower blood sugar.
33
8/7/2019 Anova 1way
34/36
PROC GLM Output Interpretation
Figure 4 and 5 are the graphs in SAS produced forPROC GLM and displayed by the ODS graphicsoption. If you do not include this option, SAS willnot generate the graphs.
Figure 5 is the graph of distribution of Time indays for Subjects (interesting to see but we arenot interested in this)
Figure 6 is the distribution of Time in days for
Treatment plans as a box plot (This is ourinterest) You can see treatment plan 1 has alower mean time.
34
8/7/2019 Anova 1way
35/36
Conclusion
It is easy to determine when to use ANOVA
and when to use GLM procedure for one-way
analysis of variance.
One way analysis of variance is very useful in
comparison of data based on multiple means
(more than 2 means).
35
8/7/2019 Anova 1way
36/36
Assignment
When to use PROC ANOVA and when to use PROCGLM? Learn the differences and similarities betweenthem.
Change the multiple comparison test by using other
options for MEANS statements instead of Duncan andTukeys test (check the Little SAS Book Pages 228-231)and see the output.
Read the Little SAS Book Pages 228-231 for more about
output interpretation and options. Do not submit this assignment but practice on your
own. You will be responsible for this in the final exam.
36