Transcript
• 8/7/2019 Anova 1way

1/36

One-way ANOVA(Independent Group and repeated Measures)

BINF 5210 (Web-based)

Spring 2011

1

• 8/7/2019 Anova 1way

2/36

Analysis of Variance (ANOVA)

Used to compare means across groups

To compare 3 or more means of repeatedmeasures

In SAS there are 3 main procedures foranalysis of variance (PROC ANOVA, PROC GLMand PROC MIXED). We will cover the first two.

(In this lecture we will cover ONE-WAY ANOVA only. We will cover Two-

way ANOVA after Correlation and Regression)

2

• 8/7/2019 Anova 1way

3/36

One-way ANOVA

It is an extension of t-test (independent

group) but with more than 2 groups

This is also known as between-subjects

analysis of variance.

Basically performs a variance comparison test

3

• 8/7/2019 Anova 1way

4/36

One-way ANOVA Assumptions

Data in the sample groups are normallydistributed

Variances for the groups are equal

Samples are random and independent (nomatching observations in the groups)- mostcritical (must meet this criteria)

For a large dataset (sample size), moderatedeparture from normality and for non-equalvariance across sample groups- ANOVA can stillbe used (robustness of ANOVA)

4

• 8/7/2019 Anova 1way

5/36

One-way ANOVA

Lets say, we have k numbers of independent

random samples that we want to compare

Our hypotheses for comparison would be:

Null hypothesis: H0 : There is no difference

among the means of the groups. In other

words, all means are equal (H0: 1= 2==k ).

Alternative hypothesis: Ha: At least 2 means

are not equal (Ha: i j for some i j)

5

• 8/7/2019 Anova 1way

6/36

One-way ANOVA

Test for equality of means produces basically an Ftest statistics (with k-1 and N-k degrees offreedom, N is the total subjects)

After checking F statistics, if a low p-value (forexample .05) for F statistics is seen, we reject thenull hypothesis.

We can conclude from the F-test in this case (withlow p-value) that at least two means are not

equal. Then perform multiple comparison test to

identify the significant differences

6

• 8/7/2019 Anova 1way

7/36

Determining Significant Differences

Once we know that at least 2 means of the independent samplesare not equal (from F test), we need to identify which meandifferences are significant.

In order to determine this, there are a number of multiplecomparison test available in statistics.

SAS supports a list of multiple comparison test such as Bonferroni t-test difference, Duncans multiple range test, Tukeys studentizedtest and so on (please check the SAS manual and the text book formore test names) (do not get scared of the names!!!)

You can apply any of these test by specifying it in the options inANOVA procedure in SAS.

Note: Just remember that you have to use multiple comparison forindependent samples in case the means of the samples are notequal.

7

• 8/7/2019 Anova 1way

8/36

SAS Procedure ANOVA

SAS provides PROC ANOVA to compare more than2 groups of independent random samples (bycomparing means)

PROC ANOVA takes the following form:

PROC ANOVA ;

CLASS VARIABLE_ NAME; /*(THIS IS INDEPENDENT VARIABLE);also group variable*/

MODEL DEPENDENT variable= INDEPENDENT variable(s);

MEANS INDEPENDENT variable(s)(same as classvariable)/type_of_multiple_comparison ;

8

• 8/7/2019 Anova 1way

9/36

PROC ANOVA

Some of the in PROC ANOVA can bedata=, order= etc.

Class variable is required for grouping

MODEL statement is required to indicate dependent

(quantitative, outcome variable) and independentvariables (grouping)

In MEANS statement you can specify which multiplecomparison test to use for pair wise means comparison

MEANS can be specifying a significancelevel (ALPHA= a given p-value). By default SAS uses .05.You can also request for confidence limit in the output(see SAS manual)

9

• 8/7/2019 Anova 1way

10/36

PROC ANOVA

For example, you can specify Tukeys

studentized range test to be used for multiple

means comparison test and ALPHA =.05 for

significance level in MEANS statement-

MEANS INDEPENDENT/TUKEY ALPHA=.05;

Please see the text book and SAS manual for a

list of multiple comparison tests and options

for MEANS statement available in SAS.

10

• 8/7/2019 Anova 1way

11/36

PROC ANOVA Example

As part of the health status maintenanceprogram, 36 individuals were randomly put into 3groups. A health score was given to eachindividual based on the physical functioning

ability of the individual. As a researcher, you wantto verify the health score given to 3 groups ofindividuals randomly selected and independentof each other. You want to see if there is any

difference in the health scores among the groups.You have ID for each individual, correspondinggroup for that individual and a health score forthe individual.

11

• 8/7/2019 Anova 1way

12/36

PROC ANOVA Example

At first we read data into SAS

You could read it as an external

file into SAS (if it was given asa text or dat file.)

12

• 8/7/2019 Anova 1way

13/36

PROC ANOVA Example

Lets run PROC ANOVA procedure in SAS:

ODS HTML;

PROC ANOVA DATA=MYDATA;

CLASS GROUP; /* group is independentvariable*/

MODEL HEALTH_SCORE= GROUP;

MEANS GROUP/TUKEY CLDIFF;

RUN;

dependent

variable

You are requesting SAS to

use Tukeys multiple

comparison test and

confidence limit (by

the output

13

• 8/7/2019 Anova 1way

14/36

PROC ANOVA Example

This table

describes the

analysis of

variance for the

whole model.

Check the F-statistic for

variances.

Since the p-value for F is

not low enough (less

than .05), we can not

reject the null hypothesisthat there is no mean

differences among

groups. We stop our

analysis here. We do not

need to go any further to

verify multiple

comparison test , Tukeys

test in this case (next

slide)

Notice these two table reports the same because

we have only one factor (group). Otherwise they

would be different

14

• 8/7/2019 Anova 1way

15/36

PROC ANOVA Example

Although SAS

generated this output

based on our request

of Tukeys multiple

comparison test in

MEANS statement andreported 95%

confidence limit for

each difference. Since

we can not reject null

hypothesis, we do not

report this

Notice there is no

significant meandifference since the

last column is

empty (it would

show *** in the last

column for

corresponding row

if there is any

significant

difference in means

across groups.

We do not

report thistable since

our first step

for PROC

ANOVA could

not reject null

hypothesis (so

there is nodifference

among

means.)

15

• 8/7/2019 Anova 1way

16/36

PROC ANOVA Example

Lets consider the same problem in theprevious example but for 15 individuals of 3randomly selected independent groups. Lets

16

• 8/7/2019 Anova 1way

17/36

PROC ANOVA Example

Now we run the same exact PROC ANOVA

with the same SAS code as before-ODS HTML;

PROC ANOVA DATA=MYDATA;CLASS GROUP; /* group is independent variable*/

MODEL HEALTH_SCORE= GROUP;

MEANS GROUP/TUKEY CLDIFF;

RUN;

Notice the output (Next slide)

17

• 8/7/2019 Anova 1way

18/36

PROC ANOVA Example

In this case we can reject the

null hypothesis since

variance analysis for the full

model shows that p-value

for f statistic is highly

significant (0.0188).

Therefore, we know that

means across the groups are

not equal. Now we gofurther to second step to

verify the multiple

comparison test to see

which means are different.

18

• 8/7/2019 Anova 1way

19/36

PROC ANOVA Example

We can see mean difference

between group 1 and 2 is significant

at .05 level (indicated by ***). You

can set up the significance level by

specifying ALPHA= options in MEANSstatement. The table indicates that

mean HEALTH_SCORE for all the

group except group 1 versus group 2

are not different. In other words,

mean health score for group 2 is

significantly (high) different from

mean health score for group 1.

95% confidence limit indicates how

small or large the difference

between means could likely to be.

For example, difference between

group 2 and group 1 could likely to be

as small as 0.731 and as high as

9.269. If this is significant according

to your experimental setting to, you

can conclude that group 2s mean

health score is higher than that of

group 1.

19

• 8/7/2019 Anova 1way

20/36

Analysis of variance(Repeated measures of 3 or more)

Measures (observations) repeated on the samesubjects or related subjects over time period or indifferent settings (before and after or observation onsame subjects for various different drug effects etc.)

If we have 2 repeated measures, then paired t-test isappropriate for comparison of means.

If we have 3 or more repeated measures, thenrepeated measures analysis of variance is appropriate.

It is different from independent group one-way ANOVA(described earlier) because subjects for repeatedmeasures are not independent (same subjects orrelated for all measures)

20

• 8/7/2019 Anova 1way

21/36

Repeated Measures ANOVA-Assumptions

Dependent variable is normally distributed

Variances across the repeated measures are

equal

Moderately deviation from both normality of

dependent variable and equality of variances

would still allow this procedure to be robust

Note: This procedure is also known as within-subject or treatment-by-subject or

Single-factor design with repeated measure on same subjects.

21

• 8/7/2019 Anova 1way

22/36

Repeated Measures ANOVA

Like the one-way independent ANOVA, this procedureis also a two step process:

First step, analysis of variance to check if the meansacross repeated measures (time) are different.

Second step, if the means are different in step one, thenmultiple comparison test is performed to verifydifferences.

PROC ANOVA can not be used for this analysis becauseof the complexity (repeated measures not independentgroup), instead PROC GLM is appropriate for this typeof one way analysis.

Note: GLM stands for General Linear Model

22

• 8/7/2019 Anova 1way

23/36

PROC GLM for One-way ANOVA

PROC GLM has the same exact structure as one-way PROC ANOVA we discussed earlier. It just usethe word GLM instead of the word ANOVA

PROC GLM ;

CLASS VARIABLE_ NAME; /*(THIS IS INDEPENDENTVARIABLE); also group variable*/

MODEL DEPENDENT variable= INDEPENDENTvariable(s);

MEANS INDEPENDENT variable(s)(same as classvariable)/type_of_multiple_comparison;

23

• 8/7/2019 Anova 1way

24/36

Repeated Measure One-way ANOVA

(Example)

Lets say, as an experimenter you want to verifythe 4 treatment plans (treatment plans 1,2,3, and4) to lower blood sugar in Type 2 diabetes

patients. You administered these 4 treatmentplans on 5 subjects (patients) in a randomizedorder and recorded the time (days) it took tolower the blood sugar on the subjects for each of

the treatment plans. Your goal is to find outwhich treatment plan reduces the blood sugar inshortest time period (days).

24

• 8/7/2019 Anova 1way

25/36

Repeated Measure One-way ANOVA

(Example)

So you raw data looks like this:

Subject_ID Plan1 Plan2 Plan3 Plan4

1 21 45 35 34

2 19 32 21 25

3 18 35 29 31

4 11 23 17 15

5 15 32 28 27PROC GLM is appropriate for analysis of this

dataset.

25

• 8/7/2019 Anova 1way

26/36

PROC GLM for One-way ANOVA

(Example)

First lets read the data in SAS:

Notice how data is rearranged in SAS coding- we created one group variable for

subjects and one group variable for treatment plans and recorded thecorresponding time accordingly in another variable

Output dataset in SAS

26

• 8/7/2019 Anova 1way

27/36

PROC GLM for One-way ANOVA

ODS HTML;

ODS GRAPHICS ON; /* Turn on the graphics option TO PRODUCE ALL THE

GRAPH AND PLOTS THAT SAS GENERATEs FOR PROC GLM*/

TITLE " GLM PROCEDURE FOR ONE-WAY ANALYSIS OF VARIANCE";

PROC GLM DATA=GLMdata;CLASS SUBJECT_ID PLAN;

MODEL TIME_DAYS = SUBJECT_ID PLAN; /* TO VERIFY EFFECTS OF BOTH

SUBJECTS AND PLAN TYPE*/

MEANS PLAN/DUNCAN;

RUN;

ODS HTML CLOSE;

ODS GRAPHICS OFF; /*Turn off the graphics option*/

27

• 8/7/2019 Anova 1way

28/36

PROC GLM Output

1

2

3

4

28

P-value is

less than

.05 for

overall

model of

analysis of

variance

They

are

same in

our case

(simple

case)

• 8/7/2019 Anova 1way

29/36

PROC GLM Output

5

6

29

• 8/7/2019 Anova 1way

30/36

PROC GLM Output Note

PROC GLM reports Type I SS and Type III SS (SSstands for sum of squares)

In our example, as a simple case, they are

essentially the same. In many cases withcomplicated settings, they will differ. It isadvisable to report the Type III SS in suchsituations.

30

• 8/7/2019 Anova 1way

31/36

PROC GLM Output Interpretation

In page 28, table 2 in figure 2 describes full model test(overall) to see if there is any statistically significantdifference across subjects or treatment plans. Since thep-value is

• 8/7/2019 Anova 1way

32/36

PROC GLM Output Interpretation

In Type III SS (sum of squares) table, the p-value for F-value in the row for Plan is

• 8/7/2019 Anova 1way

33/36

PROC GLM Output Interpretation

The multiple comparison test (figure 6) is the same as themultiple comparison test we discussed in the previousexample except we used DUNKANs multiple range testinstead of TUKEYs.

We can see from this table (figure 6) that treatment plan 2(Duncan group A) takes a significantly different (higher)time (mean 33.4 days) to lower blood sugar than otherplans. There is no significant difference between plan 4 andplan 3 (they are in the same Duncan group B) and thetreatment plan 1 (in Duncan group C) takes significant

lower time (the shortest time- lowest mean time 16.6 days)to lower blood sugar.

Therefore, you can conclude that treatment plan 1 ispreferable to lower blood sugar.

33

• 8/7/2019 Anova 1way

34/36

PROC GLM Output Interpretation

Figure 4 and 5 are the graphs in SAS produced forPROC GLM and displayed by the ODS graphicsoption. If you do not include this option, SAS willnot generate the graphs.

Figure 5 is the graph of distribution of Time indays for Subjects (interesting to see but we arenot interested in this)

Figure 6 is the distribution of Time in days for

Treatment plans as a box plot (This is ourinterest) You can see treatment plan 1 has alower mean time.

34

• 8/7/2019 Anova 1way

35/36

Conclusion

It is easy to determine when to use ANOVA

and when to use GLM procedure for one-way

analysis of variance.

One way analysis of variance is very useful in

comparison of data based on multiple means

(more than 2 means).

35

• 8/7/2019 Anova 1way

36/36

Assignment

When to use PROC ANOVA and when to use PROCGLM? Learn the differences and similarities betweenthem.

Change the multiple comparison test by using other

options for MEANS statements instead of Duncan andTukeys test (check the Little SAS Book Pages 228-231)and see the output.