Download pdf - Anova 1way

Transcript
  • 8/7/2019 Anova 1way

    1/36

    One-way ANOVA(Independent Group and repeated Measures)

    BINF 5210 (Web-based)

    Spring 2011

    1

  • 8/7/2019 Anova 1way

    2/36

    Analysis of Variance (ANOVA)

    Used to compare means across groups

    To compare 3 or more means of repeatedmeasures

    In SAS there are 3 main procedures foranalysis of variance (PROC ANOVA, PROC GLMand PROC MIXED). We will cover the first two.

    (In this lecture we will cover ONE-WAY ANOVA only. We will cover Two-

    way ANOVA after Correlation and Regression)

    2

  • 8/7/2019 Anova 1way

    3/36

    One-way ANOVA

    It is an extension of t-test (independent

    group) but with more than 2 groups

    This is also known as between-subjects

    analysis of variance.

    Basically performs a variance comparison test

    3

  • 8/7/2019 Anova 1way

    4/36

    One-way ANOVA Assumptions

    Data in the sample groups are normallydistributed

    Variances for the groups are equal

    Samples are random and independent (nomatching observations in the groups)- mostcritical (must meet this criteria)

    For a large dataset (sample size), moderatedeparture from normality and for non-equalvariance across sample groups- ANOVA can stillbe used (robustness of ANOVA)

    4

  • 8/7/2019 Anova 1way

    5/36

    One-way ANOVA

    Lets say, we have k numbers of independent

    random samples that we want to compare

    Our hypotheses for comparison would be:

    Null hypothesis: H0 : There is no difference

    among the means of the groups. In other

    words, all means are equal (H0: 1= 2==k ).

    Alternative hypothesis: Ha: At least 2 means

    are not equal (Ha: i j for some i j)

    5

  • 8/7/2019 Anova 1way

    6/36

    One-way ANOVA

    Test for equality of means produces basically an Ftest statistics (with k-1 and N-k degrees offreedom, N is the total subjects)

    After checking F statistics, if a low p-value (forexample .05) for F statistics is seen, we reject thenull hypothesis.

    We can conclude from the F-test in this case (withlow p-value) that at least two means are not

    equal. Then perform multiple comparison test to

    identify the significant differences

    6

  • 8/7/2019 Anova 1way

    7/36

    Determining Significant Differences

    Once we know that at least 2 means of the independent samplesare not equal (from F test), we need to identify which meandifferences are significant.

    In order to determine this, there are a number of multiplecomparison test available in statistics.

    SAS supports a list of multiple comparison test such as Bonferroni t-test difference, Duncans multiple range test, Tukeys studentizedtest and so on (please check the SAS manual and the text book formore test names) (do not get scared of the names!!!)

    You can apply any of these test by specifying it in the options inANOVA procedure in SAS.

    Note: Just remember that you have to use multiple comparison forindependent samples in case the means of the samples are notequal.

    7

  • 8/7/2019 Anova 1way

    8/36

    SAS Procedure ANOVA

    SAS provides PROC ANOVA to compare more than2 groups of independent random samples (bycomparing means)

    PROC ANOVA takes the following form:

    PROC ANOVA ;

    CLASS VARIABLE_ NAME; /*(THIS IS INDEPENDENT VARIABLE);also group variable*/

    MODEL DEPENDENT variable= INDEPENDENT variable(s);

    MEANS INDEPENDENT variable(s)(same as classvariable)/type_of_multiple_comparison ;

    8

  • 8/7/2019 Anova 1way

    9/36

    PROC ANOVA

    Some of the in PROC ANOVA can bedata=, order= etc.

    Class variable is required for grouping

    MODEL statement is required to indicate dependent

    (quantitative, outcome variable) and independentvariables (grouping)

    In MEANS statement you can specify which multiplecomparison test to use for pair wise means comparison

    MEANS can be specifying a significancelevel (ALPHA= a given p-value). By default SAS uses .05.You can also request for confidence limit in the output(see SAS manual)

    9

  • 8/7/2019 Anova 1way

    10/36

    PROC ANOVA

    For example, you can specify Tukeys

    studentized range test to be used for multiple

    means comparison test and ALPHA =.05 for

    significance level in MEANS statement-

    MEANS INDEPENDENT/TUKEY ALPHA=.05;

    Please see the text book and SAS manual for a

    list of multiple comparison tests and options

    for MEANS statement available in SAS.

    10

  • 8/7/2019 Anova 1way

    11/36

    PROC ANOVA Example

    As part of the health status maintenanceprogram, 36 individuals were randomly put into 3groups. A health score was given to eachindividual based on the physical functioning

    ability of the individual. As a researcher, you wantto verify the health score given to 3 groups ofindividuals randomly selected and independentof each other. You want to see if there is any

    difference in the health scores among the groups.You have ID for each individual, correspondinggroup for that individual and a health score forthe individual.

    11

  • 8/7/2019 Anova 1way

    12/36

    PROC ANOVA Example

    At first we read data into SAS

    You could read it as an external

    file into SAS (if it was given asa text or dat file.)

    12

  • 8/7/2019 Anova 1way

    13/36

    PROC ANOVA Example

    Lets run PROC ANOVA procedure in SAS:

    ODS HTML;

    PROC ANOVA DATA=MYDATA;

    CLASS GROUP; /* group is independentvariable*/

    MODEL HEALTH_SCORE= GROUP;

    MEANS GROUP/TUKEY CLDIFF;

    RUN;

    dependent

    variable

    You are requesting SAS to

    use Tukeys multiple

    comparison test and

    confidence limit (by

    CLDIFF) to be added to

    the output

    13

  • 8/7/2019 Anova 1way

    14/36

    PROC ANOVA Example

    This table

    describes the

    analysis of

    variance for the

    whole model.

    Check the F-statistic for

    variances.

    Since the p-value for F is

    not low enough (less

    than .05), we can not

    reject the null hypothesisthat there is no mean

    differences among

    groups. We stop our

    analysis here. We do not

    need to go any further to

    verify multiple

    comparison test , Tukeys

    test in this case (next

    slide)

    Notice these two table reports the same because

    we have only one factor (group). Otherwise they

    would be different

    14

  • 8/7/2019 Anova 1way

    15/36

    PROC ANOVA Example

    Although SAS

    generated this output

    based on our request

    of Tukeys multiple

    comparison test in

    MEANS statement andreported 95%

    confidence limit for

    each difference. Since

    we can not reject null

    hypothesis, we do not

    report this

    Notice there is no

    significant meandifference since the

    last column is

    empty (it would

    show *** in the last

    column for

    corresponding row

    if there is any

    significant

    difference in means

    across groups.

    We do not

    report thistable since

    our first step

    for PROC

    ANOVA could

    not reject null

    hypothesis (so

    there is nodifference

    among

    means.)

    15

  • 8/7/2019 Anova 1way

    16/36

    PROC ANOVA Example

    Lets consider the same problem in theprevious example but for 15 individuals of 3randomly selected independent groups. Lets

    read the data in SAS.

    16

  • 8/7/2019 Anova 1way

    17/36

    PROC ANOVA Example

    Now we run the same exact PROC ANOVA

    with the same SAS code as before-ODS HTML;

    PROC ANOVA DATA=MYDATA;CLASS GROUP; /* group is independent variable*/

    MODEL HEALTH_SCORE= GROUP;

    MEANS GROUP/TUKEY CLDIFF;

    RUN;

    Notice the output (Next slide)

    17

  • 8/7/2019 Anova 1way

    18/36

    PROC ANOVA Example

    In this case we can reject the

    null hypothesis since

    variance analysis for the full

    model shows that p-value

    for f statistic is highly

    significant (0.0188).

    Therefore, we know that

    means across the groups are

    not equal. Now we gofurther to second step to

    verify the multiple

    comparison test to see

    which means are different.

    18

  • 8/7/2019 Anova 1way

    19/36

    PROC ANOVA Example

    We can see mean difference

    between group 1 and 2 is significant

    at .05 level (indicated by ***). You

    can set up the significance level by

    specifying ALPHA= options in MEANSstatement. The table indicates that

    mean HEALTH_SCORE for all the

    group except group 1 versus group 2

    are not different. In other words,

    mean health score for group 2 is

    significantly (high) different from

    mean health score for group 1.

    95% confidence limit indicates how

    small or large the difference

    between means could likely to be.

    For example, difference between

    group 2 and group 1 could likely to be

    as small as 0.731 and as high as

    9.269. If this is significant according

    to your experimental setting to, you

    can conclude that group 2s mean

    health score is higher than that of

    group 1.

    19

  • 8/7/2019 Anova 1way

    20/36

    Analysis of variance(Repeated measures of 3 or more)

    Measures (observations) repeated on the samesubjects or related subjects over time period or indifferent settings (before and after or observation onsame subjects for various different drug effects etc.)

    If we have 2 repeated measures, then paired t-test isappropriate for comparison of means.

    If we have 3 or more repeated measures, thenrepeated measures analysis of variance is appropriate.

    It is different from independent group one-way ANOVA(described earlier) because subjects for repeatedmeasures are not independent (same subjects orrelated for all measures)

    20

  • 8/7/2019 Anova 1way

    21/36

    Repeated Measures ANOVA-Assumptions

    Dependent variable is normally distributed

    Variances across the repeated measures are

    equal

    Moderately deviation from both normality of

    dependent variable and equality of variances

    would still allow this procedure to be robust

    Note: This procedure is also known as within-subject or treatment-by-subject or

    Single-factor design with repeated measure on same subjects.

    21

  • 8/7/2019 Anova 1way

    22/36

    Repeated Measures ANOVA

    Like the one-way independent ANOVA, this procedureis also a two step process:

    First step, analysis of variance to check if the meansacross repeated measures (time) are different.

    Second step, if the means are different in step one, thenmultiple comparison test is performed to verifydifferences.

    PROC ANOVA can not be used for this analysis becauseof the complexity (repeated measures not independentgroup), instead PROC GLM is appropriate for this typeof one way analysis.

    Note: GLM stands for General Linear Model

    22

  • 8/7/2019 Anova 1way

    23/36

    PROC GLM for One-way ANOVA

    PROC GLM has the same exact structure as one-way PROC ANOVA we discussed earlier. It just usethe word GLM instead of the word ANOVA

    PROC GLM ;

    CLASS VARIABLE_ NAME; /*(THIS IS INDEPENDENTVARIABLE); also group variable*/

    MODEL DEPENDENT variable= INDEPENDENTvariable(s);

    MEANS INDEPENDENT variable(s)(same as classvariable)/type_of_multiple_comparison;

    23

  • 8/7/2019 Anova 1way

    24/36

    Repeated Measure One-way ANOVA

    (Example)

    Lets say, as an experimenter you want to verifythe 4 treatment plans (treatment plans 1,2,3, and4) to lower blood sugar in Type 2 diabetes

    patients. You administered these 4 treatmentplans on 5 subjects (patients) in a randomizedorder and recorded the time (days) it took tolower the blood sugar on the subjects for each of

    the treatment plans. Your goal is to find outwhich treatment plan reduces the blood sugar inshortest time period (days).

    24

  • 8/7/2019 Anova 1way

    25/36

    Repeated Measure One-way ANOVA

    (Example)

    So you raw data looks like this:

    Subject_ID Plan1 Plan2 Plan3 Plan4

    1 21 45 35 34

    2 19 32 21 25

    3 18 35 29 31

    4 11 23 17 15

    5 15 32 28 27PROC GLM is appropriate for analysis of this

    dataset.

    25

  • 8/7/2019 Anova 1way

    26/36

    PROC GLM for One-way ANOVA

    (Example)

    First lets read the data in SAS:

    Notice how data is rearranged in SAS coding- we created one group variable for

    subjects and one group variable for treatment plans and recorded thecorresponding time accordingly in another variable

    Output dataset in SAS

    26

  • 8/7/2019 Anova 1way

    27/36

    PROC GLM for One-way ANOVA

    ODS HTML;

    ODS GRAPHICS ON; /* Turn on the graphics option TO PRODUCE ALL THE

    GRAPH AND PLOTS THAT SAS GENERATEs FOR PROC GLM*/

    TITLE " GLM PROCEDURE FOR ONE-WAY ANALYSIS OF VARIANCE";

    PROC GLM DATA=GLMdata;CLASS SUBJECT_ID PLAN;

    MODEL TIME_DAYS = SUBJECT_ID PLAN; /* TO VERIFY EFFECTS OF BOTH

    SUBJECTS AND PLAN TYPE*/

    MEANS PLAN/DUNCAN;

    RUN;

    ODS HTML CLOSE;

    ODS GRAPHICS OFF; /*Turn off the graphics option*/

    27

  • 8/7/2019 Anova 1way

    28/36

    PROC GLM Output

    1

    2

    3

    4

    28

    P-value is

    less than

    .05 for

    overall

    model of

    analysis of

    variance

    They

    are

    same in

    our case

    (simple

    case)

  • 8/7/2019 Anova 1way

    29/36

    PROC GLM Output

    5

    6

    29

  • 8/7/2019 Anova 1way

    30/36

    PROC GLM Output Note

    PROC GLM reports Type I SS and Type III SS (SSstands for sum of squares)

    In our example, as a simple case, they are

    essentially the same. In many cases withcomplicated settings, they will differ. It isadvisable to report the Type III SS in suchsituations.

    (Just keep this in mind for your analysis. Dont worry about details about

    that and please consult statistics book if you are interested to know moreabout this)

    30

  • 8/7/2019 Anova 1way

    31/36

    PROC GLM Output Interpretation

    In page 28, table 2 in figure 2 describes full model test(overall) to see if there is any statistically significantdifference across subjects or treatment plans. Since thep-value is

  • 8/7/2019 Anova 1way

    32/36

    PROC GLM Output Interpretation

    In Type III SS (sum of squares) table, the p-value for F-value in the row for Plan is

  • 8/7/2019 Anova 1way

    33/36

    PROC GLM Output Interpretation

    The multiple comparison test (figure 6) is the same as themultiple comparison test we discussed in the previousexample except we used DUNKANs multiple range testinstead of TUKEYs.

    We can see from this table (figure 6) that treatment plan 2(Duncan group A) takes a significantly different (higher)time (mean 33.4 days) to lower blood sugar than otherplans. There is no significant difference between plan 4 andplan 3 (they are in the same Duncan group B) and thetreatment plan 1 (in Duncan group C) takes significant

    lower time (the shortest time- lowest mean time 16.6 days)to lower blood sugar.

    Therefore, you can conclude that treatment plan 1 ispreferable to lower blood sugar.

    33

  • 8/7/2019 Anova 1way

    34/36

    PROC GLM Output Interpretation

    Figure 4 and 5 are the graphs in SAS produced forPROC GLM and displayed by the ODS graphicsoption. If you do not include this option, SAS willnot generate the graphs.

    Figure 5 is the graph of distribution of Time indays for Subjects (interesting to see but we arenot interested in this)

    Figure 6 is the distribution of Time in days for

    Treatment plans as a box plot (This is ourinterest) You can see treatment plan 1 has alower mean time.

    34

  • 8/7/2019 Anova 1way

    35/36

    Conclusion

    It is easy to determine when to use ANOVA

    and when to use GLM procedure for one-way

    analysis of variance.

    One way analysis of variance is very useful in

    comparison of data based on multiple means

    (more than 2 means).

    35

  • 8/7/2019 Anova 1way

    36/36

    Assignment

    When to use PROC ANOVA and when to use PROCGLM? Learn the differences and similarities betweenthem.

    Change the multiple comparison test by using other

    options for MEANS statements instead of Duncan andTukeys test (check the Little SAS Book Pages 228-231)and see the output.

    Read the Little SAS Book Pages 228-231 for more about

    output interpretation and options. Do not submit this assignment but practice on your

    own. You will be responsible for this in the final exam.

    36


Recommended