Topic Anova

Embed Size (px)

Citation preview

  • 7/29/2019 Topic Anova

    1/18

    MTE3105 Statistics

    Topic 5Analysis of variance (ANOVA)

    1.1 Synopsis

    In this course, students will revisit the concepts of probability and explore inferential statistics

    such as analysis variance (ANOVA) in hypothesis testing.The important of using the appropriate

    statistical methods in solving real life problems is emphasized.

    1.2 Learning Outcomes

    1. Understand the theoretical and empirical (concept) of ANOVA2. Use inferential statistics such as ANOVA in hypothesis testing

    3. Calculating ANOVA by hand

    4. Calculating ANOVA using EXCEL

    1.3 Conceptual Framework

    TESTING

    HYPHOTESIS

    ONE WAY ANOVATWO WAYANOVA CHI-SQUARE

    LINEAR

    REGRESSION

  • 7/29/2019 Topic Anova

    2/18

    MTE3105 Statistics

    1.4 ANOVA

    Analysis of variance compares two or more populations of interval data. Specifically, we

    are interested in determining whether the differences exist between the population

    means. The procedure works by analyzing the sample variance.

    1.4.1. Definitions

    F-distribution

    The ratio of two independent chi-square variables divided by their respective degrees of

    freedom. If the population variances are equal, this simplifies to be the ratio of the

    sample variances.

    Analysis of Variance (ANOVA)A technique used to test a hypothesis concerning the means of three or mor populations.

    One-Way Analysis of Variance

    Analysis of Variance when there is only one independent variable. The null hypothesis

    will be that all population means are equal, the alternative hypothesis is that at least one

    mean is different.

    Between Group VariationThe variation due to the interaction between the samples, denoted SS(B) for Sum of

    Squares Between groups. If the sample means are close to each other (and therefore

    the Grand Mean) this will be small. There are k samples involved with one data value for

    each sample (the sample mean), so there are k-1 degrees of freedom.

    Between Group Variance

    The variance due to the interaction between the samples, denoted MS(B) for Mean

    Square Between groups. This is the between group variation divided by its degrees offreedom.

    Within Group Variation

    The variation due to differences within individual samples, denoted SS(W) for Sum of

    Squares Within groups. Each sample is considered independently, no interaction

  • 7/29/2019 Topic Anova

    3/18

  • 7/29/2019 Topic Anova

    4/18

    MTE3105 Statistics

    Interaction Effect

    The effect one factor has on the other factor

    Main Effect

    The effects of the independent variables.

    1.4.2.ONE WAY ANOVA

    In the analysis of variance, the approach is conceptually similar to the t-test, although

    the method differs.When you want to compare more than two means, the ONEWAY

    Analysis of Variance (ANOVA) is used. Say, for example you conducted an experiment

    in which you compared the effectiveness of three teaching methods in enhancing

    reading comprehension.

    A One-Way Analysis of Variance is a way to test the equality of three or more means at

    one time by using variance.

    Assumptions

    The populations from which the samples were obtained must be normally or

    approximately normally distributed.

    The samples must be independent.

    The variences of the populations must be equal.

    1.4.3 How ANOVA works

    ANOVA measures two sources of variation in the data and compares their relative sizes

    variation BETWEEN groups

    for each data value look at the difference between its group mean and

    the overall mean

    variation WITHIN groups

    for each data value we look at the difference between that value and the

    mean of its group

    2xxi

    2iij xx

  • 7/29/2019 Topic Anova

    5/18

    MTE3105 Statistics

    The ANOVA F-statistic is a ratio of the Between Group Variaton divided by the Within

    Group Variation

    A large F is evidence againstH0, since it indicates that there is more difference between

    groups than within groups.

    We want to measure the amount of variation due to BETWEEN group variation and

    WITHIN group variation

    For each data value, we calculate its contribution to:

    BETWEEN group variation:

    WITHIN group variation:

    1.4.4. Example problem using One-way Analysis of Variance

    Three groups of students, 5 in each group, were receiving therapy for severe test

    anxiety. Group 1 received 5 hours of therapy, group 2 - 10 hours and group 3 - 15

    hours. At the end of therapy each subject completed an evaluation of test anxiety (the

    dependent variable in the study). Did the amount of therapy have an effect on the level

    of test anxiety?

    The three groups of students received the following scores on the Test Anxiety Index

    (TAI) at the end of treatment.

    TAI Scores for Three Groups of Students

    Group 1 - 5 hours Group 2 - 10 hours Group 3 - 15 hours

    48 55 51

    50 52 52

    53 53 50

    52 55 53

    50 53 50

    MSE

    MSG

    Within

    BetweenF

    2

    xxi

    2iij xx

  • 7/29/2019 Topic Anova

    6/18

  • 7/29/2019 Topic Anova

    7/18

    MTE3105 Statistics

    The degrees of freedom between groups is:

    dfB = K - 1 = 3 - 1 = 2

    Where K is the number of groups.

    Next we calculate SSW, the sum of squares within groups.

    The degrees of freedom within groups is:

    dfW = NT - K = 15 - 3 = 12

    Where NT is the total number of subjects.

  • 7/29/2019 Topic Anova

    8/18

    MTE3105 Statistics

    Finally, we will calculate SST, the total sum of squares.

    As a check SST = SSB + SSW

    54.4 = 25.2 + 29.2

    We can now calculate MSB, the mean square between groups, MSW, the mean square

    within groups, and F, the F ratio.

    To test the significance of the F value we obtained, we need to compare it with the

    critical F value with an alpha level of .05, 2 degrees of freedom between groups (or

    degrees of freedom in the numerator of the F ratio), and 12 degrees of freedom within

    groups (or degrees of freedom in the denominator of the F ratio). We can look up the

    critical value of F in Appendix Table D of the text book (The 5 percent (Lightface Type)

    and 1 percent (Boldface Type) points for the Distribution of F), pages 319-326. Look in

    the table under column 2 (2 degrees of freedom for the numerator) and row 12 (12

    degrees of freedom for the denominator) and read the non-boldfaced entry (for .05 level)

    of 3.88 - this is the critical value for F.

  • 7/29/2019 Topic Anova

    9/18

    MTE3105 Statistics

    One way of indicating this critical value of F at the .05 level, with 2 degrees of freedom

    between groups and 12 degrees of freedom within groups is

    F.05(2,12) = 3.88

    When using analysis of variance, it is a common practice to present the results of the

    analysis in an analysis of variance table. This table which shows the source of variation,

    the sum of squares, the degrees of freedom, the mean squares, and the probability is

    sometimes presented in a research article. The analysis of variance table for our

    problem would appear as follows:

    Analysis of Variance Table

    Source of

    Variation

    Sum of

    Squares

    Degrees of

    Freedom

    Mean

    SquareF Ratio p

    Between

    Groups25.20 2 12.60 5.178

  • 7/29/2019 Topic Anova

    10/18

    MTE3105 Statistics

    Text Book A Text Book B Text Book C

    54 53 4949 56 53

    52 57 47

    55 51 5048 59 54

    With = .05, test if the means of the three populations are equal.

    1. State the independent variable and the dependent variable in this study

    2. State the assumptions for using a one-way ANOVA

    3. State the null hypothesis and the alternative hypothesis

    4. Compute SSB, SSw and SST

    5. Compute the between and within samples variances

    6. Indicate the value of Fcritical.

    7. Compute the F value

    8. Create and ANOVA table and fill in the above information

    9. Describe the conclusion.

    Solution:

    Text Book A Text Book B Text Book C

    54 53 49

    49 56 5352 57 47

    55 51 5048 59 54

    T1 = 258 T2 = 276 T3 = 253

    X21 = 13350 X

    22 = 15276 X

    23 = 12835

    n1 = 5 n2 = 5 n3 = 5

    1 = 51.6 2 = 55.2 3 = 50.6

    1) Independent variable : Text book with three different text books

    Dependent variable : scores of mathematics achievement

    2) The assumption using one-way ANOVA:

    1. The distribution of the populations are normal,

    2. The variances of the populations are equal

    3. Scores are independent

  • 7/29/2019 Topic Anova

    11/18

    MTE3105 Statistics

    4. Samples are independent

    5. Samples are random

    3) Null Hypothesis, H0 = (the three group mean are equal)

    Alternative Hyphotesis, Ha : ( at least one of the means are

    unequal)

    4) a) Sum of Squares Between Group (SSB)

    SSB =

    ()

    ()

    SSB =()

    ()

    ()

    ()

    = 58.5333

    b) Sum of Square Within Groups (SSw)

    SSw = - =

    = 41,461 -()

    ()

    ()

    = 111.2

    c) Sum of Squares Total (SST)

    SST = SSB + SSw = 58.5333 + 111.2 = 169.7333

    5) Between Group Variance

    MSB =

    Within Group Variance

    MSw =

    =

    = 9.2667

    6) The value of Fcritical

    Fcritical = F (0.05,2,12) = 3.89

    Decision Rules: Reject Ho if F> 3.89

    7) The value of F

    F =

    =

  • 7/29/2019 Topic Anova

    12/18

    MTE3105 Statistics

    8) One-Way ANOVA Table

    Sources of Variation Sum ofSquares(SS)

    DegreesofFreedom

    (df)

    MeanSquare(MS)

    TestStatisticValue (F)

    F critical

    Between 2 58.5333 29.26673.16 3.89

    Within 12 112.2000 9.2667

    Total 14 169.7333

    9) Conclusions

    F = 3.16, Fcritical = 3.89. Therefore we fail to reject the Ho. The data indicate

    that the means

    of populations are equal ( F(2,12) = 3.16, = 0.05). The differences

    of the three sample means are simply due to sampling errors.

    4.6 Using the Excel Spreadsheet Program to Calculate One-Way Analysis of

    Variance

    The Excel spreadsheet program has a tool to calculate One-Way Analysis of

    Variance, which simplifies our computational task considerably. Let's use the same

    research problem we already considered, but use the spreadsheet program to do the

    calculations.

    Research Problem:

    Three groups of students, 5 in each group, were receiving therapy for severe test

    anxiety. Group 1 received 5 hours of therapy, group 2 - 10 hours and group 3 - 15hours. At the end of therapy each subject completed an evaluation of test anxiety (the

    dependent variable in the study). Did the amount of therapy have an effect on the

    level of test anxiety?

  • 7/29/2019 Topic Anova

    13/18

  • 7/29/2019 Topic Anova

    14/18

    MTE3105 Statistics

    In the Excel Worksheet select Data Analysis under the Tools menu. If Data Analysis is

    not available you must install the Data Analysis Tools.

    If you need to you can install the data analysis tools as follows:

    1. Select Add-Ins from the Tools menu.

    2. In the Add-Ins window click on the box next to Analysis Tool Pak to select it.

    3. Click OK. You have now installed the Tool Pak.

    With the Data Analysis Tools installed, select Data Analysis under the Tools menu.

    In the Data Analysis window scroll down and select Anova: Single Factor. Complete

    the Anova: Single Factorwindow as follows:

    1. Enter$A$2:$C$7 in the Input Range: box (or you can enter that value

    automatically by clicking in the box and then selecting the range of cells A2

    through C7). Note that we have included the labels, Group 1, Group 2, and

    Group 3, in the range of cells we selected.

    2. Click the Columns button so that we indicate we our data is grouped by

    columns.

    3. Click the Labels in first row box so that we indicate we are using labels (Group

    1, Group 2, and Group 3)

    4. Enter.05 in the Alpha: box.

    5. UnderOutput Options click the button forOutput range: and enter$A$9 in the

    Output range: box (or click in the box and then click on the cell A9 to cause it to

    appear in the box).

    6. Click OK.

  • 7/29/2019 Topic Anova

    15/18

    MTE3105 Statistics

    Your spreadsheet should now appear as follows:

    The results of the one-way analysis of variance can be seen in the resultant tables. The

    means for the three groups (as well as the count, sum, and variance for each group) can

    be seen in the SUMMARYtable.

    The ANOVA table shows the same results as we put in the Analysis of Variance table

    when we calculated the results ourselves. The value of F is shown to be 5.178082192,

    which rounded to 5.18 is the same value as we received when we calculated F. The P-

    Value is shown as .02391684 which indicates that the result is significant at the .02 level.

    We have set our alpha level as .05 so we will simply indicate that p < .05. There is an

    additional entry to the table showing the critical value of F at the .05 level (F Crit) which

    is 3.88529031 which is similar to the result (2.88) we looked up in Appendix Table D in

    the textbook.

    Unfortunately, the spreadsheet program does not have a program to calculate the

    Scheffe test, so we will have too calculate those the way we did before. The results of

    our Scheffe tests were:

  • 7/29/2019 Topic Anova

    16/18

    MTE3105 Statistics

    Summary of Scheffe Test Results

    Group One versus Group Two 4.62

    Group One versus Group Three 0.18

    Group Two versus Group Three 2.96

    We now have all the information we need to complete the six step statistical inference

    process:

    1. State the null hypothesis and the alternative hypothesis based on your

    research question.

    Note: Our null hypothesis, for the F test, states that there are no differences

    among the three means. The alternate hypothesis states that there are significant

    differences among some or all of the individual means. An unequivocal way of

    stating this is not H0.

    2. Set the alpha level.

    Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of

    making a type I error.

    3. Calculate the value of the appropriate statistic. Also indicate the degrees of

    freedom for the statistical test if necessary and the results of any post hoc

    test, if they were conducted.

    F(2,12) = 5.178, value of the F ratio

    F.05(2,12) = 3.88, critical value of F

    F12 = 4.630, Scheffe test value for comparing means 1 and 2

    F13 = 0.185, Scheffe test value for comparing means 1 and 3

    F23 = 2.963, Scheffe test value for comparing means 2 and 3

    4. Write the decision rule for rejecting the null hypothesis.

    Reject H0 if F is >= 3.88

    Note: To write the decision rule we had to know the critical value for F, with an

    alpha level of .05, 2 degrees of freedom in the numerator (df between groups)

    and 12 degrees of freedom in the denominator (df within groups). We can do this

  • 7/29/2019 Topic Anova

    17/18

    MTE3105 Statistics

    by looking at Appendix Table D and noting the tabled value for the .05 level in the

    column for 2 df and the row for 12 df.

    5. Write a summary statement based on the decision.

    Reject H0, p < .05

    Note: Since our calculated value of F (5.178) is greater than 3.88, we reject the

    null hypothesis and accept the alternative hypothesis.

    6. Write a statement of results in standard English.

    There is a significant difference among the scores the three groups of students

    received on the Test Anxiety Index.

    Group 1 (the five hour therapy group) has a significantly lower score on the TAI

    than does Group 2 (the ten hour therapy group).

    We can see that the Excel spreadsheet program gives us an easy way to calculate the F

    ratio. It also provides us with an analysis of variance table which shows, among other

    things, the critical value of F for the alpha level we specified, and the probability level (p)

    of the result.

    http://f/ANOVA/ed602lesson13.htm
  • 7/29/2019 Topic Anova

    18/18

    MTE3105 Statistics

    Question :

    (1) State the Assumptions of ANOVA.(2) Describe the Rationale of ANOVA stating the ANOVA table.(3) Solve the following problem using One-Way ANOVA

    Four types of advertising displays were set up in 12 retail outlets, with three

    outlets randomly assigned to each of the displays, for the purpose of studyingthe point-of-sale impact of the displays. The relevant information is given in the

    following table.

    Type of Display Sales

    A1 40 44 43

    A2 53 54 59A3 48 38 46

    A4 48 61 47

    Carry out the Analysis of Variance to test the differences among the mean sales

    values for the four types of displays, using the 5 percent level of significance.

    (i) State the Null Hypothesis and Alternative Hypothesis.

    Give Step by Step solution using all the required formulas. Give the

    ANOVA table and comment on the conclusion.

    (ii) Use excel to solve the above problem using the data given in the abovetable and comment on the conclusion.