DTC Quantitative Research Methods Comparing Means II: Nonparametric Tests and Bivariate and Multivariate Analysis of Variance (ANOVA) Thursday 20 th November

DTC Quantitative Research Methods

Comparing Means II: Nonparametric Tests and

Bivariate and Multivariate Analysis of Variance (ANOVA)

Thursday 20th November 2014

Two-sample t-tests: Limitations• In Week 6 we looked at two-sample t-tests, which are used to

test the (null) hypothesis that the population means for two groups are the same.

• But t-tests make an assumption of homogeneity of variance (i.e. that the spread of values is the same in each of the groups).

• Furthermore, the assumption that the difference between sample means has a t-distribution is only reasonable for small samples if the variable has (approximately) a normal distribution.

• And, of course, very often we are interested in comparing more than two groups…

Nonparametric alternatives• Where the assumptions of a t-test are seriously violated,

an alternative approach is to use a nonparametric test.• Nonparametric tests are also referred to as distribution-free tests, as they have the advantage of not requiring the same assumptions about distributions of values.

• In practice (when using SPSS), such tests work in a similar way to parametric tests, with the same processes of selecting variables and of assessing statistical significance, based on the p-value that is calculated for the test statistic.

The weakness of the nonparametric alternative…

• However, parametric tests such as t-tests are to be preferred because, in general, for the same sample size(s), they are less likely to generate Type II errors (i.e. the acceptance of an incorrect null hypothesis).

• Nonparametric tests are thus less powerful. • This lack of power results from the loss of

information when interval-level data are converted to ranked data (i.e. merely ordering the values from lowest to highest).

The Mann-Whitney U-test• This is a nonparametric alternative to the two-

sample t-test for comparing two independent samples. In effect, it focuses on average ranks of values rather than on average values.

• U is calculated by, first, ranking all the values in the two samples taken together.

• The ranked values for each sample are then added up, and, if the sample size for a sample is n, then n(n+1)/2 is subtracted from the sum of the ranks.

• The smaller of the numbers generated for the two samples becomes the U-statistic.

Mann-Whitney (continued)• The U-test can represent a better way of

comparing an ordinal measure between two groups than assuming the measure can be treated as interval-level.

• Since it is based on ranks, it is more robust than the t-test with respect to the impact of outliers.

• However, it is less appropriate where there are more than a small number of ‘tied’ values.

Another alternative…• Where there are a substantial number of tied values, the

Kolmogorov-Smirnov Two Sample Test may be more appropriate.

• This is (yet) another nonparametric test, focusing on whether the two groups have the same distribution of values, and based on the maximum absolute difference between the observed cumulative frequency distributions for the two samples

• However, this is a broader hypothesis than one focusing on the level of the values.

• It has also been noted in the technical literature that this test has limited power and hence gives a high chance of a Type II Error, i.e. not identifying a difference when one exists.

Rethinking the difference between two sample means: An example

Women’s ages at marriage (in years)

First pair of samples: Education Ages at marriage MeanLeft school at 16: 19 19 19 20 20 20 21 21 21 20.0Stayed on at school: 24 24 24 25 25 25 26 26 26 25.0 Second pair of samples:Education Ages at marriage Mean Left school at 16: 16 17 18 19 20 21 22 23 24 20.0Stayed on at school: 21 22 23 24 25 26 27 28 29 25.0

Which pair of samples provides stronger evidence of a difference?

Question: Within each pair of samples on the preceding slide the difference between the sample means is the same (25.0 - 20.0 = 5.0 years). Given this similarity, which pair of samples provides stronger evidence that there is a difference between the mean ages at marriage, in the population, of women who left school at 16 and of women who stayed on at school?Answer: It seems intuitively obvious that the first pair of samples provides stronger evidence of a difference, since in this case the ages at marriage in each of the two groups are quite homogeneous, and as a consequence there is no overlap between the two groups. It seems implausible that a set of values that is so homogeneous within groups but different between groups could have arisen by chance, rather than as a consequence of some underlying difference between the groups.

Comparing types of variation• Another way of looking at the above is to say that the

difference between the means in the first pair of samples is large when compared with the differences between individuals within either of the groups.

• The difference between the group means can be labelled as between-groups variation and the differences between individuals within each of the groups can be labelled as within-group variation.

• It is the comparison of between-groups variation and within-group variation that is at the heart of the statistical technique labelled analysis of variance (ANOVA).

Quantifying variation• As in the first pair of samples in the example, a high level

of between-groups variation relative to within-group variation gives one more confidence that there is an underlying difference between the groups.

• But how can one quantify the between-groups variation and the within-group variation?

• Typically, when we want to summarise the spread of a set of values we calculate the standard deviation corresponding to those values. A similar approach is used to quantify the two forms of variation.

Sums of squares• Recall that the standard deviation is based on the

squared differences between each of a set of individual values and a mean value.

• Between-groups variation is thus quantified as the sum of the squared differences between the group means and the overall mean, with each squared difference being weighted by the number of cases in the group in question (since larger groups are obviously of greater empirical importance).

• Thus, in the example, the between-groups variation can be calculated as: [ 9 x (20.0 - 22.5)2 ] + [ 9 x (25.0 - 22.5)2 ] = 112.5

Sums of squares (continued)• The within-group variation can be calculated by taking

each of the groups in turn, and calculating the sum of squared differences between the individual values in that group and the mean for that group.

• Thus, in the first of the groups in the second pair of samples: (16 - 20)2 + (17 - 20)2 + (18 - 20)2 + (19 - 20)2 + (20 - 20)2 +

(21 - 20)2 + (22 - 20)2 + (23 - 20)2 + (24 - 20)2 = 60.0

• The second of the groups in the second pair of samples also generates a sum of squared differences of 60.0, so the total value for the within-group variation is 60.0 + 60.0 = 120.0

Partitioning variation

• Note that the overall amount of variation within the data can be measured by calculating the sum of squared differences between each of the individual values (i.e. all the values in both of the groups) and the overall mean.

• This calculation results in a figure of 232.5. • Note that 232.5 = 112.5 + 120.0! • In other words, the technique of Analysis of Variance

involves breaking down (‘partitioning’) the overall variation in a set of values into its between-groups and within-group components.

Accounting for sources of variation

• Now that the two forms of variation have been quantified the next step is to compare the two values that have been obtained with each other.

• However, when doing this it makes sense to take account of: (a) the number of groups being considered, and (b) the number of individuals in each group.

Degrees of freedom• In this case there are only two groups, hence we are only

making one comparison between groups. In fact, the number of degrees of freedom (sources of variation) attached to the between-groups variation is always equal to the number of groups less one.

• The number of degrees of freedom (sources of variation) for the within-group variation is the total number of individuals in all the groups, less the number of groups (or, to put it another way, the sum across all the groups of the number of individuals in each group minus one). Thus, in this case:

Degrees of freedom of between-groups variation = 2 - 1 = 1Degrees of freedom of within-group variation = 18 - 2 = 16

Calculating the F-statistic

• We now divide the two amounts of variation by their respective degrees of freedom, i.e.:

Between-groups variation = 112.5/1 = 112.5Within-group variation = 120.0/16 = 7.5

• Finally we compare the amounts of the two forms of variation by dividing the first amount by the second amount, giving 112.5/7.5 = 15.0.

• Thus, in a sense, the between-groups variation is 15 times as great as the within-group variation.

Evaluating the F-statistic• Note that an F-statistic has associated with it two sets of

degrees of freedom (corresponding to the between-groups variation and the within-group variation). Hence here we have an F-statistic of 15.0 with 1 degree of freedom and 16 degrees of freedom.

• Differences between sample means that occur simply as a consequence of sampling error result, on average, in the same amount of between-groups variation per degree of freedom as within-group variation per degree of freedom. Hence the average F-statistic where the null hypothesis of equal means is correct will be 1.

• How rarely, then, would an F-statistic of 15.0 occur simply as a consequence of sampling error?

The usual p-value…

• For an F-statistic of 15.0 with 1 degree of freedom and 16 degrees of freedom, the p-value is 0.0013.

• Since p < 0.05, we can reject the (null) hypothesis that the population means for the two groups are the same.

• However, ANOVA makes the same assumptions about homogeneity of variance and normally distributed values as t-tests do!

• And, if we are comparing more than two groups, the question arises as to whether the means for particular pairs of groups differ from each other?

Post-hoc tests• Rather than carrying out a large number of t-tests for pairs of

groups, which involves a substantially increased chance of one or more Type I Errors (i.e. false positives), there are a number of alternative ways of comparing the groups more appropriately in a pair-wise way.

• If the assumptions of homogeneity of variance and normal distribution of values are met, then Tukey’s HSD test corrects for the increased chance of Type I Errors when groups are compared in a pair-wise way.

• Another common post-hoc procedure is Scheffe’s test. However, because this allows for more complex forms of comparisons (i.e. of three or more means), it is unnecessarily low in power for pair-wise comparisons, i.e. the chance of Type II Errors is increased when it is used to look at these.

…and the nonparametric alternative?

• The Kruskal-Wallis H Test is the nonparametric test equivalent to (one-way) ANOVA, being an extension of the Mann-Whitney U-test to allow the comparison of more than two (independent) samples.

• The above comment refers to one-way ANOVA because the technique can be generalised to versions which involve two or more independent variables at the same time…

Multivariate analysisAs noted last week, we can use multivariate analysis to elaborate bivariate relationships, in order to answer the following types of questions:

1. Why does the relationship [between two variables] exist?

Spurious relationships, intervening variables

2. How general is the relationship? Does it vary in existence/intensity between subgroups?

The replication of/specification of relationships

These objectives can be achieved via an elaboration of ANOVA

Starting with some means…

BSA 2006: At what age did you retire work? (Q296)

NS- SEC class N MeanEmployers in large org.; higher manag. & pr. 64 60.84Lower profess & manag; higher techn. & su. 183 58.01Intermediate occupations 88 56.18Employers in small org.; own account work 72 61.39Lower supervisory & technical occupation 96 60.04Semi-routine occupations 144 58.53Routine occupations 111 57.60

Total 758 58.65

… and then a One-Way ANOVA

BSA 2006: At what age did you retire work? (Q296)Sum of Squares df Mean Square F Sig.

Between Groups 1769.833 6 294.972 3.845 .001Within Groups 57609.915 751 76.711Total 59379.748 757

Since p=0.001 < 0.05, there is a significant relationship between occupational class (NS-SEC) and retirement age.

… but we need to remember to reflect on whether the assumptions of ANOVA are met in this case!

Assumptions: a reminder• ANOVA make an assumption of homogeneity of variance (i.e. that the spread of values is the same in each of the groups).

• Furthermore, ANOVA assumes that the variable has (approximately) a normal distribution within each of the groups.

• Levene’s test of the former assumption results in p<0.001, i.e. the assumption is not plausible.

• … and it is also not self-evident that retirement ages would have a normal distribution!

Nevertheless…

• We might ask ourselves the question whether some of the class difference in retirement ages reflects gender.

• And hence there is a motivation to carry out a Two-way ANOVA to look at the effects of class and gender simultaneously.

Two-way ANOVA results

BSA2006: At what age did you retire work Q296(Type III)

Source Sum of Sq. df Mean Sq. F Sig.Corrected Model 4739.996 13 364.615 4.965 .000RClass 619.086 6 103.181 1.405 .210

RSex 2188.093 1 2188.093 29.794 .000

RClass * RSex 506.510 6 84.418 1.149 .332

Error 54639.752 744 73.441

Corrected Total 59379.748 757

… so what do the results mean?• The overall variation explained by the two variables is

greater (4739.996 compared to 1769.833).• But the between-groups variation which is unique to

class is no longer significant (p=0.210 > 0.05)• Whereas the between-groups variation which is unique

to sex is significant (p<0.001)• … but sex and class do not have interacting effects

(p=0.332)• Note that the class, sex and interaction sums of squares

don’t add up to the overall ‘explained’ sum of squares because some of the effects of class and sex overlap.

A multivariate conclusion!

• The class differences in retirement age observed in the One-way ANOVA are shown by the Two-way ANOVA to be a spurious consequence of the relationships between gender and class and between gender and retirement age!

Documents

DTC Quantitative Research Methods Comparing Means II: Nonparametric Tests and Bivariate and Multivariate Analysis of Variance (ANOVA) Thursday 20 th November