Wilcoxon Rank-Sum, Mann Whitney U, Kolmogorov-Smirnov 1-2 Sample Test

STAT162 / AC YR 2014 1

STAT162 / AC YR 2014

WilcoxonRank-SumTest

2

STAT162 / AC YR 2014 3

The Wilcoxon Rank Sum test is used to test for a difference between two samples. It is the nonparametric counterpart to the two-sample Z or t test. Instead of comparing two population means, we compare two population medians.

STAT162 / AC YR 2014 4

The problem characteristics of this test are:

two groups being tested are independent of each other two groups should have approximately similar

distributions numeric and ordinal data

STAT162 / AC YR 2014 5

Wilcoxon with ni < 10 (small sample)

Wilcoxon with ni ≥ 10 (large sample, use normal approximation)

STAT162 / AC YR 2014 6

Step 1: List the data values from both samples in a single list arranged from smallest to largest

Step 2: In the next column, assign the numbers 1 to N (where N = n1+n2). These are the ranks of the observations. When N is equal to our total sample size, our smallest observation receives a rank of 1, and the largest observation receives a rank of N. If there are ties, assign the average of the ranks the values would receive to each of the tied values.

Step 3: The sum of the ranks of the first sample is W, the Wilcoxon Rank Sum test statistic. If one sample is truly bigger than the other, we’d expect its ranks to be higher than the others. So after we have ranked all of the observations, we sum up the ranks for each of the two samples and we can then compare the two rank sums.

Wilcoxon Rank-Sum Test

STAT162 / AC YR 2014 7

Note the following:

If there are ties, then we’d expect W to be roughly half of [N(N + 1)]/2.

If there are no ties when the observations are ranked, then we’d

expect W to be roughly equal to its mean/expected value, .

STAT162 / AC YR 2014 8

The hypothesis statements function the same way as the two sample ttest – but we are focused on the medians rather than on the means:

H0: η1 – η2 = 0

H1: η1 – η2 ≠ 0

(These could also be expressed as one tailed tests.)

STAT162 / AC YR 2014 9

The following data measures the reaction times of two samples of people – one set drank alcohol, one set drank a placebo.

Small sample

From this dataset, the hypothesis statements will be:

: The median reaction times for the placebo group is the same or slower than the median reaction time for the alcohol group.

: The median reaction times for the placebo group is faster than the median reaction time for the alcohol group.

Alcohol Placebo

1.45 .90

1.46 .37

1.76 1.63

1.44 .83

1.11 .95

.98 .78

1.27 .86

2.56 .61

1.32 .38

STAT162 / AC YR 2014 10

If we sum the ranks of the Placebo group, we get

W = 1+2+3+4+5+6+7+8+16 W = 52.

Since the middle point of the ranks that is

and the placebo ranks is much lower, we have initial evidence to conclude that the placebo group had quicker reaction times than did the alcohol group.

Data Rank Alcohol or Placebo

.37 1 Placebo

.38 2 Placebo

.61 3 Placebo

.78 4 Placebo

.83 5 Placebo

.86 6 Placebo

.90 7 Placebo

.95 8 Placebo

.98 9 Alcohol

1.11 10 Alcohol

1.27 11 Alcohol

1.32 12 Alcohol

1.44 13 Alcohol

1.45 14 Alcohol

1.46 15 Alcohol

1.63 16 Placebo

1.76 17 Alcohol

2.56 18 Alcohol

STAT162 / AC YR 2014 11

Large Sample using Normal Approximation

STAT162 / AC YR 2014 12

Two independent samples of army and marine recruits are selected, and the time in minutes it takes each recruit to complete an obstacle course is recorded, as shown in the table. At 0.05, is there a difference in the times it takes the recruits to complete the course?

Army 15 18 16 17 13 22 24 17 19 21 26 28 Mean = 19.67

Marines 14 9 16 19 10 12 11 8 15 18 25 Mean = 14.27

Large Sample using Normal Approximation

𝑛𝐴=12 ;𝑛𝑀=11

STAT162 / AC YR 2014 13

Step 1: State the hypothesis and identify the claim.

: There is no difference in the times it takes the recruits to complete the obstacle course.: There is a difference in the times it takes the recruits to complete the

obstacle course.

Step 2: Find the critical value.

Since α= 0.05 and this test is a two-tailed test, use the z values of +1.96 and -1.96 ()

STAT162 / AC YR 2014 14

Combine the data from the two samples, arrange the combined data inorder, and rank each value. Be sure to indicate the group.

STAT162 / AC YR 2014 15

Step 3: Compute the test value

Sum the ranks of the group with the smaller sample size. In this case, the sample size for the marines is smaller.

R= 1+2+3+…+14.5+16.5+21 = 93

Substitute in the formulas to find the test value.

STAT162 / AC YR 2014 16

Step 5: Make the decision

The decision is to reject the null hypothesis, since -2.41< -1.96.

Step 6: Interpretation

There is enough evidence to support the claim that there is a difference in the times it takes the recruits to complete the course

STAT162 / AC YR 2014 17

Mann – Whitney U

STAT162 / AC YR 2014 18

commonly portrayed as the non-parametric substitute for Student's t-test when samples are not normally distributed.

STAT162 / AC YR 2014 19

To compute the Mann Whitney U:– Rank the scores in both groups (together) from

highest to lowest. – Sum the ranks of the scores for each group.– The sum of ranks for each group are used to

make the statistical comparison.

STAT162 / AC YR 2014 20

1. The null hypothesis states that there is no difference in the scores of the populations from which the samples were drawn. 2. The Mann Whitney U is sensitive to both the central tendency of the scores and the distribution of the scores.3. The Mann Whitney U statistic is defined as the smaller of and .

= ( + 1) / 2] - = ( + 1) / 2] -

where: = number of observations in group 1 = number of observations in group 2 = sum of the ranks assigned to group 1 = sum of the ranks assigned to group 2

STAT162 / AC YR 2014 21

Null Hypothesis: There is no difference in scores of the two groups (i.e. the sum of ranks for group 1 is no different than the sum of ranks for group 2).

Alternative Hypothesis: There is a difference between the scores of the two groups (i.e. the sum of ranks for group 1 is significantly different from the sum of ranks for group 2).

Sum Ranks

STAT162 / AC YR 2014 22

We are to perform the test at the 5% significance level, so α = 0.05 Critical value Test Statistic= ( + 1) / 2] - = 10(10)+ [10(10+1)/2 – 85] = 70= ( + 1) / 2] - = 10(10) + [10(10+1)/2 -125] =30

Reject null. Since

STAT162 / AC YR 2014 23

Suppose you wished to determine if there was a difference in the biomass of male and female Juniper trees.Thus, Ho: Bmale = Bfemale (medians are equal) H1: Bmale ≠ Bfemale (medians not equal)

You randomly select 6 individuals of each gender from the field, dry them to constant moisture, chip them, and then weigh them to the nearest kg (

STAT162 / AC YR 2014 24

Raw Data:Male 74 77 78 75 72 71Female 80 83 73 84 82 79

Raw Data with assigned rank:Male 74 77 78 75 72 71

4 6 7 5 2 1 Female 80 83 73 84 82 79 9 11 3 12 10 8

STAT162 / AC YR 2014 25

Critical Value

Test Statistic= ( + 1) / 2] - = ( + 1) / 2] -

Reject Since

STAT162 / AC YR 2014 26

Kolmogorov – Smirnov One Sample Test

STAT162 / AC YR 2014 27

Concern with the degree of agreement between the distribution of a set of sample values (observed scores) and some specified theoretical distribution

Determines whether the scores in a sample can reasonably be taught to have come from a population having the theoretical distribution

STAT162 / AC YR 2014 28

Assumptions: The data consist of the independent observations constituting a random sample of size n from some unknown

distribution function designated by The variables being measured are at least an ordinal scale

STAT162 / AC YR 2014 29

Kolmogorov-Smirnov One Sample Test

(1) Specify the theoretical cumulative distribution, i.e., the cumulative frequency distribution expected, then state the null hypotheses and the corresponding alternatives.

(2) Arrange the observed scores into a cumulative distribution and convert the cumulative frequencies into cumulative relative frequencies [ ]. For each interval find the expected cumulative relative frequency

(3) Compute the test statistic given by,| where

(4) Refer to Appendix Table F to find the probability (two-tailed) associated with the occurrences with the occurrence under of values as large as the observed value of D . If that probability is equal to or less than α , reject Ho.

STAT162 / AC YR 2014 30

ExampleGrundman et.al reported the weighted of the kidneys f 36 mongrel dogs before they were used in an experiment. We wish to test the null hypothesis that these data are from a normally distributed population with a mean of 85 grams and a standard deviation of 15 grams.

STAT162 / AC YR 2014 31

STAT162 / AC YR 2014 32

AssumptionKolmogorov- Smirnov one sample test is chosen because the researcher wishes to compare an observed distribution of scores from an ordinal scale with theoretical distribution of scores.

Hypotheses: The sampled population is normally distributed with mean µ = 85 and standard deviation σ

= 15.: The sampled population is not normally distributed.

Significance levelLet α=0.05 and N =36

STAT162 / AC YR 2014 33

Test Statistics:

Critical Region: Reject (refer to Table F)

0.23

STAT162 / AC YR 2014 34

STAT162 / AC YR 2014 35

STAT162 / AC YR 2014 36

Decision:Entering Table A.18 with N = 36, and keeping in mind that the test is two-sided, we find that the probability of obtaining a value of D as extreme as more extreme than 0.15 is greater than 0.23. Hence these data do not provide sufficient evidence to warrant the conclusion that the weights of mongrel dog kidneys are not normally distributed.

Conclusion: Thus, the weights of the kidneys of mongrel dogs are normally distributed .

STAT162 / AC YR 2014 37

Kolmogorov – Smirnov Two Sample Test

STAT162 / AC YR 2014 38

Applications:

concern with the agreement between two sets of sample values

determines whether the two independent samples have been drawn from the same population(or from populations with the same distribution)

STAT162 / AC YR 2014 39

Arrange each of the two groups of scores in a cumulative frequency distribution using the same intervals. Let be the observed cumulative distribution for one sample and for the other sample.

Compute the test statistic given by,

For one-tailed test, and

For two-tailed test By subtraction, determine the difference between the two-sample cumulative

distributions. Determine the largest of these differences, .

Kolmogorov-Smirnov Two Sample Test

STAT162 / AC YR 2014 40

Refer to Appendix Table when m and n are both ≤ 25 for one-tailed test, Appendix Table for two tailed test. In either table, the entry is used.

Refer to Table when either m and n is larger than 25 for two-tailed test. For one-tailed test, the value of chi-square computed by using =4 is chi-square distributed with df = 2. use Appendix Table C.

If the observed value is equal to or larger than that given in the appropriate table for a particular level of significance, may be rejected.

STAT162 / AC YR 2014 41

Example

Lepley compared the serial learning of 10 seventh-grade students with the serial learning of 9 eleventh-grade students. His hypothesis was that the primacy effect should be less prominent in the learning of the younger subjects. The primacy effect is the tendency for the material learned early in a series to be remembered more efficiently than the material learned later in the series. He tested this hypothesis by comparing the percentage of errors made by the two groups in the first half of the series of learned material, predicting that the older group would make relatively fewer errors in repeating the first half of the series than would the younger group.

STAT162 / AC YR 2014 42

Percentage of total errors in first half of series

Eleventh-grade subjects Seventh-grade subjects

35.239.240.938.134.429.141.824.332.4------

39.141.245.246.248.448.755.040.652.147.2

STAT162 / AC YR 2014 43

AssumptionsSince two small independent samples are being compared and the alternative hypothesis is one-tailed, the Kolmogorov-Smirnov two-sample one-tailed test will be applied to the data.

Hypotheseshere is no difference in the proportion of errors made in recalling the first half of a learned series between eleventh-grade students and seventh-grade students: Eleventh-graders make proportionally fewer errors than seventh-graders in recalling the first half of a learned series.

Significance level 0.01 with

STAT162 / AC YR 2014 44

Critical RegionSince m and n ≤ 25 (one-tailed)We reject if >That is, if > 61

Computation

STAT162 / AC YR 2014 45

DecisionObserve that the largest discrepancy between the two cumulative distributions is = .70. Thus, = 63 > 61. So, since the observed value exceeds the critical value, we reject in favor of

ConclusionWe conclude that the eleventh-graders make proportionally fewer errors than seventh-graders in recalling the first half of a learned series.

Business

Wilcoxon Rank-Sum, Mann Whitney U, Kolmogorov-Smirnov 1-2 Sample Test