BMI 541/699 Lecture 17lindstro/17.nonparametric.tests.11.6.pdfParametric Nonparametric Assumptions for npar. test in test tests addition to SRS One-sample t-test One-sample sign test

BMI 541/699 Lecture 17

Where we are:

1. Introduction and Experimental Design

2. Exploratory Data Analysis

3. Probability

4. T-based methods for continous variables

5. Proportions & contingency tables

6. Non-parametric hypothesis tests- one sample, continuous variable- two samples, continuous variable- contingency tables

1 / 44

Assumptions for hypothesis test and confidence intervals

Assumptions in addition to simple random sample(s).

Continuous variables: one-sample t-test, two-sample t-test, pairedt-test and all t-based confidence intervals.

• The mean of the sample(s) is approximately normallydistributed (the central limit theorem works)

This will be true if:

1. the distribution of the data in each sample are approximatelysymmetric

2. the sample size is large enough to overcome any lack ofsymmetry

• For 2-sample t-test that assumes equal variances

- The data in the two samples have the same standard deviation.

2 / 44

Assumptions in addition to simple randomsample(s).

Categorical variables

• χ2 test of goodness of fit

- The expected count in each cell is at least 1 and no more than20% of the cells have expected counts less than 5.

• Exact test of one proportion: None

• CI for one proportion:

- Wald (Approximate Normal): The distribution of the numberof successes is approximately normal.

- Adjusted Wald (Agresti-Coull): None.

• χ2 test of independence in a contingency table (also test ofequality for two proportions).

- The expected count in each cell is at least 1 and no more than20% of the cells have expected counts less than 5.

• CI for the difference between two proportion:

- Adjusted Wald (Agresti-Coull): None

3 / 44

What if we are not willing to make these assumptions?Continuous variablesWhen samples are small and you do not wish to assumeapproximate normality, non-parametric hypothesis tests methodsmay be used:

Parametric Nonparametric Assumptions for npar. test in

test tests addition to SRS

One-sample t-test One-sample signtest

None

Wilcoxon signed-rank test

RV is continuous and symmetric*

Two-sample t-test Mann-Whitneytest (AKAWilcoxon ranksum)

Independent samples, variables arecontinuous, The distribution of thevariable in the two populations arethe same except for a shift.

Permutation test Same as Mann-Whitney

* Since we have to assume symmetry we might as well use a t-test unless

n is very small.4 / 44

Sign TestExample 1: Assume we have results from a survey administered toa SRS of 12 patients who rated their nursing care on a scale from1 to 5.

Subject 1 2 3 4 5 6 7 8 9 10 11 12

Rating 5 4 4 1 3 5 2 4 5 3 4 5

A histogram of the ratings:

05

1015

20

Fre

quen

cy

1 2 3 4 5

Rating

5 / 44

We wish to test the null hypothesis that the median rating is equalto 3.

H0 : median = med0 = 3 vs. HA : median 6= 3

First we calculate the adjusted rating = rating - med0 for eachsubject and then the sign of those values:

Subject 1 2 3 4 5 6 7 8 9 10 11 12

Rating 5 4 4 1 3 5 2 4 5 3 4 5

Adjusted rating 2 1 1 -2 0 2 -1 1 2 0 1 2

sign + + + – + – + + + +

Define:X = the number of adjusted ratings > 0 = the number of plusesn = the number of adjusted ratings 6= 0.

For this example x = 8, n = 10

If H0 is true then pluses and minuses are equally likely under thenull hypothesis.

6 / 44

Under the null hypothesis

X ∼ Binom(n = 10, 0.5)

We wish to test

H0 : p = 0.5 vs. HA : p 6= 0.5

To calculate the p-value use binom.test() in R(no easy way to do this in R Commander)

> binom.test(x=8,n=10)

Exact binomial test

data: 8 and 10

number of successes = 8, number of trials = 10, p-value = 0.1094

alternative hypothesis: true probability of success is not equal to 0.5

95 percent confidence interval:

0.4439045 0.9747893

sample estimates:

probability of success

0.8

No evidence against the null hypothesis H0 : median = 37 / 44

Example 2: A SRS of 10 subjects rates the effectiveness of each oftwo treatments.

Subject 1 2 3 4 5 6 7 8 9 10

tmt 1 rating 5 5 4 0 4 4 7 5 6 4tmt 2 rating 8 8 4 5 9 8 6 9 3 9

difference -3 -3 0 -5 -5 -4 1 -4 3 -5sign – – – – – + – + –

This is paired data. We approach it the same way as a pairedt-test. Take the differences and treat it like one sample.

We wish to test

H0 : median diff = 0 v.s. HA : median difference 6= 0

Define X = number of positive differences and

n = number of nonzero differences

For this examplex = 2 and n = 9

8 / 44

If H0 is true then positive and negative differences are equally likelyso under the null hypothesis

X ∼ Binom(n = 9, 0.5)

and the hypotheses can be written:

H0 : p = 0.5 vs. HA : p 6= 0.5

9 / 44

To calculate the p-value use binom.test() in R.

> binom.test(x = 2,n = 9)

Exact binomial test

data: 2 and 9

number of successes = 2, number of trials = 9, p-value = 0.1797

alternative hypothesis: true probability of success is not equal to 0.5


0.02814497 0.60009357

sample estimates:

probability of success

0.2222222

No evidence against the null hypothesis that the two treatmentshave equal medians

The sign test has the advantage that it only assumes a simplerandom sample. However it is not very powerful because it throwsaway a lot of information.

10 / 44

Summary of the sign test

One sample

• H0 : median = med0 vs. HA : median 6= med0

• calculate adjusted observations = observed value - med0

• let X = number positive adjusted observations

• let n = the number of non-zero adjusted observations

• test H0 : X/n = 0.5.

• using R

> binom.test(x,n)

where x is the observed number of positive adjustedobservations.

R commander does not offer the sign test.

Paired observations

• calculate the paired differences and conduct a one sample signtest on the differences.

11 / 44

Sign test: advantages and disadvantages

• No assumptions on the distribution of the data (but need aSRS)

• Not very powerful (will fail to reject when H0 is false morethan we would like)

• When there are zeros the sign test may under estimate thep-value.

An alternative is to use the Conservative sign test

- count 1/2 the zeros as positive and 1/2 the zeros as negative.- If there are an odd number of zeros add one zero to the data

(and increase n by 1).

12 / 44

One-sample Wilcoxon signed-rank testExample: Assume we have results from a survey administered to aSRS of 10 patients who rated their doctors. The summary scores(one per patient) are:

Patient 1 2 3 4 5 6 7 8 9 10Rating 5.0 4.3 4.1 1.0 3.0 4.9 2.2 4.8 3.1 3.2

The null and alternative hypotheses are the same as for the signtest

H0 : median = med0 vs. HA : median 6= med0

In this example

H0 : median = 3 vs. HA : median 6= 3

We will not go throught the calculations since the application isvery limited.

It requires almost the same assumptions at the t-test.13 / 44

One-sample Wilcoxon signed-rank test in RR Commander can do a one-sample wilcoxon for paired samplesbut not for a single sample.Menus: Statistics → Nonparametric tests → Single-sampleWilcoxon test

To use R directly

> x <- c(4.4, 4.3, 4.1, 1.0, 3.0, 4.9, 2.2, 4.8, 3.1, 3.2)

> median(x)

[1] 3.65

> wilcox.test(x,mu=3)

Wilcoxon signed rank test with continuity correction

data: x

V = 33, p-value = 0.2361

alternative hypothesis: true location is not equal to 3

Warning message:

In wilcox.test.default(x, mu = 3) :

cannot compute exact p-value with zeroes

>

No evidence against the null hypothesis that the median score is 3.14 / 44

Non-parametric methods for two samplesThe first test we will discuss is the Mann-Whitney test (AKA twosample Wilcoxon test, AKA Wilcoxon rank sum test)

Assumptions:

• Simple random samples from two populations

• The variables are continuous.

• The population distributions are the same shape. (the secondpopulation distribution is a shifted copy of the first)

The Mann-Whitney is based on the ranks (sorted order) of thedata.

The null and alternative hypothesis are:

H0 : med1 −med2 = med0 HA : med1 −med2 6= med0

Where medi is the population median of the variable in the i th

population and usually med0 = 0.15 / 44

Example In an experiment the dopamine concentrations in thebrains of 6 rats on toluene treatment and 8 control rats wererecorded.

We want to know if there is a difference in the two groups

Group response

tmt 3120tmt 2014tmt 1664tmt 2481tmt 2603tmt 2301cntl 1620cntl 1743cntl 1397cntl 1503cntl 2339cntl 2090cntl 2231cntl 1225

16 / 44

Distribution of dopamine by group

cntltmt

1500 2000 2500 3000

17 / 44

First we sort the observations and assign ranks

Group Response Rank

cntl 1225 1cntl 1397 2cntl 1503 3cntl 1620 4tmt 1664 5cntl 1743 6tmt 2014 7cntl 2090 8cntl 2231 9tmt 2301 10cntl 2339 11tmt 2481 12tmt 2603 13tmt 3120 14

Then we sum up the ranks of one of the groups. We will use thecontrols

scntl = 1 + 2 + 3 + 4 + 6 + 8 + 9 + 11 = 44

18 / 44

To obtain the Mann-Whitney statistics we subtract scntl from

ncntlntmt + ncntl(ncntl + 1)/2 = 8× 6− 8× 9/2 = 84

where ncntl is the size of the control group and ntmt is the size ofthe tmt group

This gives us the Mann-Whitney U test statistic:

Ucntl = 84− 44 = 40

Find Utmt as

Utmt = ncntlntmt − Ucntl = 48− 40 = 8

Choose the larger of the two Us. This is the Mann-Whitneystatistic U = 40.

This statistic is compared to all possible values of U given acontrol group of size 8 and a treatment group of size 6.

You can look up the cutoff values for U in Whitlock.19 / 44

In R Commander first load the EZR plugin.R Commander w/o EZR can now do this test

Use menus: Statistical analysis → Nonparametric tests →Mann-Whitney U testw/o EZR Use menus: Statistics → Nonparametric tests →Two-sample Wilcoxon test ...

> #####Mann-Whitney U test#####

Wilcoxon rank sum test

data: response by factor(Group)

W = 8, p-value = 0.04262

alternative hypothesis: true location shift is not equal to 0

## and a bunch of other stuff

Note that R Commander uses a slightly different but equivalentversion of this test.

W is the smaller of the two U statistics (Utmt = 8 = W )

20 / 44

Permutation test for two samples

An alternative non-parametric test for two samples is thepermutation test.

It requires the same assumptions as the Mann-Whitney test:

• Simple random samples from two populations

• The variables are continuous.

• The population distributions are the same shape. (the secondpopulation distribution is a shifted copy of the first)

21 / 44

A permutation is one possible reordering of a variable.

For instance, if our original variable is the numbers 1 to 10

1 2 3 4 5 6 7 8 9 10

Two possible permutations of this variable are:

9 5 2 10 4 7 1 3 8 6

and

2 6 1 10 3 9 5 4 7 8

22 / 44

Recall the dopamine example where treatment is toluene.

Group response

tmt 3120tmt 2014tmt 1664tmt 2481tmt 2603tmt 2301cntl 1620cntl 1743cntl 1397cntl 1503cntl 2339cntl 2090cntl 2231cntl 1225

23 / 44

For a permutation test we assume we have two independent SRSfrom two populations and we measure X1 on population 1 and X2

on population 2.

The null and alternative hypothesis for the permutation test are:

H0 : X1 and X2 have the same distribution

HA : X1 has a distribution that is a shifted versionof the distribution of X2

We can define any test statistic that we like as long as it measuresa shift in the location of the distributions that we are interested in:

Two possible test statistics based on the sample means are:

• the unscaled difference in the sample means (x̄1 − x̄2)

• the two-sample t-test statistic

t =x̄1 − x̄2

sp√

1/n1 + 1/n2

24 / 44

Since we do not want to assume that the shared distribution underthe null hypothesis is symmetric we will use the test statistic

M = sample median group 1 - sample median group 2

For the dopamine example the observed value of the test statisticis:

Mobs = median(tmt)−median(cntl) = 2391.0−1681.5 = 709.5

25 / 44

To do the permutation test, first create a table with columns forthe group variable and the response variable.

group obs

1 tmt 3120

2 tmt 2014

3 tmt 1664

4 tmt 2481

5 tmt 2603

6 tmt 2301

7 cntl 1620

8 cntl 1743

9 cntl 1397

10 cntl 1503

11 cntl 2339

12 cntl 2090

13 cntl 2231

14 cntl 1225

26 / 44

Next, permute (rearrange) the group labels but not theobservations.

group2 obs

1 tmt 3120

2 cntl 2014

3 cntl 1664

4 tmt 2481

5 cntl 2603

6 cntl 2301

7 tmt 1620

8 cntl 1743

9 tmt 1397

10 cntl 1503

11 cntl 2339

12 tmt 2090

13 tmt 2231

14 cntl 1225

27 / 44

Then, calculate the permutation test statistic Mp using thepermuted group assignments:

For the permuted data the medians are

cntl tmt

1878.5 2160.5

and Mp = 2160.5− 1878.5 = 282

Next, repeat this process for all possible permutations (reorderings)of the group variable.

28 / 44

If we let M̄p be the mean permutation test statistic.

Then the p-value is the proportion of the permutations where weget an as extreem or more extreem test statistic.

P-value =Number of permutations where |Mp − M̄p| ≥ |Mobs − M̄p|

Number of permutations

That is, the proportion of permutation test statistics where thedistance from the permutation test statatistic to the meanpermutation test stat

|Mp − M̄p|

is larger than the distance from the observed test statistic to themean permutation test statistic

|Mobs − M̄p|

29 / 44

Here is a histogram of 50,000 permutation test statistics calculatedfrom 50,000 random permutations of the group assignments.

This is called the permutation distribution of the test statistic.

x

Pr(

X=

x)0.

0000

0.00

060.

0012

−724.7 709.5−500.0 −7.6 300.0

The mean permutation statistic is −7.6The observed statistic is 709.5The area outside the red lines is the p-value.

30 / 44

In this case there are 669 permutations that are as far or furtherfrom the mean than 709.5.

We did 50,000 permutations so the estimated p-value is

669/50000 = 0.01338

31 / 44

The number of possible permutations gets large fast as the samplesize increases.

Instead of using all possible permutations we typically create alarge number of random permutations and calculate the teststatistic for each of them.

This gives us a list of permutation test statistics from randompermutations of the group variable.

32 / 44

There is an R function perm.test() that will do the test for you.You can download the file “perm.test.q” the web site and thenSource it into R.

> source("perm.test.q")

Define variables for the results from the two groups.

> cntl = c(1620, 1743, 1397, 1503, 2339, 2090, 2231, 1225)

> tmt = c(3120, 2014, 1664, 2481, 2603, 2301)

And run the test.

> perm.test(tmt,cntl)

nperm = 10000 , number as extreme or more extreme = 128

p-value = 0.0128, 95% CI for the p-value = (0.010776, 0.015214)

Note the confidence interval on the p-value. The P-value is aproportion. perm.test() calculates a the Agresti-Coull confidenceinterval for the P-value.

33 / 44

Each time it is run you will get slightly different answers

> perm.test(tmt,cntl)



More permutations give a tighter confidence interval but thecalculation takes a little longer.

> perm.test.out <- perm.test(tmt,cntl,nperm=50000)



34 / 44

Summary: Two-sample non-parametric tests

We have covered two nonparametric tests for twoindependent samples:

• The two-sample Mann-Whitney based on ranks

• The permutation test based on the original data

Both require the assumptions

• Two populations

• Two independent simple random samples, one from eachpopulation.

• The variable has the same shape distribution in eachpopulation.

Why use one over the other?

• The permutation test can be based on any statistic you like

- difference in the medians- difference in the means- etc.

35 / 44

Fisher’s Exact Test for contingency tables.Example A SRS of 15 students who were treated for flu inFebruary at the university health service were asked whether or notthey had a flu shot since the previous September.

22 matching controls were found who had not had the flu inFebruary and were asked whether they had a flu shot since theprevious September.

The following table shows the data.

Flu No Flu Total

Flu shot? Yes 3 9 12No 12 13 25

Total 15 22 37

• 3 of 15 subjects (20.0%) who got the flu did have a flu shot

• 9 of 22 subjects (40.9%) who did not get the flu did have aflu shot.

36 / 44

We wish to test the null hypothesis that having had a flu shot andcontracting the flu are independent.

Typically we would use a χ2 test of independence. In R:

> x <- matrix(c(3,9,12,13),byrow=T,ncol=2,nrow=2)

> x

[,1] [,2]

[1,] 3 9

[2,] 12 13

> chisq.test(x)

Pearson’s Chi-squared test with Yates’ continuity correction

data: x

X-squared = 0.9531, df = 1, p-value = 0.3289

Warning message:

In chisq.test(x) : Chi-squared approximation may be incorrect

If you do this in R Commander the warning message appears in theMessages window at the bottom.

37 / 44

The warning is because the expected value of the upper left cell isbelow 5: (12× 15)/37 = 4.86

25% of the cells have an expected value below 5.(maximum allowed is 20% of cells)

Fisher’s exact test for 2× 2 tables is an alternative test to the χ2

test for independence when the assumptions for the χ2 test are notmet.

As in the χ2 test for independence the hypotheses are

H0 : the row and column variables are independent

HA : the row and column variables are not independent

The p-value is the probability of obtaining a table as extreme ormore extreme as the one that actually occurred.

38 / 44

The more extreme tables are those where there is larger disparitybetween the sample percent who got the flu given the same rowand column totals.

In the data table

• 3 of 15 subjects (20.0%) who got the flu did have a flu shot

• 9 of 22 subjects (40.9%) who did not get the flu did have aflu shot.

We can create more extreme tables by keeping the row and columntotals the same and decreasing the number of subjects with flu anda flu shot.

39 / 44

Those tables where the upper left entry is 2, 1, or 0 are moreextreme.

Flu No Flu TotalFlu shot? Yes 2 10 12

No 13 12 25Total 15 22 37

flu, shot := 2/15 = 13.3% noflu, noshot : 10/22 = 45.5%


No 14 11 25Total 15 22 37

flu, shot : 1/15 = 6.7% noflu, noshot : 11/22 = 50%


No 15 10 25Total 15 22 37

flu, shot : 0/15 = 0.0% noflu, noshot : 12/22 = 54.5%40 / 44

The p-value calculation is exact and is based on thehypergeometric distribution (we will not cover the details).

The function fisher.test() in R will compute the P-value.

> fisher.test(x)

Fisher’s Exact Test for Count Data

data: x

p-value = 0.2863

alternative hypothesis: true odds ratio is not equal to 1


0.05210989 1.97509758

sample estimates:

odds ratio

0.3710016

Note that Fisher’s test is based on the Odds Ratio.

There is no evidence against the null hypothesis.

Even though the percentages are quite different the sample size isnot sufficient to reject the null hypothesis.

Larger sample sizes are called for.41 / 44

Fisher’s exact test in R Commander

The odds ratio reported by the fisher.test() function is alsocalculated under the assumption that the row and column totalsare fixed. It does not have a simple formula.

Fisher’s exact test can be found in two places.

• Statistics → Contingency tables → Two-way table

• Statistics → Contingency tables → Enter and analyzetwo-way table.

In both cases the “Statistics” tab has an option for “Fisher’s exacttest” as well as the default “Chi-square test of independence”.

Note that R can do a Fisher’s exact test for tables up to about 4x4(they don’t have to be square) depending on how large the countsare. After that your computer will run out of space.

42 / 44

Summary: Nonparametric testsParametric Nonparametric Assumption for npar. test in

test tests addition to SRS

One samplet-test

One-sample signtest

None.

Wilcoxonsigned-ranktest

RV is continuous and symmetric.*

Two-samplet-test

Mann-Whitneytest (AKAWilcoxon ranksum)

Independent samples, variables arecontinuous, The distribution of thevariable in the two populations arethe same except for a shift.

Permutationtest

Same as Mann-Whitney.

χ2 test ofindepen-dence

Fisher’s exacttest

Fixed row and column totals.

* Since we have to assume symmetry we might as well use a t-test unlessn is very small. 43 / 44

Advantage of Non-parametric tests:

• Require fewer assumptions about the underlying distributionof the data.

Note that Fisher’s exact test has the additional assumption offixed row and column totals.

Disadvantage of Non-parametric tests:

• Usually Less powerful

Power is the ability of a test to reject the null hypothesiswhen it is false.

Making few assumptions means you lose power.

44 / 44

Documents

BMI 541/699 Lecture 17lindstro/17.nonparametric.tests.11.6.pdfParametric Nonparametric Assumptions for npar. test in test tests addition to SRS One-sample t-test One-sample sign test