McNemar’s Test, Correlation, Regression
Arthur BergPennsylvania State University
McNemar’s Test Correlation Linear Regression
Tonsillectomy & Hodgkin’s Lymphoma
Tonsillectomy rates in the US in children under the age of 15.Year # per 10,0001965 1661986 122000 <2
Hodgkin’s Lymphoma linked with Tonsillectomy?
S. Johnson and R. Johnson, “Tonsillectomy history in Hodgkin’s disease”,NEJM (1972)
Hodgkin’s data
A study involved 85 patients with Hodgkin’s disease each of which had anormal sibling. Unpaired data:
Tonsillectomy No Tonsillectomy
Hodgkins 41 44Control 33 52
Arthur Berg McNemar’s Test, Correlation, Regression 2 / 22
McNemar’s Test Correlation Linear Regression
Tonsillectomy & Hodgkin’s Lymphoma
> mat <- matrix(c(41, 33, 44, 52), 2, 2)
> prop.table(mat, 1)
[,1] [,2]
[1,] 0.4823529 0.5176471
[2,] 0.3882353 0.6117647
> chisq.test(mat)
Pearson's Chi-squared test with Yates' continuity
correction
data: mat
X-squared = 1.1726, df = 1, p-value = 0.2789
Arthur Berg McNemar’s Test, Correlation, Regression 3 / 22
McNemar’s Test Correlation Linear Regression
Tonsillectomy & Hodgkin’s Lymphoma
Hodgkin’s data
paired data:Sibling
Tonsillectomy No Tonsillectomy
PatientTonsillectomy 37 7
No Tonsillectomy 15 26
> mat <- matrix(c(37, 15, 7, 26), 2, 2)
> mcnemar.test(mat)
McNemar's Chi-squared test with continuity correction
data: mat
McNemar's chi-squared = 2.2273, df = 1, p-value =
0.1356
Arthur Berg McNemar’s Test, Correlation, Regression 4 / 22
McNemar’s Test Correlation Linear Regression
Insulin Sensitivity (Y) vs. C20-22 Fatty Acids (X)
Arthur Berg McNemar’s Test, Correlation, Regression 5 / 22
McNemar’s Test Correlation Linear Regression
> y <- c(250, 220, 145, 115, 230, 200, 330, 400,
370, 260, 270, 530, 375)
> x <- c(17.9, 18.3, 18.3, 18.4, 18.4, 20.2, 20.3,
21.8, 21.9, 22.1, 23.1, 24.2, 24.4)
> plot(x, y, pch = 16, cex = 2)
●
●
●
●
●
●
●
●
●
● ●
●
●
18 19 20 21 22 23 24
100
200
300
400
500
x
y
Arthur Berg McNemar’s Test, Correlation, Regression 6 / 22
McNemar’s Test Correlation Linear Regression
> cor(x, y)
[1] 0.7700025
> cor.test(x, y)
Pearson's product-moment correlation
data: x and y
t = 4.0026, df = 11, p-value = 0.002077
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.3804100 0.9274906
sample estimates:
cor
0.7700025
Arthur Berg McNemar’s Test, Correlation, Regression 7 / 22
McNemar’s Test Correlation Linear Regression
CI and p-value
I The CI is not symmetrical.
I CI interpretation: Assuming the data were randomly sampled from alarger population, there is a 95% chance that this range includes thepopulation correlation coefficient.
I p-value interpretation: If the null hypothesis was true, what is thechance that 13 randomly picked subjects would have an r greater than.77 or less than -.77?
Arthur Berg McNemar’s Test, Correlation, Regression 8 / 22
McNemar’s Test Correlation Linear Regression
The Pearson’s correlation coefficient is not robust
> y2 <- y
> y2[12] <- 30
> plot(x, y2, pch = 16, cex = 2)
●
●
●
●
●
●
●
●
●
●●
●
●
18 19 20 21 22 23 24
100
200
300
400
x
y2
Arthur Berg McNemar’s Test, Correlation, Regression 9 / 22
McNemar’s Test Correlation Linear Regression
> cor.test(x, y2)
Pearson's product-moment correlation
data: x and y2
t = 0.8235, df = 11, p-value = 0.4277
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.3574666 0.6991377
sample estimates:
cor
0.2409823
Arthur Berg McNemar’s Test, Correlation, Regression 10 / 22
McNemar’s Test Correlation Linear Regression
Correlation vs Causation
I The lipid content of the membranes determines insulin sensitivity.
I The insulin sensitivity of the membranes somehow affects lipid content.
I Both insulin sensitivity and lipid content are under the control of someother factor, perhaps a hormone.
I Lipid content, insulin sensitivity, and other factors are all part of acomplex molecular/biochemical/physiological network, perhaps withpositive and/or negative feedback components. In this case, theobserved correlation is just a peek at a much more complicated set ofrelationships.
I The two variables don’t correlate in the population at all, and theobserved correlation in this sample was a coincidence.
Arthur Berg McNemar’s Test, Correlation, Regression 11 / 22
McNemar’s Test Correlation Linear Regression
Assumptions
I Random and independent data: (Xi ,Yi)
I X and Y are paired
I Both X and Y are stochastic–not experimentally controlled
I Normally distributed
I No outliers
I Linear relationship
Arthur Berg McNemar’s Test, Correlation, Regression 12 / 22
McNemar’s Test Correlation Linear Regression
R2
R2 is the fraction of the variance shared between the two variables.
.772 = .59
I 59% of the variability in insulin tolerance is associated with variability inlipid content.
I 59% of the variability in lipid content is associated with variability ininsulin tolerance.
I Knowing the lipid content of the membranes lets you explain 59% of thevariance in the insulin sensitivity.
I 41% of the variance is explained by other factors.
Arthur Berg McNemar’s Test, Correlation, Regression 13 / 22
McNemar’s Test Correlation Linear Regression
Arthur Berg McNemar’s Test, Correlation, Regression 14 / 22
McNemar’s Test Correlation Linear Regression
> summary(lm(y ~ x))
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-102.96 -65.05 25.64 61.23 116.11
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -486.542 193.716 -2.512 0.02890 *
x 37.208 9.296 4.003 0.00208 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 75.9 on 11 degrees of freedom
Multiple R-squared: 0.5929, Adjusted R-squared: 0.5559
F-statistic: 16.02 on 1 and 11 DF, p-value: 0.002077Arthur Berg McNemar’s Test, Correlation, Regression 15 / 22
McNemar’s Test Correlation Linear Regression
Arthur Berg McNemar’s Test, Correlation, Regression 16 / 22
McNemar’s Test Correlation Linear Regression
> fit <- lm(y ~ x)
> plot(x, y, pch = 16, cex = 2)
> abline(fit, lwd = 3)
●
●
●
●
●
●
●
●
●
● ●
●
●
18 19 20 21 22 23 24
100
200
300
400
500
x
y
Arthur Berg McNemar’s Test, Correlation, Regression 17 / 22
McNemar’s Test Correlation Linear Regression
> plot(fit, 1)
200 250 300 350 400
−10
0−
500
5010
0
Fitted values
Res
idua
ls
●
●
●
●
●
●
●
●
●
●
●
●
●
Residuals vs Fitted
12
11
4
Arthur Berg McNemar’s Test, Correlation, Regression 18 / 22
McNemar’s Test Correlation Linear Regression
Arthur Berg McNemar’s Test, Correlation, Regression 19 / 22
McNemar’s Test Correlation Linear Regression
Things to Look Out For
I look at the residuals
I make sure you have a well-defined response variable
I consider the use of weighted regression
I be mindful of spurious regression
I be cautious of extrapolating beyond your data
I delineate statistical significance from scientific or practical significance
Arthur Berg McNemar’s Test, Correlation, Regression 20 / 22
McNemar’s Test Correlation Linear Regression
Arthur Berg McNemar’s Test, Correlation, Regression 21 / 22
McNemar’s Test Correlation Linear Regression
Arthur Berg McNemar’s Test, Correlation, Regression 22 / 22