22
McNemar’s Test, Correlation, Regression Arthur Berg Pennsylvania State University

McNemar's Test, Correlation, Regression · McNemar’s TestCorrelationLinear Regression Tonsillectomy & Hodgkin’s Lymphoma Tonsillectomy rates in the US in children under the age

  • Upload
    hacong

  • View
    224

  • Download
    0

Embed Size (px)

Citation preview

Page 1: McNemar's Test, Correlation, Regression · McNemar’s TestCorrelationLinear Regression Tonsillectomy & Hodgkin’s Lymphoma Tonsillectomy rates in the US in children under the age

McNemar’s Test, Correlation, Regression

Arthur BergPennsylvania State University

Page 2: McNemar's Test, Correlation, Regression · McNemar’s TestCorrelationLinear Regression Tonsillectomy & Hodgkin’s Lymphoma Tonsillectomy rates in the US in children under the age

McNemar’s Test Correlation Linear Regression

Tonsillectomy & Hodgkin’s Lymphoma

Tonsillectomy rates in the US in children under the age of 15.Year # per 10,0001965 1661986 122000 <2

Hodgkin’s Lymphoma linked with Tonsillectomy?

S. Johnson and R. Johnson, “Tonsillectomy history in Hodgkin’s disease”,NEJM (1972)

Hodgkin’s data

A study involved 85 patients with Hodgkin’s disease each of which had anormal sibling. Unpaired data:

Tonsillectomy No Tonsillectomy

Hodgkins 41 44Control 33 52

Arthur Berg McNemar’s Test, Correlation, Regression 2 / 22

Page 3: McNemar's Test, Correlation, Regression · McNemar’s TestCorrelationLinear Regression Tonsillectomy & Hodgkin’s Lymphoma Tonsillectomy rates in the US in children under the age

McNemar’s Test Correlation Linear Regression

Tonsillectomy & Hodgkin’s Lymphoma

> mat <- matrix(c(41, 33, 44, 52), 2, 2)

> prop.table(mat, 1)

[,1] [,2]

[1,] 0.4823529 0.5176471

[2,] 0.3882353 0.6117647

> chisq.test(mat)

Pearson's Chi-squared test with Yates' continuity

correction

data: mat

X-squared = 1.1726, df = 1, p-value = 0.2789

Arthur Berg McNemar’s Test, Correlation, Regression 3 / 22

Page 4: McNemar's Test, Correlation, Regression · McNemar’s TestCorrelationLinear Regression Tonsillectomy & Hodgkin’s Lymphoma Tonsillectomy rates in the US in children under the age

McNemar’s Test Correlation Linear Regression

Tonsillectomy & Hodgkin’s Lymphoma

Hodgkin’s data

paired data:Sibling

Tonsillectomy No Tonsillectomy

PatientTonsillectomy 37 7

No Tonsillectomy 15 26

> mat <- matrix(c(37, 15, 7, 26), 2, 2)

> mcnemar.test(mat)

McNemar's Chi-squared test with continuity correction

data: mat

McNemar's chi-squared = 2.2273, df = 1, p-value =

0.1356

Arthur Berg McNemar’s Test, Correlation, Regression 4 / 22

Page 5: McNemar's Test, Correlation, Regression · McNemar’s TestCorrelationLinear Regression Tonsillectomy & Hodgkin’s Lymphoma Tonsillectomy rates in the US in children under the age

McNemar’s Test Correlation Linear Regression

Insulin Sensitivity (Y) vs. C20-22 Fatty Acids (X)

Arthur Berg McNemar’s Test, Correlation, Regression 5 / 22

Page 6: McNemar's Test, Correlation, Regression · McNemar’s TestCorrelationLinear Regression Tonsillectomy & Hodgkin’s Lymphoma Tonsillectomy rates in the US in children under the age

McNemar’s Test Correlation Linear Regression

> y <- c(250, 220, 145, 115, 230, 200, 330, 400,

370, 260, 270, 530, 375)

> x <- c(17.9, 18.3, 18.3, 18.4, 18.4, 20.2, 20.3,

21.8, 21.9, 22.1, 23.1, 24.2, 24.4)

> plot(x, y, pch = 16, cex = 2)

● ●

18 19 20 21 22 23 24

100

200

300

400

500

x

y

Arthur Berg McNemar’s Test, Correlation, Regression 6 / 22

Page 7: McNemar's Test, Correlation, Regression · McNemar’s TestCorrelationLinear Regression Tonsillectomy & Hodgkin’s Lymphoma Tonsillectomy rates in the US in children under the age

McNemar’s Test Correlation Linear Regression

> cor(x, y)

[1] 0.7700025

> cor.test(x, y)

Pearson's product-moment correlation

data: x and y

t = 4.0026, df = 11, p-value = 0.002077

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

0.3804100 0.9274906

sample estimates:

cor

0.7700025

Arthur Berg McNemar’s Test, Correlation, Regression 7 / 22

Page 8: McNemar's Test, Correlation, Regression · McNemar’s TestCorrelationLinear Regression Tonsillectomy & Hodgkin’s Lymphoma Tonsillectomy rates in the US in children under the age

McNemar’s Test Correlation Linear Regression

CI and p-value

I The CI is not symmetrical.

I CI interpretation: Assuming the data were randomly sampled from alarger population, there is a 95% chance that this range includes thepopulation correlation coefficient.

I p-value interpretation: If the null hypothesis was true, what is thechance that 13 randomly picked subjects would have an r greater than.77 or less than -.77?

Arthur Berg McNemar’s Test, Correlation, Regression 8 / 22

Page 9: McNemar's Test, Correlation, Regression · McNemar’s TestCorrelationLinear Regression Tonsillectomy & Hodgkin’s Lymphoma Tonsillectomy rates in the US in children under the age

McNemar’s Test Correlation Linear Regression

The Pearson’s correlation coefficient is not robust

> y2 <- y

> y2[12] <- 30

> plot(x, y2, pch = 16, cex = 2)

●●

18 19 20 21 22 23 24

100

200

300

400

x

y2

Arthur Berg McNemar’s Test, Correlation, Regression 9 / 22

Page 10: McNemar's Test, Correlation, Regression · McNemar’s TestCorrelationLinear Regression Tonsillectomy & Hodgkin’s Lymphoma Tonsillectomy rates in the US in children under the age

McNemar’s Test Correlation Linear Regression

> cor.test(x, y2)

Pearson's product-moment correlation

data: x and y2

t = 0.8235, df = 11, p-value = 0.4277

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

-0.3574666 0.6991377

sample estimates:

cor

0.2409823

Arthur Berg McNemar’s Test, Correlation, Regression 10 / 22

Page 11: McNemar's Test, Correlation, Regression · McNemar’s TestCorrelationLinear Regression Tonsillectomy & Hodgkin’s Lymphoma Tonsillectomy rates in the US in children under the age

McNemar’s Test Correlation Linear Regression

Correlation vs Causation

I The lipid content of the membranes determines insulin sensitivity.

I The insulin sensitivity of the membranes somehow affects lipid content.

I Both insulin sensitivity and lipid content are under the control of someother factor, perhaps a hormone.

I Lipid content, insulin sensitivity, and other factors are all part of acomplex molecular/biochemical/physiological network, perhaps withpositive and/or negative feedback components. In this case, theobserved correlation is just a peek at a much more complicated set ofrelationships.

I The two variables don’t correlate in the population at all, and theobserved correlation in this sample was a coincidence.

Arthur Berg McNemar’s Test, Correlation, Regression 11 / 22

Page 12: McNemar's Test, Correlation, Regression · McNemar’s TestCorrelationLinear Regression Tonsillectomy & Hodgkin’s Lymphoma Tonsillectomy rates in the US in children under the age

McNemar’s Test Correlation Linear Regression

Assumptions

I Random and independent data: (Xi ,Yi)

I X and Y are paired

I Both X and Y are stochastic–not experimentally controlled

I Normally distributed

I No outliers

I Linear relationship

Arthur Berg McNemar’s Test, Correlation, Regression 12 / 22

Page 13: McNemar's Test, Correlation, Regression · McNemar’s TestCorrelationLinear Regression Tonsillectomy & Hodgkin’s Lymphoma Tonsillectomy rates in the US in children under the age

McNemar’s Test Correlation Linear Regression

R2

R2 is the fraction of the variance shared between the two variables.

.772 = .59

I 59% of the variability in insulin tolerance is associated with variability inlipid content.

I 59% of the variability in lipid content is associated with variability ininsulin tolerance.

I Knowing the lipid content of the membranes lets you explain 59% of thevariance in the insulin sensitivity.

I 41% of the variance is explained by other factors.

Arthur Berg McNemar’s Test, Correlation, Regression 13 / 22

Page 14: McNemar's Test, Correlation, Regression · McNemar’s TestCorrelationLinear Regression Tonsillectomy & Hodgkin’s Lymphoma Tonsillectomy rates in the US in children under the age

McNemar’s Test Correlation Linear Regression

Arthur Berg McNemar’s Test, Correlation, Regression 14 / 22

Page 15: McNemar's Test, Correlation, Regression · McNemar’s TestCorrelationLinear Regression Tonsillectomy & Hodgkin’s Lymphoma Tonsillectomy rates in the US in children under the age

McNemar’s Test Correlation Linear Regression

> summary(lm(y ~ x))

Call:

lm(formula = y ~ x)

Residuals:

Min 1Q Median 3Q Max

-102.96 -65.05 25.64 61.23 116.11

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -486.542 193.716 -2.512 0.02890 *

x 37.208 9.296 4.003 0.00208 **

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 75.9 on 11 degrees of freedom

Multiple R-squared: 0.5929, Adjusted R-squared: 0.5559

F-statistic: 16.02 on 1 and 11 DF, p-value: 0.002077Arthur Berg McNemar’s Test, Correlation, Regression 15 / 22

Page 16: McNemar's Test, Correlation, Regression · McNemar’s TestCorrelationLinear Regression Tonsillectomy & Hodgkin’s Lymphoma Tonsillectomy rates in the US in children under the age

McNemar’s Test Correlation Linear Regression

Arthur Berg McNemar’s Test, Correlation, Regression 16 / 22

Page 17: McNemar's Test, Correlation, Regression · McNemar’s TestCorrelationLinear Regression Tonsillectomy & Hodgkin’s Lymphoma Tonsillectomy rates in the US in children under the age

McNemar’s Test Correlation Linear Regression

> fit <- lm(y ~ x)

> plot(x, y, pch = 16, cex = 2)

> abline(fit, lwd = 3)

● ●

18 19 20 21 22 23 24

100

200

300

400

500

x

y

Arthur Berg McNemar’s Test, Correlation, Regression 17 / 22

Page 18: McNemar's Test, Correlation, Regression · McNemar’s TestCorrelationLinear Regression Tonsillectomy & Hodgkin’s Lymphoma Tonsillectomy rates in the US in children under the age

McNemar’s Test Correlation Linear Regression

> plot(fit, 1)

200 250 300 350 400

−10

0−

500

5010

0

Fitted values

Res

idua

ls

Residuals vs Fitted

12

11

4

Arthur Berg McNemar’s Test, Correlation, Regression 18 / 22

Page 19: McNemar's Test, Correlation, Regression · McNemar’s TestCorrelationLinear Regression Tonsillectomy & Hodgkin’s Lymphoma Tonsillectomy rates in the US in children under the age

McNemar’s Test Correlation Linear Regression

Arthur Berg McNemar’s Test, Correlation, Regression 19 / 22

Page 20: McNemar's Test, Correlation, Regression · McNemar’s TestCorrelationLinear Regression Tonsillectomy & Hodgkin’s Lymphoma Tonsillectomy rates in the US in children under the age

McNemar’s Test Correlation Linear Regression

Things to Look Out For

I look at the residuals

I make sure you have a well-defined response variable

I consider the use of weighted regression

I be mindful of spurious regression

I be cautious of extrapolating beyond your data

I delineate statistical significance from scientific or practical significance

Arthur Berg McNemar’s Test, Correlation, Regression 20 / 22

Page 21: McNemar's Test, Correlation, Regression · McNemar’s TestCorrelationLinear Regression Tonsillectomy & Hodgkin’s Lymphoma Tonsillectomy rates in the US in children under the age

McNemar’s Test Correlation Linear Regression

Arthur Berg McNemar’s Test, Correlation, Regression 21 / 22

Page 22: McNemar's Test, Correlation, Regression · McNemar’s TestCorrelationLinear Regression Tonsillectomy & Hodgkin’s Lymphoma Tonsillectomy rates in the US in children under the age

McNemar’s Test Correlation Linear Regression

Arthur Berg McNemar’s Test, Correlation, Regression 22 / 22