18
Multiple Regression SPH 247 Statistical Analysis of Laboratory Data 1 April 23, 2010 SPH 247 Statistical Analysis of Laboratory Data

SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data

  • View
    217

  • Download
    1

Embed Size (px)

Citation preview

Page 1: SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data

SPH 247 Statistical Analysis of Laboratory Data 1

Multiple RegressionSPH 247

Statistical Analysis of Laboratory Data

April 23, 2010

Page 2: SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data

SPH 247 Statistical Analysis of Laboratory Data 2

Cystic Fibrosis DataCystic fibrosis lung function data

lung function data for cystic fibrosis patients (7-23 years old)

age a numeric vector. Age in years. sex a numeric vector code. 0: male, 1:female. height a numeric vector. Height (cm). weight a numeric vector. Weight (kg). bmp a numeric vector. Body mass (% of normal). fev1 a numeric vector. Forced expiratory volume. rv a numeric vector. Residual volume. frc a numeric vector. Functional residual

capacity. tlc a numeric vector. Total lung capacity. pemax a numeric vector. Maximum expiratory

pressure.April 23, 2010

Page 3: SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data

SPH 247 Statistical Analysis of Laboratory Data 3April 23, 2010

cf <- read.csv("cystfibr.csv")pairs(cf)attach(cf)cf.lm <- lm(pemax ~ age+sex+height+weight+bmp+fev1+rv+frc+tlc)print(summary(cf.lm))print(anova(cf.lm))print(drop1(cf.lm,test="F"))plot(cf.lm)step(cf.lm)detach(cf)

Page 4: SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data

SPH 247 Statistical Analysis of Laboratory Data 4April 23, 2010

Page 5: SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data

SPH 247 Statistical Analysis of Laboratory Data 5April 23, 2010

> source("cystfibr.r")> cf.lm <- lm(pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc)> print(summary(cf.lm))…

Coefficients: Estimate Std. Error t value Pr(>|t|)(Intercept) 176.0582 225.8912 0.779 0.448age -2.5420 4.8017 -0.529 0.604sex -3.7368 15.4598 -0.242 0.812height -0.4463 0.9034 -0.494 0.628weight 2.9928 2.0080 1.490 0.157bmp -1.7449 1.1552 -1.510 0.152fev1 1.0807 1.0809 1.000 0.333rv 0.1970 0.1962 1.004 0.331frc -0.3084 0.4924 -0.626 0.540tlc 0.1886 0.4997 0.377 0.711

Residual standard error: 25.47 on 15 degrees of freedomMultiple R-Squared: 0.6373, Adjusted R-squared: 0.4197 F-statistic: 2.929 on 9 and 15 DF, p-value: 0.03195

Page 6: SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data

SPH 247 Statistical Analysis of Laboratory Data 6April 23, 2010

> print(anova(cf.lm))Analysis of Variance Table

Response: pemax Df Sum Sq Mean Sq F value Pr(>F) age 1 10098.5 10098.5 15.5661 0.001296 **sex 1 955.4 955.4 1.4727 0.243680 height 1 155.0 155.0 0.2389 0.632089 weight 1 632.3 632.3 0.9747 0.339170 bmp 1 2862.2 2862.2 4.4119 0.053010 . fev1 1 1549.1 1549.1 2.3878 0.143120 rv 1 561.9 561.9 0.8662 0.366757 frc 1 194.6 194.6 0.2999 0.592007 tlc 1 92.4 92.4 0.1424 0.711160 Residuals 15 9731.2 648.7 ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Performs sequential ANOVA

Page 7: SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data

SPH 247 Statistical Analysis of Laboratory Data 7April 23, 2010

> print(drop1(cf.lm, test = "F"))

Single term deletions

Model:pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc Df Sum of Sq RSS AIC F value Pr(F)<none> 9731.2 169.1 age 1 181.8 9913.1 167.6 0.2803 0.6043sex 1 37.9 9769.2 167.2 0.0584 0.8123height 1 158.3 9889.6 167.5 0.2440 0.6285weight 1 1441.2 11172.5 170.6 2.2215 0.1568bmp 1 1480.1 11211.4 170.6 2.2815 0.1517fev1 1 648.4 10379.7 168.7 0.9995 0.3333rv 1 653.8 10385.0 168.7 1.0077 0.3314frc 1 254.6 9985.8 167.8 0.3924 0.5405tlc 1 92.4 9823.7 167.3 0.1424 0.7112

Performs Type III ANOVA

Page 8: SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data

SPH 247 Statistical Analysis of Laboratory Data 8April 23, 2010

80 100 120 140 160

-40

-20

02

04

0

Fitted values

Re

sid

ua

ls

lm(pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc)

Residuals vs Fitted

21

24

16

Page 9: SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data

SPH 247 Statistical Analysis of Laboratory Data 9April 23, 2010

-2 -1 0 1 2

-10

12

Theoretical Quantiles

Sta

nd

ard

ize

d r

esi

du

als

lm(pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc)

Normal Q-Q

24 14

16

Page 10: SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data

SPH 247 Statistical Analysis of Laboratory Data 10April 23, 2010

80 100 120 140 160

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Fitted values

Sta

nda

rdiz

ed

re

sid

uals

lm(pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc)

Scale-Location

241416

Page 11: SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data

SPH 247 Statistical Analysis of Laboratory Data 11April 23, 2010

0.0 0.1 0.2 0.3 0.4 0.5 0.6

-2-1

01

2

Leverage

Sta

nd

ard

ize

d r

esi

du

als

lm(pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc)

Cook's distance

0.5

0.5

Residuals vs Leverage

1424

16

Page 12: SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data

SPH 247 Statistical Analysis of Laboratory Data 12April 23, 2010

> step(cf.lm)Start: AIC=169.11pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc

Df Sum of Sq RSS AIC- sex 1 37.9 9769.2 167.2- tlc 1 92.4 9823.7 167.3- height 1 158.3 9889.6 167.5- age 1 181.8 9913.1 167.6- frc 1 254.6 9985.8 167.8- fev1 1 648.4 10379.7 168.7- rv 1 653.8 10385.0 168.7<none> 9731.2 169.1- weight 1 1441.2 11172.5 170.6- bmp 1 1480.1 11211.4 170.6

Step: AIC=167.2pemax ~ age + height + weight + bmp + fev1 + rv + frc + tlc

……………

Page 13: SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data

SPH 247 Statistical Analysis of Laboratory Data 13April 23, 2010

Step: AIC=160.66pemax ~ weight + bmp + fev1 + rv

Df Sum of Sq RSS AIC<none> 10354.6 160.7- rv 1 1183.6 11538.2 161.4- bmp 1 3072.6 13427.2 165.2- fev1 1 3717.1 14071.7 166.3- weight 1 10930.2 21284.8 176.7

Call:lm(formula = pemax ~ weight + bmp + fev1 + rv)

Coefficients:(Intercept) weight bmp fev1 rv 63.9467 1.7489 -1.3772 1.5477 0.1257

Page 14: SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data

SPH 247 Statistical Analysis of Laboratory Data 14April 23, 2010

> cf.lm2 <- lm(pemax ~ rv+bmp+fev1+weight)> summary(cf.lm2)

Call:lm(formula = pemax ~ rv + bmp + fev1 + weight)

Residuals: Min 1Q Median 3Q Max -39.77 -11.74 4.33 15.66 35.07

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 63.94669 53.27673 1.200 0.244057 rv 0.12572 0.08315 1.512 0.146178 bmp -1.37724 0.56534 -2.436 0.024322 * fev1 1.54770 0.57761 2.679 0.014410 * weight 1.74891 0.38063 4.595 0.000175 ***---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 22.75 on 20 degrees of freedomMultiple R-Squared: 0.6141, Adjusted R-squared: 0.5369 F-statistic: 7.957 on 4 and 20 DF, p-value: 0.000523

Page 15: SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data

SPH 247 Statistical Analysis of Laboratory Data 15

Cautionary NotesThe significance levels are not necessarily

believable after variable selectionThe original full model F-statistic is

significant, indicating that there is some significant relationship: F(9,15) = 2.93, p = 0.0320

After variable selection, F(3,21) = 9.28, p = 0.0004, which is biased.

April 23, 2010

Page 16: SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data

SPH 247 Statistical Analysis of Laboratory Data 16April 23, 2010

set obs 25generate x1 = invnormal(uniform())generate x2 = invnormal(uniform())generate x3 = invnormal(uniform())generate x4 = invnormal(uniform())generate x5 = invnormal(uniform())generate x6 = invnormal(uniform())generate x7 = invnormal(uniform())generate x8 = invnormal(uniform())generate x9 = invnormal(uniform())generate y = invnormal(uniform())regress y x1 x2 x3 x4 x5 x6 x7 x8 x9stepwise, pr(.1): regress y x1 x2 x3 x4 x5 x6 x7 x8 x9

Page 17: SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data

SPH 247 Statistical Analysis of Laboratory Data 17April 23, 2010

. regress y x1 x2 x3 x4 x5 x6 x7 x8 x9

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 9, 15) = 0.91 Model | 12.3235639 9 1.36928488 Prob > F = 0.5397 Residual | 22.5105993 15 1.50070662 R-squared = 0.3538-------------+------------------------------ Adj R-squared = -0.0340 Total | 34.8341632 24 1.45142347 Root MSE = 1.225

------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x1 | -.0441858 .2998066 -0.15 0.885 -.6832085 .594837 x2 | -.9078136 .4347798 -2.09 0.054 -1.834525 .0188976 x3 | .2076754 .3789522 0.55 0.592 -.6000421 1.015393 x4 | -.0056383 .3319125 -0.02 0.987 -.7130931 .7018166 x5 | -.330546 .3854497 -0.86 0.405 -1.152113 .4910207 x6 | .0202964 .3470704 0.06 0.954 -.7194666 .7600594 x7 | -.073401 .3135234 -0.23 0.818 -.7416603 .5948583 x8 | -.0552909 .3026913 -0.18 0.858 -.7004621 .5898803 x9 | -.3190092 .3137931 -1.02 0.325 -.9878434 .349825 _cons | -.2490392 .3078424 -0.81 0.431 -.9051898 .4071113------------------------------------------------------------------------------

Page 18: SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data

SPH 247 Statistical Analysis of Laboratory Data 18April 23, 2010

. stepwise, pr(.1): regress y x1 x2 x3 x4 x5 x6 x7 x8 x9 begin with full modelp = 0.9867 >= 0.1000 removing x4p = 0.9545 >= 0.1000 removing x6p = 0.8456 >= 0.1000 removing x1p = 0.8165 >= 0.1000 removing x7p = 0.7506 >= 0.1000 removing x8p = 0.5023 >= 0.1000 removing x3p = 0.2866 >= 0.1000 removing x5p = 0.2081 >= 0.1000 removing x9

Source | SS df MS Number of obs = 25-------------+------------------------------ F( 1, 23) = 7.23 Model | 8.33379862 1 8.33379862 Prob > F = 0.0131 Residual | 26.5003646 23 1.15218977 R-squared = 0.2392-------------+------------------------------ Adj R-squared = 0.2062 Total | 34.8341632 24 1.45142347 Root MSE = 1.0734

------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x2 | -.6644002 .2470417 -2.69 0.013 -1.175445 -.1533555 _cons | -.1523124 .214703 -0.71 0.485 -.5964594 .2918346------------------------------------------------------------------------------