22
Multiple and complex regression

Multiple and complex regression

  • Upload
    helmut

  • View
    44

  • Download
    0

Embed Size (px)

DESCRIPTION

Multiple and complex regression. Extensions of simple linear regression. Multiple regression models: predictor variables are continuous Analysis of variance: predictor variables are categorical (grouping variables), - PowerPoint PPT Presentation

Citation preview

Page 1: Multiple and complex regression

Multiple and complex regression

Page 2: Multiple and complex regression

Extensions of simple linear regression

• Multiple regression models: predictor variables are continuous

• Analysis of variance: predictor variables are categorical (grouping variables),

• But… general linear models can include both continuous and categorical predictors

Page 3: Multiple and complex regression
Page 4: Multiple and complex regression

Relative abundance of C3 and C4 plants • Paruelo & Lauenroth (1996)

• Geographic distribution and the effects of climate variables on the relative abundance of a number of plant functional types (PFTs): shrubs, forbs, succulents, C3 grasses and C4 grasses.

Page 5: Multiple and complex regression
Page 6: Multiple and complex regression

data

• Relative abundance of PTFs (based on cover, biomass, and primary production) for each site

• Longitude• Latitude• Mean annual temperature• Mean annual precipitation• Winter (%) precipitation• Summer (%) precipitation• Biomes (grassland , shrubland)

73 sites across temperate central North America

Response variable Predictor variables

Page 7: Multiple and complex regression

Relative abundance transformed ln(dat+1) because positively skewed

Histogram of C3

C3

Fre

quen

cy

0.0 0.2 0.4 0.6 0.8

05

1015

2025

30

Histogram of log_10_C3

log_10_C3

Fre

quen

cy

-2.0 -1.5 -1.0 -0.5 0.0

02

46

810

12

Histogram of log_C3

log_C3

Fre

quen

cy

-5 -4 -3 -2 -1 0

02

46

810

12

Histogram of SQRT_C3

SQRT_C3

Fre

quen

cy

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

12

Page 8: Multiple and complex regression

Collinearity

• Causes computational problems because it makes the determinant of the matrix of X-variables close to zero and matrix inversion basically involves dividing by the determinant (very sensitive to small differences in the numbers)

• Standard errors of the estimated regression slopes are inflated

Page 9: Multiple and complex regression

Detecting collinearlity

• Check tolerance values

• Plot the variables

• Examine a matrix of correlation coefficients between predictor variables

Page 10: Multiple and complex regression

Dealing with collinearity

• Omit predictor variables if they are highly correlated with other predictor variables that remain in the model

Page 11: Multiple and complex regression

Correlations

LAT

95 105 115 5 10 20 0.1 0.3 0.5

3040

50

9510

511

5LONG

MAP

200

600

1000

510

20

MAT

JJAMAP

0.1

0.3

0.5

30 40 50

0.1

0.3

0.5

200 600 1000 0.1 0.3 0.5

DJFMAP

Page 12: Multiple and complex regression

Correlations

1 .097 -.247* -.839** .074 -.065

. .416 .036 .000 .533 .584

73 73 73 73 73 73

.097 1 -.734** -.213 -.492** .771**

.416 . .000 .070 .000 .000

73 73 73 73 73 73

-.247* -.734** 1 .355** .112 -.405**

.036 .000 . .002 .344 .000

73 73 73 73 73 73

-.839** -.213 .355** 1 -.081 .001

.000 .070 .002 . .497 .990

73 73 73 73 73 73

.074 -.492** .112 -.081 1 -.792**

.533 .000 .344 .497 . .000

73 73 73 73 73 73

-.065 .771** -.405** .001 -.792** 1

.584 .000 .000 .990 .000 .

73 73 73 73 73 73

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

LAT

LONG

MAP

MAT

JJAMAP

DJFMAP

LAT LONG MAP MAT JJAMAP DJFMAP

Correlation is significant at the 0.05 level (2-tailed).*.

Correlation is significant at the 0.01 level (2-tailed).**.

Page 13: Multiple and complex regression

Coefficientsa

7.391 3.625 2.039 .045

-.191 .091 -3.095 -2.101 .039 .003 307.745

-.093 .035 -1.824 -2.659 .010 .015 66.784

.002 .001 4.323 2.572 .012 .002 400.939

(Constant)

LAT

LONG

LOXLA

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Tolerance VIF

Collinearity Statistics

Dependent Variable: LC3a.

Coefficientsa

-.553 .027 -20.131 .000

-.003 .004 -.051 -.597 .552 .980 1.020

.048 .006 .783 8.484 .000 .827 1.209

.002 .001 .238 2.572 .012 .820 1.220

(Constant)

LONRE

LATRE

RELALO

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Tolerance VIF

Collinearity Statistics

Dependent Variable: LC3a.

(lnC3)= βo+ β1(lat)+ β2(long)+ β3(latxlong)

After centering both lat and long

Page 14: Multiple and complex regression

Analysis of variance

Source of variation

SS df MS

Regression Σ(yhat-Y)2 p Σ(yhat-Y)2

p

Residual Σ(yobs-yhat)2 n-p-1 Σ(yobs-yhat)2

n-p-1

Total Σ(yobs-Y)2 n-1

Page 15: Multiple and complex regression

Matrix algebra approach to OLS estimation of multiple regression models

• Y=βX+ε

• X’Xb=XY

• b=(X’X) -1 (XY)

Page 16: Multiple and complex regression

Criteria for “best” fitting in multiple regression with p predictors.

Criterion Formula

r2

Adjusted r2

Akaike Information Criteria AIC

Akaike Information Criteria AIC

total

sidual

total

gression

SS

SS

SS

SSr ReRe2 1

)1()

11 2r

pn

n

1

2)]/[ln( Re pn

pnnSSn sidual

121))/(2ln(

22 )Re pn

pnnSS

nsidual

Page 17: Multiple and complex regression

Hierarchical partitioning and model selection

No pred

Model r2 Adjr2 P AIC (R)

1 Lon 0.0006 -0.013 0.84 30.15

1 Lat 0.47 0.46 >0.001 -16.16

2 Lon + Lat 0.48 0.46 >0.001 -15.25

3 Long +Lat +

Lon x Lat0.54 0.52 >0.001 -22.55

Page 18: Multiple and complex regression

R2=0.48

Longitude Latitude

C3

Model Lat + Long

Page 19: Multiple and complex regression

-15 -10 -5 0 5 10 150.0

0.2

0.4

0.6

0.8

1.0

-15-10

-5 0

5 10

15

cLONG

cLA

T

Y_h

ats.

long

lat

-15 -10 -5 0 5 10 15-0.2

0.0

0.2

0.4

0.6

0.8

1.0

-15-10

-5 0

5 10

15

cLONG

cLA

T

Y_h

ats.

long

xlat

-15 -10 -5 0 5 10 150.0

0.2

0.4

0.6

0.8

1.0

-15-10

-5 0

5 10

15

cLAT

cLO

NG

Y_h

ats.

long

lat

-15 -10 -5 0 5 10 15-0.2

0.0

0.2

0.4

0.6

0.8

1.0

-15-10

-5 0

5 10

15

cLAT

cLO

NG

Y_h

ats.

long

xlat

Page 20: Multiple and complex regression

95 100 105 110 115 120

0.0

0.2

0.4

0.6

0.8

1.0

C3 grasses in North America

Longitude

rela

tive

abun

danc

e

35 Lat

45 Lat

Model Lat * Long

Page 21: Multiple and complex regression

The final forward model selection is:

Step: AIC=-228.67SQRT_C3 ~ LAT + MAP + JJAMAP + DJFMAP

Df Sum of Sq RSS AIC<none> 2.7759 -228.67+ LONG 1 0.0209705 2.7549 -227.23+ MAT 1 0.0001829 2.7757 -226.68

Call:lm(formula = SQRT_C3 ~ LAT + MAP + JJAMAP + DJFMAP)

Coefficients:(Intercept) LAT MAP JJAMAP DJFMAP -0.7892663 0.0391180 0.0001538 -0.8573419 -0.7503936

Page 22: Multiple and complex regression

The final backward selection model is

Step: AIC=-229.32SQRT_C3 ~ LAT + JJAMAP + DJFMAP

Df Sum of Sq RSS AIC<none> 2.8279 -229.32- DJFMAP 1 0.26190 3.0898 -224.85- JJAMAP 1 0.31489 3.1428 -223.61- LAT 1 2.82772 5.6556 -180.72

Call:lm(formula = SQRT_C3 ~ LAT + JJAMAP + DJFMAP)

Coefficients:(Intercept) LAT JJAMAP DJFMAP -0.53148 0.03748 -1.02823 -1.05164