Regression analysis

Regression analysis

Linear regression Logistic regression

2

Relationship and association

3

Straight line

95 95.5 96 96.5 97 97.5 98 98.5 9921.523

21.5235

21.524

21.5245

21.525

21.5255

21.526

21.5265

H ip (cm )

1 cm

-0.0008BM

I

XbbY 10

XBMI 0008.01000

)()(

12

121 XX

YYb

onintersecti0 b

HIPBMI 10 bb

4

Best straight line?

5

Best straight line!

90 92 94 96 98 100 102 104 106 10814

16

18

20

22

24

26

28

30

32

(X1,Y1)

11 YYe

N

iii YYe

1

2ˆ

Least square estimation

6

Simple linear regression

1. Is the association linear?

-3 -2 -1 0 1 2 3-4

-2

0

2

4

6

8

10

12

7


1. Is the association linear?2. Describe the

association: what is b0 and b1BMI = -12.6kg/m2+0.35kg/m3*Hip

21

XX

YYXXb

i

ii

nX

X i

XbYb 10

8


1. Is the association linear?2. Describe the association3. Is the slope significantly

different from 0?Help SPSS!!!

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant) -12,581 2,331 -5,396 ,000

Hip ,345 ,023 ,565 15,266 ,000

a. Dependent Variable: BMI

9


1. Is the association linear?2. Describe the association3. Is the slope significantly

different from 0?4. How good is the fit?

How far are the data points fom the line on avarage?

11

22

r

YYXX

YYXXr

ii

ii

10

The Correlation Coefficient, r

R = 0

R = 1

R = 0.7

R = -0.5

11

r2 – Goodness of fitHow much of the variation can be explained by the model?

R2 = 0

R2 = 1

R2 = 0.5

R2 = 0.2

12

Multiple linear regression

Could waist measure descirbe some of the variation in BMI?BMI =1.3 kg/m2 + 0.42 kg/m3 * WaistOr even better:

WSTHIPBMI 210 bbb

0.17WST0.25HIP12.2- BMI

13

Multiple linear regression

If Y is linearly dependent on more than one independent variable:

is the intercept, the value of Y when X1 and X2 = 01 and 2 are termed partial regression coefficients1 expresses the change of Y for one unit of X when 2 is kept constant

jjj XXY 2211

05

1015

2025

12

34

56

70

0.5

1

1.5

2

2.5

3

3.5

4

4.5

14

Multiple linear regression – residual error and estimations

As the collected data is not expected to fall in a plane an error term must be added

The error term sums up to be zero.

Estimating the dependent factor and the population parameters:

jjjj XXY 2211

05

1015

2025

12

34

56

70

0.5

1

1.5

2

2.5

3

3.5

4

4.5

jjj XbXbaY 2211ˆ

15

Multiple linear regression – general equations

In general an finite number (m) of independent variables may be used to estimate the hyperplane

The number of sample points must be two more than the number of variables

j

m

iijij XY

1

16

Multiple linear regression – co-liniarity

Adding age: adj R2 = 0.352

Adding thigh: adj R2 = 0.352?

Coefficientsa

Model

Unstandardized

Coefficients

Standardized

Coefficients

t Sig.

95,0% Confidence Interval

for B

B Std. Error Beta Lower Bound Upper Bound

1 (Constant) -9,001 2,449 -3,676 ,000 -13,813 -4,190

Waist ,168 ,043 ,201 3,923 ,000 ,084 ,252

Hip ,252 ,031 ,411 8,012 ,000 ,190 ,313

Age -,064 ,018 -,126 -3,492 ,001 -,101 -,028


Coefficientsa

Model

Unstandardized

Coefficients

Standardized

Coefficients

t Sig.

95,0% Confidence Interval

for B

B Std. Error Beta Lower Bound Upper Bound

1 (Constant) 3,581 1,784 2,007 ,045 ,075 7,086

Waist ,168 ,043 ,201 3,923 ,000 ,084 ,252

Age -,064 ,018 -,126 -3,492 ,001 -,101 -,028

Thigh ,252 ,031 ,411 8,012 ,000 ,190 ,313


17

Assumptions

1. Dependent variable must be metric continuous

2. Independent must be continuous or ordinal

3. Linear relationship between dependent and all independent variables

4. Residuals must have a constant spread.

5. Residuals are normal distributed6. Independent variables are not

perfectly correlated with each other

18

Multible linear regression in SPSS

19

Multible linear regression in SPSS

Non-parametric correlation

20

21

Ranked Correlation

Kendall’s Spearman’s rs

Correlation between -1 og 1. Where -1 indicates perfect inversse correlation , 0 indicates no

correlation, and 1 indicates perfect correlation

Pearson is the correlation method for normal dataRemember the assumptions:1. Dependent variable must be metric continuous2. Independent must be continuous or ordinal3. Linear relationship between dependent and all independent

variables4. Residuals must have a constant spread.5. Residuals are normal distributed

22

Kendall’s - An example

23

Kendall’s - An example

121

nnS QPS

24

Spearman – the same example

d2 1 4 9 1 1 1 9 9 1 16

0.68481010

52616

1 33

2

nnd

rs

25

Korrelation i SPSS

26

Korrelation i SPSS

Correlations

a b

a Pearson

Correlation

1 ,685*

Sig. (2-tailed) ,029

N 10 10

b Pearson

Correlation

,685* 1

Sig. (2-tailed) ,029

N 10 10

*. Correlation is significant at the 0.05 level (2-tailed).

Correlations

a b

Kendall's tau_b a Correlation

Coefficient

1,000 ,511*

Sig. (2-tailed) . ,040

N 10 10

b Correlation

Coefficient

,511* 1,000

Sig. (2-tailed) ,040 .

N 10 10

Spearman's rho a Correlation

Coefficient

1,000 ,685*

Sig. (2-tailed) . ,029

N 10 10

b Correlation

Coefficient

,685* 1,000

Sig. (2-tailed) ,029 .

N 10 10

*. Correlation is significant at the 0.05 level (2-tailed).

Logistic regression

27

28

Logistic Regression

• If the dependent variable is categorical and especially binary?

• Use some interpolation method

• Linear regression cannot help us.

29

The sigmodal curve

0 1 1

11 e

...

z

n n

p

z x x

-6 -4 -2 0 2 4 60

0.2

0.4

0.6

0.8

1

x

p

sigmodal curve

0 = 0; 1 = 1

30

The sigmodal curve

• The intercept basically just ‘scale’ the input variable

0 1 1

11 e

...

z

n n

p

z x x

-6 -4 -2 0 2 4 60

0.2

0.4

0.6

0.8

1

x

p

sigmodal curve

0 = 0; 1 = 1

0 = 2; 1 = 1

0 = -2; 1 = 1

31

The sigmodal curve

0 1 1

11 e

...

z

n n

p

z x x

-6 -4 -2 0 2 4 60

0.2

0.4

0.6

0.8

1

x

p

sigmodal curve

0 = 0; 1 = 1

0 = 0; 1 = 2

0 = 0; 1 = 0.5


• Large regression coefficient → risk factor strongly influences the probability

32

The sigmodal curve

0 1 1

11 e

...

z

n n

p

z x x

-6 -4 -2 0 2 4 60

0.2

0.4

0.6

0.8

1

x

p

sigmodal curve

0 = 0; 1 = 1

0 = 0; 1 = -1


• Large regression coefficient → risk factor strongly influences the probability

• Positive regression coefficient → risk factor increases the probability

• Logistic regession uses maximum likelihood estimation, not least square estimation

33

Does age influence the diagnosis? Continuous independent variable

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

95% C.I.for EXP(B)

Lower Upper

Step 1a Age ,109 ,010 108,745 1 ,000 1,115 1,092 1,138

Constant -4,213 ,423 99,097 1 ,000 ,015

a. Variable(s) entered on step 1: Age.

age1

1

10

BBze

p z

34

Does previous intake of OCP influence the diagnosis? Categorical independent variable



95% C.I.for EXP(B)

Lower Upper

Step 1a OCP(1) -,311 ,180 2,979 1 ,084 ,733 ,515 1,043

Constant ,233 ,123 3,583 1 ,058 1,263

a. Variable(s) entered on step 1: OCP.

OCP1

1

10

BBze

p z

0.48051

11

1)1( 1, OCP If

0.55801

11

1)1( 0, OCP If

311.0233.01

233.0

10

0

eeYp

eeYp

BB

B

35

Odds ratio

zeppo

1

0.7327 ratio odds 311.01010

0

10

eeeee BBBB

B

BB

36

Multiple logistic regression



95% C.I.for EXP(B)

Lower Upper

Step 1a Age ,123 ,011 115,343 1 ,000 1,131 1,106 1,157

BMI ,083 ,019 18,732 1 ,000 1,087 1,046 1,128

OCP ,528 ,219 5,808 1 ,016 1,695 1,104 2,603

Constant -6,974 ,762 83,777 1 ,000 ,001

a. Variable(s) entered on step 1: Age, BMI, OCP.

BMIageOCP1

1

3210

BBBBze

p z

37

Predicting the diagnosis by logistic regression

What is the probability that the tumor of a 50 year old woman who has been using OCP and has a BMI of 26 is malignant?

z = -6.974 + 0.123*50 + 0.083*26 + 0.28*1 = 1.6140p = 1/(1+e-1.6140) = 0.8340



95% C.I.for EXP(B)

Lower Upper

Step 1a Age ,123 ,011 115,343 1 ,000 1,131 1,106 1,157

BMI ,083 ,019 18,732 1 ,000 1,087 1,046 1,128

OCP ,528 ,219 5,808 1 ,016 1,695 1,104 2,603

Constant -6,974 ,762 83,777 1 ,000 ,001

a. Variable(s) entered on step 1: Age, BMI, OCP.

38

Logistic regression in SPSS

39

Logistic regression in SPSS

Documents

Regression analysis