Upload
jersey
View
25
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Regression analysis. Linear regression Logistic regression. Relationship and association. Straight line. Best straight line?. Best straight line!. Least square estimation. Simple linear regression. Is the association linear?. Simple linear regression. Is the association linear? - PowerPoint PPT Presentation
Citation preview
Regression analysis
Linear regression Logistic regression
2
Relationship and association
3
Straight line
95 95.5 96 96.5 97 97.5 98 98.5 9921.523
21.5235
21.524
21.5245
21.525
21.5255
21.526
21.5265
H ip (cm )
1 cm
-0.0008BM
I
XbbY 10
XBMI 0008.01000
)()(
12
121 XX
YYb
onintersecti0 b
HIPBMI 10 bb
4
Best straight line?
5
Best straight line!
90 92 94 96 98 100 102 104 106 10814
16
18
20
22
24
26
28
30
32
(X1,Y1)
11 YYe
N
iii YYe
1
2ˆ
Least square estimation
6
Simple linear regression
1. Is the association linear?
-3 -2 -1 0 1 2 3-4
-2
0
2
4
6
8
10
12
7
Simple linear regression
1. Is the association linear?2. Describe the
association: what is b0 and b1BMI = -12.6kg/m2+0.35kg/m3*Hip
21
XX
YYXXb
i
ii
nX
X i
XbYb 10
8
Simple linear regression
1. Is the association linear?2. Describe the association3. Is the slope significantly
different from 0?Help SPSS!!!
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig.B Std. Error Beta
1 (Constant) -12,581 2,331 -5,396 ,000
Hip ,345 ,023 ,565 15,266 ,000
a. Dependent Variable: BMI
9
Simple linear regression
1. Is the association linear?2. Describe the association3. Is the slope significantly
different from 0?4. How good is the fit?
How far are the data points fom the line on avarage?
11
22
r
YYXX
YYXXr
ii
ii
10
The Correlation Coefficient, r
R = 0
R = 1
R = 0.7
R = -0.5
11
r2 – Goodness of fitHow much of the variation can be explained by the model?
R2 = 0
R2 = 1
R2 = 0.5
R2 = 0.2
12
Multiple linear regression
Could waist measure descirbe some of the variation in BMI?BMI =1.3 kg/m2 + 0.42 kg/m3 * WaistOr even better:
WSTHIPBMI 210 bbb
0.17WST0.25HIP12.2- BMI
13
Multiple linear regression
If Y is linearly dependent on more than one independent variable:
is the intercept, the value of Y when X1 and X2 = 01 and 2 are termed partial regression coefficients1 expresses the change of Y for one unit of X when 2 is kept constant
jjj XXY 2211
05
1015
2025
12
34
56
70
0.5
1
1.5
2
2.5
3
3.5
4
4.5
14
Multiple linear regression – residual error and estimations
As the collected data is not expected to fall in a plane an error term must be added
The error term sums up to be zero.
Estimating the dependent factor and the population parameters:
jjjj XXY 2211
05
1015
2025
12
34
56
70
0.5
1
1.5
2
2.5
3
3.5
4
4.5
jjj XbXbaY 2211ˆ
15
Multiple linear regression – general equations
In general an finite number (m) of independent variables may be used to estimate the hyperplane
The number of sample points must be two more than the number of variables
j
m
iijij XY
1
16
Multiple linear regression – co-liniarity
Adding age: adj R2 = 0.352
Adding thigh: adj R2 = 0.352?
Coefficientsa
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95,0% Confidence Interval
for B
B Std. Error Beta Lower Bound Upper Bound
1 (Constant) -9,001 2,449 -3,676 ,000 -13,813 -4,190
Waist ,168 ,043 ,201 3,923 ,000 ,084 ,252
Hip ,252 ,031 ,411 8,012 ,000 ,190 ,313
Age -,064 ,018 -,126 -3,492 ,001 -,101 -,028
a. Dependent Variable: BMI
Coefficientsa
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95,0% Confidence Interval
for B
B Std. Error Beta Lower Bound Upper Bound
1 (Constant) 3,581 1,784 2,007 ,045 ,075 7,086
Waist ,168 ,043 ,201 3,923 ,000 ,084 ,252
Age -,064 ,018 -,126 -3,492 ,001 -,101 -,028
Thigh ,252 ,031 ,411 8,012 ,000 ,190 ,313
a. Dependent Variable: BMI
17
Assumptions
1. Dependent variable must be metric continuous
2. Independent must be continuous or ordinal
3. Linear relationship between dependent and all independent variables
4. Residuals must have a constant spread.
5. Residuals are normal distributed6. Independent variables are not
perfectly correlated with each other
18
Multible linear regression in SPSS
19
Multible linear regression in SPSS
Non-parametric correlation
20
21
Ranked Correlation
Kendall’s Spearman’s rs
Correlation between -1 og 1. Where -1 indicates perfect inversse correlation , 0 indicates no
correlation, and 1 indicates perfect correlation
Pearson is the correlation method for normal dataRemember the assumptions:1. Dependent variable must be metric continuous2. Independent must be continuous or ordinal3. Linear relationship between dependent and all independent
variables4. Residuals must have a constant spread.5. Residuals are normal distributed
22
Kendall’s - An example
23
Kendall’s - An example
121
nnS QPS
24
Spearman – the same example
d2 1 4 9 1 1 1 9 9 1 16
0.68481010
52616
1 33
2
nnd
rs
25
Korrelation i SPSS
26
Korrelation i SPSS
Correlations
a b
a Pearson
Correlation
1 ,685*
Sig. (2-tailed) ,029
N 10 10
b Pearson
Correlation
,685* 1
Sig. (2-tailed) ,029
N 10 10
*. Correlation is significant at the 0.05 level (2-tailed).
Correlations
a b
Kendall's tau_b a Correlation
Coefficient
1,000 ,511*
Sig. (2-tailed) . ,040
N 10 10
b Correlation
Coefficient
,511* 1,000
Sig. (2-tailed) ,040 .
N 10 10
Spearman's rho a Correlation
Coefficient
1,000 ,685*
Sig. (2-tailed) . ,029
N 10 10
b Correlation
Coefficient
,685* 1,000
Sig. (2-tailed) ,029 .
N 10 10
*. Correlation is significant at the 0.05 level (2-tailed).
Logistic regression
27
28
Logistic Regression
• If the dependent variable is categorical and especially binary?
• Use some interpolation method
• Linear regression cannot help us.
29
The sigmodal curve
0 1 1
11 e
...
z
n n
p
z x x
-6 -4 -2 0 2 4 60
0.2
0.4
0.6
0.8
1
x
p
sigmodal curve
0 = 0; 1 = 1
30
The sigmodal curve
• The intercept basically just ‘scale’ the input variable
0 1 1
11 e
...
z
n n
p
z x x
-6 -4 -2 0 2 4 60
0.2
0.4
0.6
0.8
1
x
p
sigmodal curve
0 = 0; 1 = 1
0 = 2; 1 = 1
0 = -2; 1 = 1
31
The sigmodal curve
0 1 1
11 e
...
z
n n
p
z x x
-6 -4 -2 0 2 4 60
0.2
0.4
0.6
0.8
1
x
p
sigmodal curve
0 = 0; 1 = 1
0 = 0; 1 = 2
0 = 0; 1 = 0.5
• The intercept basically just ‘scale’ the input variable
• Large regression coefficient → risk factor strongly influences the probability
32
The sigmodal curve
0 1 1
11 e
...
z
n n
p
z x x
-6 -4 -2 0 2 4 60
0.2
0.4
0.6
0.8
1
x
p
sigmodal curve
0 = 0; 1 = 1
0 = 0; 1 = -1
• The intercept basically just ‘scale’ the input variable
• Large regression coefficient → risk factor strongly influences the probability
• Positive regression coefficient → risk factor increases the probability
• Logistic regession uses maximum likelihood estimation, not least square estimation
33
Does age influence the diagnosis? Continuous independent variable
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
95% C.I.for EXP(B)
Lower Upper
Step 1a Age ,109 ,010 108,745 1 ,000 1,115 1,092 1,138
Constant -4,213 ,423 99,097 1 ,000 ,015
a. Variable(s) entered on step 1: Age.
age1
1
10
BBze
p z
34
Does previous intake of OCP influence the diagnosis? Categorical independent variable
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
95% C.I.for EXP(B)
Lower Upper
Step 1a OCP(1) -,311 ,180 2,979 1 ,084 ,733 ,515 1,043
Constant ,233 ,123 3,583 1 ,058 1,263
a. Variable(s) entered on step 1: OCP.
OCP1
1
10
BBze
p z
0.48051
11
1)1( 1, OCP If
0.55801
11
1)1( 0, OCP If
311.0233.01
233.0
10
0
eeYp
eeYp
BB
B
35
Odds ratio
zeppo
1
0.7327 ratio odds 311.01010
0
10
eeeee BBBB
B
BB
36
Multiple logistic regression
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
95% C.I.for EXP(B)
Lower Upper
Step 1a Age ,123 ,011 115,343 1 ,000 1,131 1,106 1,157
BMI ,083 ,019 18,732 1 ,000 1,087 1,046 1,128
OCP ,528 ,219 5,808 1 ,016 1,695 1,104 2,603
Constant -6,974 ,762 83,777 1 ,000 ,001
a. Variable(s) entered on step 1: Age, BMI, OCP.
BMIageOCP1
1
3210
BBBBze
p z
37
Predicting the diagnosis by logistic regression
What is the probability that the tumor of a 50 year old woman who has been using OCP and has a BMI of 26 is malignant?
z = -6.974 + 0.123*50 + 0.083*26 + 0.28*1 = 1.6140p = 1/(1+e-1.6140) = 0.8340
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
95% C.I.for EXP(B)
Lower Upper
Step 1a Age ,123 ,011 115,343 1 ,000 1,131 1,106 1,157
BMI ,083 ,019 18,732 1 ,000 1,087 1,046 1,128
OCP ,528 ,219 5,808 1 ,016 1,695 1,104 2,603
Constant -6,974 ,762 83,777 1 ,000 ,001
a. Variable(s) entered on step 1: Age, BMI, OCP.
38
Logistic regression in SPSS
39
Logistic regression in SPSS