Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005Instructors: Professor Lucila Ohno-Machado and Professor Staal Vinterbo
6.873/HST.951 Medical Decision Support Fall 2005
Decision Analysis (part 2 of 2)
Review Linear Regression
Lucila Ohno-Machado
Outline
• Homework clarification • Sensitivity, specificity, prevalence• Cost-effectiveness analysis • Discounting cost and utilities • Review of Linear Regression
2 x 2 table(contingency table)
PPD+ PPD
2 10TB 8
no TB 3 87 90
11 89 100
Prevalence of TB = 10/100 Sensitivity of test = 8/11 Specificity of test = 87/89
threshold
normal Disease
FN
True Negative (TN)
FP
True Positive (TP)
0 3mm 10
Sens = TP/TP+FN
Spec = TN/TN+FP
PPV = TP/TP+FP
NPV = TN/TN+FN
Accuracy = TN +TP
nl D
“nl”
“D”
- +
TN
TPFP
FN
“nl” “D”
Sens = TP/TP+FN40/50 = .8
Spec = TN/TN+FP45/50 = .9
PPV = TP/TP+FP40/45 = .89
NPV = TN/TN+FN45/55 = .81
Accuracy = TN +TP85/100 = .85
nl D
“nl”
“D”
“nl” “D”
45
405
10
nl disease
threshold
TN
FP
TP
Sensitivity = 50/50 = 1 Specificity = 40/50 = 0.8
“D”
“nl”
nl D
40
5010
0
50 50
40
60
0.0 0.4 1.0
nl disease
threshold
FN
TN
FP
TP
“D”
“nl”
nl D
45
405
10
50 50
50
50
Sensitivity = 40/50 = .8 Specificity = 45/50 = .9
0.0 0.6 1.0
nl disease
threshold
FN
TN TP
Sensitivity = 30/50 = .6 Specificity = 1
“D”
“nl”
nl D
50
300
20
50 50
70
30
0.0 0.7 1.0
Cost-effectiveness analysis
• Comparison of costs with health effects– Cost per Down case syndrome averted
– Cost per year of life saved • Perspectives (society, insurer, patient)• Comparators
– Comparison with doing nothing – Comparison with “standard of care”
Discounting costs
• It is better to spend $10 next year than today (its value will be only $9.52, assuming 5% rate)
• Even better to spend it 2 years from now ($9.07)• For cost-effectiveness analysis spanning
multiple years, recommended rate is usually 5%
• C = C0 + C1/(1-r)1 + C2/(1-r)2 + ...• C0 are costs at time 0
Discounting utilities
• Value for full mobility is 10 today (is it only 9.52 next year?)
• Should the discount rate be the same as for costs?
• If smaller, then it would always be better to wait one more year…
Levels of Evidencein Evidence-Based Medicine
US Task Force
• Level 1: at least 1 randomized controlled trial (RCT)
• Level 2-I: controlled trials (CT) • Level 2-II: cohort or case-control study
• Level 2-III: multiple time series with or without the intervention
• Level 3: expert opinions
Examples
• Cost per year of life saved, Life years/US$1M
• By pass surgery middle-aged man– $11k/year, 93
• CCU for low risk patients – $435k/year, 2
Importance of good stratification
• Bypass surgery – Left main disease 93 – One vessel disease 12
• CCU for chest pain – 5% risk of MI 2 – 20% risk of MI 10
Intro to Modeling
Univariate Linear Model• Y is what we want to predict (dependent variable)• X are the predictors (independent variables) • Y=f(X), where f is a linear function
y
y = 1β0 + x1β1 1
y = 1β + x2β2 0 1
…
x
Univariate Linear Model
ββ1 is slope
0 is intercept
y = 1β0 + x11
y = 1β + x22 0…
1
1
β
β y
x
2
1
Multivariate Model
• Simple model: structure and parameters
• 3 predictors, 4 parameters β
• one of the parameters (β0) is a constant
+β0 +β1 2 +β ββx
3
23 3
1=y x11 x12 x13
ββββ
⎢⎢⎢⎢
⎥⎥⎥⎥
0
1
2
3
1 1⎡ ⎡⎤ 0 +β 1+β +β21=y x21 x22 ⎢
⎢⎢⎢
x11 x 21
x12 x 22
⎤ ⎥⎥⎥⎥
⎣ x13 x 23 ⎣⎦ ⎦
Notation and Terminology
• xi is vector of j inputs, covariates, independent
⎡⎤ 30⎡ ⎤ ⎡ ⎤age x1variables, or predictors for case i (i.e., what wex1
know for all cases)
⎢⎢⎢⎣
⎥⎥⎥⎦
⎢⎢⎢⎣
⎥⎥⎥⎦
⎢⎢⎢⎣
= 10
1
⎥⎥⎥⎦
salt smoke
= x2
x3
X is matrix of j x n T•⎡ ⎤x11x21xi column vectors (input
data for each case) ⎡x11x12 x13⎢
⎢⎢⎣
=⎥⎥⎥⎦
x12 x22 ⎢⎣x21x22 x23
⎤⎥⎦x13 x23
Prediction
• y is scalar: output,i dependent variable (i.e., what we want to predict)
e.g., mean blood pressure ⎡ predpat1 ⎤
⎥ = ⎡ 100⎤ ⎡ y1 ⎤
⎢ ⎢ ⎢ ⎥ ⎣ predpat2 ⎦ ⎣ 98 ⎦
⎥ = ⎣ y2 ⎦
• Y is vector of yi
Multivariate Linear Model
+β0 +β1 2 +β β3
βx23 3
1=y1 x11 x12 x13
0 +β 1+β 2 +β1=y2 x21 x22
Y = XTβ, where each x includes a term for 1 (constant) (x10=1,i x20=1, etc.) to be multiplied by the intercept β0
ββββ
⎢⎢⎢⎢
0
1
2
3
⎡ ⎤ ⎥⎥⎥⎥
⎡ ⎤ ⎡y1 ⎤x10 x11x12 x13⎢⎣
⎢⎣
⎥⎦
⎥⎦y2 x x x x20 21 22 23
⎣ ⎦
Regression and Classification
Regression: continuous outcome • E.g., blood pressure Y = f(X)
Classification: categorical outcome
• E.g., death (binary)
G = X G )ˆ (
Loss function
• Y and X random variables
• f(x) is the model • L(Y,f(X)) is the loss function (penalty for being
wrong) • It is a function of how much to pay for
discrepancies between Y (real observation) and f(X) estimated value for an observation)
• In several cases, we use only the error and leave the cost for the decision analysis model
Regression Problems
Let’s concentrate on simple errors for now:• Expected Prediction Error (EPE):
[Y-f(X)]2
|Y-f(X)| • These two error functions imply that errors
in both directions are considered the same way.
Univariate Linear Model
y1 = 1β0 + x1β1
y2 = 1β + x2β0 1 y
y = 1β + x1β + error1 0 1
y = 1β + x2β + error2 0 1
x
Squared Errors
n 2SSE = ∑(Y − Y )
i=1
)(ˆ
ˆ 10
XfY
xy ii
=
+= ββ
y
y
Linear regression
Conditioning on x y• x is a certain value
ˆ(x y ) = 1/ k ∑ yixi = x )
x f ) = Ave ( y | x = x )( i i
• Expectation isapproximated by average
• Conditioning is on x
x
k -Nearest Neighbors y• N is neighborhood
ˆ(x y ) = 1/ k ∑ yixi ∈Nk ( x )
x f ) = Ave ( y | x ∈ Nk ( x ))( i i x
• Expectation is approximated by average
• Conditioning is on neighborhood
Nearest Neighbors y• N is continuous
neighborhood
Y ( x) = 1/ n∑ w y i i
(f ( x) = y e WeightedAv | x)i x
w• Expectation isapproximated by weighted average
x
Minimize Sum of Squared Errors
y
yi = β 0 + β xi1
n n
SSE = ∑ ( y − y)2 2= ∑ [ y − (β + β x)] =i= 1 i=
0 11
n
= ∑ ( y2 2− 2 y(β + β x) + (β + β x) ) =i=
0 1 0 11
n2= ∑ ( y − 2 yβ − 2 yβ x + β 2 + 2 ββ x + β 2 2x )0 1 0 0 1 1
i= 1
(derivative wrt β 0) = 0
n 2 2 2∑ ( y − 2 yβ − 2 yβ x + β 2 + 2 β β x + β x )0 1 0 0 1 1
i= 1
n∂ SE = 2∑ ( + − β + β x) = 00 1∂β 0 i= 1
y
β n + β 1 ∑ x = ∑ y Normal equation 10
(derivative wrt β 1) = 0
n 2 2 2∑ ( y − 2 y β − 2 y β x + β 2 + 2 ββ x + β x )0 1 0 0 1 1
i = 1
n∂ SE = 2∑ (− x y + β x + β x 2 ) = 0
∂β 1 i = 1 0 1
Normal equation 2β 0 ∑ x + β 1 ∑ x 2 = ∑ x y
Solve system of normal equations
β n + β1 ∑ x = ∑ y Normal equation 10
β0 ∑ x + β1 ∑ x 2 = ∑ x y Normal equation 2
yβ0 = y − β x1
β1 =Σ( x − x )( y − y )
Σ( x − x )2
Limitations of linear regression
• Assumes conditional probability p(Y|X) is normal
• Assumes equal variance in every X
• It’s linear ☺
(but we can always use interaction or transformed terms)
Linear Regression for Classification?
y = p y=1
yi = β + β xi0 1
x
Linear Probability Model
y=1
x
ii xy 10ˆ ββ += for 0<= β0 + β1 xi <=1
for 0 >= β0 + β1 xi
for β0 + β1 xi >=1
0
1
Logit Model
∑=⎥⎦
⎤⎢⎣
⎡−
+=⎥⎦
⎤⎢⎣
⎡−
+=
+=
+
+
+−
ii
i
i
ii
i
x
x
i
xi
xp
p
xp
p
eep
ep
i
i
i
β
ββ
ββ
ββ
ββ
1log
1log
1
11
10
)(
10
10
10
logit
x
p=1
x
Two dimensions
x1
x2