37
Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur, L.J. Wei (Harvard University)

Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Embed Size (px)

Citation preview

Page 1: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Model and Variable Selections for Personalized Medicine

Lu Tian (Northwestern University)

Hajime Uno (Kitasato University)

Tianxi Cai, Els Goetghebeur, L.J. Wei (Harvard University)

Page 2: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Outline

Background and motivation

Developing and evaluating prediction rules based on a set of markers for Continuous or binary outcome Censored event time outcome Evaluating the incremental value of a biomarker over

the entire population various sub-populations

Incorporating the patient level precision of the prediction Prediction intervals/sets

Remarks

Page 3: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Background and Motivation

DiagnosisPrognosis Treatment

Personalized medicine: using information about a person’s biological and genetic make up to tailor strategies for the prevention, detection and treatment of disease

Important step: develop prediction rules that can accurately predict health outcome or diagnosis of clinical phenotype

Page 4: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Background and Motivation

Subject CharacteristicsBiomarkers

Genetic Markers

Predictor Z Outcome Y

Disease statusTime to event

Treatment Response

Accurate prediction of disease outcome and treatment response, however, are complex and difficult tasks.

Developing prediction rules involve Identifying important predictors Evaluating the accuracy of the prediction Evaluating the incremental value of new markers

Page 5: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Background and Motivation AIDS Clinical Trial : ACTG320

Study objective: to compare 3-drug regimen (n=579): Zidovudine + Lamivudine + Indinarvir 2-drug regimen (n=577): Zidovudine + Lamivudine

Identify biomarkers for predicting treatment response

How well can we predict the treatment response? Is RNA needed?

Age, CD4week 0, CD4week 8 RNAweek 0, RNAweek 8

Predictor Z

CD4week 24

Outcome Y?

Page 6: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Background and Motivation

CD4week 24Predictors

AssociationCoefficients for RNA significant?

Is RNA needed?

Regression Analysis: ZY '

Page 7: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Background and Motivation AIDS Clinical Trial

Age RNAweek 0 RNAweek 8 CD4week 0 CD4week 8

Estimate -0.55 0.08 -12.06 0.03 0.68

SE 0.35 5.53 2.80 0.07 0.10

Pvalue 0.12 0.99 0.00 0.72 0.00

Regression Coefficient

Coefficient for RNAweek 8 highly significant RNA needed for a more precise prediction of responses??

Page 8: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Background and Motivation

Y = CD4week 8Z=PredictorsIs RNA needed?

Does adding RNA improve the prediction?

is? )(ZYhat than w

Y closer to RNA),(ZY Is

01

02(Z)Yprediction procedure

)(ZY 01RNA),(ZY 02

(Z)Y(Z)Y

1. Prediction rule: based on regression models2. The distance between and Y?

Page 9: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Developing Prediction RulesBased on a Set of Markers

)'()|( ZgZtTP tt

Regression approach to approximate Y | Z Continuous or binary outcome: Generalize linear

regression Survival outcome:

Proportional Hazards model Time-specific prediction models

Regression modeling as a vehicle: the procedure has to be valid when the imposed statistical

model is not the true model!

Page 10: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Developing and Evaluating Prediction Rules

Predict Y with Z based on the prediction model

Evaluate the performance of the prediction by the average “distance” between and Y The utility or cost to predicting Y as is The average “distance” is

(Z)Y

c} )'ˆ({)(ˆ )'ˆ()(ˆ :Examples ZgIZYZgZY

][ )}(ˆ,{ ZYYdED (Z)Y )}(ˆ,{ ZYYd

Examples:

Absolute prediction error: |Y-(Z)Y|Y}(Z),Y{d Total “Cost” of Risk Stratification:

d01 d02 d03

d11 d11 d31

Y = 0

Y = 1

1(Z)Y 2(Z)Y 3(Z)Y

kydy}Yk,(Z)Y{d

Page 11: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Evaluating and Comparing Prediction Rules

The performance of the prediction model/rule with can be estimated by

Prediction Model/Rule Comparison: Prediction with E(Y | Z) = g1(a’Z) vs E(Y | W) = g2(b’W) Compare two models/rules by comparing

n

iii ZYYdnD

1

1 )(ˆ , ˆ

n

iiiii ZYYdZYYdnDD

121

121 )}(ˆ{)}(ˆ{ˆˆˆ

(Z)Y

(Z)}Yd{Y 1 (Z)}Yd{Y 2and

Page 12: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Variability in the prediction errors: Estimate = 50, SE = 1? SE = 50?

Inference about D and = D1 – D2

Confidence intervals based on large sample approximations to the distribution of

)ˆ( ),ˆ( 2/12/1 nDDn

Variability in the Estimated Prediction Performance Measures

Page 13: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Bias Correction

Bias issue in the apparent error type estimators Bias correction via Cross-validation:

Data partition Tk, Vk

For each partition Obtain based on observations in Tk

Obtain based on observations in Vk

Obtain cross-validated estimator

β )(-k

)β(Dk

)ˆ(ˆ~)(

1

1k

K

kkDKD

))ˆ(ˆ(2/1 DDn and have the same limiting distribution

)~

(2/1 DDn

Page 14: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Example: AIDS Clinical Trial

Objective: identify biomarkers to predict the treatment response

Outcome: Y = CD4week 24

Predictors Z: Age, CD4week 0, CD4week 8,

RNAweek 0, RNAweek 8

Working Model: E(Y|Z) = ’’ZZ

Page 15: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Example: AIDS Clinical TrialIncremental Value of RNA

Full Model

w/o RNA

Apparent 51 (2.7*)

52 (2.7)

10-fold CV 52 53

2n/3 CV 53 53

Apparent [46, 56] [47, 57]

10-fold CV [47, 57] [48, 58]

2n/3 CV [48, 58] [48, 58]

Gain Due to RNA

-0.61(0.61)

-0.64

-0.28

[-2.0, 0.4]

[-2.0, 0.4]

[-1.5, 0.9]

* : Std Error Estimates

Estimates

95% C.I.

Page 16: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Incremental Value of RNA within Various Sub-populations

Page 17: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Trandolapril Cardiac Evaluation Study

(Kober et al 2005, NEJM)

• Prognostic importance of the left ventricular dysfunction– Thune et al (2005) : Diamond study– Trace study (Kober et al 2005, NEJM)

• Designed to determine whether patients w/ left ventricular dysfunction soon after myocardial infarction benefit from long-term oral ACE inhibition

• Between 1990 and 1992, a total of 6676 patients with myocardial infarction were screened with echocardiography

• A total of 5921 subjects had available data

Page 18: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Trandolapril Cardiac Evaluation Study (Kober et al 2005, NEJM)

• Routine Markers include: – Age– creatine (CRE)– occurrence of heart failure (CHF)– history of diabetes (DIA),– history of hypertension (HYP), – cardiogenic shock after MI (KS)

• We are interested in evaluating in the incremental value of wall motion index (WMI)

Page 19: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Age CRE CHF DIA HYP KS WMI

Est .055 -.010 .759 .718 .187 1.153 -1.097

SE .004 .002 .067 .101 .073 .163 .083

Pvalue .000 .000 .000 .000 .010 .000 .000

• Does WMI improve the prediction of 5-year survival?

Trandolapril Cardiac Evaluation Study (Kober et al 2005, NEJM)

Page 20: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

OME

Routine Markers w/o WMI 0.28

Markers Including WMI 0.26

Population Gain Attributed to WMI

0.02

Population Average Incremental Value of WMI

Predicting 5-year Survival

5-year mortality rate = 42%

Page 21: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

D1 D2

)0ˆ,1()1ˆ,0(

)ˆ()ˆ,(

YYIYYI

YYIYYD

Page 22: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

)0ˆ,1()1ˆ,0(

)ˆ()ˆ,(

YYIYYI

YYIYYD

Page 23: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

1ˆ and 0 ofError YY 0ˆ and 1 ofError YY

Gain

Du

e t

o

WM

I

Page 24: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

= 1 = 4 = 9

)0ˆ,1()1ˆ,0()ˆ,( YYIYYIYYD

Gain

Du

e t

o W

MI

wit

h r

esp

ect

to

D

Page 25: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

ExampleBreast Cancer Gene Expression Study

Objective: construct a new classifier that can accurately predict future disease outcome

van’t Veer et al (2002) established a classifier based on a 70-gene profile good- or poor-prognosis signature based on their correlation with the

previously determined average profile in tumors from patients with good prognosis

Classify subjects as Good prognosis if Gene score > cut-off Poor prognosis if Gene score < cut-off

van de Vijver et al (2002) evaluated the accuracy of this classifier by using hazard ratios and signature specific Kaplan Meier curves

Page 26: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

ExampleBreast Cancer Gene Expression Study

Data consist of 295 Subjects Outcome T: time to death Predictors: Lymph-Node Status, Estrogen Receptor

Status, gene score

We are interested in Constructing prediction rules for identify subjects who would

survive t-year, Y = I(T t)=1.

Evaluating the incremental value of the Gene Score.

Page 27: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

ModelApparent

Error

Naïve 0.30 (0.031)

Clinical only 0.28 (0.033)

Clinical +Gene Score 0.25 (0.036)

Van de Vijver 0.35 (0.050)

10-fold

CV

Random

CV

0.29 0.30

0.30 0.28

0.27 0.28

Example: Breast Cancer DataPredicting 10-year Survival

Page 28: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Evaluating the Prediction RuleBased on Various Accuracy Measures

For a future patient with T0 and Z0, we predict

Classification accuracy measures

Sensitivity Specificity

Prediction accuracy measures

c)Z'β( if 00 gtT c)Z'β( if 00 gtT

}'β({)( 00 tc|T)ZgPcSE }|c)Z'β({)( 00 tTgPcSP

}'β(|{)( 00 c)ZgtTPcPPV c})Z'β(|{)( 00 gtTPcNPV

Page 29: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Naïve o Clinical Clinical + Gene van de Vijver

Example: Breast Cancer DataPredicting 10-year Survival

Page 30: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Example: Breast Cancer Data

To compare Model II: g(a + Node + ER) Model III: g(a + Node + ER + Gene)

Choosing cut-off values for each model to achieve SE = 69% which is an attainable value for Model II, then

Model II SP = 0.45, PPV = 0.35, NPV = 0.77 Model III SP = 0.75, PPV = 0.54, NPV = 0.85 95% CI for the difference in

SP: [0.11, 0.45], PPV: [0.01, 0.24], NPV: [0.06, 0.19]

Page 31: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Prediction IntervalAccounting for the Precision of the Prediction

Based on a prediction model predict the response summarize the corresponding population average accuracy

)(ˆ as 00 ZYY

][ )}(ˆ,{ˆ 000 ZYYdEDD

)(ˆ 0ZY

What if the population average accuracy of 70% is not satisfactory? How to achieve 90% accuracy?

What if can predict Y0 more precisely for certain Z0, while on the other hand fails to predict Y0 accurately?

Account for the precision of the prediction? Identify patients would need further assessment?

Page 32: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Predicted Risk = 0.04Predicted Risk = 0.51

Classic Rule: Risk of Death < 0.50 Survivor {Y=0} Risk of Death ≥ 0.50 Non-survivor {Y=1}

{1} {0}

Page 33: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Prediction Interval

To account for patient-level prediction error, one may instead predict such that

The optimal interval for the population with Z0 is

: estimated conditional density function

)(ˆ 00 ZKY

})|(ˆ:{)(ˆ,

00 cZyfyZK

)|(ˆ 0Zyf

}|)(ˆ{ 000 ZZKYP

Page 34: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Example: Breast Cancer Study

Data: 295 patients Response: 10 year survival Predictors: Lymph-Node Status, Estrogen Receptor Status, Gene

Score

Model

Possible prediction sets: {}, {0}, {1}, {0,1} Classic prediction: considers {0}, {1} only.

)'β()|10( ZgZTP

Page 35: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Predicted Risk = 0.51 Predicted Risk = 0.04

90% Prediction Set: {0,1} 90% Prediction Set: {0}

Page 36: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Example: Breast Cancer Study Prediction Sets Based on Clinical + Gene Score

(0%)

(63%)

(37%)

4%

39%

57%

Page 37: Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Proper choice of the accuracy/cost measure Classification accuracy vs predictive values Utility function: what is the consequence of predicting

a subject with outcome Y as

With an expensive or invasive marker Should it be applied to the entire population? Is it helpful for a certain sub-population? Should the cost of the marker be considered when

evaluating its value?

Remarks

(Z)Y