Upload
hannah-lyons
View
221
Download
0
Tags:
Embed Size (px)
Citation preview
Logical Analysis Of Data (LAD)Logical Analysis Of Data (LAD)Applied To Mass SpectrometryApplied To Mass Spectrometry
Data To Predict Rate Of Decline OfData To Predict Rate Of Decline OfKidney FunctionKidney Function
M. Lipkowitz1, M. Subasi2, E. Subasi2, V. Anbalagan1, W. Zhang1, P.L. Hammer2 J. Roboz1 and the AASK Investigators
1Mount Sinai School of Medicine, NY, NY2RUTCOR, Rutgers Center for Operations Research, Piscataway, NJ
DIMACS-RUTCOR Workshop on Boolean andPseudo-Boolean Functions in Memory of Peter L. Hammer
January, 2009
AcknowledgementsAcknowledgements
• 1,094 Participants
• Investigators and Staff at 21 AASK Clinical Centers and Coordinating Center
• Sponsors– NIDDK
– NIH Office on Research in Minority Health
– King Pharmaceuticals
Why worry about chronic kidney Why worry about chronic kidney disease???disease???
Prevalence of Renal Disease in USPrevalence of Renal Disease in US(Age > 20 yrs, NHANES III)(Age > 20 yrs, NHANES III)
ESRDESRD 300,000300,000
Severe CKDSevere CKDGFR 15-29GFR 15-29
400,000400,000
Moderate CKDModerate CKDGFR 30-59GFR 30-59
Mild CKDMild CKDGFR 60-89GFR 60-89
NormalNormalGFR > 90GFR > 90
7-12 million7-12 million
55 million55 million
114 million114 million
Creat > 1.3-1.4 (men)Creat > 1.1-1.2 (women)
Adapted from:Coresh et al, AJKD 41:1-12, 2003
Risk of Death and Cardiovascular Disease in CKD
Go et al. N Engl J Med 2004;351:1296-305.Go et al. N Engl J Med 2004;351:1296-305.
0
10
20
30
Lif
e E
xpec
tan
cy (
yrs)
at A
ge
49
Total ProstateCancer
ColonCancer
ESRD LungCancer
Life Expectancy in ESRDLife Expectancy in ESRD
African American Study of Kidney African American Study of Kidney Disease and HypertensionDisease and Hypertension
(AASK)(AASK)
Motivated by the high incidence of kidney Motivated by the high incidence of kidney disease in African Americans with hypertensiondisease in African Americans with hypertension
Extremely hard to recruit Extremely hard to recruit 500,000 medical records screened to 500,000 medical records screened to recruit recruit
1094 participants1094 participants
Two Phases of AASKTwo Phases of AASK
• Phase 1: Randomized trial (completed Sept 2001)Phase 1: Randomized trial (completed Sept 2001)
– 1,094 African-Americans with non-diabetic, hypertensive 1,094 African-Americans with non-diabetic, hypertensive CKD (baseline GFR of 20-65 ml/min/1.73 mCKD (baseline GFR of 20-65 ml/min/1.73 m22
– Demonstrated that one class of BP medications, ACE Demonstrated that one class of BP medications, ACE inhibitor, slowed progression of kidney diseaseinhibitor, slowed progression of kidney disease
• Phase 2: Observational cohort (completed June 2007)Phase 2: Observational cohort (completed June 2007)
– One Objective: document the long-term effects of trial One Objective: document the long-term effects of trial interventions on CKD events interventions on CKD events
– Therapy: all participants received Therapy: all participants received recommended BP recommended BP therapytherapy::• ACEi (or ARB) ACEi (or ARB) • BP goal < 130/80 mmHgBP goal < 130/80 mmHg
Main Results of Phase 1Main Results of Phase 1
• Trial results published in JAMA 2002Trial results published in JAMA 2002
– ACEi was more effective than CCBs and BBs ACEi was more effective than CCBs and BBs in slowing progression of hypertensive renal in slowing progression of hypertensive renal diseasedisease
– Largest difference seen in participants with Largest difference seen in participants with UP/Cr > 0.22 (>300 mg/24h)UP/Cr > 0.22 (>300 mg/24h)
– No difference between participants No difference between participants randomized to lower MAP goal <92 mmHg vs randomized to lower MAP goal <92 mmHg vs 102-107 mmHg regardless of UP/Cr102-107 mmHg regardless of UP/Cr
Second Phase of AASKSecond Phase of AASK
Cohort study (completed 6/07)Cohort study (completed 6/07)
– One Objective: document the long-term effects of trial One Objective: document the long-term effects of trial interventions on CKD events interventions on CKD events
– Therapy: all participants received Therapy: all participants received recommended BP therapyrecommended BP therapy::• ACEi (or ARB) ACEi (or ARB) • BP goal < 130/80 mmHgBP goal < 130/80 mmHg
– Primary composite outcome: Primary composite outcome: • doubling of serum Cr from the trial baseline, ESRD, or doubling of serum Cr from the trial baseline, ESRD, or
death across death across bothboth trial and cohort phase trial and cohort phase
Event Rates- Trial and CohortEvent Rates- Trial and Cohort
Conclusion:Conclusion:
ACE inhibition does slow progression of CKD.ACE inhibition does slow progression of CKD.
However, the residual progression rate on However, the residual progression rate on best therapy is unacceptable!best therapy is unacceptable!
Heterogeneity of Progression of CKD
Glomerular Filtration Rate (GFR)
A measure of kidney function
Normal is 100ml/min/1.73 m2
GFR slope
We use rate of decline of GFR as our main measure of progression
Clinical Case 1Clinical Case 1
ACEi ACEi Good BP controlGood BP control1 gm proteinuria1 gm proteinuria
ACEiACEiSub-optimal BP ControlSub-optimal BP ControlUprot 1.1 g/24 hUprot 1.1 g/24 h
Clinical Case 2Clinical Case 2
Blood PressureBlood Pressure eGFReGFR
How do we find the “Rapid Progressors” How do we find the “Rapid Progressors” and “Non-progressors”and “Non-progressors”
Heterogeneity in Chronic GFR SlopeHeterogeneity in Chronic GFR Slope
Urine Protein, Our Current Best Predictor, Is Not Adequate
A Serum Proteomics ApproachA Serum Proteomics Approach
• Use SELDI-tof Mass Spectrometry to detect Use SELDI-tof Mass Spectrometry to detect serum proteinsserum proteins
• Use Logical Analysis of Data (LAD), a special Use Logical Analysis of Data (LAD), a special data analysis methodology which combines data analysis methodology which combines ideas and concepts from optimization, ideas and concepts from optimization, combinatorics, and Boolean functionscombinatorics, and Boolean functions
The Data Set
Rapid Rapid
ProgressorsProgressors
(n=60)(n=60)
Slow Slow
ProgressorsProgressors
(n=60)(n=60)p-valuep-value
Chronic SlopeChronic Slope -6.60 -6.60 ++ 1.36 1.36 +2.18 +2.18 ++ 1.12 1.12 <0.00001<0.00001
GFRGFR 45.05 45.05 ++ 11.97 11.97 53.45 53.45 ++ 11.50 11.50 <0.0001<0.0001
ProteinuriaProteinuria 1.09 1.09 ++1.351.35 0.090.09++0.190.19 <0.00001<0.00001
AgeAge 50.85 50.85 ++ 11.95 11.95 53.35 53.35 ++ 9.51 9.51 NSNS
WeightWeight 95.75 95.75 ++ 22.75 22.75 86.23 86.23 ++ 20.82 20.82 NSNS
*Matched for randomized drug class*Matched for randomized drug class
SELDI-tofSELDI-tof
SELDI Data
insulininsulin
Logical Analysis of Data (LAD)Logical Analysis of Data (LAD)
• Non-statistical method based on– Combinatorics– Optimization– Logic
• Initiated by Peter L. Hammer in 1988.
• Has been applied to numerous disciplines: economics and business, seismology, oil exploration, medicine.
LAD ApproximationLAD Approximation
Hidden Function Hidden Function
LAD LAD Approximation Approximation
Dataset Dataset
Main Components of LADMain Components of LAD
• DiscretizationDiscretization
• Support setSupport set
• Pattern generationPattern generation
• ModelModel
• PredictionPrediction
DiscretizationDiscretization
Feasible set of Feasible set of cut-pointscut-points
Minimum set of Minimum set of cut-pointscut-points
Set covering
Support SetSupport Set
• Smallest (cardinality) subset of attributes which Smallest (cardinality) subset of attributes which are sufficient to distinguish between the positive are sufficient to distinguish between the positive and negative observations.and negative observations.
• Finding a support set is a set-covering problem!Finding a support set is a set-covering problem!
Positive PatternPositive Pattern Negative PatternNegative Pattern
PatternsPatterns
Positive Pattern Covering A:Positive Pattern Covering A: i) Covers Ai) Covers A ii) Does not cover D, E, ii) Does not cover D, E, FF
01:P 31 xANDx
Coverage(P)Coverage(P) = Number of observations covered by = Number of observations covered by PP
Degree(P)Degree(P) = Number of conditions in = Number of conditions in PP
Homogeneity(PHomogeneity(P) = Proportion of positive observation among those) = Proportion of positive observation among those it coversit covers
Prevalence(P)Prevalence(P) = Proportion of positive observations covered by = Proportion of positive observations covered by P P to to
total number of positive observationstotal number of positive observations
x1 x2 x3 x4 x5 x6A 1 0 0 1 1 1 1B 1 1 0 1 0 0 1C 0 0 1 0 1 0 1D 1 0 1 0 1 1 0E 0 0 0 1 1 1 0F 1 1 1 0 0 0 0O
bse
rvat
ion
s
AttributesOutcome
Pattern CharacteristicsPattern Characteristics
Positive TheoryPositive Theory Negative TheoryNegative Theory
TheoryTheory
LAD ModelLAD Model
Unexplained Unexplained AreaArea
Discordant Discordant AreaArea
Positive Positive areaarea
NegativeNegativeareaarea
A good LAD Model!A good LAD Model!
• Small # of featuresSmall # of features
• High quality patternsHigh quality patterns
– Small degreeSmall degree– High prevalenceHigh prevalence– High homogeneityHigh homogeneity
• Small # of patternsSmall # of patterns
LAD PredictionLAD Prediction
• Model: Model: PP11, , PP22, … , , … , PPpp ; ; NN11, , NN22 , … , , … , NNnn
• DiscriminantDiscriminant
j
ji
i xNn
xPp
x11
• Prediction:Prediction:Based on the sign of the Based on the sign of the discriminant.discriminant.
• Discriminant is not only used for prediction, Discriminant is not only used for prediction,
but also as an effective risk score!but also as an effective risk score!
LAD SoftwaresLAD Softwares
• Sorin Alexe, DatascopeSorin Alexe, Datascope
http://rutcor.rutgers.edu/~salexe/LAD_kit/SETUP-LAD-DS-SE20.ziphttp://rutcor.rutgers.edu/~salexe/LAD_kit/SETUP-LAD-DS-SE20.zip
• Pierre Lemaire, LadoscopePierre Lemaire, Ladoscope
http://www.kamick.org/lemaire/LADhttp://www.kamick.org/lemaire/LAD
LAD Applied to AASK DataLAD Applied to AASK Data
• Generates groups of “combinatorial biomarkers”Generates groups of “combinatorial biomarkers”
– Pairs of SELDI peak intensities that are either Pairs of SELDI peak intensities that are either “positive” (predict rapid progression) or “positive” (predict rapid progression) or “negative” (predict slow progression) biomarkers“negative” (predict slow progression) biomarkers
– Groups of these “combinatorial biomarkers” are Groups of these “combinatorial biomarkers” are combined to create a model that predicts combined to create a model that predicts outcomesoutcomes
– There are a small number of pairs of peaks There are a small number of pairs of peaks potentially provides targets for future researchpotentially provides targets for future research
The ‘Support Set’The ‘Support Set’
• 5751 SELDI protein peaks5751 SELDI protein peaks– 7 are enough to predict outcomes7 are enough to predict outcomes
Attribute
Correlation
Correlation rank
M2018 0.039 4115 M2756 0.260 16 M2780 0.252 28 M5266 0.065 3290 M9940 0.194 348
M11274 0.133 1565 M11752 0.192 378
The LAD Model
Patterns
Pattern characteristics Pattern defining conditions
PrevalenceHomogeneity
Hazard Ratio
M2018 M2756 M2780 M5266 M9940 M11274 M11752Positive Negative
P133
(57.89%)10
(16.95%)78.57% 2.42 < 0.575 > 0.055
P232
(56.14%)8
(13.56%)80% 2.43 < 3.835 > 2.78
P332
(56.14%)9
(15.25%)78.05% 2.34 > 0.49 < 0.515
N111
(19.30%)39
(66.10%)78% 2.57 > 1.705 > 0.465
N26
(10.53%)31
(52.54%)85.71% 2.39 > 0.235 < 0.115
N38
(14.04%)35
(59.32%)81.4% 2.48 > 1.295 > 0.515
N47
(12.28%)31
(52.54%)83.33% 2.3 > 0.425 < 2.78
Validation of the LAD ModelValidation of the LAD Model
Accuracy
Sensitivity
Specificity
Hazard Ratio
Basic LAD model
80.6%
78.4%
78.5%
2.72
““10-folding” experiments:10-folding” experiments:• patients randomly divided into 10 equal groupspatients randomly divided into 10 equal groups• use data from 9 groups to predict outcomes in use data from 9 groups to predict outcomes in 10th10th• repeat for each grouprepeat for each group• randomly re-divide and repeat X 10 (100 total randomly re-divide and repeat X 10 (100 total runs)runs)
Discriminants as Risk ScoresDiscriminants as Risk Scores
GroupGroup # of# ofobservationsobservations
Percentage of Percentage of Rapid ProgressorsRapid Progressors
Average Risk Average Risk ScoreScore
11 2323 0%0% 0.0870.087
22 2323 26.09%26.09% 0.2750.275
33 2323 56.52%56.52% 0.4980.498
44 2323 69.57%69.57% 0.6970.697
55 2424 91.67%91.67% 0.9240.924
Outcomes by Quintile of “Risk Score”
LADLAD
Upro/UCrUpro/UCr
Proportion of Rapid Progressors vs Average Risk Index
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1 2 3 4 5Quintiles of Risk Index
Proportion of RapidProgressors
Average Risk Index
LAD vs Proteinuria to Predict Progression
• Both work well to find rapid progressors– >95% of patients with high risk or high protein
progress
• LAD Risk Score better defines slow progressors– None with lowest LAD risk score progress
– 16% with lowest protein progress
– In fact, the degree of proteinuria in the 3 lowest quintiles may not be distinguishable on repeated testing, so progression could be up to 40%
1.1. Expand this pilot SELDI study to the full Expand this pilot SELDI study to the full AASK data set (800 samples).AASK data set (800 samples).
• If data are reproducible this could lead to a If data are reproducible this could lead to a clinical test for progression rate.clinical test for progression rate.
2.2. The ultimate goal: isolate and identify The ultimate goal: isolate and identify components of combinatorial biomarkerscomponents of combinatorial biomarkers
• This will hopefully lead to new therapeutic This will hopefully lead to new therapeutic targets for drug developmenttargets for drug development
• Identification of proteins is difficult, and LAD Identification of proteins is difficult, and LAD limits the number to identifylimits the number to identify
Future StudiesFuture Studies