13
A decision support system to facilitate management of patients with acute gastrointestinal bleeding Adrienne Chu a,1 , Hongshik Ahn a,1 , Bhawna Halwan b,1 , Bruce Kalmin c , Everson L.A. Artifon d , Alan Barkun e , Michail G. Lagoudakis f , Atul Kumar g, * a Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794, United States b SUNY Downstate, Brooklyn, NY 11203, United States c Division of Gastroenterology, Medical University of South Carolina, Charleston, SC 29425, United States d University of Sao Pualo School of Medicine, Sao Paulo, Brazil e Mc Gill University, Montreal, Canada H3A 2T5 f Intelligent Systems Laboratory, Department of Electronic and Computer Engineering, Technical University of Crete, Kounoupidiana, 73100 Chania Hellas, Greece g United States Department of Veterans Affairs, Stony Brook University, Stony Brook, NY 11794, United States Received 19 January 2007; received in revised form 25 September 2007; accepted 6 October 2007 Artificial Intelligence in Medicine (2008) 42, 247—259 http://www.intl.elsevierhealth.com/journals/aiim KEYWORDS Class prediction; Cross validation; Gastrointestinal bleeding; Machine learning Summary Objective: To develop a model to predict the bleeding source and identify the cohort amongst patients with acute gastrointestinal bleeding (GIB) who require urgent intervention, including endoscopy. Patients with acute GIB, an unpredictable event, are most commonly evaluated and managed by non-gastroenterologists. Rapid and consistently reliable risk stratification of patients with acute GIB for urgent endoscopy may potentially improve outcomes amongst such patients by targeting scarce health- care resources to those who need it the most. Design and methods: Using ICD-9 codes for acute GIB, 189 patients with acute GIB and all available data variables required to develop and test models were identified from a hospital medical records database. Data on 122 patients was utilized for development of the model and on 67 patients utilized to perform comparative analysis of the * Corresponding author. Tel.: +1 631 880 8510; fax: +1 631 486 6113. E-mail address: [email protected] (A. Kumar). 1 These authors contributed equally to this study. 0933-3657/$ — see front matter # 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.artmed.2007.10.003

A decision support system to facilitate management of patients with acute gastrointestinal bleeding

Embed Size (px)

Citation preview

Page 1: A decision support system to facilitate management of patients with acute gastrointestinal bleeding

Artificial Intelligence in Medicine (2008) 42, 247—259

http://www.intl.elsevierhealth.com/journals/aiim

A decision support system to facilitatemanagement of patients with acutegastrointestinal bleeding

Adrienne Chu a,1, Hongshik Ahn a,1, Bhawna Halwan b,1,Bruce Kalmin c, Everson L.A. Artifon d, Alan Barkun e,Michail G. Lagoudakis f, Atul Kumar g,*

aDepartment of Applied Mathematics and Statistics, Stony Brook University, Stony Brook,NY 11794, United Statesb SUNY Downstate, Brooklyn, NY 11203, United StatescDivision of Gastroenterology, Medical University of South Carolina, Charleston, SC 29425,United StatesdUniversity of Sao Pualo School of Medicine, Sao Paulo, BrazileMc Gill University, Montreal, Canada H3A 2T5f Intelligent Systems Laboratory, Department of Electronic and Computer Engineering,Technical University of Crete, Kounoupidiana, 73100 Chania Hellas, GreecegUnited States Department of Veterans Affairs, Stony Brook University, Stony Brook,NY 11794, United States

Received 19 January 2007; received in revised form 25 September 2007; accepted 6 October 2007

KEYWORDSClass prediction;Cross validation;Gastrointestinalbleeding;Machine learning

Summary

Objective: To develop a model to predict the bleeding source and identify the cohortamongst patients with acute gastrointestinal bleeding (GIB) who require urgentintervention, including endoscopy. Patients with acute GIB, an unpredictable event,are most commonly evaluated and managed by non-gastroenterologists. Rapid andconsistently reliable risk stratification of patients with acute GIB for urgent endoscopymay potentially improve outcomes amongst such patients by targeting scarce health-care resources to those who need it the most.Design andmethods: Using ICD-9 codes for acute GIB, 189 patients with acute GIB andall available data variables required to develop and test models were identified from ahospital medical records database. Data on 122 patients was utilized for developmentof the model and on 67 patients utilized to perform comparative analysis of the

* Corresponding author. Tel.: +1 631 880 8510; fax: +1 631 486 6113.E-mail address: [email protected] (A. Kumar).

1 These authors contributed equally to this study.

0933-3657/$ — see front matter # 2007 Elsevier B.V. All rights reserved.doi:10.1016/j.artmed.2007.10.003

Page 2: A decision support system to facilitate management of patients with acute gastrointestinal bleeding

248 A. Chu et al.

models. Clinical data such as presenting signs and symptoms, demographic data,presence of co-morbidities, laboratory data and corresponding endoscopic diagnosisand outcomes were collected. Clinical data and endoscopic diagnosis collected foreach patient was utilized to retrospectively ascertain optimal management for eachpatient. Clinical presentations and corresponding treatment was utilized as trainingexamples. Eight mathematical models including artificial neural network (ANN),support vector machine (SVM), k-nearest neighbor, linear discriminant analysis(LDA), shrunken centroid (SC), random forest (RF), logistic regression, and boostingwere trained and tested. The performance of these models was compared usingstandard statistical analysis and ROC curves.Results: Overall the random forest model best predicted the source, need forresuscitation, and disposition with accuracies of approximately 80% or higher (accu-racy for endoscopy was greater than 75%). The area under ROC curve for RF wasgreater than 0.85, indicating excellent performance by the random forest model.Conclusion: While most mathematical models are effective as a decision supportsystem for evaluation and management of patients with acute GIB, in our testing, theRF model consistently demonstrated the best performance. Amongst patients pre-senting with acute GIB, mathematical models may facilitate the identification of thesource of GIB, need for intervention and allow optimization of care and healthcareresource allocation; these however require further validation.# 2007 Elsevier B.V. All rights reserved.

1. Introduction

Acute gastrointestinal bleeding (GIB) continues tobe a significant healthcare problem due to risingNSAID use and an aging population [1]. Furtherreductions inmortality will most likely require intro-duction of novel strategies to aid identification ofthe cohort requiring aggressive resuscitation andendoscopic intervention to prevent complicationsand death from ongoing bleeding [2,3]. Delays inintervention usually result from failure to ade-quately recognize the source and severity of thebleed. Although several models and scores havebeen developed to risk-stratify patients, no singlemodel aids the identification of the subgroup ofpatients presenting with acute upper or lower GIBthat would most likely benefit from urgent inter-vention [4,5]. Our goal was to utilize mathematicalmodels to formulate a decision support system uti-lizing clinical and laboratory information availablewithin a few hours of patient presentation to predictthe source, need for intervention and disposition inpatients with acute upper, mid- and lower GIB.

There are a number of well-known classificationtools for analyzing data, including artificial neuralnetworks (ANN), k-nearest neighbor (kNN), decisiontrees, support vector machines (SVM), and shrunkencentroid (SC) [6,7]. These classification tools aresupervised learning methods where the algorithmlearns from a training set and establishes a predic-tion rule to classify new samples using statisticalapproaches for class prediction. Logistic regression,an alternative to these classification methods, andFisher’s linear discriminant analysis (LDA) are both

parametric approaches. Logistic regression and LDAdo not differ in functional form, and differ only inthe estimation of coefficients; LDA assumes normaldistribution of the explanatory variables, whilelogistic regression does not. If the Gaussian assump-tions are met, then LDA is a more powerful andefficient model than logistic regression. Standardstatistical model building relies on an a priori col-lection of predictor variables for identifying out-comes of clinical interest. Often it may becomputationally impossible for standard models toovercome the complexities of problems with largedimensionality. Support vector machines (SVM),introduced by Vapnik [6] can overcome the highdimensionality problem computationally and is con-sistently a good classifier and hence widely utilizedas a classification method. Following the recentintroduction of ensemble-voting approaches, twoensemble voting methods, boosting and bagging,have also gained wide popularity [8,9]. An ensembleuses the predictions of multiple base classifiersthrough majority voting. Boosting, a meta-classifier,combines weak classifiers and takes a weightedmajority vote of their predictors. Breiman [10]developed the random forest (RF) method by com-bining classification tree predictors. The baggingalgorithm in RF uses bootstrap samples to build basetrees. Each bootstrap sample is formed by randomlysampling, with replacement, the same number ofobservations as the training set. The final classifica-tion produced by the ensemble of these base clas-sifiers is obtained using equal weight voting.

The study objective was to develop and comparethe performance of eight classification models as

Page 3: A decision support system to facilitate management of patients with acute gastrointestinal bleeding

A decision support system to facilitate management of patients with acute gastrointestinal bleeding 249

Box 1. Independent variables

Presentation

described above to predict clinical outcomes inpatients presenting with acute GIB.

Hematemesis/coffee groundsHematochezia/melenaDuration of symptomsSyncope/presyncope

DemographicsAge, gender

Past historyPrior GIB, unstable CAD (MI, CHF),

COPD exacerbation, CRF, risk of stress ulcer, cirrhosisASA/NSAID use, PPI, prior history of GIB

Clinical examSBP/DBP, HR, orthostasis, NG lavage, rectal exam

Laboratory dataHematocrit, drop in Hct., platelet count,

creatinine, BUN, PT/INR

2. Methods

2.1. Definitions

Upper GIB refers to gastrointestinal blood loss whoseorigin is proximal to the ligament of Treitz; mid-GIBwhose source was below the ligament of Trietz butproximal to the ileocecalvalveand lowerGIBreferredtogastrointestinalblood lossemanatingfromasourcedistal to the ileocecal valve [11]. Acute GIB wasdefinedasbleedingof less than5daysduration.Acuteupper GIB was diagnosed if there was hematemesis,‘‘coffee ground’’ colored emesis, the return of redblood via a nasogastric tube, and/or melena with orwithout hemodynamic compromise or hematochezia(bright red blood per rectum in patients with briskupper GIB) and the presence of a bleeding source intheupperGI tract at endoscopy. Patientswithmelenaand/or hematochezia along with clinical evidence ofbleeding (hemodynamic instability, anemia)andposi-tive findings of a bleeding source at enteroscopy orcapsule endoscopy were designated as mid-GIB.LowerGIBwasdefinedasthe identificationofa sourceof bleeding in the lowerGI in patients presentingwithhematochezia. Hypotension was defined as systolicblood pressure of less than 100 mmHg, tachycardia asheart-rateof greater than100 beats/min,orthostasisas a drop of systolic blood pressure of greater than20 mmHgandan increase inheart rateof greater than20 beats/min, 2 minafter sittingupright fromrecum-bency with legs dangling from the side of the bed.Urgent endoscopy was defined as a procedure thatwas performed within 24 h for upper GIB and within48 h for lower GIB.

2.2. Patients

Patients with acute GIB were identified from thehospital medical records database using ICD-9 codesfor GIB. The studywas carried out in compliancewithall institutional human investigations committeeguidelines. Eligible patients were those presentingwith clinical manifestations of acute upper or lowergastrointestinal bleeding and those who had under-gone endoscopy within 24 h for suspected upper GIBand within 48 h for suspected lower GIB or if upperendoscopy was negative. If no obvious source of GIBwas identified at either upper or lower endoscopy,the patient should have undergone small bowelenteroscopy or capsule endoscopy within 1 week ofthe initial episode of acute GIB. Records of patientsfor whom a definite source of bleeding could not be

identified or those with missing clinical variablesrequired for model building and testing were dis-carded. A total of 189 patients meeting inclusionandexclusion criteriawere identified retrospectivelyfrom a review of hospital medical records. Clinicaldata on each patient was entered into a scannabledata entry form which was scanned into an SQLdatabase andmanually reviewed for errors. Variablesto ascertain clinical outcomes corresponding topatient data included clinical and endoscopic dataas listed in Box 1. Variables that correlate with andare predictive of clinical outcomes amongst patientswith acute GIB were identified from a review of theliterature [12,21]. Clinical outcomes that are rele-vant for the management of patients with acute GIBare as listed in Box 2. Initially only 70 patient datawere available when first building and testing themodels. This was followed by building and testingmodels using a total of 122 patients which resultedin an insignificant improvement in performanceof the models (data not shown). A sample size of122 patients was therefore deemed to be adequatefor building models. An additional 67 patient data-base was utilized for testing and validation of themodels.

2.3. Selection of independent variables topredict outcomes for test cases

Several studies in the past have evaluated variablespredictive of adverse outcomes, predictive ofsource, severity and outcomes in patients withacute upper and lower GIB. Clinical correlates ofsource, severity and outcomes amongst patients

Page 4: A decision support system to facilitate management of patients with acute gastrointestinal bleeding

250 A. Chu et al.

Box 2. Output variables

SourceUpper, mid, lower

ResuscitationYes, no

EndoscopyYes, no

DispositionNot ICU (home, regular floor, monitor floor); ICU

with acute gastrointestinal bleeding were reviewedand listed as in Box 3 [17—29].

2.3.1. Bleeding sourceThe definitive source of bleeding was the irrefutableidentification of a bleeding source at upper endo-scopy, colonoscopy, small bowel enteroscopy orcapsule endoscopy. Variables utilized to predictthe source of GIB included past medical historyand presenting symptoms (prior history of GIB,hematochezia, hematemesis, melena, syncope/presyncope), risk factors (risk for stress ulceration,cirrhosis, ASA/NSAID use), physical exam andlaboratory tests ((blood pressure (SBP/DBP), heartrate (HR), orthostasis, NG lavage, rectal exam,platelet count (Plt), creatinine (Cr), BUN, and INR)).

Box 3. Explanatory variables usedfor each output

Source of bleeding. Prior history of GIB,hematochezia, hematemesis, melena, syncope/presyncope, risk for stress ulcer, cirrhosis, ASA/NSAIDuse, blood pressure, heart rate, orthostasis, NGlavage, rectal exam, platelet count, creatinine, BUN,INRResuscitation. Hematochezia, hematemesis,melena, duration of symptoms, syncope/presyncope,unstable CAD, blood pressure, heart rate, orthostasis,NG lavage, rectal exam, hematocrit, drop inhematocrit, creatinine, BUN, INREndoscopy. Hematochezia, hematemesis, melena,duration of symptoms, syncope/presyncope,cirrhosis, ASA/NSAIDs, blood pressure, heart rate,orthostasis, NG lavage, rectal exam, hematocrit,drop in hematocrit, platelet count, creatinine, BUN,INRDisposition. Age, hematochezia, hematemesis,melena, duration of symptoms, syncope/presyncope,unstable CAD, COPD, CRF, risk for stress ulcer,cirrhosis, blood pressure, heart rate, orthostasis, NGlavage, rectal exam, hematocrit, drop in hematocrit,platelet count, creatinine, BUN, INR

2.3.2. Blood resuscitationUrgent blood resuscitation referred specifically tothe administration of blood and blood products tocorrect loss of intravascular volume and presence ofcoagulopathy. Variables utilized to predict this out-come included symptoms (hematochezia, hema-temesis, melena, duration of symptoms, syncope/presyncope), prior history (unstable CAD), physicalexam and laboratory tests (blood pressure, heartrate, orthostasis, NG lavage, rectal exam, hemato-crit (Hct), drop in hematocrit, creatinine, BUN, andINR).

2.3.3. Urgent endoscopyVariables to predict need for urgent endoscopyincluded symptoms (hematochezia, hematemesis,melena, syncope/presyncope, duration of symp-toms), risk factors (cirrhosis, ASA/NSAID use) andphysical exam and laboratory tests (blood pressure,heart rate, orthostasis, Naso-Gastric lavage, rectalexam, hematocrit, hematocrit drop, platelet count,creatinine, BUN, and INR).

2.3.4. DispositionVariables utilized to predict disposition includedpatient demographics (age), presenting symptoms(hematochezia, hematemesis, melena, duration ofsymptoms, syncope/presyncope), comorbidities(unstable CAD, COPD, CRF, risk for stress ulcer,cirrhosis), clinical exam/blood tests (blood pres-sure, heart rate, orthostasis, NG lavage, rectalexam, hematocrit, drop in hematocrit, plateletcount, creatinine, BUN, and INR).

2.4. Algorithm for analysis of clinical datato ascertain outcomes for training set

The algorithm to ascertain the source of bleedingand appropriate clinical management for eachpatient is as illustrated in Fig. 1 and was determinedjointly by authors (AK, BH, BK) after careful analysisof the clinical and endoscopic data for each patient,in accordance with current evidence and standardof care [13—15]. Source was ascertained by reviewof endoscopic findings for each patient [16].Patients with findings of active bleeding or otherhigh-risk stigmata at endoscopy were reasoned asrequiring intensive resuscitation, urgent endoscopyand admission to the medical intensive care unit.Patients with low-risk stigmata at endoscopy andhemodynamic instability (hypotension, tachycardia,orthostasis), elevated coagulation parameters and/or hematocrit <30, were reasoned to require inten-sive resuscitation but no urgent endoscopy. Conver-sely, patients under 60 years of age with no activebleeding or high-risk stigmata and no co-morbidities

Page 5: A decision support system to facilitate management of patients with acute gastrointestinal bleeding

A decision support system to facilitate management of patients with acute gastrointestinal bleeding 251

Figure 1 Schematic of training examples for classification.

were reasoned as suitable for discharge to home.Patients with co-morbidities were categorized asrequiring admission to general inpatient. Example:A 60-year-old patient with acute onset of profusehematemesis, melena, dizziness, tachycardia,hypotension, positive nasogastric lavage, hemato-crit of 24, is admitted to the regular floor andundergoes endoscopy greater than 24 h after admis-sion. Endoscopy reveals an oozing blood vesselwithin an ulcer bed. Since, in retrospect the patientshould have been resuscitated urgently with blood,undergone urgent endoscopy and admitted to theICU, the patient was classified as requiring urgentresuscitation, endoscopy and admission to the ICU.Each case was similarly analyzed to identify thecorresponding optimal outcome and the databasethus developed was utilized to train the models.

2.5. Importance of each variable

For RF and ANN, the variable importance option wasselected to see if these variables were significant for

ascertaining outcomes by the computer models.Variable importance for RF is given in terms ofthe mean decrease in accuracy, hence higher thenumber, greater the importance of the variable.ANN provides data about the importance of a vari-able without any ranking. Hence the variable impor-tance feature for ANN was repeated ten times inorder to obtain rankings for the variables and thenumber of times a variable was shown to be impor-tant was counted; the closer the number was to 10,the more important the variable. In addition, dis-tributions of the variables as well as correlationbetween variables were examined.

2.6. Models and statistical analysis

Eight predictivemodels including randomforest (RF),support vector machines (SVM), shrunken centroid(SC), linear discriminant analysis (LDA), k-nearestneighbor (kNN), logistic regression (logistic), boost-ing, andartificial neural networks (ANN)were trainedand their performances compared [8,9]. All models

Page 6: A decision support system to facilitate management of patients with acute gastrointestinal bleeding

252 A. Chu et al.

were run in R (version 2.3.0, downloadable fromhttp://cran.cnr.berkeley.edu) except for ANN,which was run in STATISTICA (version 7.1, Statsoft,Inc., Tulsa, OK).

Model training was performed on a randomlyselected subset ofpatients and testing on the remain-ing patients of a total of 122 patient database. Theprimary approach was to use the selected explana-tory variables to predict the response variable dis-carding any patients with missing data. In addition,for predicting resuscitation and endoscopy, an alter-native strategy was adopted. The predicted value of‘‘source of bleeding’’ and selected input variables(hematochezia, hematemesis, syncope/presyncope,blood pressure, heart rate, orthostasis, NG lavage,and INR) were utilized to predict the need for resus-citation and endoscopy. Categorical variables werechanged accordingly to indicator variables.

2.6.1. Model evaluation stepTen runs of 10-fold cross validation (CV) were per-formed for each iteration to obtain a reliable resultwith low mean square error (MSE) and bias [30]. TheMSE is a function of bias and variance and when theestimator is unbiased, MSE reduces to variancebecause the bias term in MSE becomes zero. Eachrun of cross validation is comprised of an indepen-dent training and testing database, where 90% of thedata is put in the training set and the remaining 10%of the data is put into the test set. For every 10-foldCV, the following statistics were calculated: sensi-tivity (SN), specificity (SP), accuracy (ACC: the sumof correct predictions divided by total predictions),positive predictive value (PPV: probability that thepatient is truly positive given a positive prediction),and negative predictive value (NPV: probability thata patient is truly negative given a negative predic-tion). For each classification model, statisticalresults of 10 repetitions of 10-fold CV were averagedand reported.

ROC curves were calculated and areas under thecurve were compared for each of the eight models.The Mann—Whitney statistic was calculated, whichis equivalent to the area under an ROC curve [31].Additionally, performance measures between mod-els were compared using McNemar’s test. Using theBonferroni correction to account for multiple com-parisons of models, an appropriate alpha value wasused for each test to control the error rate. Forexample, there were 8 models for the resuscitationresponse; hence there were 28 pair-wise compar-isons. Thus for an original alpha value of 0.05, thenew alpha used for a two-sided test was 0.05/28 = 0.0018. A normal approximation and CentralLimit Theorem was used for underlying assumptionsfor the test.

2.6.2. Model validation stepAfter models had been trained using a 122 patientdatabase, an unseen before 67 patient database wasutilized as a test set to validate the model. Accura-cies were compared for both the evaluation step andthe validation step.

2.7. Individual model parameters

2.7.1. Random forest (package random forest)All default settings were used (500 trees weregrown and the number of variables randomlysampled at each node was

ffiffiffiffi

pp

where p is thenumber of explanatory variables). The variableoption of importance was set to true. To predictthe outcome of a certain patient, voting was donewithout normalizing.

2.7.2. Support vector machine (packagee1071)All default settings were used, variables werescaled, tolerance value to terminate algorithmwas set to 0.001, epsilon set to 1 (for insensitive-loss function). Both the radial basis (default) andlinear kernels were considered, but only the radialbasis function is included in this comparison becausethe two kernels gave similarly high accuracies.

2.7.3. Shrunken centroid (package pamr)An optimal threshold value for each response wasfound by cross validation. The best threshold valuewas one that gave the highest accuracies–—a thresh-old of 0.2 was used for all responses.

2.7.4. Linear discriminant analysis (packagesma), kNN (package class), and logisticregression (package stats) modelsAll default settings were used. Depending on whatresponse variable was being classified, a different kwas used for kNN. By performing CV on severaldifferent k values, the k that yielded the highestaccuracy was chosen. The final values chosen werek = 7 for source and resuscitation, k = 11 for endo-scopy, and k = 3 for disposition.

2.7.5. Boosting (package boost)No features were pre-selected (presel = 0), boostingran for 10 iterations (mfinal = 10). The optimal set-tings were found by trying different combinations ofpresel and mfinal. The R Boosting package containsfour different approaches: AdaBoost, LogitBoost,L2Boost and BagBoost. AdaBoost was excluded, asoften it did not classify patients correctly. Perfor-mance of the other three boosting methods weresimilar. Among these three approaches, LogitBoostis included in the comparison.

Page 7: A decision support system to facilitate management of patients with acute gastrointestinal bleeding

A decision support system to facilitate management of patients with acute gastrointestinal bleeding 253

Table 1 Evaluation step—performance of models for output source of bleeding (standard error)

ACC SN SP PPV NPV AUC

RF 0.943 (0.007) 0.980 (0.004) 0.932 (0.007) 0.967 (0.005) 0.959 (0.006) 0.998SVM 0.930 (0.007) 0.965 (0.005) 0.945 (0.007) 0.973 (0.005) 0.932 (0.007) 0.979SC 0.914 (0.008) 0.965 (0.005) 0.890 (0.009) 0.946 (0.007) 0.927 (0.008) 0.978LDA 0.931 (0.007) 0.965 (0.005) 1.000 (0.000) 1.000 (0.000) 0.935 (0.007) 0.987kNN 0.697 (0.013) 0.901 (0.009) 0.287 (0.013) 0.717 (0.013) 0.591 (0.014) 0.658ANN 0.917 (0.008) 0.972 (0.005) 0.936 (0.007) 0.968 (0.005) 0.944 (0.007) 0.999

2.7.6. Artificial neural network (NeuralNetworks package)Multilayer perceptrons with back propagation wasperformed. The error function used was the crossentropy function. A linear synaptic function wasused and a combination of the following four activa-tion functions were used: linear, hyperbolic, soft-max, and logistic. The number of epochs to train themodel was set to 100 although the network alwaysconverged in less number of epochs. The learningrate was set to 0.01 and there was one hidden layerin the network. For the source of bleeding response,there were 30 input neurons, 11 hidden neurons(neurons in the hidden layer), and 3 outputneurons. For the resuscitation response, there were28 input neurons, 10 hidden neurons, and 1 outputneuron. For the endoscopy response, there were 31input neurons, 12 hidden neurons, and 1 outputneuron. For the disposition response, there were34 input neurons, 34 hidden neurons, and 1 outputneuron.

Table 2 Evaluation step–—performance of models for outp

ACC SN SP

RF 0.932 (0.007) 0.937 (0.007) 0.923SVM 0.941 (0.007) 0.938 (0.007) 0.945SC 0.915 (0.008) 0.929 (0.007) 0.891LDA 0.922 (0.008) 0.904 (0.009) 0.955kNN 0.884 (0.009) 0.903 (0.009) 0.852ANN 0.921 (0.008) 0.927 (0.008) 0.910Logistic 0.923 (0.008) 0.939 (0.007) 0.895LogitBoost 0.647 (0.014) 0.916 (0.008) 0.184

Table 3 Evaluation step–—performance of models for outp

ACC SN SP

RF 0.790 (0.012) 0.854 (0.010) 0.671SVM 0.803 (0.011) 0.859 (0.010) 0.700SC 0.811 (0.011) 0.838 (0.011) 0.760LDA 0.833 (0.011) 0.821 (0.011) 0.857kNN 0.796 (0.012) 0.876 (0.010) 0.648ANN 0.778 (0.012) 0.801 (0.012) 0.733Logistic 0.787 (0.012) 0.871 (0.010) 0.831LogitBoost 0.627 (0.014) 0.891 (0.009) 0.138

3. Results

3.1. Models

Tables 1—4 summarize the results for each outcomeprediction variable for the evaluation step. Eightmodels were run (only six of the eight models wereutilized to predict source, since logistic regressionand boosting can be only used for two-way classifica-tion problems (while output source included threeoutcomes-upper, mid and lower) using the primaryapproach. Figs. 2—5 depict the accuracies obtainedfrom each individual model and each response vari-able for the evaluation step. Overall, accuraciesobtained using SVM, RF, and LDA were numericallysuperior to others, with these models correctly pre-dicting the sourceofbleeding, need for resuscitation,and disposition correctly 88—94% of the time. Theneed for endoscopy was correctly predicted about80% of the time using SVM, SC, and LDA; accuracyusing RF was just under 80% at 79%. Logistic regres-

ut resuscitation (standard error)

PPV NPV AUC

(0.008) 0.954 (0.006) 0.894 (0.009) 0.982(0.007) 0.968 (0.005) 0.899 (0.009) 0.964(0.009) 0.936 (0.007) 0.879 (0.009) 0.920(0.006) 0.972 (0.005) 0.852 (0.010) 0.937(0.010) 0.914 (0.008) 0.835 (0.011) 0.890(0.008) 0.946 (0.007) 0.880 (0.009) 0.993(0.009) 0.940 (0.007) 0.897 (0.009) 0.985(0.011) 0.662 (0.014) 0.481 (0.014) 0.381

ut endoscopy (standard error)

PPV NPV AUC

(0.014) 0.828 (0.011) 0.712 (0.013) 0.871(0.013) 0.842 (0.011) 0.728 (0.013) 0.820(0.012) 0.866 (0.010) 0.717 (0.013) 0.801(0.010) 0.914 (0.008) 0.720 (0.013) 0.843(0.014) 0.822 (0.011) 0.737 (0.013) 0.766(0.013) 0.849 (0.010) 0.665 (0.014) 0.913(0.014) 0.815 (0.011) 0.726 (0.013) 0.853(0.010) 0.658 (0.014) 0.403 (0.014) 0.404

Page 8: A decision support system to facilitate management of patients with acute gastrointestinal bleeding

254 A. Chu et al.

Table 4 Evaluation step–—performance of models for output disposition (standard error)

ACC SN SP PPV NPV AUC

RF 0.883 (0.009) 0.907 (0.008) 0.843 (0.011) 0.908 (0.008) 0.841 (0.011) 0.967SVM 0.887 (0.009) 0.929 (0.007) 0.816 (0.011) 0.896 (0.009) 0.872 (0.010) 0.922SC 0.897 (0.009) 0.916 (0.008) 0.866 (0.010) 0.921 (0.008) 0.858 (0.010) 0.891LDA 0.897 (0.009) 0.891 (0.009) 0.909 (0.008) 0.943 (0.007) 0.830 (0.011) 0.901kNN 0.876 (0.010) 0.923 (0.008) 0.798 (0.012) 0.886 (0.009) 0.858 (0.010) 0.881ANN 0.850 (0.010) 0.829 (0.011) 0.889 (0.009) 0.928 (0.007) 0.752 (0.013) 0.972LogitBoost 0.584 (0.014) 0.819 (0.011) 0.184 (0.011) 0.629 (0.014) 0.377 (0.014) 0.324

Figure 2 Accuracies for output-bleeding source (evaluation step).

Figure 3 Accuracies for output-resuscitation (evaluation step).

Figure 4 Accuracies for output-endoscopy (evaluation step).

Page 9: A decision support system to facilitate management of patients with acute gastrointestinal bleeding

A decision support system to facilitate management of patients with acute gastrointestinal bleeding 255

Figure 5 Accuracies for output-disposition (evaluation step).

sion did well for predicting resuscitation and endo-scopy, however it did not do well for disposition. Thisis because the algorithm for obtaining the modelrarely converged, that is, it was unstable (resultsnot shown). Accuracies using boosting were worst ofall models. Accuracies obtained with the ANN modelwere inferior to SVM, RF and LDA. At a significancelevel of a = 0.05, kNN performed the worst for pre-dicting source and resuscitation. The linear discrimi-nant analysis model appeared to demonstrate a goodoverall performance with regards to sensitivity spe-cificity, PPV, NPV and accuracy (Tables 1—4). Boost-ing, on the other hand, revealed an imbalancebetween sensitivity and specificity. The accuraciesobtained using the alternative strategy for predictingresuscitation and endoscopy showed no significantimprovement over the primary approach (Table 5).

Figure 6 ROC curves for predicting source of bleeding(evaluation step).

ROC curves were constructed (Figs. 6—9 andTables 1—4), and overall RF and ANN have the highestAUC (area under the curve), followed by SVM andLDA. For predicting resuscitation and endoscopy, thelogistic model had excellent AUC. In the validationstep, RF consistently predicted responseswith higheraccuracies as compared to the other models(Table 6). Overall, results from the evaluation andvalidation steps suggests that the RF model consis-tently performs the best.

3.2. Importance of variables

The importance of each variable in predicting out-comes in ascertaining relevant outcomes when usingrandom forest and ANN respectively are shown inTable 7 (using 122 patient database). About half the

Figure 7 ROC curves for predicting resuscitation (eva-luation step).

Page 10: A decision support system to facilitate management of patients with acute gastrointestinal bleeding

256 A. Chu et al.

Figure 8 ROC curves for predicting endoscopy (evalua-tion step).

Figure 9 ROC curves for predicting disposition (evalua-tion step).

variables to predict outcomes were of commonimportance to both RF and ANN models. For pre-dicting source, explanatory variables hematemesisthrough HR (heart rate) had significant weights andwere hence importance. The remaining variableswere of variable importance aside from ASA/NSAID,which did not appear to have great influence on theperformance of the models. For predicting resusci-tation, both models utilized the variables syncope,orthostasis, hematocrit, hematocrit drop, blood

Table 5 Evaluation step–—accuracies of predicting resuscitaerror)

RF SVM

Resuscitation 0.934 (0.007) 0.935 (0.Endoscopy 0.783 (0.012) 0.787 (0.

kNN ANN

Resuscitation 0.883 (0.009) 0.915 (0.Endoscopy 0.794 (0.012) 0.783 (0.

Table 6 Validation step—predictive accuracies using a 67

Source of bleeding Resu

RF 0.928 0.85SVM 0.826 0.79SC 0.855 0.86LDA 0.768 0.82kNN 0.783 0.82ANN 0.884 0.82Logistic N/A 0.79LogitBoost N/A 0.56

pressure, heart rate, hematemesis and melena asimportant predictor variables. The remainder of thevariables was of limited and variable importance inpredicting outcomes.

4. Discussion and conclusion

Although superior healthcare outcomes may beexpected if gastroenterologists manage all patients

tion and endoscopy using source as a variable (standard

SC LDA

007) 0.909 (0.008) 0.922 (0.008)012) 0.809 (0.011) 0.818 (0.011)

Logistic LogitBoost

008) 0.918 (0.008) 0.631 (0.014)012) 0.792 (0.012) 0.631 (0.014)

patient database

scitation Endoscopy Disposition

1 0.753 0.7971 0.681 0.7836 0.696 0.7681 0.681 0.7971 0.667 0.7831 0.638 0.7541 0.710 N/A7 0.551 0.362

Page 11: A decision support system to facilitate management of patients with acute gastrointestinal bleeding

A decision support system to facilitate management of patients with acute gastrointestinal bleeding 257

Table 7 Variable importance using RF and ANN

RF ANN

SourceHematemesis 0.1388 10NG lavage 0.0692 10Hematochezia 0.0611 10BUN 0.0484 10Rectal 0.0374 10Melena 0.0221 9Orthostasis 0.0116 10Hx. of GIB 0.0088 8Heart Rate 0.0066 10Creatinine 0.0055 7SBP 0.0051 4DBP 0.0043 9Syncope 0.0032 9INR 0.0031 5Platelets 0.0010 6Risk for stress ulcer 0.0008 10Cirrhosis 0.0004 10ASA/NSAID use �0.0010 1

RF ANN

ResuscitationSyncope 0.1097 10Orthostasis 0.0782 10Hct. drop 0.0495 10Diastolic BP 0.0276 10Hct 0.0232 10HR 0.0139 10Systolic BP 0.0114 10Hematemesis 0.0052 10Melena 0.0032 10Creatinine 0.0019 1NG lavage 0.0019 9BUN 0.0018 6Hematochezia 0.0015 10Duration 0.0011 6INR 0.0009 6Rectal 0.0006 7Unstable CAD �0.0016 5

RF ANN

EndoscopySyncope 0.0507 10Orthostasis 0.0259 10Hct 0.0223 10Heart Rate 0.0213 9Hct. drop 0.0188 10Diastolic BP 0.0178 9Hematemesis 0.0133 10Rectal 0.0054 9BUN 0.0051 5Systolic BP 0.0046 9INR 0.0043 2Cirrhosis 0.0039 8Melena 0.0022 9Platelets 0.0020 3Risk for stress ulcer 0.0015 9

Table 7 (Continued )

RF ANN

Duration 0.0008 8Creatinine 0.0003 2ASA/NSAID use 0.00008 4Hematochezia 0.00001 9NG lavage �0.0004 8

RF ANN

DispositionOrthostasis 0.0585 10Heart Rate 0.0431 10Hct 0.0340 10Systolic BP 0.0287 10Syncope 0.0271 10Distolic BP 0.0217 10Hct. drop 0.0189 10Rectal 0.0094 10Age 0.0070 10NG lavage 0.0055 8BUN 0.0054 8INR 0.0051 4Risk for stress ulcer 0.0037 8Platelets 0.0036 6Melena 0.0026 9Hematochezia 0.0022 6Hematemesis 0.0010 7Cirrhosis 0.0007 8COPD 0.0006 8CRF 0.00007 4Duration 0.00001 4Creatinine �0.00006 3Unstable CAD �0.0002 5

with acute GIB [32], it is logistically impossible forevery patient with acute GIB to be emergently eval-uated and treated by a gastroenterologist, as theonset of acute GIB is unpredictable. It is also imprac-tical and economically unjustifiable to subject everypatient with acute GIB to intensive resuscitation andurgent endoscopy as only 20% of patients with acuteGIBmay require urgent intervention and furthermorebecause healthcare resources are expensive andlimited. ‘‘Expert systems’’maypotentially substitutefor a specialist amongst patients with acute GIB andfacilitate triage thecohortmost likely tobenefit fromurgent resuscitation and endoscopy. ‘‘Machine learn-ing’’ or computer-assisted predictive models havebeen successfully utilized to optimize treatmentand predict clinical outcomes in a variety of otherconditions [33—36], such as computerized interpre-tation of the electrocardiogram [37], to help stream-line and optimize care of patients with acutemyocardial infarction [38], especially in a busy prac-tice or in theemergency room[39].Our objectivewasto develop a systemtoprovide diagnostic and specifictreatment recommendations for patients presenting

Page 12: A decision support system to facilitate management of patients with acute gastrointestinal bleeding

258 A. Chu et al.

with acute GIB. Such models have previously beenshown to accurately predict theneed for colonoscopyin patients with acute lower GIB [5]. However, theyhave not been used for all patients with acute GIbleeding. The recommendations were designed to bein agreement with current evidence based guidelinesfor management of acute GIB. Our models success-fullyprovidedpatient specific recommendationswithaccuracies exceeding 70—80%. In the present study,RF performed well in classification of the fourresponse variables, in agreement with previous stu-dies that have demonstrated the robustness of suchmodels. RF and SVM are designed for high-dimen-sional data with a large feature space (large numberof predictor variables) compared to the sample sizeand are likely to outperform other methods for high-dimensional data unlike thecurrentGIBdata set [40].Logistic regression is a widely used standard regres-sionmodel for binary data, and it can be expanded todata with more than two classes. However, it oftenshows computational instability such as failure toconverge or the predicted value being extremelyclose to 1 or 0 due to the nature of the model. Ourresults also support the conclusion by Ahn et al. inprevious studies, that boosting strategies in generalprovide poor accuracies [40]. Furthermore, given itscomplexity, these are also cumbersomeandunwieldyas compared toothermethods. Althoughnot relevantto our problem, LDA and kNN require a variable pre-selection for an optimal performance unlike RF orSVM for high-dimensional data.

Statistical variable selection is often dependenton the criteria and is computer intensive. Withregards to the analysis of the importance of vari-ables, we show that both the RF and ANN modelsconsidered half of the variables to be important withthe remaining half being of varied importance. Thisappears to be consistent with prior knowledge inregards to importance of variables identified topredict source and severity of acute GIB. Everypre-selected variable was important for one modelor the other and therefore consistent with theiridentification in prior multivariate analysis. Differ-ent models assigned varied importance to differentvariables due to the different methods for evaluat-ing variable importance. In RF, for every tree grownin the forest, test samples are used to count thenumber of votes cast for the correct class. RFrandomly permutes the values of a selected variablein the test set and put these cases down the tree. Itfinds the number of votes for the correct class in thedata with this permuted variable. It subtracts thisnumber from the number of votes for the correctclass in the original data without permutation. Theaverage of this number of all trees in the forest is theraw importance score for the variable. ANN uses a

Connection Weight Approach to quantify variableimportance. Given that a minimal variable set topredict outcomes is desirable, overall our datasetappears to be optimal to predict clinical outcomesrelevant to management of patients with acute GIB.

Since the output utilizes training examples whichare developed by individuals, bias can potentially beintroduced through using inaccurate training exam-ples. The authors have the ability to influence therecommendations by modulating outcomes asso-ciated with each training example. Since, examplesare utilized to train the models, flaws in examplesmay lead to flaws in the model output: garbage in-garbage out. Despite these flaws, such systems havethe potential to facilitate and standardize the care ofpatients presenting with acute GIB. Given that com-puterbased tools aremore likely towork if integratedwith clinical care, prospective validation and inte-gration of such a model into an electronic medicalrecordmay potentially enhance care of patients withacute GIB. It is also possible to train the model usingfresh examples so as to adapt to changing guidelinesand varied clinical scenarios, allowing these predic-tivemodels tobeportable toabroad rangeof locales.The prospective development and testing of a modelusing a larger patient cohort prospectively, its imple-mentation and comparison of its performance tophysicians is underway.

In summary, predictive models such as shownabove, allow the identification of the high-risk cohortamongst patients presenting with acute GIB to allowoptimal allocation of resources to patients who maypotentially benefit the most from urgent resuscita-tion, endoscopy and intensive clinical care.

Acknowledgement

The study was funded by 2005 Research and Out-comes Effectiveness Awards of the American Societyof Gastrointestinal Endoscopy (ASGE) to Atul Kumar.

References

[1] Rockall TA, Logan RF, Devlin HB, Northfield TC. Incidence ofand mortality from acute upper gastrointestinal haemor-rhage in the United Kingdom Steering Committee and mem-bers of the National Audit of Acute Upper GastrointestinalHaemorrhage. Br Med J 1995;311:222—6.

[2] Baradarian R, Ramdhaney S, Chapalamadugu R, Skoczylas L,Wang K, Rivilis S, et al. Early intensive resuscitation ofpatients with upper gastrointestinal bleeding decreasesmortality. Am J Gastroenterol 2004;99:619—22.

[3] Elta GH. Urgent colonoscopy for acute lower-GI bleeding.Gastrointest Endosc 2004;59:402—8 [Review].

Page 13: A decision support system to facilitate management of patients with acute gastrointestinal bleeding

A decision support system to facilitate management of patients with acute gastrointestinal bleeding 259

[4] Das A, Wong RC. Prediction of outcome of acute GI hemor-rhage: a review of risk scores and predictive models. Gas-trointest Endosc 2004;60:85—93.

[5] Das A, Ben-Menacehm T, Cooper GS, Chak A, Sivak Jr MV,Gonet JA, et al. Prediction of outcome in acute lower-gastrointestinal haemorrhage based on an artificial neuralnetwork: internal and external validation of a predictivemodel. Lancet 2003;362:1261—6.

[6] Vapnik V. The nature of statistical learning theory. New York,NY: Springer-Verlag; 1995.

[7] Tibshirani R, Hastie T, Narasimhan B, Chu G. Diagnosis ofmultiple cancer types by shrunken centroids of gene expres-sion. Proc Natl Acad Sci 2002;99:6567—72.

[8] Breiman L. Bagging predictors. Mach Learn 1996;24:123—40.[9] Schapire RE. The strength of weak learnability. Mach Learn

1990;5:197—227.[10] Breiman L. Random forest. Mach Learn 2001;45:5—32.[11] Prakash C, Zuckerman GR. Gastrointest Endosc 2003;58:

330—5;Prakash C, Zuckerman GR. Acute small bowel bleeding: adistinct entity with significantly different economic impli-cations compared with GI bleeding from other locations.Gastrointest Endosc 2003;58:409—12.

[12] Barkun A, Bardou M, Marshall JK, Nonvariceal Upper GIBleeding Consensus Conference Group. Consensus recom-mendations for managing patients with nonvariceal uppergastrointestinal bleeding. Ann Intern Med 2003;139(10):843—57.

[13] Palmer KR. Non-variceal upper gastrointestinal haemor-rhage: guidelines. Gut 2002;51(Suppl. 4):iv1—6.

[14] Hay JA, Maldonado L, Weingarten SR, Ellrodt AG. Prospec-tive evaluation of a clinical guideline recommending hospi-tal length of stay in upper gastrointestinal tracthemorrhage. JAMA 1997;278(24):2151—6.

[15] Hay JA, Lyubashevsky E, Elashoff J, Maldonado L, Weingar-ten SR, Ellrodt AG. Upper gastrointestinal hemorrhage clin-ical-guideline determining the optimal hospital length ofstay. Am J Med 1996;100:313—22.

[16] Adler DG, Leighton JA, Davila RE, Hirota WK, Jacobson BC,Quereshi WA, et al. ASGE guideline: the role of endoscopy inacute non-variceal upper-GI hemorrhage. GastrointestEndosc 2004;60:497—504.

[17] Klebl F, Bregenzer N, Schofer L, Tamme W, Langgartner J,Scholmerich J, et al. Risk factors for mortality in severeupper gastrointestinal bleeding. Int J Colorectal Dis2004;19.

[18] Rockall TA, Logan RF, Devlin HB, Northfield TC. Risk assess-ment after acute upper gastrointestinal haemorrhage. Gut1996;38:316—21.

[19] Rockall TA, Logan RF, Devlin HB, Northfield TC. Selection ofpatients for early discharge or outpatient care after acuteupper gastrointestinal haemorrhage National Audit ofAcute Upper Gastrointestinal Haemorrhage. Lancet 1996;347(9009):1138—40.

[20] Velayos FS, Williamson A, Sousa KH, Lung E, Bostrom A,Weber EJ, et al. Early predictors of severe lower gastro-intestinal bleeding and adverse outcomes: a prospectivestudy. Clin Gastroenterol Hepatol 2004;2:485—90.

[21] Strate LL, Orav EJ, Syngal S. Early predictors of severity inacute lower intestinal tract bleeding. Arch Intern Med2003;163(7):838—43.

[22] Kalula SZ, Swingler GH, Louw JA. Clinical predictors ofoutcome in acute upper gastrointestinal bleeding. SouthAfr Med J 2003;93:286—90.

[23] Bordley DR, Mushlin AI, Dolan JG, Richardson WS, Barry M,Polio J, et al. Early clinical signs identify low-risk patientswith acute upper gastrointestinal hemorrhage. JAMA1985;253(22):3282—5.

[24] Mortensen PB, Nohr M, Moller-Petersen JF, Balslev I. Thediagnostic value of serum urea/creatinine ratio in distin-guishing between upper and lower gastrointestinal bleedingA prospective study. Danish Med Bull 1994;41:237—40.

[25] Zimmerman J, Siguencia J, Tsvang E, Beeri R, Arnon R.Predictors of mortality in patients admitted to hospitalfor acute upper gastrointestinal hemorrhage. Scand J Gas-troenterol 1995;30:327—31.

[26] Terdiman JP, Ostroff JW. Risk of persistent or recurrent andintractable upper gastrointestinal bleeding in the era oftherapeutic endoscopy. Am J Gastroenterol 1997;92:1805—11.

[27] Corley DA, Stefan AM, Wolf M, Cook EF, Lee TH. Earlyindicators of prognosis in upper gastrointestinal hemor-rhage. Am J Gastroenterol 1998;93:336—40.

[28] Blatchford O, Murray WR, Blatchford M. A risk score topredict need for treatment for upper-gastrointestinal hae-morrhage. Lancet 2000;356(9238):1318—21.

[29] Zaragoza A, Tenias JM, Llorente MJ. Pre-endoscopic prog-nostic factors in non-varicose upper gastrointestinal bleed-ing. Development of a predictive algorithm. Rev Esp EnfermDig 2002;94:139—48.

[30] Molinaro AM, Simon R, Pfeiffer RM. Prediction error estima-tion: a comparison of resampling methods. Bioinformatics2005;21:3301—7.

[31] Bamber D. The area above the ordinal dominance graph andthe area below the receiver operating graph. J Math Psychol1975;12:387—415.

[32] Quirk DM, Barry MJ, Aserkoff B, Podolsky DK. Physicianspecialty and variations in the cost of treating patients withacute upper gastrointestinal bleeding. Gastroenterology1997;113:1443—8.

[33] Timmerman D, Verrelst H, Bourne TH, De Moor B, Collins WP,Vergote I, et al. Artificial neural network models for thepreoperative discrimination between malignant and benignadnexal masses. Ultrasound Obstet Gynecol 1999;13:17—25.

[34] Rosenblatt KP, Bryant-Greenwood P, Killian JK, Mehta A,Geho D, Espina V, et al. Serum proteomics in cancer diag-nosis and management. Annu Rev Med 2004;55:97—112.

[35] Selaru FM, Xu Y, Yin J, Zou T, Liu TC, Mori Y, et al. Artificialneural networks distinguish among subtypes of neoplasticcolorectal lesions. Gastroenterology 2002;122:606—13.

[36] Chong CF, Li YC, Wang TL, Chang H. Stratification of adverseoutcomes by preoperative risk factors in coronary arterybypass graft patients: an artificial neural network predictionmodel. Proc AMIA Annu Symp 2003;160—4.

[37] Lund LH. Comment on: computerized interpretation of theelectrocardiogram. Arch Intern Med 2004;164(15):1698—9.

[38] Kennedy RL, Harrison RF, Burton AM, Fraser HS, Hamer WG,MacArthur D, et al. An artificial neural network system fordiagnosis of acute myocardial infarction (AMI) in the acci-dent and emergency department: evaluation and compar-ison with serum myoglobin measurements. Comput MethodsProg Biomed 1997;52:93—103.

[39] Lisboa PJ. A review of evidence of health benefit fromartificial neural networks in medical intervention. NeuralNetworks 2002;15:11—39.

[40] Ahn H, Moon H, Fazzari MJ, Lim N, Chen JJ, Kodell RL.Classification by ensembles from random partitions of high-dimensional data. Comput Stat Data Anal 2007;51:6166—79.