13
Predicting Discharge Dates From the NICU Using Progress Note Data Michael W. Temple, MD a , Christoph U. Lehmann, MD a,b , Daniel Fabbri, PhD a abstract BACKGROUND AND OBJECTIVES: Discharging patients from the NICU may be delayed for nonmedical reasons including the need for medical equipment, parental education, and childrens services. We describe a method to predict which patients will be medically ready for discharge in the next 2 to 10 days, providing lead time to address nonmedical reasons for delayed discharge. METHODS: A retrospective study examined 26 features (17 extracted, 9 engineered) from daily progress notes of 4693 patients (103 206 patient-days) from the NICU of a large, academic childrens hospital. These data were used to develop a supervised machine learning problem to predict days to discharge (DTD). Random forest classi ers were trained by using examined features and International Classication of Diseases, Ninth Revisionbased subpopulations to determine the most important features. RESULTS: Three of the 4 subpopulations (premature, cardiac, gastrointestinal surgery) and all patients combined performed similarly at 2, 4, 7, and 10 DTD with area under the curve (AUC) ranging from 0.854 to 0.865 at 2 DTD and 0.723 to 0.729 at 10 DTD. Patients undergoing neurosurgery performed worse at every DTD measure, scoring 0.749 at 2 DTD and 0.614 at 10 DTD. This model was also able to identify important features and provide rule-of-thumbcriteria for patients close to discharge. By using DTD equal to 4 and 2 features (oral percentage of feedings and weight), we constructed a model with an AUC of 0.843. CONCLUSIONS: Using clinical features from daily progress notes provides an accurate method to predict when patients in the NICU are nearing discharge. WHATS KNOWN ON THIS SUBJECT: Discharge from the NICU requires coordination and may be delayed for nonmedical reasons. Predicting when patients will be medically ready for discharge can avoid these delays and result in cost savings for the hospital. WHAT THIS STUDY ADDS: We developed a supervised machine learning approach using real-time patient data from the daily neonatology progress note to predict when patients will be medically ready for discharge. Departments of a Biomedical Informatics, and b Pediatrics, Vanderbilt University, Nashville, Tennessee Dr Temple drafted the manuscript, contributed to the data collection, analysis, and model development, reviewed and revised the manuscript, and prepared it for publication; Dr Lehmann assisted with the data collection, aided in the selection of relevant clinical features for the model, categorized ICD-9 codes for grouping patients into distinct populations, and reviewed and revised the manuscript; Dr Fabbri assisted with the data collection, analysis, and model development, contributed to the machine learning and statistical analysis of the data, and reviewed and revised the manuscript; and all authors approved the nal manuscript as submitted. www.pediatrics.org/cgi/doi/10.1542/peds.2015-0456 DOI: 10.1542/peds.2015-0456 Accepted for publication May 19, 2015 Address correspondence to Michael W. Temple, MD, Department of Biomedical Informatics, Vanderbilt University School of Medicine, 2525 West End, Suite 1475, Nashville, TN 37203-8390. E-mail: [email protected], [email protected] PEDIATRICS (ISSN Numbers: Print, 0031-4005; Online, 1098-4275). Copyright © 2015 by the American Academy of Pediatrics PEDIATRICS Volume 136, number 2, August 2015 ARTICLE by guest on June 8, 2018 www.aappublications.org/news Downloaded from

Predicting Discharge Dates From the NICU Using …pediatrics.aappublications.org/content/pediatrics/early/2015/07/21/... · Predicting Discharge Dates From the ... predicts the upcoming

  • Upload
    buikiet

  • View
    218

  • Download
    4

Embed Size (px)

Citation preview

Predicting Discharge Dates From theNICU Using Progress Note DataMichael W. Temple, MDa, Christoph U. Lehmann, MDa,b, Daniel Fabbri, PhDa

abstractBACKGROUND AND OBJECTIVES: Discharging patients from the NICU may be delayed for nonmedicalreasons including the need for medical equipment, parental education, and children’s services.We describe a method to predict which patients will be medically ready for discharge inthe next 2 to 10 days, providing lead time to address nonmedical reasons for delayeddischarge.

METHODS: A retrospective study examined 26 features (17 extracted, 9 engineered)from daily progress notes of 4693 patients (103 206 patient-days) from the NICU ofa large, academic children’s hospital. These data were used to develop a supervisedmachine learning problem to predict days to discharge (DTD). Random forestclassifiers were trained by using examined features and International Classification ofDiseases, Ninth Revision–based subpopulations to determine the most importantfeatures.

RESULTS: Three of the 4 subpopulations (premature, cardiac, gastrointestinal surgery) and allpatients combined performed similarly at 2, 4, 7, and 10 DTD with area under the curve(AUC) ranging from 0.854 to 0.865 at 2 DTD and 0.723 to 0.729 at 10 DTD. Patientsundergoing neurosurgery performed worse at every DTD measure, scoring 0.749 at 2 DTDand 0.614 at 10 DTD. This model was also able to identify important features and provide“rule-of-thumb” criteria for patients close to discharge. By using DTD equal to 4 and 2features (oral percentage of feedings and weight), we constructed a model with an AUC of0.843.

CONCLUSIONS: Using clinical features from daily progress notes provides an accurate method topredict when patients in the NICU are nearing discharge.

WHAT’S KNOWN ON THIS SUBJECT: Dischargefrom the NICU requires coordination and may bedelayed for nonmedical reasons. Predicting whenpatients will be medically ready for dischargecan avoid these delays and result in cost savingsfor the hospital.

WHAT THIS STUDY ADDS: We developeda supervised machine learning approach usingreal-time patient data from the daily neonatologyprogress note to predict when patients will bemedically ready for discharge.

Departments of aBiomedical Informatics, and bPediatrics, Vanderbilt University, Nashville, Tennessee

Dr Temple drafted the manuscript, contributed to the data collection, analysis, and modeldevelopment, reviewed and revised the manuscript, and prepared it for publication; Dr Lehmannassisted with the data collection, aided in the selection of relevant clinical features for the model,categorized ICD-9 codes for grouping patients into distinct populations, and reviewed and revisedthe manuscript; Dr Fabbri assisted with the data collection, analysis, and model development,contributed to the machine learning and statistical analysis of the data, and reviewed and revisedthe manuscript; and all authors approved the final manuscript as submitted.

www.pediatrics.org/cgi/doi/10.1542/peds.2015-0456

DOI: 10.1542/peds.2015-0456

Accepted for publication May 19, 2015

Address correspondence to Michael W. Temple, MD, Department of Biomedical Informatics,Vanderbilt University School of Medicine, 2525 West End, Suite 1475, Nashville, TN 37203-8390. E-mail:[email protected], [email protected]

PEDIATRICS (ISSN Numbers: Print, 0031-4005; Online, 1098-4275).

Copyright © 2015 by the American Academy of Pediatrics

PEDIATRICS Volume 136, number 2, August 2015 ARTICLE by guest on June 8, 2018www.aappublications.org/newsDownloaded from

Approximately 4 million babies areborn every year in the United States,and about 11% (∼440 000) of themare born prematurely.1 Caring forinfants in the NICU poses a significantfinancial burden to the health caresystem, with an estimated total cost of$26 billion.1 The cost per day of NICUcare can be several thousand dollars;therefore, discharging these infants assoon as they are medically ready iscritical to controlling expenditures.

Delayed discharge of hospitalizedpatients who are medically ready isa common occurrence often linked todependency and the need to providepostdischarge services.2 In olderadults, difficulties in coordinatingpostdischarge services, lack ofanticipation of discharge, and absenceof caregivers at home were associatedwith delayed discharge of medicallyready patients.3 Similarly, discharginga patient from the NICU usuallyrequires a great deal of coordination.Neonates discharged from the NICUare prime examples of patients withdependencies (on parents andcaregivers) and significantpostdischarge needs such as primarycare, specialists, physical and speechtherapy, neonatal follow-upappointments, home equipmentservices, and home nursing. In casesof intrauterine drug exposure,discharge is often dependent on ChildProtective Services approval. Parentshave to demonstrate their ability tooperate medical equipment,administer home medication, andfeed and care for their medicallyfragile infant. In addition, a number ofservices must be scheduled aroundthe time of discharge, such as hearingscreens, car seat tests, immunizations,repeat state screens, and eyeexaminations. All these requirementscan delay the discharge of a patientwho is medically ready and,consequently, unnecessarily increasethe cost of hospitalization.

The goal of this project is to builda predictive model to identify patientswho are close to discharge from

a medical perspective so staff can bealerted to impending discharges. Doingso will allow the nonmedical factors tobe addressed in advance to ensure thatthe patient’s discharge is not delayed.

Almost all previous studies attempt topredict length of stay (LOS) usingclinical and diagnostic information ator near the time of admission.4–7

Although it is important to pursueLOS prediction to understand totalhospitalization costs, these methodslack sufficient clinical context toaccurately predict the discharge date.Instead, the focus of this researchproject is to identify, based on themost recent clinical data, whichpatients in the NICU are likely to bedischarged from the hospital in thenext 2 to 10 days. Our methodpredicts the upcoming discharge date,not the LOS from time of admission.

To prevent delayed discharge, 3questions will be answered. First, canthe discharge date for a patient in theNICU be accurately predicted?Second, what combinations of clinicaldata improve predictive accuracy?Third, are there simple, “rule-of-thumb” factors that are responsiblefor a substantial fraction of theprediction accuracy?

Because of the potential impact oncost savings, predicting the LOS forpatients in the NICU has been wellstudied. Most of the followingprediction methods were performedat or near the time of admission.Powell et al8 found gestational age,low birth weight, and respiratorydifficulties to be most predictive ofLOS. Bannwart et al9 developed 2models to predict the LOS for patientsin the NICU. The first modelconsidered only risk factors presentin the first 3 days of life, whereas thesecond model used factors presentduring the entire hospitalization.

Despite the use of modelsincorporating multiple diagnosticfactors at the time of admission andduring the hospitalization, theaccuracy of these models varied

significantly, making LOS predictiondifficult. Studying the Canadian NICUNetwork, Lee et al10 found that“significant variation in NICUpractices and outcomes was observeddespite Canada’s universal healthinsurance system.” Using data from theCalifornia Perinatal Quality CareCollaborative, Lee et al11 reported“wide variance in LOS by birth weight,gestational age, and other factors.”

In 2012, Levin et al12 describeda real-time model to forecast LOS ina PICU by using physician ordersfrom a provider order entry system.This model used physician orders(not diagnostic data) to providea cumulative probability of dischargefrom the PICU over the next 72 hours.Counts of medications byadministration route (injected,infused, or enteral) were moresignificant in predicting dischargefrom the PICU than the types ofmedication the patient received.Activity, diet (regular diet vsparenteral nutrition) and mechanicalventilation orders were highlypredictive of remaining in the PICUover the next 72 hours.

It was our hypothesis that usinga real-time data source that reflectsorders, physiologic data, anddiagnostic information will allowimproved NICU discharge prediction.

In contrast to LOS models that areperformed at the time of admission,our model is updated daily with themost recent progress note data. Thecalculated probability of dischargemay, in the future, be displayed in theelectronic medical record.

METHODS

Patients and Setting

We conducted a retrospective studyof all patients admitted to the NICU ata large academic medical center fromJune 2007 to May 2013.

Exclusion Criteria

All patients admitted to the NICUwere considered for the study.

e396 TEMPLE et al by guest on June 8, 2018www.aappublications.org/newsDownloaded from

Patients who were back-transferredto another facility or who died duringtheir NICU hospitalization wereexcluded from the analysis. Alsoexcluded from the analysis werepatients with any missing dailyneonatology progress notes.

Data Collection and Extraction

A large database containing all dailyprogress notes written byneonatology attending physicians wasmade available to the investigators.The data from the progress noteswere in a semistructured text formatthat was extracted through regularexpressions in Python version 2.7.3(Python Software Foundation,Beaverton, OR) and SQL. In addition,these data were cross-referencedwith the enterprise data warehouseto obtain basic patient informationsuch as date of birth andInternational Classification ofDiseases, Ninth Revision (ICD-9)codes used for billing during thehospitalization.

Feature Descriptions

The clinical features used in ourmodel fell into 4 main categories:quantitative, qualitative, engineered,and derived subpopulations. Thirteenfeatures were obtained directly from

data contained in the daily progressnotes. These extracted features wereclassified as quantitative (values fellwithin a range) and qualitative(assigned a value of 0 or 1). Ninefeatures were engineered from theextracted data. These engineeredfeatures do not actually exist as datain the progress note but were derivedfrom the extracted data. For example,progress notes contain informationon the number of apnea andbradycardia events (A&Bs) in the last24 hours. The engineered featurefrom these data was the number ofdays since the last A&B.

Additionally, a neonatologist (C.U.L.)reviewed 138 of the most frequentlyoccurring ICD-9 codes in the NICUpatient population to categorizepatients into 4 subpopulations:prematurity, cardiac disease,gastrointestinal (GI) surgical disease,and neurosurgical (NS) disease (seethe Appendix for a list of ICD-9 codesand categories). A single patient couldbelong to 1, many, or none of thesubpopulations. Table 1 containsa list of all features used in the model.

Matrix Generation

All extracted data, subpopulationcategories, engineered features, anddays to discharge (DTD) were inserted

into a matrix. Each row representeddata for 1 hospital day for a specificpatient. If a row contained missingdata in any field, the entire row wasexcluded from the final matrix.

Because the matrix is constructedusing historical data, the outcome ofinterest (discharge date) is known.The DTD column contains the numberof hospital days until the patient isdischarged. For example, if thepatient was discharged on March 15,the row of the matrix containingpatient features for March 10 wouldhave a DTD of 5 (Fig 1).

Data Analysis

A supervised machine learningapproach using a random forest (RF)classifier in Python’s Sci-kit Learnmodule (version 0.15.2)13 was usedto analyze the data, engineerimportant features, and builda predictive model. An RF constructsmany binary decision trees thatbranch based on randomly chosenfeatures. The RF in Sci-kit Learn usesan optimized Classification andRegression Trees (CART) algorithmfor constructing binary trees byusing the input features and valuesthat yield the largest informationgain at each node. The Sci-kit Learnpackage allows the selection ofeither the gini impurity or entropyalgorithms to determine featureimportance. These algorithmsperformed similarly, and we chose touse gini impurity because it isslightly more robust tomisclassifications. We ran themodels using many differentcombinations of parameters, and thebest-performing models used a RFwith 100 trees, maximum tree depthof 10, and a minimum of 200samples per split.

Models were trained with differentcombinations of subpopulations (allpatients, premature, cardiac, GIsurgical, and NS), DTD (2, 4, 7, and 10days), and number of features (anycombination of features from 2 toall 26).

TABLE 1 Features Used in the Predictive Model

Quantitative Features(Unit of Measure)

Qualitative Features(Unit of Measure)

Engineered Features(Unit of Measure)

SubpopulationFeatures

Wt (kg) On infusedmedication (Y/N)

Number of dayssince last A&B (d)

Premature (Y/N)

Birth wt (kg) On caffeine (Y/N) Number of days offinfused medication (d)

Cardiac surgery (Y/N)

A&Bs (no.) On ventilator (Y/N) Number of daysoff caffeine (d)

GI surgery (Y/N)

Amount of oral feeds (mL) — Number of daysoff ventilator (d)

Neurosurgery (Y/N)

Amount of tube feeds (mL) — Number of daysoff oxygen (d)

Percentage of oral feeds (%) — Number of dayspercentage oforal feeds .90% (d)

Gestational age (wk) — Total feeds (oral +tube feeds) (mL)

Gestational age at birth (wk) — Ratio of wt to birth wt —

Day of life (d) — Amount of oral feeds/wt(mL/kg/day)

Oxygen (L) — — —

—, not applicable.

PEDIATRICS Volume 136, number 2, August 2015 e397 by guest on June 8, 2018www.aappublications.org/newsDownloaded from

Training Vector

To train our model, we converted theDTD variable into a binary outcomevariable based on the number of dayswe were trying to model. Forexample, if we were training themodel to predict when patients were4 days from discharge, all values inthe model where the DTD was notequal to 4 were set to 0. The rows inwhich the number of DTD was 4 wereset to 1 (Fig 1). This same processwas followed for 2, 7, and 10 DTD.

Cross-Validation

Each time a model was run, half of thepatients (and all their associated dailyrows) were randomly assigned toa training set, and the other half wereassigned to the testing set. Becauseeach patient provides only a singleDTD, halving the data provided bothtesting and training sets an adequatenumber of the DTD of interest. Toachieve small enough standarddeviations, the patients wererandomly assigned 5 times for eachmodel and the area under the curve(AUC) for the receiver operatingcharacteristic curve was obtained forthe testing set. The reported AUC isthe average of the 5 AUCs obtainedafter each round of randomization.Additionally, each time a model was

run, the features used in the modelwere ranked in order of importance.

Model Generation

We ran the model for all patients andfor each subpopulation to determinehow well the model performed, tochoose the most important featuresfor each group, and to determinewhether different features hada greater impact on certain patientpopulations. Finally, the mostimportant features at 2, 4, 7, and 10DTD were evaluated to determinewhether the most important featureschanged as a patient was gettingcloser to discharge.

Institutional Review Board Approval

The Institutional Review Board ofVanderbilt University approved thisstudy.

RESULTS

The initial database consisted of 6302patients (116 299 hospital days)admitted to the NICU between June2007 and May 2013. There were 256(4%) deaths during this time period.A total of 1154 (18%) patients wereexcluded because the database didnot contain physician progress notesfor every day of the hospital course.There were 199 (3%) patients back-

transferred to other NICUs in theregion. The final matrix consisted of4693 (74%) unique patients,accounting for 103 206 (89%)hospital days with a mean LOS of 30days. A total of 3689 (79%) patientswere categorized into $1subpopulations based on ICD-9codes; the other 1004 (21%) patientsdid not have an ICD-9 code thatmatched our criteria (Fig 2).

The average AUC for the model usingall 26 features for all patients andeach patient subpopulation is shownin Fig 3. Three of the 4subpopulations (premature, cardiac,GI surgery) and all patients combinedperformed very similarly at 2, 4, 7,and 10 DTD, with AUCs ranging from0.854 to 0.865 at 2 DTD and 0.723 to0.729 at 10 DTD. The NSsubpopulation performed worse onevery DTD measure, scoring 0.749 at2 DTD and 0.614 at 10 DTD (Fig 3).Using fivefold cross-validationprovided a sufficiently narrow SDrange for AUCs of ∼0.005 to 0.01.

The 9 most predictive features foreach subpopulation were very similar,and their plots are shown in Fig 4. Ineach subpopulation, the combinationof all features performed better thanany single feature alone. Once again,the poorest-performingsubpopulation included the NSpatients.

In addition to analyzing the mostimportant features for eachsubpopulation, we explored the best-performing features by the DTD. Foreach DTD (2, 4, 7, 10 days) the top 20features in order of importance areshown in Table 2. The combination ofall features performed best at eachDTD, and model performanceimproved as patients moved closer todischarge.

DISCUSSION

We were able to use data from dailyprogress notes to predict impendingdischarge from the NICU accurately.Our model improved as more clinical

FIGURE 1Example data matrix construction showing an attempt to model 4 DTD. HD, hospital day.

e398 TEMPLE et al by guest on June 8, 2018www.aappublications.org/newsDownloaded from

information was included, and itsprediction improved as the DTDbecame smaller (closer to dischargedate). Three of the 4 subpopulationsand all patients combined performed

very similarly. The only population onwhich the model consistentlyunderperformed was the NSpopulation, for 2 possible reasons.First, the NS population was the

smallest cohort by far, and thereforethe model may not have had enoughpatients on which to train. Second,the NS population may be verydifferent clinically from the otherpatients seen in the NICU, and theirreadiness for discharge may not becaptured in the features extracted forthis model.

When we broke the most importantfeatures down by subpopulation andDTD, the features remainedsurprisingly consistent across thesubpopulations and DTD. This resultwas unexpected because we thoughtthat different subpopulations ofpatients with different medicalconditions would have differentfeatures that were important fordischarge prediction. The top featurescentered on various feeding metrics,gestational age, and weight.Surprisingly, none of the metricsinvolving infused medications,caffeine use, A&Bs, or oxygen usagehad a significant impact on thepredictive power of the model.

Two interesting features are worthdiscussing. First, the percentage oforal feeds (eg, oral amount divided bythe oral amount plus the tube fedamount) was the best-performing ornearly the best-performing featureacross populations and DTD values.For example, using this feature alonegives an AUC score of 0.766 at 2 DTD.The second-best feature was theengineered feature of the number ofdays with oral feedings of .90%. At10 DTD this feature ranks 20th inimportance, but at 2 DTD this featurehas advanced to third place. Thisindicates that consuming most oftheir feedings orally instead of bytube is an important predictor ofimpending discharge.

We used 26 features to predict witha high degree of accuracy whichpatients will be discharged from thehospital in the next 2 to 10 days.However, it may not always bepractical or possible to include allthese features into a decision supporttool to construct this predictive

FIGURE 3AUC for each patient subpopulation for all features.

FIGURE 2Distribution of patients in each subpopulation.

PEDIATRICS Volume 136, number 2, August 2015 e399 by guest on June 8, 2018www.aappublications.org/newsDownloaded from

model to alert staff of impendingdischarges. One beneficial aspect ofour approach is the ability to identifyand use the most important featuresto build a scaled-down but still highlypredictive model.

A few simple “rule of thumb” modelscan be created to determine whichpatients are nearing discharge. Forexample, a very simple decision tree

can be constructed from only 2features (Fig 5). This tree is based ondata from all patients, 2 features (oralpercentage of feeds and weight),a DTD of 4 days, and a maximum treedepth of 3. The first branch of the treesplits the patients into 2 groups basedon whether their oral percentage offeeds is .80%. On the right, the nextdifferentiator is based on weight. If

the patient weighs ,1.5 kg, his or herprobability of being discharged in thenext 4 days is 0.23 (on a scale of 0–1).If the patient weighs between 1.5 and1.7 kg, his or her probability fordischarge in the next 4 days is 0.48. Ifthe patient weighs .1.7 kg and takes.90% of his or her feeds orally, thepatient has a 0.81 probability of beingdischarged in the next 4 days. The

FIGURE 4The 9 most predictive features for each subpopulation. A single patient may be represented in .1 subpopulation.

e400 TEMPLE et al by guest on June 8, 2018www.aappublications.org/newsDownloaded from

probabilities for discharge in 4 daysfor patients at different weights andtaking ,80% of their feeds orally arelisted in the left-side branch.

This simple decision tree has an AUCof 0.843. Although it is not asaccurate as using all features toobtain an AUC of 0.865, it is still anexcellent predictor and can be easilycalculated at the bedside.

It is interesting that using all 26features yields an AUC of 0.865,whereas using only 2 features canyield an AUC 0.843. This resultillustrates just how important feedingand weight gain are to the improvinghealth of a neonate.

One possible way to improve ourcurrent model’s performance would

be to add more features. The use oftrending data (eg, the average amountof feeding increase over a 5-dayperiod) could be beneficial. Anotherconsideration for model improvementwould be to predict a range of daysuntil discharge (eg, 3–5 days insteadof just 4).

There are several limitations to thisstudy. First, some of the features usedin the model are more difficult toobtain than others, and extractingcertain features from commercialelectronic medical record systems canbe challenging.14 Second, the dataextracted included pediatric- andneonatology-specific data, which wascollected using specific pediatricfunctions built into Vanderbilt’selectronic health record. These

functions may not be supported by allelectronic health record systems.15,16

Third, categorizing hospitalizedpatients based on ICD-9 codes wouldbe difficult because these codes arenot usually available until afterdischarge. However, as the analysisshowed, diagnosis categories addedsurprisingly little to the predictionmodel. Should we need our model todifferentiate patients, admittingdiagnoses could be used. Fourth, oursample could be potentially biasedbecause we did exclude patients ifthey were missing any progressnotes. Although an RF does providetechniques to address missing data,we felt thought excluding thesepatients was a conservative andappropriate approach.

We trained the model by using actualdischarge dates. This limitationworked against us because some of thepatients in the data set may have beenmedically ready for discharge sooner.The model may have performed betterif we had been able to determine andadjust for the patients who haddelayed discharges for nonmedicalreasons. Additionally, once fullyimplemented our model might predictdischarge too early, which could resultin premature expectations of parentsand possible wasted effort.

TABLE 2 Top 20 Features in Order of Importance for All Patients for All DTD values

2 DTD 4 DTD 7 DTD 10 DTD

Feature AUC Feature AUC Feature AUC Feature AUC

All 0.854 All 0.795 All 0.754 All 0.723% of oral feeds 0.766 % of oral feeds 0.704 Amount of oral feeds 0.649 % of oral feeds 0.623Amount of oral feeds 0.764 Amount of oral feeds 0.703 Gestational age birth 0.647 Amount of oral feeds 0.620No. days oral % .90% 0.753 Amount of oral feeds/wt 0.700 Amount of oral feeds/wt 0.646 Amount of oral feeds/wt 0.620Amount of oral feeds/wt 0.750 No. days oral % .90% 0.681 % of oral feeds 0.646 Gestational age 0.617Total feeds 0.720 Gestational age birth 0.678 Birth wt 0.632 Wt 0.617Gestational age birth 0.707 Gestational age 0.673 Wt 0.632 Gestational age birth 0.609Birth wt 0.698 Birth wt 0.672 Gestational age 0.631 Birth wt 0.607Gestational age 0.698 Wt 0.667 No. days off caffeine 0.610 On caffeine 0.594Wt 0.690 Total feeds 0.652 On caffeine 0.605 No. days off caffeine 0.592No. days off caffeine 0.643 No. days off caffeine 0.630 GI surgery 0.594 Total feeds 0.569GI surgery 0.637 GI surgery 0.622 Total feeds 0.590 GI surgery 0.566No. days off ventilator 0.624 On caffeine 0.608 No. days oral % .90% 0.589 No. days off oxygen 0.560No. days off infused medication 0.620 No. days off ventilator 0.605 Cardiac 0.563 On oxygen 0.548Ratio of wt to birth wt 0.613 No. days off infused medication 0.594 No. days off ventilator 0.562 On ventilator 0.543On caffeine 0.613 Ratio wt/Birth wt 0.592 Ratio wt/birth wt 0.561 Cardiac 0.542No. days off oxygen 0.609 Days of life 0.587 No. days off oxygen 0.558 No. days off ventilator 0.537Days of life 0.604 Cardiac 0.582 No. days off infused medication 0.555 No. of A&Bs 0.535No. days no A&B 0.601 No. days off oxygen 0.581 Days of life 0.547 No. days oral % .90% 0.534

FIGURE 5A simple decision tree demonstrating how 2 features can be used to create an accurate dischargeprediction model. The fraction in each cell denotes the probability of discharge in the next 4 days.This tree has an AUC of 0.843.

PEDIATRICS Volume 136, number 2, August 2015 e401 by guest on June 8, 2018www.aappublications.org/newsDownloaded from

Future work will have to includetesting the model in different ways.First, we will analyze the model ona new data set, such as patientrecords obtained from June 2013 tothe present. Second, once we finishoperationalizing this model, we willcollect provider feedback abouta patient’s discharge potential duringdaily rounds. We will then comparethose results with the prediction ofour model to determine whether theproviders or the machine learningmodel is most accurate.

CONCLUSIONS

A supervised machine learningapproach using an RF classifieraccurately predicts which patientswill be discharged from the NICU in

the next 2 to 10 days. Running ourmodel daily with the most recentprogress note data will identify whichpatients are close to being medicallyready for discharge and may alert theclinical staff through indicators in theelectronic medical record. Thismethod would allow more timelydischarge planning and has thepotential to prevent delayeddischarges for nonmedical reasons.

ACKNOWLEDGMENT

The authors appreciate theResearch Derivative team atVanderbilt University for theirassistance in retrieving data. Thepublication described wassupported by CTSA award No.UL1TR000445 from the National

Center for Advancing TranslationalSciences. Its contents are solelythe responsibility of the authorsand do not necessarily representofficial views of the National Centerfor Advancing TranslationalSciences or the National Institutesof Health.

ABBREVIATIONS

A&B: apnea and bradycardia eventAUC: area under the curveDTD: days to dischargeGI: gastrointestinalICD-9: International Classification

of Diseases, Ninth RevisionLOS: length of stayNS: neurosurgicalRF: random forest

FINANCIAL DISCLOSURE: Dr Lehmann serves in a part-time role at the American Academy of Pediatrics. He has received royalties for the textbook Pediatric Informatics

and travel funds from the American Medical Informatics Association, the International Medical Informatics Association, and the World Congress on Information

Technology. Dr Fabbri has an equity interest in Maize Analytics, LLC. Dr Temple has indicated he has no financial relationships relevant to this article to disclose.

FUNDING: National Library of Medicine training grant 5T15LM007450-13.

POTENTIAL CONFLICT OF INTEREST: The authors have indicated they have no potential conflicts of interest to disclose.

REFERENCES

1. Bockli K, Andrews B, Pellerite M, MeadowW. Trends and challenges in United Statesneonatal intensive care units follow-upclinics. J Perinatol. 2014;34(1):71–74

2. Challis D, Hughes J, Xie C, Jolley D. Anexamination of factors influencingdelayed discharge of older people fromhospital. Int J Geriatr Psychiatry. 2014;29(2):160–168

3. Victor CR, Healy J, Thomas A, SeargeantJ. Older patients and delayed dischargefrom hospital. Health Soc CareCommunity. 2000;8(6):443–452

4. Szubski CR, Tellez A, Klika AK, et al.Predicting discharge to a long-termacute care hospital after admission toan intensive care unit. Am J Crit Care.2014;23(4):e46–e53

5. Marcin JP, Slonim AD, Pollack MM,Ruttimann UE. Long-stay patients in thepediatric intensive care unit. Crit CareMed. 2001;29(3):652–657

6. Edwards JD, Houtrow AJ, Vasilevskis EE,et al. Chronic conditions among children

admitted to U.S. pediatric intensive careunits: their prevalence and impact on riskfor mortality and prolonged length ofstay. Crit Care Med. 2012;40(7):2196–2203

7. Ruttimann UE, Pollack MM. Variability induration of stay in pediatric intensivecare units: a multiinstitutional study. JPediatr. 1996;128(1):35–44

8. Powell PJ, Powell CV, Hollis S, RobinsonSJ. When will my baby go home? Arch DisChild. 1992;67(10 spec no):1214–1216

9. Bannwart DC, Rebello CM, Sadeck LS,Pontes MD, Ramos JL, Leone CR.Prediction of length of hospital stay inneonatal units for very low birth weightinfants. J Perinatol. 1999;19(2):92–96

10. Lee SK, McMillan DD, Ohlsson A, et al.Variations in practice and outcomes inthe Canadian NICU network: 1996–1997.Pediatrics. 2000;106(5):1070–1079

11. Lee HC, Bennett MV, Schulman J, GouldJB. Accounting for variation in length ofNICU stay for extremely low birth weightinfants. J Perinatol. 2013;33(11):872–876

12. Levin SR, Harley ET, Fackler JC, et al.Real-time forecasting of pediatricintensive care unit length of stay usingcomputerized provider orders. Crit CareMed. 2012;40(11):3058–3064

13. Sci-Kit Learn. 2014. Available at: http://scikit-learn.org/stable/index.html

14. Koppel R, Lehmann CU. Implications of anemerging EHR monoculture for hospitalsand healthcare systems. J Am MedInform Assoc. 2015;22(2):465–471

15. Kim GR, Lehmann CU; Council on ClinicalInformation Technology. Pediatricaspects of inpatient health informationtechnology systems [publishedcorrection appears in Pediatrics. 2009;123(2):604]. Pediatrics. 2008;122(6).Available at: www.pediatrics.org/cgi/content/full/122/6/e1287

16. Lehmann CU; Council on ClinicalInformation Technology. Pediatricaspects of inpatient health informationtechnology systems. Pediatrics. 2015;135(3). Available at: www.pediatrics.org/cgi/content/full/135/3/e756

e402 TEMPLE et al by guest on June 8, 2018www.aappublications.org/newsDownloaded from

APPENDIX

ICD-9 Code Description Category

746.01 Atresia of pulmonary valve, congenital Cardiac747.49 Other anomalies of great veins Cardiac428 Congestive heart failure, unspecified Cardiac428.2 Systolic heart failure, unspecified Cardiac429 Myocarditis, unspecified Cardiac429.3 Cardiomegaly Cardiac745.1 Complete transposition of great vessels Cardiac745.1 Complete transposition of great vessels Cardiac745.11 Double outlet right ventricle Cardiac745.2 Tetralogy of Fallot Cardiac427.89 Other specified cardiac dysrhythmias, other Cardiac745.6 Endocardial cushion defect, unspecified type Cardiac427.42 Ventricular flutter Cardiac746.02 Stenosis of pulmonary valve, congenital Cardiac746.09 Other congenital anomalies of pulmonary valve Cardiac746.3 Congenital stenosis of aortic valve Cardiac746.4 Congenital insufficiency of aortic valve Cardiac746.87 Malposition of heart and cardiac apex Cardiac746.89 Other specified congenital anomalies of heart Cardiac746.9 Unspecified congenital anomaly of heart Cardiac747.1 Coarctation of aorta (preductal) (postductal) Cardiac747.21 Congenital anomalies of aortic arch Cardiac747.3 Congenital anomalies of pulmonary artery Cardiac745.4 Ventricular septal defect Cardiac424.9 Endocarditis, valve unspecified, unspecified cause Cardiac396.3 Mitral valve insufficiency and aortic valve insufficiency Cardiac397 Diseases of tricuspid valve Cardiac420.9 Acute pericarditis, unspecified Cardiac420.99 Other acute pericarditis Cardiac421 Acute and subacute bacterial endocarditis Cardiac422.91 Idiopathic myocarditis Cardiac423.3 Cardiac tamponade Cardiac424 Mitral valve disorders Cardiac424.1 Aortic valve disorders Cardiac427.9 Cardiac dysrhythmia, unspecified Cardiac424.3 Pulmonary valve disorders Cardiac745.3 Common ventricle Cardiac425.1 Hypertrophic cardiomyopathy Cardiac425.3 Endocardial fibroelastosis Cardiac425.4 Other primary cardiomyopathies Cardiac425.8 Cardiomyopathy in other diseases classified elsewhere Cardiac426 Atrioventricular block, complete Cardiac426.1 Atrioventricular block, unspecified Cardiac426.11 First-degree atrioventricular block Cardiac426.12 Mobitz (type) II atrioventricular block Cardiac426.13 Other second-degree atrioventricular block Cardiac427.41 Ventricular fibrillation Cardiac424.2 Tricuspid valve disorders, specified as nonrheumatic CardiacV15.1 Personal history of surgery to heart and great vessels, presenting hazards

to healthCardiac

794.3 Unspecified nonspecific abnormal function study of cardiovascular system Cardiac794.39 Other nonspecific abnormal function study of cardiovascular system Cardiac997.1 Cardiac complications, not elsewhere classified Cardiac745.12 Corrected transposition of great vessels Cardiac997.79 Vascular complications of other vessels Cardiac777.1 Meconium obstruction in fetus or newborn GI surgery530.3 Stricture and stenosis of esophagus GI surgery530.4 Perforation of esophagus GI surgery530.6 Diverticulum of esophagus, acquired GI surgery777.5 Necrotizing enterocolitis in newborn, unspecified GI surgery530.89 Other specified disorders of the esophagus GI surgery777.51 Stage I necrotizing enterocolitis in newborn GI surgery

PEDIATRICS Volume 136, number 2, August 2015 e403 by guest on June 8, 2018www.aappublications.org/newsDownloaded from

Continued

ICD-9 Code Description Category

553.1 Umbilical hernia without mention of obstruction or gangrene GI surgery557.9 Unspecified vascular insufficiency of intestine GI surgery560.2 Volvulus GI surgery560.81 Intestinal or peritoneal adhesions with obstruction (postoperative)

(postinfection)GI surgery

560.89 Other specified intestinal obstruction, other GI surgery569.83 Perforation of intestine GI surgery569.69 Other colostomy and enterostomy complication GI surgery530.84 Tracheoesophageal fistula GI surgery756.79 Other congenital anomalies of abdominal wall GI surgery751.3 Hirschsprung disease and other congenital functional disorders of colon GI surgery751.2 Congenital atresia and stenosis of large intestine, rectum, and anal canal GI surgery751.1 Congenital atresia and stenosis of small intestine GI surgery750.4 Other specified congenital anomalies of esophagus GI surgeryV55.2 Attention to ileostomy GI surgery756.72 Congenital anomalies of abdominal wall, omphalocele GI surgeryV55.4 Attention to other artificial opening of digestive tract GI surgery756.73 Congenital anomalies of abdominal wall, gastroschisis GI surgery560.9 Unspecified intestinal obstruction GI surgery777.53 Stage III necrotizing enterocolitis in newborn GI surgery777.52 Stage II necrotizing enterocolitis in newborn GI surgery777.5 Necrotizing enterocolitis in newborn, unspecified GI surgeryV55.1 Attention to gastrostomy GI surgeryV44.1 Gastrostomy status GI surgery536.49 Other gastrostomy complications GI surgery536.42 Mechanical complication of gastrostomy GI surgery536.41 Infection of gastrostomy GI surgery742.9 Unspecified congenital anomaly of brain, spinal cord, and nervous system Neurosurgery741 Spina bifida, unspecified region, with hydrocephalus Neurosurgery331.3 Other cerebral degenerations, communicating hydrocephalus Neurosurgery331.4 Other cerebral degenerations, obstructive hydrocephalus Neurosurgery742.4 Other specified congenital anomalies of brain Neurosurgery742.3 Congenital hydrocephalus Neurosurgery741.9 Spina bifida, unspecified region, without mention of hydrocephalus Neurosurgery741.02 Spina bifida, dorsal (thoracic) region, with hydrocephalus Neurosurgery741.03 Spina bifida, lumbar region, with hydrocephalus Neurosurgery742.1 Microcephalus Neurosurgery741.93 Spina bifida, lumbar region, without mention of hydrocephalus Neurosurgery552.3 Diaphragmatic hernia with obstruction PPH/ECMO756.6 Congenital anomalies of diaphragm PPH/ECMO747.83 Congenital anomaly, persistent fetal circulation PPH/ECMO416 Primary pulmonary hypertension PPH/ECMO763.84 Meconium passage during delivery affecting fetus or newborn PPH/ECMO764.94 Unspecified fetal growth retardation, 1000–1249 g Premature765.01 Disorders relating to extreme immaturity of infant, ,500 g Premature362.24 Retinopathy of prematurity, stage 2 Premature779.7 Periventricular leukomalacia Premature764.95 Unspecified fetal growth retardation, 1250–1499 g Premature765 Disorders relating to extreme immaturity of infant, wt unspecified Premature764.92 Unspecified fetal growth retardation, 500–749 g Premature772.13 Intraventricular hemorrhage of fetus or newborn, grade III Premature765.02 Disorders relating to extreme immaturity of infant, 500–749 g Premature362.25 Retinopathy of prematurity, stage 3 Premature772.12 Intraventricular hemorrhage of fetus or newborn, grade II Premature362.23 Retinopathy of prematurity, stage 1 Premature362.21 Retrolental fibroplasia Premature362.2 Retinopathy of prematurity, unspecified Premature362.27 Retinopathy of prematurity, stage 5 Premature765.28 Disorders related to weeks of gestation completed, 35–36 wk Premature765.17 Disorders relating to other preterm infants, 1750–1999 g Premature765.16 Disorders relating to other preterm infants, 1500–1749 g Premature765.15 Disorders relating to other preterm infants, 1250–1499 g Premature

e404 TEMPLE et al by guest on June 8, 2018www.aappublications.org/newsDownloaded from

Continued

ICD-9 Code Description Category

765.18 Disorders relating to other preterm infants, 2000–2499 g Premature765.22 Disorders related to weeks of gestation completed, 24 wk Premature765.24 Disorders related to weeks of gestation completed, 27–28 wk Premature765.25 Disorders related to weeks of gestation completed, 29–30 wk Premature776.6 Anemia of prematurity Premature765.27 Disorders related to weeks of gestation completed, 33–34 wk Premature765.03 Disorders relating to extreme immaturity of infant, 750–999 g Premature769 Respiratory distress syndrome in newborn Premature770.7 Chronic respiratory disease arising in the perinatal period Premature772.1 Intraventricular hemorrhage of fetus or newborn, unspecified grade Premature772.11 Intraventricular hemorrhage of fetus or newborn, grade I Premature772.14 Intraventricular hemorrhage of fetus or newborn, grade IV Premature765.14 Disorders relating to other preterm infants, 1000–1249 g Premature765.13 Disorders relating to other preterm infants, 750–999 g Premature765.1 Disorders relating to other preterm infants, wt unspecified Premature765.26 Disorders related to weeks of gestation completed, 31–32 wk Premature

ECMO, extracorporeal membrane oxygenation; PPH, persistent pulmonary hypertension.

PEDIATRICS Volume 136, number 2, August 2015 e405 by guest on June 8, 2018www.aappublications.org/newsDownloaded from

originally published online July 27, 2015; Pediatrics Michael W. Temple, Christoph U. Lehmann and Daniel Fabbri

Predicting Discharge Dates From the NICU Using Progress Note Data

ServicesUpdated Information &

015-0456http://pediatrics.aappublications.org/content/early/2015/07/21/peds.2including high resolution figures, can be found at:

Subspecialty Collections

s_subhttp://www.aappublications.org/cgi/collection/biomedical_informaticBiomedical Informaticschnology_subhttp://www.aappublications.org/cgi/collection/health_information_teHealth Information Technologysubhttp://www.aappublications.org/cgi/collection/fetus:newborn_infant_Fetus/Newborn Infantfollowing collection(s): This article, along with others on similar topics, appears in the

Permissions & Licensing

http://www.aappublications.org/site/misc/Permissions.xhtmlin its entirety can be found online at: Information about reproducing this article in parts (figures, tables) or

Reprintshttp://www.aappublications.org/site/misc/reprints.xhtmlInformation about ordering reprints can be found online:

by guest on June 8, 2018www.aappublications.org/newsDownloaded from

originally published online July 27, 2015; Pediatrics Michael W. Temple, Christoph U. Lehmann and Daniel Fabbri

Predicting Discharge Dates From the NICU Using Progress Note Data

http://pediatrics.aappublications.org/content/early/2015/07/21/peds.2015-0456located on the World Wide Web at:

The online version of this article, along with updated information and services, is

ISSN: 1073-0397. 60007. Copyright © 2015 by the American Academy of Pediatrics. All rights reserved. Print the American Academy of Pediatrics, 141 Northwest Point Boulevard, Elk Grove Village, Illinois,has been published continuously since 1948. Pediatrics is owned, published, and trademarked by Pediatrics is the official journal of the American Academy of Pediatrics. A monthly publication, it

by guest on June 8, 2018www.aappublications.org/newsDownloaded from