Representation and Reinforcement Learning for Personalized...

Representation and Reinforcement Learning for Personalized Glycemic Control in Septic PatientsWei-Hung Weng1, Mingwu Gao2, Ze He3, Susu Yan4, Peter Szolovits1

1 CSAIL, MIT | 2 Philips Connected Sensing Venture | 3 Philips Research North America | 4 Massachusetts General Hospital

BackgroundMotivation•Criticallyillpatientshavetheissueofpoorglucosecontrol,whichin-cludesthepresenceofdysglycemiaandhighglycemicvariability.•CurrentclinicalpracticefollowstheguidelinessuggestedbytheNICE-Sugartrialtocontrolthebloodsugarlevelforcricitalcare.•However,thereareoverwhelmingvariationsinclinicalconditionsandphysiologicalstatesamongpatientsundercriticalcare.Thislimitscli-nicians’abilitytoperformappropriateglycemiccontrol.Inaddition,clinicianssometimesmaynotbeabletoconsidertheissueofglycemiccontrol.•Tohelpcliniciansbetteraddressthechallengeofmanagingpatients’glucoselevel,weneedapersonalizedglycemiccontrolstrategythatcantakeintoaccountthevariationsinpatients’physiologicalandpathologicalstates.

Reinforcement Learning (RL) in Clinical Domain•RLisapotentialapproachforthescenarioofsequentialdecisionmak-ingwithdelayedrewardoroutcome.•RLalsohastheabilitytogenerateoptimalstrategiesbasedonnon-op-timizedtrainingdata.•RLhasbeenusedfortreatmentofschizophrenia[Shortreed2011];hep-arindosingproblem[Nemati2016];mechanicalventilationadministra-tionandweaning[Prasad2016];andsepsistreatment[Raghu2017].•Relatedtoglycemiccontrol,somestudiesutilizeRLandinverseRLtodesignclinicaltrialsandadjustclinicaltreatments[Bothe2014].•FewerstudieshaveutilizedtheRLapproachtolearnabettertargetlaboratoryvaluesasreferencesforclinicaldecisionmaking.

Proposed Approach and Objectives•Learnoptimalpolicytosimulatepersonalizedoptimalglycemictrajec-tories,whicharesequencesofappropriateglycemictargets.•Thesimulatedtrajectoriesareintendedasareferenceforclinicianstodecidetheirglycemiccontrolstrategy,andtoachievebetterclinicaloutcomes.•Wehypothesizedthatthepatientstates,glycemicvalues,andpatientoutcomescanbemodeledasaMarkovdecisionprocess(MDP).•‘Action‘=theglycemicvaluethatleadtorealclinicalaction.•WeexploredRLapproachtolearnthepolicyforlearningpersonalizedoptimalglycemictrajectories,andcomparedtheprognosisofthetra-jectoriessimulatedbytheoptimalpolicytotherealtrajectories.•Thelearnedpolicyisintendedasreferencesforclinicianstoadaptandoptimizetheircarestrategy,andtoachievebetterclinicaloutcomes.

Abstract

Glycemiccontrolisessentialforcriticalcare.However,itisachallengingtasksincetherehasbeennostudyonpersonalizedoptimalstrategyforglycemiccontrol.Thisworkaimstolearnpersonalizedoptimalglycemictrajectoriesforseverelyillsepticpatientsvialearningdata-drivenpoli-ciesofdecidingoptimaltargetedbloodglucoselevelasclinicians’refer-ence.Weencodedpatientstatesusingasparseautoencoderandadoptedareinforcementlearningparadigmusingpolicyiterationtolearntheop-timalpolicyfromdata.Wealsoestimatedtheexpectedreturnfollowingthepolicylearnedfromtherecordedglycemictrajectories,whichyieldedafunctionindicatingtherelationshipbetweenrealbloodglucosevaluesand90-daymortalityrates.Thissuggeststhatthelearnedoptimalpolicycouldreducethepatients’estimated90-daymortalityrateby6.3%,from31%to24.7%.Theresultdemonstratesthatthereinforcementlearningwithappropriatepatientstateencodingcanpotentiallyprovideoptimalglycemictrajectoriesandallowclinicianstodesignapersonalizedstrat-egyforglycemiccontrolinsepticpatients.

Methods

Experiment

Reference•Bellman.DynamicProgramming.1957.•Botheetal.Theuseofreinforcementlearningalgorithmstomeetthechallengesofanartificialpancreas.ExpertRe-viewofMedicalDevices,2014.

•Howard.DynamicProgrammingandMarkovProcesses.1960.•Nematietal.Optimalmedicationdosingfromsuboptimalclinicalexamples:Adeepreinforcementlearningapproach.InIEEEEngineeringinMedicineandBiologySociety,2016.

•Ng.Sparseautoencoder.2011.https://web.stanford.edu/class/archive/cs/cs294a/cs294a.1104/sparseAutoencoder.pdf.•Prasadetal.Areinforcementlearningapproachtoweaningofmechanicalventilationinintensivecareunits.arXiv2017.

•Raghuetal.Continuousstate-spacemodelsforoptimalsepsistreatment-adeepreinforcementlearningapproach.arXiv2017.

•Shortreedetal.Murphy.Informingsequentialclinicaldecision-makingthroughreinforcementlearning:anempiricalstudy.MachineLearning2010.

•Silver.ReinforcementLearningLecture3:PlanningbyDynamicProgramming.2015.

Data Source and Study Cohort•5,565septicpatientsinMIMIC-IIIversion1.4.•Sepsis-3criteriatoidentifypatientswithsepsis.•Exclusion:age<18,SOFA<2,notfirstICUadmission.•Diabetes:(1)ICD-9,(2)pre-admissionHbA1c>7.0%,(3)admissionmedication,and(4)historyofdiabetesinthefreetext•Datawerecollectedatonehourinterval.•Missingvalues:linearandpiecewiseconstantinterpolation

RL Settings•Reward:90-daymortality(+100/-100)•Action:discretizedglucoselevels(11bins)astheproxyofrealactions•State:Total46normalizedvariables(patientlevelvariables,bloodglu-coserelatedvariables,periodicvitalsigns)

Patient State Encoding•Rawfeaturesvs.sparseautoencoder-encodedfeatures[Ng2011].•500stateclustersbyk-meansclustering.

Policy Evaluation / Iteration•Learnoptimalpolicy&evaluateonrealtrajectories.•90-daymortalityrate=f (expectedreturn)•Computeandcomparetheestimatedmortalityrateofrealandopti-malglucosetrajectoriesobtainedbyRL-learnedpolicy.[Figurecourtesy:DavidSilver]

Conclusion

WeutilizedtheRLalgorithmwithrepresentationlearningtolearnthepersonalizedoptimalpolicyforbetterpredictingglycemictargetsfromretrospectivedata.Themethodmayreducethemortalityrateofsepticpatients,andpotentiallyassistclinicianstooptimizethereal-timetreat-mentstrategyatdynamicpatientstatelevelswithamoreaccuratetreat-mentgoal,andleadtooptimalclinicaldecisions.Futureworksincludeapplyingacontinuousstateapproach,differentevaluationmethods,andapplyingthemethodtodifferentclinicaldecisionmakingproblems.

Results•Thedatadistributionoflearnedexpectedre-turn,whichistherescaledQ-value,isnegative-lycorrelatedwithmortalityratewithhighcor-relationvalue.•Thelearnedexpectedreturnreflectstherealpatientstatuswell.•Theoptimalpolicylearnedbythepolicyitera-tionalgorithmcanpotentiallyreducearound6.3%ofestimatedmortalityrateifwechosetheappropriatepatientstaterepresentations.

Rawfeaturerepresentation

32-dimensionsparseautoencoderrepresentation

Representation and Reinforcement Learning for Personalized...

Documents

DURABILITY OF GLASS FRP COMPOSITE BARS … REINFORCEMENT UNDER TENSILE SUSTAINED LOAD ... suggested by ACI 440.1R-01, ... than the specified design strength as recommended by ACI 440

INSPIRE: An INtelligent System for Personalized ... · where Honey and Mumford, based on Kolb's theory of experiential learning [9], suggested four types of learners: Activists, Pragmatists,

From Reinforcement Learning to Deep Reinforcement …fagostin/assets/files/...Keywords: Machine learning · Reinforcement learning Deep learning · Deep reinforcement learning 1 Introduction

Continuous hoops for transverse reinforcement of ... · Continuous hoops for transverse reinforcement of ... transverse reinforcement details for the ... Fig. 1 Transverse reinforcement

Social-Personalized versus Computer-Personalized Methods ... · Social-Personalized versus Computer-Personalized ... Social-Personalized versus Computer-Personalized Methods to

AUTOMATIC, PERSONALIZED, AND FLEXIBLE PLAYLIST GENERATION USING REINFORCEMENT LEARNING · 2019-06-04 · Beyond playlist generation, there are several works adopting the concept of

Statistical Methods for Dynamic Treatment Regimes: Reinforcement Learning, Causal Inference, and Personalized Medicine

Personalized Dynamics Models for Adaptive Assistive Navigation … · 2018. 10. 9. · Keywords: Indoor Navigation, Model-Based Reinforcement Learning, Human-Robot Interaction, Assistive

Reinforcement Learning Lecture Inverse Reinforcement Learningipvs.informatik.uni-stuttgart.de/mlr/wp-content/uploads/2017/07/09... · Reinforcement Learning Inverse Reinforcement

Swiss Personalized Health Network4 Personalized Medicine –Personalized Health “Personalized medicine is a strategy to prevent, diagnose, and treat diseases so as to achieve an

Representation and Reinforcement Learning for Personalized ...Komorowski et al. [2016] applied on-policy learning and evaluated the performance difference between RL agent’s actions

Personalized Medicine and Human Genetic Diversityperspectivesinmedicine.cshlp.org/content/4/9/a008581.full.pdf · between “racial” groups (Lewontin1972). Lew-ontin’s data suggested

Meyer from personalized med to personalized research-april 2013

Inverse Reinforcement Learning CS885 Reinforcement

Temperament and exercise dependence Running title ... · 3 1 Introduction 2 ... 3 Prior research suggested a link between specific personality and temperament ... Reinforcement Sensitivity

for high REINFORCEMENT SYSTEMS - nevoga.com€¦ · 2 REINFORCEMENT SYSTEMS REINFORCEMENT SYSTEMS REINFORCEMENT SYSTEM PLEXUS®, PYRAPLEX®, FTW The reinforcement system PLEXUS®,

Reinforcement Learning Das Reinforcement Learning-Problem Alexander Schmid

Representation and Reinforcement Learning for Personalized ... · Many critically ill patients have poor glycemic control, such as dysglycemia and high glycemic variability. Several

Cement and Concrete Research - umi.mit.edu · concrete use: reinforcement, binder content, and implementation methods. More precisely, it is suggested that, More precisely, it is

Informational Interviews · Add contact on LinkedIn with personalized message Reply to original email Report back after following up with suggested contacts Send them articles of