14
A Proactive Workflow Model for Healthcare Operation and Management Chuanren Liu, Hui Xiong Senior Member, IEEE , Spiros Papadimitriou, Yong Ge, Keli Xiao Abstract—Advances in real-time location systems have enabled us to collect massive amounts of fine-grained semantically rich location traces, which provide unparalleled opportunities for understanding human activities and generating useful knowledge. This, in turn, delivers intelligence for real-time decision making in various fields, such as workflow management. Indeed, it is a new paradigm to model workflows through knowledge discovery in location traces. To that end, in this paper, we provide a focused study of workflow modeling by integrated analysis of indoor location traces in the hospital environment. In particular, we develop a workflow modeling framework that automatically constructs the workflow states and estimates the parameters describing the workflow transition patterns. More specifically, we propose effective and efficient regularizations for modeling the indoor location traces as stochastic processes. First, to improve the interpretability of the workflow states, we use the geography relationship between the indoor rooms to define a prior of the workflow state distribution. This prior encourages each workflow state to be a contiguous region in the building. Second, to further improve the modeling performance, we show how to use the correlation between related types of medical devices to reinforce the parameter estimation for multiple workflow models. In comparison with our preliminary work [11], we not only develop an integrated workflow modeling framework applicable to general indoor environments, but also improve the modeling accuracy significantly. We reduce the average log-loss by up to 11%. Index Terms—Indoor Location Traces, Workflow Modeling, Healthcare Operation and Management 1 I NTRODUCTION Real-time location systems (RTLS) are being rapidly developed and deployed. Of note, hospitals are in- creasingly using these systems to track the move- ment of medical devices, doctors, and patients (see Figure 1), as well as the interaction among them. However, their utilization is currently limited to basic tasks, such as locating a wheelchair or checking the availability of an inpatient bed. In the near future, we expect indoor location tracking data to be widely available in many hospitals, as well as in other envi- ronments (e.g., shops, schools, warehouses, etc). Understanding sequences of procedures that reflect workflows (e.g., surgery, from admission to recovery) remains an important challenge, which we address in this work. Workflows reveal semantically meaningful patterns that can help (i) understand how space and assets (e.g., medical equipment, classrooms, shopping areas) are utilized (workflow auditing); (ii) ensure that such utilization complies with rules and regu- lations (workflow compliance); and (iii) perform any of these tasks in real time (workflow monitoring). For example, many healthcare providers have their own work protocols to ensure that healthcare practices are executed in a controlled manner. Non-compliance Chuanren Liu is with the Decision Science and Management In- formation Systems Department, Drexel University, E-mail: chuan- [email protected]. Hui Xiong and Spiros Papadimitriou are with the Management Science and Information Systems Department, Rut- gers University, E-mail: {hxiong,spapadim}@rutgers.edu. Yong Ge is with Eller College of Management, University of Arizona, E-mail: [email protected]. Keli Xiao is with the College of Business, Stony Brook University, E-mail: [email protected]. to these protocols may be costly and expose the healthcare providers to severe risks, such as litigation, prosecution, and damage to brand reputation. Thus, there is a real need for effective inspection of work- flow compliance. This work focuses on fundamental models to support all of the above tasks. Hospital managers traditionally accomplish these tasks by inspecting the detailed workflow logs [7, 15, 14, 19], which can be in heterogeneous formats, stored in different media (including paper), and pro- vided passively by personnel and, therefore, may be biased and incomplete. The overall task is quite daunting, and the opportunities to develop proactive approaches to help with workflow management tasks are unparalleled. However, RTLS deployments are still used in a relatively basic way, as noted above, with little work focusing on how to leverage massive indoor location traces. To this end, this paper pro- vides a focused study of workflow modeling via inte- grated analysis of indoor location traces, evaluated on real data from hospital environments. Such workflow models serve as fundamental building blocks in a wide range of workflow management problems. In particular, we propose the Proactive Workflow Model (ProWM) that, leveraging indoor location traces, provides a unified framework for simultane- ously (i) identifying semantically meaningful regions as workflow states, and (ii) summarizing sequential procedures as workflow transitions. Moreover, the proposed method is applicable to general indoor en- vironments where the workflow patterns are hidden in the location traces of moving objects. Digital Object Identifier no. 10.1109/TKDE.2016.2631537 1041-4347 ß 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

A Proactive Workflow Model for Healthcare Operation and

Embed Size (px)

Citation preview

A Proactive Workflow Model for HealthcareOperation and Management

Chuanren Liu, Hui Xiong Senior Member, IEEE , Spiros Papadimitriou, Yong Ge, Keli Xiao

Abstract—Advances in real-time location systems have enabled us to collect massive amounts of fine-grained semantically richlocation traces, which provide unparalleled opportunities for understanding human activities and generating useful knowledge.This, in turn, delivers intelligence for real-time decision making in various fields, such as workflow management. Indeed, it isa new paradigm to model workflows through knowledge discovery in location traces. To that end, in this paper, we provide afocused study of workflow modeling by integrated analysis of indoor location traces in the hospital environment. In particular,we develop a workflow modeling framework that automatically constructs the workflow states and estimates the parametersdescribing the workflow transition patterns. More specifically, we propose effective and efficient regularizations for modeling theindoor location traces as stochastic processes. First, to improve the interpretability of the workflow states, we use the geographyrelationship between the indoor rooms to define a prior of the workflow state distribution. This prior encourages each workflowstate to be a contiguous region in the building. Second, to further improve the modeling performance, we show how to usethe correlation between related types of medical devices to reinforce the parameter estimation for multiple workflow models. Incomparison with our preliminary work [11], we not only develop an integrated workflow modeling framework applicable to generalindoor environments, but also improve the modeling accuracy significantly. We reduce the average log-loss by up to 11%.

Index Terms—Indoor Location Traces, Workflow Modeling, Healthcare Operation and Management

1 INTRODUCTIONReal-time location systems (RTLS) are being rapidlydeveloped and deployed. Of note, hospitals are in-creasingly using these systems to track the move-ment of medical devices, doctors, and patients (seeFigure 1), as well as the interaction among them.However, their utilization is currently limited to basictasks, such as locating a wheelchair or checking theavailability of an inpatient bed. In the near future,we expect indoor location tracking data to be widelyavailable in many hospitals, as well as in other envi-ronments (e.g., shops, schools, warehouses, etc).

Understanding sequences of procedures that reflectworkflows (e.g., surgery, from admission to recovery)remains an important challenge, which we address inthis work. Workflows reveal semantically meaningfulpatterns that can help (i) understand how space andassets (e.g., medical equipment, classrooms, shoppingareas) are utilized (workflow auditing); (ii) ensurethat such utilization complies with rules and regu-lations (workflow compliance); and (iii) perform anyof these tasks in real time (workflow monitoring). Forexample, many healthcare providers have their ownwork protocols to ensure that healthcare practicesare executed in a controlled manner. Non-compliance

Chuanren Liu is with the Decision Science and Management In-formation Systems Department, Drexel University, E-mail: [email protected]. Hui Xiong and Spiros Papadimitriou are withthe Management Science and Information Systems Department, Rut-gers University, E-mail: {hxiong,spapadim}@rutgers.edu. Yong Ge iswith Eller College of Management, University of Arizona, E-mail:[email protected]. Keli Xiao is with the College of Business,Stony Brook University, E-mail: [email protected].

to these protocols may be costly and expose thehealthcare providers to severe risks, such as litigation,prosecution, and damage to brand reputation. Thus,there is a real need for effective inspection of work-flow compliance. This work focuses on fundamentalmodels to support all of the above tasks.

Hospital managers traditionally accomplish thesetasks by inspecting the detailed workflow logs [7,15, 14, 19], which can be in heterogeneous formats,stored in different media (including paper), and pro-vided passively by personnel and, therefore, maybe biased and incomplete. The overall task is quitedaunting, and the opportunities to develop proactiveapproaches to help with workflow management tasksare unparalleled. However, RTLS deployments arestill used in a relatively basic way, as noted above,with little work focusing on how to leverage massiveindoor location traces. To this end, this paper pro-vides a focused study of workflow modeling via inte-grated analysis of indoor location traces, evaluated onreal data from hospital environments. Such workflowmodels serve as fundamental building blocks in awide range of workflow management problems.

In particular, we propose the Proactive WorkflowModel (ProWM) that, leveraging indoor locationtraces, provides a unified framework for simultane-ously (i) identifying semantically meaningful regionsas workflow states, and (ii) summarizing sequentialprocedures as workflow transitions. Moreover, theproposed method is applicable to general indoor en-vironments where the workflow patterns are hiddenin the location traces of moving objects.

Digital Object Identifier no. 10.1109/TKDE.2016.2631537

1041-4347� 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

2 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 29, NO. 3, MARCH 2017

Fig. 1: An example of a real-time location system(RTLS) deployed in a hospital. Bottom layer: sensortags attached to moving objects (e.g., medical devices,patients, doctors); middle layer: receivers relayingsignals from sensors; top layer: network bridges con-nected to data/application servers, to calculate andstore locations of tracked objects.

3D coordinates

Line distance Room graph

Distribution

Transition prob.

Roomsover rooms

MICRO MESO MACRO

1a.

1b.

2a.

2b.(3.)

(3.)

(3.)

MOBILITY

LOCAT

ION

(inferred) matrix

Fig. 2: Challenges (numbers explained in the introduc-tion) and our proposed solutions (filled boxes).

Although workflow modeling is central to manybuilding operations and management tasks, system-atically constructing and estimating those modelsbased on massive indoor location traces is a non-trivial endeavor. Next, we identify specific challengesin workflow modeling from indoor location traces,and we outline the ingredients of our proposed so-lution, which revolve around representation of po-sition/location and of mobility/transitions at threedifferent levels: micro, meso, and macro (see Figure 2).

At the “micro” level, we have the raw data, whichconsist of three-dimensional coordinates and geo-metric (Euclidean) distance between them. Based onthese, we have to construct or infer appropriate rep-resentations for workflow modeling.

1. “Meso” level. The raw data at the micro level,which is what much of the research on indoor locationtraces focuses on, are not appropriate for our purposesfor a number of reasons.1a. Location: The granularity and quality of indoor

location traces captured by wireless location systemsin the form of (x, y, z) coordinates might vary sub-stantially from place to place due to several factors:the density of sensor receivers, environment changes(e.g., doors opening/closing) the underlying local-ization techniques, and the sensor device itself (e.g.,remaining battery power). Given a particular set ofindoor location traces, it is very important to be awareof its specific granularity and quality when modelingor analyzing the data. Typically, it is possible toidentify the room that corresponds to a location withreasonable accuracy [10]. We therefore use the floorplan information to map raw locations (continuous)to rooms (discrete symbols).1b. Topology/proximity: Furthermore, the topologiesof indoor space are often much more complex thanoutdoor space. For example, a small distance be-tween two locations does not mean that one is easilyreachable from the other (e.g., two rooms with ashared wall but no door between them). Therefore,some fundamental assumptions commonly used for(outdoor) trajectories may not hold for indoors. Forinstance, widely-used similarity measures based ongeometric distance [4] or degree of overlap [6] arenot very meaningful for indoor location traces. Oursolution is to use a graph of communicating rooms.An additional complication is that this graph maychange over time, and therefore should also be in-ferred, rather than extracted from the floor plan.

2. “Macro” level. The representations at the “meso”level are still too fine-grained to provide useful in-sights into modeling the workflows.2a. Location granularity and semantics: Multiplelocations may be used either concurrently or inter-changeably and, therefore, should be grouped to-gether. However, this grouping will depend on theworkflow/procedure, and may change over time. Wemodel the functional significance of a location in thecontext of a workflow as a hidden state. Therefore, astate is a probability distribution over rooms. How-ever, in order to obtain interpretable results, we needto regularize it. To that end, we employ the graphderived in (1b) above.2b. Transition patterns: Once we have identified thestates (which, intuitively, should correspond to stages,or procedures, that comprise a workflow), we needto learn the transition patterns between them. Anadditional challenge is that, if we consider each devicetype independently, we may have insufficient data torobustly estimate model parameters. However, certainmedical devices are often used together for particularhealthcare tasks. By leveraging such naturally arisingcorrelations, we may improve the robustness of pa-rameter estimation for the finite state machines. Weshow that correlations can be incorporated effectivelyand efficiently as another regularization, for simulta-neously estimating multiple workflow models.

LIU ET AL.: A PROACTIVE WORKFLOW MODEL FOR HEALTHCARE OPERATION AND MANAGEMENT 3

3. Evolving environment. In addition to the abovechallenges, the structure of indoor space is very dy-namic and the modeling framework should automat-ically adapt. In more traditional settings, such as cityroad networks, the structure of the environment mayremain mostly unchanged for years. However, thestructure and utilization of modern buildings may fre-quently change, based on evolving needs. Althoughroom partitions (1a, above) typically don’t change,everything else (1b, 2a, and 2b) may change. Forexample, an elevator may be out of order, sectionsof the building may be closed for cleaning or repairs,thereby changing the graph. Rooms may become un-available or have their purposes re-assigned, there-fore changing state composition. Workflows may alsochange over time, as procedures are updated. Suchdynamic changes will alter the semantics of indoorlocation traces. In order to effectively and efficientlydeal with these challenges, our method should becomputationally efficient and also able to robustlyestimate parameters based on more limited, recentdata, rather than rely on a long past history.

As a result, although there is extensive work onthe analysis of location traces, e.g., [6, 13, 5], mostis not suitable for modeling indoor location tracesfor proactive workflow analysis. For example, themethod developed by Giannotti et al. [6] discoversfrequent trajectory patterns from outdoor locationtraces, based on given thresholds (minimum supportand time tolerance). However, such frequent patternscannot provide a parsimonious description of hos-pital workflows. For instance, workflow complianceinspection requires all activities, rather than only afrequent subset of moving patterns. Similarly, the pe-riodic patterns mined from outdoor spatio-temporaldata by Mamoulis et al. [13] are also a subset ofmoving activities and cannot fully meet the needs ofworkflow monitoring, auditing, or compliance inspec-tion. Finally, although stochastic models for indooractivities were proposed by Yin et al. [18], the purposeof these models is to classify activities (rather thanprovide an interpretable, parsimonious summary ofoverall activity patterns), based mostly on supervisedlearning, requiring sufficient labeled training data.

In preliminary work [11], we approached challenges(2a) and (2b) independently, by employing a density-based clustering algorithm over the room graph,to construct spatially contiguous “hard” clusters ofrooms. Once this clustering solution was fixed, stateswere fully observed, and we therefore can directlylearn the transition matrices between these clusters.

Contributions. We propose stochastic models toproactively unravel the workflow patterns hidden inthe massive indoor location traces, by automaticallydiscovering workflow states, and estimating param-eters describing the transition patterns. The discov-ered knowledge can then be transformed to valuable,

actionable intelligence in a wide range of practicalproblems in indoor environments (e.g., hospitals).

More specifically, in this work we unify the treat-ment of challenges (2a) and (2b), by proposing a novelhidden Markov model which is regularized usingthe inferred room graph and transition correlations.This allows us to learn good representations of states(which are now “soft” probability distributions) andjointly optimize learning states and transitions. Inaddition to rigorously unifying the modeling frame-work, our proposed approach also achieves betteraccuracy than previous work, without sacrificing com-putational efficiency.

We have also implemented and deployed a man-agement information system, HISflow, based on ourmethods to show how the discovered knowledge canhelp with the three important managerial tasks inhospitals: workflow monitoring, auditing, and com-pliance inspection.

The rest of the paper is organized as follows. Sec-tion 2 discusses work related to modeling healthcareworkflow patterns, and to analyzing indoor locationtraces. Section 3 introduces the indoor location dataand formalizes the problem of workflow modeling.Section 4 presents our model based on regularizedHMM (Hidden Markov Model) and Section 5 detailsmodel estimation. Section 6 reports results from exten-sive experimental evaluation of our method on bothsynthetic data and real-world data. Finally, Section 7concludes the paper.

2 RELATED WORK

Workflow analysis and mining. Workflow analysisconventionally relies on detailed workflow logs [1,16, 7]. Workflow processes are typically representedby activity graphs. Given the execution logs, whichare lists of activity records, workflow mining can beformalized as a graph mining problem by viewingexecution logs as walks on the activity graphs [1].In practice, there might be discrepancies between theactual workflow processes and those perceived by themanagement. In this case, to discover a completelyspecified workflow design model, Van der Aalst et al.[16] presented an algorithm to extract a process modelfrom the workflow logs and represent it in termsof a Petri net. Instead of discovering the completemodel, Greco et al. [7] later formalized the problemof discovering the most frequent patterns of execu-tions, i.e., the workflow substructures that have beenscheduled more frequently and have lead to a desiredfinal configuration. However, these methods rely onworkflow logs which are often manually recordedin several settings, including the healthcare industry.Thus, the results may be distorted due to bias andmissing records. These distortions can be misleadingfor many operation and management tasks in hospi-tals, such as the inspection of workflow compliance.

4 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 29, NO. 3, MARCH 2017

In comparison, as we discussed in Section 1, in thispaper we propose a proactive approach to workflowmodeling by mining the digital location traces of mov-ing objects. These are automatically recorded by RTLS,requiring practically no human intervention. The sta-tistical modeling results provided by our approachare helpful for a range of operation and managementtasks in hospital environments.

Activity modeling and prediction. In terms ofmethodology, another category of related work isthe modeling and prediction of human activities. Forinstance, Yin et al. [18] proposed stochastic processmodels to predict the goals of indoor human activities.Furthermore, for multiple-goal recognition, Chai andYang [2] proposed a two-level architecture for behav-ior modeling and Hu and Yang [8] developed a dy-namic Bayesian model where skip-chain conditionalrandom fields were used for modeling interleavinggoals. All the above approaches are supervised andrequire sufficient labeled training data. However, inour setting we wish to predict the next individuallocation/state occurring in well-defined workflows,rather than categorize sequences of states. Moreover,we do not have labeled training data.

Trajectory mining. In terms of analytics of locationtraces, trajectory pattern mining is also related to thiswork. For instance, Giannotti et al. [6] introducedtrajectory patterns as frequent behaviors in terms ofboth time and space, where the frequent trajectorypatterns are computed based on given thresholds.In [9], methods were proposed to discover periodicpatterns from spatio-temporal data, where a periodicpattern is defined as a regular activity which pe-riodically happens at certain locations. Also, Yangand Hu [17] proposed methods to discover sequen-tial patterns from imprecise trajectories of movingobjects. However, these methods were not developedfor indoor spaces, were not designed for the purposeof workflow modeling and, more importantly, themined frequent patterns cannot provide a parsimo-nious description of healthcare activities in hospitals,to support the applications we have considered.

Area-of-interest detection. Finally, the last category ofrelated work is the detection of area-of-interest, basedon trajectory data. For instance, Liu et al. [12] pro-posed a non-density-based approach, called mobility-based clustering, to identify the hotspots of movingvehicles in an urban area. The key idea is that sampleobjects serve as “sensors” to perceive vehicle crowd-edness in nearby areas using their instant mobility,rather than the “object representatives”. Moreover,Zheng et al. [20] proposed a stay point concept andidentified hotspots from human moving trajectories.One location was considered as a hotspot if severalmoving objects stay nearby over a given time period.Finally, Giannotti et al. [6] used the neighborhoodfunction to model Regions-of-Interest. They parti-

tioned space into grids and quantified the interest ofeach grid by the density/direction information withineach grid. As we have discussed, although methodsmentioned above are successful for analyzing outdoorlocation traces, most of them are not applicable toindoor environments and, in particular, hospitals, be-cause of the unique characteristics of indoor space.

3 PRELIMINARIES AND PROBLEM FORMU-LATION

We first describe the data from indoor location tracesand introduce necessary notation. Then, we formalizethe problem of indoor workflow modeling.

3.1 Data Description and TransformationOur location traces (trajectories) of medical devicesare collected indoors at several US hospitals. Thelocation trace of an object O is defined as:

Definition 1 (Location trace): A location trace is a se-quence (L1, L2, ...), where Li represents the ith recordin the sequence. Li = (starti, endi;xi, yi, zi) con-tains numerical coordinates (xi, yi, zi) and the timethat record was recorded, where starti and endi arethe start and end time, respectively, of Li. In otherwords, during the time frame from starti to endi, theobject stays at the coordinate (xi, yi, zi) in a three-dimensional indoor space.

However, the indoor wireless communication maybe interrupted by environmental factors, leading toerrors and noise in the localization of moving ob-jects. Therefore, a coordinate localized by the RTLSmight not indicate the exact position, but a smallarea surrounding the coordinate [10]. In addition, itis not meaningful to use the raw coordinates forindoor workflow modeling. For example, althoughtwo recorded coordinates may be separated by asmall geometric distance on the same floor, the actualmoving distance from one coordinate to the other maybe very large if there is, e.g., a wall between them.

To cope with these challenges, we normalizethe original location traces for workflow modeling.Specifically, we project each raw coordinate to a se-mantic location of the building, such as a room inthe hospital, based on the floor maps of the building.For the data and maps in this study, each hallway isalso treated as a room and some long hallways havebeen segmented into multiple smaller rooms. Then, Li

can be transformed to: Li = (starti, endi, ri), whereri is the room containing the coordinate (xi, yi, zi).After this projection, two neighboring coordinates ofthe raw location traces may be mapped into the sameroom. In other words, ri for Li may be the same asri+1 for Li+1. We therefore merge runs of consecutiverecords with the same room into one record. Specifi-cally, if i < j and ri = ri+1 = · · · = rj , we replace thesubsequence (Li, Li+1, · · · , Lj) with only one recordL∗i = (starti, endj , ri).

LIU ET AL.: A PROACTIVE WORKFLOW MODEL FOR HEALTHCARE OPERATION AND MANAGEMENT 5

Fig. 3: Workflow instances of infusion pump.Locations FunctionsC1 Post-anesthesia care unit (PACU)C2 Operating room (OR)C3 Intensive care unit (ICU)C4, C5 Patient care unit (PCU)E1, E2 Elevator

TABLE 1: Annotation of key locations in Figure 3

Now, each raw numerical location is mapped to asymbolic room representation (graph node), and eachlocation trace is transformed to a symbolic sequence(traveling path). This data pre-processing step greatlysmooths out the noise and alleviates the impact oferrors on subsequent workflow modeling tasks (de-scribed next). Additionally, it also drastically reducesthe computational cost, since we significantly reducethe number of records in the data.

3.2 Concepts for Workflow Modeling3.2.1 Workflow Modeling in Healthcare EnvironmentsOur goal of workflow pattern modeling is to automat-ically summarize the work activities in a systematicmanner. To this end, the concept of workflow patternsis actually a hierarchy with several levels. At the low-est level, the location trace can be seen as a workflowpattern instance. For example, in Figure 3, we showthe location trace of an infusion pump (red line).However, it is difficult to semantically understand thepattern hidden in the raw location traces.

At the other extreme, three general high-level work-flow stages can be used to describe workflow logistics:preprocessing maintenance stage, in-use stage, postprocess-ing maintenance stage. The in-use stage of a medical de-vice corresponds to the period when it is used for anyhealthcare purpose. Before the in-use stage, a device isin the preprocessing maintenance stage, e.g., held in thestorage room. After the in-use stage, the device must gothrough the postprocessing maintenance stage before thenext use cycle. These maintenance processes includecleaning up, sterilization, disinfection, etc. However,modeling workflow patterns based on the genericstages is too coarse to provide useful results thatcan help with healthcare workflow management. Weneed a middle-level representation of the location traces,which leads to our proposed workflow models.

3.2.2 General Location Based Workflow ModelingIndeed, as we discussed in the Introduction, we expectindoor location tracking data to be more widely avail-able in environments beyond hospitals (e.g., shops,schools, warehouses, etc), and the automatic work-flow modeling with the indoor location data can beapplied to discover different types of human activitypatterns in various of scenarios. To this end, it’s aubiquitous challenge to identify the right represen-tation level of the location traces for better workflowmodeling and understanding. The raw location tracesinclude too much unnecessary minutia, while the puresemantical high-level workflow stages are to coarseto encode the spatial information. Therefore, one ofour objectives in developing the indoor location basedworkflow models is to automatically construct theworkflow states, which are spatially contiguous as wellas semantically meaningful. In the following, we firstdiscuss how to construct the workflow states as theright representation levels in the healthcare environ-ments. We also highlight the assumptions used by ourmodels. Therefore, the proactive workflow modelingframework is applicable to general domains where theassumptions can be accepted and the data is available.

3.2.3 Workflow StatesTo better model and understand the workflow pro-cesses, we define the workflow states as a few keyareas in these trajectories. Our goal is to automaticallygenerate these workflow states and use them as “an-notations” of the original location traces. For example,we annotate the location traces in Figure 3 with areasCi (i = 1, · · · , 5). The medical functions of theseareas are summarized in Table 1. The location tracein red in Figure 3 can now be concisely representedas C1 �→ C2 �→ C3 �→ C4. Such representation with thekey areas makes it easy to understand the workflowunderlying the location traces.

In fact, Figure 3 is the map of the second floor ofa hospital building, which is centered around E1, theelevator connected to the basement. The red locationtrace of one infusion pump starts from the storageroom in the basement to the elevator E1. After E1

follows C1, which is the Post-Anesthesia Care Unit(PACU), where the patient is ready for medical pro-cedures, such as surgery, and the medical devicesare attached to the patients. Next, C2 is the area ofoperating rooms (OR), where medical procedures areperformed. After the medical procedure, the patientsand the medical devices are moved to C3, the Inten-sive Care Unit (ICU), based on medical needs. Whenthe condition of the patient becomes stable, the patientand the medical devices being used will be furthermoved to C4, the Patient Care Unit (PCU), before thepatient is discharged. After the patient is discharged,the medical devices will be moved through elevatorE2 to the basement, cleaned in a disinfection room,and then transferred into storage.

6 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 29, NO. 3, MARCH 2017

It should be noted that Figure 3 is a grossly simpli-fied view of reality. In fact, we may have many work-flow states, spanning multiple floors and buildings,and the overall workflow patterns are much morecomplex. Formally, we define the workflow states (ingeneral indoor contexts) as follows:

Definition 2 (Workflow state): A workflow state is anarea in the indoor space where specific workflowactivities frequently happen.

From the above example, it is clear that each sucharea (where a particular stage of the process occurs)should correspond to a workflow state, and that lo-cation traces represented using such workflow statesconvey the necessary information for easily under-standing and modeling workflow patterns. In otherwords, workflow patterns can be understood muchmore clearly when location traces are representedusing workflow states, rather than by using eitherlow-level rooms or high-level stages.

3.3 Problem Statement of Workflow ModelingA further task in workflow modeling is to summarizethe transition patterns of the moving objects amongworkflow states. As shown in Figure 3, the transitionfrom one state may lead to several different states. Forexample, when the situation of a patient is stable afterthe medical operation at C2, the medical devices andthe patient might be moved directly to PCU (C4, C5)without passing through ICU (C3).

Now, we can formally state the problem of location-based workflow pattern modeling in general indoorenvironments:

Problem 1 (Workflow pattern modeling): Given the lo-cation traces of moving objects (e.g., medical devicesin hospitals, shoppers in malls, loading trucks inwarehouses, etc), workflow pattern modeling discov-ers workflow knowledge including workflow statesand parameters describing the transition patterns ofthe moving objects.

In practice, we have multiple types of objects mov-ing around indoors, and one workflow model can bebuilt for each of the object types. The multiple modelsare not independent with each other. For instance,in hospitals, many different types of medical devicesare often used together for a particular task. There-fore, although different types of medical devices havedifferent workflow patterns, there is some naturalcorrelation among their location traces. Modeling suchcorrelation not only helps reinforce the robustnessof the workflow models, but also provides betterunderstanding of the overall workflow patterns.

4 METHODOLOGY

Although the workflow states are not directly visible,the observed location traces depend on the hiddenworkflow states. Therefore, our Proactive WorkflowModel (ProWM) uses the Hidden Markov Model

rnt−1 rnt rnt+1

snt−1 snt snt+1

n = 1, · · · , NFig. 4: The graphical Hidden Markov Model. Whitenodes snt ∈ S are hidden workflow states; Shadednodes rnt ∈ R are observed room sequence.

(HMM) to construct the workflow states. In a HMM,each state has a probability distribution over thepossible output observations. With the Markov as-sumption, the state distribution can be estimated withthe observed data. In the following, we first intro-duce necessary notation for hidden Markov modelingof location traces, then present our regularizationschemes to obtain spatially contiguous (thus more in-terpretable) states, and to leverage correlations amongdevices for multi-flow estimation. Our regularizationis intuitive, and can be viewed as a prior over themodel parameters. The parameters in the regularizedmodels can also be estimated effectively.

We observe a set of room sequences X = {xn : n =1, · · · , N}, where xn = (rn1 , · · · , rnTn

) is the sequencefor the n-th moving object, Tn is the length of xn,and N is the number of all moving objects. Eachobservation rnt ∈ R is associated with a workflowstate snt ∈ S. Here, R is the set of rooms in thebuilding and S is the set of workflow states. Withoutloss of generality, we assume R = {1, 2, · · · ,KR} andS = {1, 2, · · · ,KS}, where KR is the number of roomsand KS is the number of states. Hidden MarkovModels assume that, as shown in Figure 4, whenstate snt is given, the corresponding observation rntis independent to all other states snl for l �= t, and thestate snt+1 is independent to all previous states snl forl < t. Then, the model is fully determined by the statetransition probabilities Aij = Pr(snt+1 = j|snt = i), theemission probabilities Bij = Pr(rnt = j|snt = i), andthe initial state distribution πi = Pr(sn1 = i).

Probability parameters Θ = (A,B, π) can be esti-mated by maximizing the log-likelihood L(Θ|X) =log Pr(X|Θ), e.g., the Baum-Welch algorithm. We im-prove the model estimation using the maximum aposteriori (MAP) estimation:

Pr(X,Θ) ∝ Pr(X|Θ) exp(−λΩ(Θ)).

In the remaining of this section, we define the priorregularization Ω(Θ).

4.1 Regularized Workflow StatesIn practice, the identified workflow states should bespatially contiguous. For example, we want to identifyareas of semantically meaningful locations, such as‘2nd floor northeast patient rooms’ and ‘basement

LIU ET AL.: A PROACTIVE WORKFLOW MODEL FOR HEALTHCARE OPERATION AND MANAGEMENT 7

central storage rooms’. In this section, to encourageeach workflow state to be a contiguous region inthe building, we use the proximity between roomsto define a prior on the workflow state distribution.Specifically, let G be the graph where nodes repre-sent rooms and edges signify that these two roomscommunicate with each other. We also use G as itsadjacency matrix. We define the regularization

ΩB(B) =1

2

s

ij

Gij(Bsi−Bsj)2 = tr(B(D−H)B′),

where H = 12 (G+G′) is a symmetric graph and D is

the diagonal degree matrix of H , with Dii =∑

j Hij .Intuitively, we use ΩB(B) to encourage neighbor-ing rooms to have similar probabilities in the statedistribution. Note that, D − H is always positive-semidefinite and ΩB(B) ≥ 0.

The graph G can be user-provided or automati-cally defined given the historical location traces ofthe moving objects. In this study, we let Gij = 1if and only if there exists a trace (rn1 , · · · , rnTn

) and1 ≤ t < Tn, such that rnt = i and rnt+1 = j. In otherwords, we let Gij = 1 if one can transit from room ito j directly, and Gij = 0 otherwise. Such a definitionavoids the use of geometric distance, which is notparticularly meaningful in indoor environments, asdiscussed earlier. Moreover, our definition also avoidsthe use of parameters, such as distance thresholds.

4.2 Adaptive Multiflow EstimationFor healthcare procedures on one patient, multipletypes of medical devices are often needed at the sametime and place. Thus, these different types of movingobjects transit together, therefore following correlatedworkflow models, which can be estimated jointly. Tothis end, first, we can use a common workflow statedistribution B = B(k), k = 1, · · · ,K, for all the Ktypes of moving objects. Indeed, using a commonstate distribution for all types also reduces the modelcomplexity dramatically.

Second, to estimate the K workflow transition ma-trices {A(k)|1 ≤ k ≤ K}, we borrow the idea ofmean-regularized multi-task learning [3]. Specifically,we assume that each A(k), for 1 ≤ k ≤ K, can bewritten as

A(k) = A(0) +D(k)

where the difference D(k) = A(k) −A(0) between A(k)

and A(0) is ‘small’ when the workflow transitions ofdifferent types of moving objects are similar to eachother. Therefore, define the regularizer as

ΩA(A) =1

2

K∑

k=1

‖D(k)‖2

=1

2

K∑

k=1

‖A(k) −A(0)‖2

One can show that the optimal A(0) minimizing theregularizer ΩA(A) is

A(0) =1

K

K∑

t=1

A(t),

and thus we have

ΩA(A) =1

2

K∑

k=1

‖A(k) − 1

K

K∑

t=1

A(t)‖2. (1)

In other words, mean-regularized multi-task learningassumes that the transitions are related in a waythat the true estimates are all close to the meanA(0). Note that, although in this work we consideronly the mean-regularized multi-task learning for themultiflow estimation, general formulations can beeasily adopted when more knowledge is available. Forexample, if we had pairwise correlations between theworkflow models of different types of objects, e.g., thecorrelation ρij between the type i and the type j, wecan define the ‘multiflow’ regularization as

ΩA(A) =1

2

ij

ρij‖A(i) −A(j)‖2. (2)

5 IMPLEMENTATION

With the regularizations ΩA(A) and ΩB(B), the ob-jective function to simultaneously estimate all theworkflow models is

J (A(1), · · · , A(K), B, π(1), · · · , π(K))

=∑

k

L(A(k), B, π(k))− Ω(Θ) (3)

where L(A(k), B, π(k)) = log Pr(r(k)|A(k), B, π(k)) and

Ω(Θ) = λ1ΩA(A) + λ2ΩB(B).

We maximize J (Θ), where the parameters are Θ =(A(1), · · · , A(K), B, π(1), · · · , π(K)), using the regular-ized Baum-Welch algorithm, which iteratively appliesthe following two steps. Based on the current valueof Θ̂, the first step is to compute the probabilities:

γ(k)i (t) = Pr(s

(k)t = i|r(k))

ξ(k)ij (t) = Pr(s

(k)t = i, s

(k)t+1 = j|r(k))

and define

Γ(k)ij =

Tk∑

t=1

γ(k)i (t)[r

(k)t = j], Ξ

(k)ij =

Tk−1∑

t=1

ξ(k)ij (t).

Note that, here we assume that there is only one obser-vation sequence r(k) of the k-th type of the moving ob-jects. Notations for more general case would be morecomplicated but leading to the same computation.

The second step of this Expectation Maximization(EM) algorithm computes the new parameter value

8 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 29, NO. 3, MARCH 2017

Θ, by maximizing the expected value of the objectivefunction in Equation 3

Q(Θ) =∑

k

Es(k)|r(k),Θ̂ log Pr(s(k), r(k)|Θ)− Ω(Θ)

=∑

s(k)

Pr(s(k)|r(k), Θ̂) log Pr(s(k), r(k)|Θ)− Ω(Θ)

where the log-likelihood is given by

log Pr(s(k), r(k)|Θ)

= log π(k)

s(k)1

+

Tk−1∑

t=1

logA(k)

s(k)t s

(k)t+1

+

Tk∑

t=1

logBs(k)t r

(k)t

It follows that π(k) = argmax∑

i γ(k)i (1) log π

(k)i ,

where the solution is π(k)i = γ

(k)i (1). For updating of

A and B, we have the following results:Proposition 1 (Optimization subproblem of A): Given

the current value of (B, π(1), · · · , π(K)), the problemof updating (A(1), · · · , A(K)) to optimize Q(Θ) isequivalent to:

A = argmax∑

k

ij

Tk−1∑

t=1

ξ(k)ij (t) logA

(k)ij − λ1ΩA(A)

= argmax∑

k

ij

Ξ(k)ij logA

(k)ij − λ1ΩA(A)

s.t. A ≥ 0,∑

j

A(k)ij = 1, ∀k, i.

Proposition 2 (Optimization subproblem of B): Giventhe current value of (A(1), · · · , A(K), π(1), · · · , π(K)),the problem of updating B to optimize Q(Θ) isequivalent to:

B=argmax∑

k

ij

Tk∑

t=1

γ(k)i (t)[rjt =j] logBij − λ2ΩB(B)

=argmax∑

k

ij

Γ(k)ij logBij − λ2ΩB(B)

s.t. B ≥ 0,∑

j

Bij = 1, ∀i.

Note that, A,B ≥ 0 are element-wise constraints.Since λ1, λ2 ≥ 0, both problems in Proposition 1 and2 are concave and we have efficient algorithms (e.g.,gradient ascent) to update A and B. Particularly forthe special case when λ1 = λ2 = 0, we have:

A(k)ij =

Ξ(k)ij

∑j Ξ

(k)ij

, Bij =

∑k Γ

(k)ij

∑k

∑j Γ

(k)ij

.

5.1 Choosing the Number of Workflow StatesOne important aspect of training ProWM is how tochoose an appropriate number of latent workflowstates. A commonly used approach for estimating thelatent states of HMMs is to leverage domain knowl-edge or some existing algorithms to pre-cluster theobservations [21]. In our problem, we shall pre-cluster

the indoor location records to guide the training ofProWM. To this end, we adopt the density-basedclustering algorithm proposed in Liu et al. [11], whichcomputes the neighborhood of a location record aswell as its density based on the historical locationtraces. Using this algorithm, the number of clusterscan be automatically determined, and the detectedclusters can be of different densities and arbitraryshapes. More importantly, the detected clusters arespatially contiguous, and therefore can be semanti-cally annotated, such as ‘2nd floor northeast patientrooms’ and ‘basement central storage rooms’.

In this paper, since we use a common emissionmatrix in Equation 3, we accordingly piece togetherthe pre-cluster solutions for all the K types of movingobjects:

C = ∪Kk=1C

(k)

where C(k) is the pre-cluster solution of the kth typeof moving object. Note that, when C1 ∈ C(1) andC2 ∈ C(2) overlap, i.e., C1 ∩ C2 �= ∅, we merge themtogether as C1 ∪ C2 ∈ C. Now, we have the commonpre-cluster solution expressed in C and can determinethe number of workflow states |C|.

5.2 Non-Essential Space ReductionIn addition to choosing the number of latent states inour ProWM, the results of pre-clustering can also beused for accelerating parameter estimation. In reality,not every room in the building is associated withhealthcare activities and not every movement in thedevices’ trajectories needs to be modeled. For exam-ple, we may have many location records in front of theelevator. Although many trajectories may briefly gothrough that location, no healthcare activities occur atsuch non-essential locations. With the pre-clusteringsolution, we can exclude the non-essential observa-tions in the indoor space, to reduce the modeling com-plexity. As shown in Liu et al. [11], the density-basedpre-clustering algorithm can effectively identify thenon-essential locations as outliers in the clustering re-sults. Intuitively, these low-density1 outliers consist ofshort-duration stays at one or more locations, whereashealthcare activities happen mostly when the medicaldevices are stationary. Therefore, our implementa-tion filters out the non-stationary locations from theworkflow modeling. Specifically, the union ∪c∈Cc isa proper subset of the indoor space and, accordingly,we model only the observations rkt ∈ ∪c∈Cc, reducingthe modeling complexity without compromising themodeling quality of the workflow patterns.

6 EMPIRICAL EVALUATION

This section evaluates the performance of our work-flow models. We first use the synthetic data to show

1. Density also incorporates time duration of stay.

LIU ET AL.: A PROACTIVE WORKFLOW MODEL FOR HEALTHCARE OPERATION AND MANAGEMENT 9

Fig. 5: The workflow states of HMM and ProWM.

Fig. 6: The workflow transitions of HMM and ProWM.

the effectiveness of our ProWM methods. We thenapply our models on real-world data collected inindoor healthcare environments.

6.1 Synthetic Data and ResultsWe simulate workflow processes with known statedistributions and transition patterns to demonstratethe effectiveness of ProWM. First, we define fiveworkflow (hidden) states where each distributes uni-formly on 20 rooms (symbols), with emission proba-bility 0.05 on each room. Therefore we have a blockstructure in the emission matrix B. We inject Gaussiannoises N (0, 0.01) in the emission matrix. Then, wesimulate 200 sequences each with length 100 (total20, 000 observations). With the simulated data, thehidden states identified by naive HMM method isshown in Figure 5 (top). As can be seen, the naiveHMM method can only partially uncover the ground-truth structures of the hidden state. In comparison, asshown in Figure 5 (middle), our regularized ProWMmethod can (almost) perfectly construct the (per-muted) true workflow states in Figure 5 (bottom).

To test the effectiveness of the multiflow regu-larization (Equation 2), we randomly generate twotransition patterns shared by five workflow processeswith transition matrices A(k), k = 1, · · · , 5. The twotransition patterns are shown in Figure 6 (bottom),where Pattern 1 is shared by the first three processesand Pattern 2 is shared by the last two. The transi-tion matrix of each process is generated by injectingN (0, 0.01) to the associated pattern. Then we simulate

6.3 Workflow States

In Equation 3, we set λ1 = 0 and select an appropriatevalue for λ2 = 10v where v ∈ [−5, 5], to constructinterpretable workflow states. Generally, the larger theλ2, the more continuous the states. Figure 7, illustratesthe unregularized workflow states on two floors inHospital 1. The figures at the lower level are mapsof the basement, and the figures at the upper levelare maps of the second floor. In each map, rooms areshaded based on the emission probabilities (> 10−5)for each state distribution (darker indicates higherprobability of a room belonging to that state). Severalissues are evident in the results. First, some states arenot contiguous in the indoor space, e.g., State 1, 2 and5. State 2 even covers rooms in two different floors.Second, States 3 and 4 are duplications of each other.

100 sequences (with length 100) for each workflowprocess. As shown in Figure 6 (top), the transitionpatterns identified by five independent HMMs are notcapturing the designed Pattern 1 and Pattern 2. Thereason is that the HMM optimization is highly non-convex with difficulty to find the designed optimumin the optimization space. In contrast, with our multi-flow regularization, ProWM can successfully approx-imate the ground-truth in all workflow processes.

In addition to above visual comparisons, ProWMalso increases the likelihood of the model parameters.The increases are 72% and 31% per sequence in Figure5 and 6, respectively.

6.2 Indoor Location Data

Our real-world indoor location data sets are collectedfrom several hospitals in the US. Sensor tags are at-tached to medical devices operated in these hospitals,allowing them to be tracked by real-time locationsystems. Table 2 shows basic statistics of the data col-lected for various types of medical devices in Hospital1. The second and third columns show the number ofmedical devices of each type operated in this hospital,and the number of location records collected duringthe period from January 2011 to August 2011.

In the remaining of this section, we first illustratethe workflow states identified by ProWM. We thenanalyze the goodness-of-fit of ProWM, with respect toimportant statistics. Particularly, we show that the reg-ularized workflow states are semantically meaningfuland easily amenable to interpretation; and the adap-tive multiflow estimation can reinforce the modelingperformance, especially for real-time applications.

Type #Objects #LocationsWheelchair 121 524415PCA II Pump 66 4431Venodyne 403 1588045Feeding Pump 83 157370PCA Pump 136 231057ETCO2 137 220380

TABLE 2: Data statistics in Hospital 1.

10 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 29, NO. 3, MARCH 2017

(a) State 0 (b) State 1 (c) State 2

(d) State 3 (e) State 4 (f) State 5Fig. 7: Unregularized workflow states. Issues: 1) Several states are not continuous in the indoor space, e.g.,State 1, 2 and 5. Especially, State 2 covers rooms in two floors. 2) States 3 and 4 are effectively duplicates.

In contrast, as shown in Figure 8, these issuescan be effectively addressed by the regularization weproposed in Section 4.1. Not only the regularizedworkflow states are contiguous in the indoor space,but they can also be easily matched to known roomusage, e.g., Post-anesthesia care unit (PACU), Operat-ing room (OR), Intensive care unit (ICU), Patient careunit (PCU), Device Maintenance and Device Storage.For example, in the unregularized workflow states,the left highlighted area in the basement of State 1and the highlighted area in the basement of State 2 areactually neighbors and all are used for storage. Thecentral highlighted area in the basement of State 1 hasdifferent usage, which is device maintenance. On theother hand, in the regularized workflow state we haveclearly identified the Storage and Maintenance states

without any supervision. Also, regularized workflowstates have smooth emissions in contiguous areas ofthe building (e.g., compare the PCU state in Figure 8with State 5 in Figure 7).

6.4 Goodness-of-FitWe measure the goodness-of-fit of the learned work-flow models by computing the average log-loss withtest location traces. For a test location sequence Tr =(r1, · · · , rT ), the average log-loss is

�(Tr) = − 1

Tlog Pr(Tr|Θ). (4)

We randomly partition the data into ten subsets andcompute the average log-loss in ten rounds. In eachround, we use nine of these subsets as training data

LIU ET AL.: A PROACTIVE WORKFLOW MODEL FOR HEALTHCARE OPERATION AND MANAGEMENT 11

(a) PACU (b) OR (c) ICU

(d) PCU (e) Maintenance (f) StorageFig. 8: Regularized workflow states can be easily interpreted with semantics as follows: Post-anesthesia careunit (PACU), Operating room (OR), Intensive care unit (ICU), Patient care unit (PCU), Device Maintenanceand Device Storage.

to estimate the model parameters and compute �(Tr)for each Tr in the remaining test data.

We set λ2 based on results in Section 6.3, thenchoose λ1 = 10v with v ∈ [−5, 5] that achieves thelowest average log-loss on validation data. The solidred lines in Figure 9 show �(Tr) with respect to thelength of Tr. We also show the results with solid greenlines for the baseline models, using λ1 = 0 in Equa-tion 3. Figure 9 shows that the regularized multiflowmodels consistently achieve lower information lossacross different types of moving objects.

For certain workflow managerial applications, wehave to estimate the models in real-time with limitedtraining data. In this case, our models substantiallyoutperform baselines by unified estimation. This is

clearly demonstrated by repeating the above compar-ison using less training data. As shown by the dashedlines, the improvement of the regularized multiflowmodels is more significant compared to baselines.Moreover, the regularized models are more robust,without sharp jumps in the plot.

In Figure 9, we also show the performances ofthe baseline methods developed by Liu et al. [11] inblue lines. The results demonstrate that goodness-of-fit is significantly improved by integrating workflowconstruction and transition estimation.

6.5 Computational EfficiencyIn this subsection, we evaluate the performanceof training ProWM. In particular, we first choose

12 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 29, NO. 3, MARCH 2017

0 5 10 15 20

2.6

2.65

2.7

2.75

(a) Wheelchair0 5 10 15 20

2.3

2.32

2.34

2.36

2.38

2.4

(b) PCA II Pump

0 5 10 15 20

2.3

2.32

2.34

2.36

2.38

(c) Venodyne

0 5 10 15 20

2.3

2.35

2.4

(d) Feeding Pump

0 5 10 15 20

2.3

2.32

2.34

2.36

2.38

2.4

(e) PCA Pump

0 5 10 15 20

2.3

2.35

2.4

2.45

2.5

2.55

(f) ETCO2

Fig. 9: The comparison of average log-loss. Red: ProWM with regularized multiflow models; Green: ProWMwithout multiflow regularization; Blue: Baselines developed by Liu et al. [11]; Solid: 90% training and 10%testing; Dashed: 10% training and 90% testing;

Model Training:Testing Running InformationTime (s) Loss

ProWM 9:1 238 2.270(reduced space) 1:9 31 2.399

ProWM 9:1 1037 2.267(full space) 1:9 419 2.401

CTMC [11] 9:1 71 2.4011:9 29 2.689

TABLE 3: Comparison of the running time of differentmethods in two scenarios. In one scenario we havesufficient training data, with a ratio of training:testingdata equal to 9:1. In the other scenario the ratio is 1:9.

better prediction performance (2.399 < 2.689). In otherwords, we reduce the average log-loss by 11% whilemaintaining computational efficiency.

6.6 Application: Workflow Monitoring

The learned workflow model is valuable, since a rangeof practical problems can benefit from the modelingresults. Indeed, we have implemented a managementinformation system, HISflow, to exploit the discoveredknowledge for healthcare operation and management.A screenshot of HISflow is shown in Figure 10; fortechniques used in our implementation, see Liu et al.[11]. In the following, we elaborate on one particularapplication, workflow monitoring.

When the extracted workflow patterns are mappedto specific procedure codes (either existing or new),we can identify abnormal behavior within dailyhealthcare activities in real-time. When such anoma-lies occur, warnings or alerts can be activated bythe management system, potentially helping reducethe risk of faults or accidents in healthcare services.One approach to develop such a system is to rankthe ongoing trajectories of all monitored medical de-vices based on the average log-loss in Equation 4. Inthis way, the top-ranked devices merit more scrutiny.However, these ranking results may not be intuitivefrom the management perspective. In fact, it is vitalto provide more insights into the causes behind thehigher log-loss trajectories. To this end, given therecent trajectory Tr = (r1, r2, · · · , rT ) which is rankedin the top, we compute the most likely correspondingstate sequence (s1, s2, · · · , sT ) and identify the time

Two interesting observations follow from the com-parison of running time and information loss in Ta-ble 3. First, by reducing the non-essential modelingspace, we can significantly reduce computation com-plexity without decreasing modeling performance.The model with reduced space has average log-loss comparable to that of the model using the fullspace. Second, in the scenario with substantially lim-ited training data, the running time of the proposedProWM model with reduced modeling space is closeto that of the baseline model developed in Liu et al.[11]. However, our ProWM model has significantly

the number of workflow states based on the pre-clustering of location records. We also use the pre-clustering solution to reduce the modeling space, asintroduced in Section 5.2. After that, we run thealternative optimization procedure to compute theoptimal transition probabilities Ak for k = 1, · · · ,Kand the workflow emission probabilities B. We com-pare the training time against two baselines: 1) Thesame alternative optimization procedure without non-essential space reduction. 2) The model developed inLiu et al. [11].

LIU ET AL.: A PROACTIVE WORKFLOW MODEL FOR HEALTHCARE OPERATION AND MANAGEMENT 13

7 CONCLUDING REMARKS

In this paper, we leveraged location traces of medicaldevices to model the healthcare workflow patterns inhospital environments. Specifically, we developed astochastic process-based framework, which providesparsimonious descriptions of long location traces. Thisframework provides new opportunities to conciselyunderstand the logistics of a large hospital. From atechnical perspective, we proposed a unified mod-eling approach based on a novel regularized HMMmethod, that produces interpretable states and lever-ages correlations between devices for improved ro-bustness. From an application perspective, the dis-covered knowledge, such as workflow states andtransition patterns can be integrated into management

information system we developed. With this system,we showed that valuable intelligent applications forhealthcare operation and management can be en-abled to manage, evaluate and optimize the healthcareservices. Extensive experimental results on both thesynthetic data and the real-world data validated theeffectiveness of our proposed work.

REFERENCES

[1] Rakesh Agrawal, Dimitrios Gunopulos, andFrank Leymann. Mining process models fromworkflow logs. Advances in Database Technolo-gyłEDBT’98, pages 467–483, 1998.

[2] Xiaoyong Chai and Qiang Yang. Multiple-goalrecognition from low-level signals. In Proceedingsof the National Conference on Artificial Intelligence,volume 20, page 3. Menlo Park, CA; Cambridge,MA; London; AAAI Press; MIT Press; 1999, 2005.

[3] Theodoros Evgeniou and Massimiliano Pontil.Regularized multi–task learning. In Proceedingsof the tenth ACM SIGKDD international conferenceon Knowledge discovery and data mining, pages 109–117. ACM, 2004.

[4] Yong Ge, Hui Xiong, Zhi-hua Zhou, HasanOzdemir, Jannite Yu, and Kuo Chu Lee. Top-eye: top-k evolving trajectory outlier detection. InProceedings of the 19th ACM international conferenceon Information and knowledge management, pages1733–1736. ACM, 2010.

[5] Yong Ge, Hui Xiong, Chuanren Liu, and Zhi-HuaZhou. A taxi driving fraud detection system. InProceedings of the 11th IEEE International Confer-ence on Data Mining, pages 181–190. IEEE, 2011.

[6] Fosca Giannotti, Mirco Nanni, Dino Pedreschi,and Fabio Pinelli. Trajectory pattern mining.In Proceedings of the ACM SIGKDD internationalconference on Knowledge discovery and data mining,pages 330–339, 2007.

[7] Gianluigi Greco, Antonella Guzzo, GiuseppeManco, and Domenico Sacca. Mining and rea-soning on workflows. Knowledge and Data Engi-neering, IEEE Transactions on, 17(4):519–534, 2005.

[8] Derek Hao Hu and Qiang Yang. Cigar: concur-rent and interleaving goal and activity recogni-tion. In Proceedings of the 23rd national conferenceon Artificial intelligence, pages 1363–1368, 2008.

[9] Zhenhui Li, Bolin Ding, Jiawei Han, RolandKays, and Peter Nye. Mining periodic behaviorsfor moving objects. In Proceedings of the 16th ACMSIGKDD international conference on Knowledge dis-covery and data mining, pages 1099–1108. ACM,2010.

[10] Chuanren Liu, Hui Xiong, Yong Ge, Wei Geng,and Matt Perkins. A stochastic model forcontext-aware anomaly detection in indoor lo-cation traces. In Proceedings of the 12th IEEEInternational Conference on Data Mining, pages449–458. IEEE, 2012.

of abnormal events T̂ = maxt �(r1:t), as well as thedescription of each observation for t = 2, · · · , T̂ :

1) Unlikely transition: If st �= st−1 and− logAst−1st ≥ − logBstrt .

2) Long wait: If st = st−1 and − logAst−1st ≥− logBstrt .

3) Unlikely location: If − logBstrt ≥ − logAst−1st .Intuitively, unlikely transition indicates that the statetransition from st−1 is unlikely to end at st, and thelikelihood of the transition contributes significantly tothe increase of average log-loss �(Tr). The Long waitcase is similar, except that the workflow states areunchanged. Finally, in the case of unlikely location, weobserved the location rt for the state st, however, thelocation rt is unlikely to be emitted by the state st andthe likelihood of the emission contributes more to theincrease of �(Tr), compared to other explanations.

As can be seen in Figure 10, based on the aver-age log-loss, we can effectively identify the abnor-mal traces with detailed explanations. Specifically, wehighlight the explanation for each identified abnormaldevice with different background color in the ‘Status’column of the list. When the user clicks one itemin the list, we show more details in the right panel(e.g., the distribution of different recent causes for theanomaly, and the current location of the device). In theleft panel, we also show the aggregated distribution ofdifferent anomaly causes in the identified list. Such anintuitive and interpretable real-time monitor systemis valuable for the hospital managers to improve thequality of healthcare services.

Fig. 10: The screenshot of HISflow.

14 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 29, NO. 3, MARCH 2017

[11] Chuanren Liu, Yong Ge, Hui Xiong, Keli Xiao,Wei Geng, and Matt Perkins. Proactive workflowmodeling by stochastic processes with applica-tion to healthcare operation and management. InProceedings of the 20th ACM SIGKDD internationalconference on Knowledge discovery and data mining,pages 1593–1602. ACM, 2014.

[12] Siyuan Liu, Yunhuai Liu, Lionel M. Ni, Jian-ping Fan, and Minglu Li. Towards mobility-based clustering. In Proceedings of the 16th ACMSIGKDD international conference on Knowledge dis-covery and data mining, pages 919–928. ACM,2010.

[13] N. Mamoulis, H. Cao, G. Kollios, M. Hadjieleft-heriou, Y. Tao, and D.W. Cheung. Mining, in-dexing, and querying historical spatiotemporaldata. In Proceedings of the tenth ACM SIGKDDinternational conference on Knowledge discovery anddata mining, pages 236–245. ACM, 2004.

[14] Elham Ramezani, Dirk Fahland, and Wil MPvan der Aalst. Where did i misbehave? diagnos-tic information in compliance checking. In Busi-ness Process Management, pages 262–278. Springer,2012.

[15] Sherry X Sun, Qingtian Zeng, and HuaiqingWang. Process-mining-based workflow modelfragmentation for distributed execution. Systems,Man and Cybernetics, Part A: Systems and Humans,IEEE Transactions on, 41(2):294–310, 2011.

[16] Wil Van der Aalst, Ton Weijters, and LauraMaruster. Workflow mining: Discovering processmodels from event logs. Knowledge and Data En-gineering, IEEE Transactions on, 16(9):1128–1142,2004.

[17] Jiong Yang and Meng Hu. Trajpattern: Miningsequential patterns from imprecise trajectories ofmobile objects. In EDBT’06, pages 664–681, 2006.

[18] J. Yin, D. Shen, Q. Yang, and Z.N. Li. Activityrecognition through goal-based segmentation. InProceedings of the National Conference on ArtificialIntelligence, volume 20, page 28. Menlo Park,CA; Cambridge, MA; London; AAAI Press; MITPress; 1999, 2005.

[19] Qingtian Zeng, Sherry X Sun, Hua Duan, CongLiu, and Huaiqing Wang. Cross-organizationalcollaborative workflow mining from a multi-source log. Decision Support Systems, 54(3):1280–1301, 2013.

[20] Y. Zheng, L. Zhang, X. Xie, and W.Y. Ma. Mininginteresting locations and travel sequences fromgps trajectories. In Proceedings of the 18th interna-tional conference on World wide web, pages 791–800.ACM, 2009.

[21] H. Zhu, C. Liu, Y. Ge, H. Xiong, and E. Chen.Popularity modeling for mobile apps: A sequen-tial approach. Cybernetics, IEEE Transactions on,PP(99):1–1, 2014. ISSN 2168-2267. doi: 10.1109/TCYB.2014.2349954.

Chuanren Liu received the Ph.D. degreefrom Rutgers, the State University of NewJersey, USA, the M.S. degree from the Bei-jing University of Aeronautics and Astro-nautics (BUAA), and the B.S. degree fromthe University of Science and Technology ofChina (USTC). His research interests includedata mining and knowledge discovery, andtheir applications in business analytics. Hehas published papers in refereed journalsand conference proceedings, such as Annals

of Operations Research, European Journal of Operational Research,IEEE Transactions on Knowledge and Data Engineering, IEEETransactions on Cybernetics, Knowledge and Information Systems,and ACM SIGKDD, IEEE ICDM, SIAM SDM, etc.

Hui Xiong (SM’07) is currently a Professorand Vice Chair of the Management Scienceand Information Systems Department, andthe Director of Rutgers Center for Informa-tion Assurance at the Rutgers, the StateUniversity of New Jersey. He received theB.E. degree from the University of Scienceand Technology of China (USTC), China, theM.S. degree from the National University ofSingapore (NUS), Singapore, and the Ph.D.degree from the University of Minnesota

(UMN), USA. His general area of research is data and knowledgeengineering, with a focus on developing effective and efficient dataanalysis techniques for emerging data intensive applications. He haspublished prolifically in refereed journals and conference proceed-ings. He is a senior member of the ACM and IEEE.

Spiros Papadimitriou is currently an Assis-tant Professor at the Department of Manage-ment Science & Information Systems at Rut-gers Business School. Previously, he was aresearch scientist at Google, and a researchstaff member at IBM Research. His maininterests are large scale data analysis, timeseries, graphs, and clustering. He has pub-lished more than forty papers on these topicsand has three invited journal publications inbest paper issues, several book chapters and

he has filed multiple patents. He has also given a number of invitedtalks, keynotes, and tutorials. He was a Siebel scholarship recipientin 2005 and received the best paper award in SIAM SDM 2008.

Yong Ge is an Assistant Professor at theUniversity of Arizona. He received his Ph.D.from Rutgers University in 2013, the M.S.degree from the University of Science andTechnology of China in 2008, and the B.E.degree from Xi’an Jiao Tong University in2005. His research interests include datamining and business analytics. He receivedthe ICDM-2011 Best Research Paper Award,Excellence in Academic Research (one perschool) at Rutgers Business School in 2013,

and the Dissertation Fellowship at Rutgers University in 2012. Hehas published prolifically in refereed journals and conference pro-ceedings, such as IEEE TKDE, ACM TOIS, ACM TKDD, ACM TIST,ACM SIGKDD, SIAM SDM, IEEE ICDM, and ACM RecSys.

Keli Xiao received his Ph.D. in Finance fromRutgers Business School in 2013, the Mas-ter’s Degree in Quantitative Finance fromRutgers Business School in 2009, and theMaster’s Degree in Computer Science fromQueens College in the City University of NewYork in 2008. His general area of research infinance is asset pricing, with a concentrationon the studies of financial bubbles and empir-ical asset pricing. He has published severalpapers in refereed journals and conference

proceedings such as ACM Transactions on Knowledge Discoveryfrom Data (TKDD) and ACM SIGKDD International Conference onKnowledge Discovery and Data Mining (KDD).