[IEEE 2009 First Asian Conference on Intelligent Information and Database Systems, ACIIDS - Dong hoi, Quang binh, Vietnam (2009.04.1-2009.04.3)] 2009 First Asian Conference on Intelligent

From Checking Integrity Constraints to Temporal Abstraction for Clinical Databases

Pham Van Chung1 and Duong Tuan Anh2 1 Department of Information Technology, Ho Chi Minh City Industry University

2 Faculty of Computer Science & Engineering, Ho Chi Minh City University of Technology [email protected]

Abstract

Temporal abstraction (TA) methods aim to extract more meaningful data from raw temporal data. The use of temporal abstraction is important for decision support applications which consume abstract concepts, while databases usually contain primitive concepts. In this paper we propose a new approach for TA which has a tight coupling with the temporal integrity constraints checking (TICC) stage. The preceding TICC stage not only ensures the consistency of the raw data stored in the temporal database but also prepares appropriate datasets for the TA stage. The approach has been applied to the clinical database system for monitoring the treatment of patients who have colorectal cancer.

1. Introduction

In a valid-time temporal database, the information that is stored includes temporal attributes stating when the information is valid ([5]). Temporal abstraction (TA) is an approach to provide short, informative, context-sensitive summaries, at various levels of abstraction, of time oriented data stored in a temporal database. Meaningful summaries include abstractions that hold over both time-points and time intervals. The use of TA is important especially for decision-support applications which require more abstract data, while databases usually contain primitive data. The application of TA in clinical information systems which can help the medical doctors in monitoring, diagnosis and treatment of patients has been studied in several research works. Typical works in this direction include the knowledge-based temporal abstraction (KBTA) framework ([12]), methods for context-sensitive and expectation-guided TA ([9]), methods for combining statistical and probability techniques with

TA ([1], [2]) and methods for combining TA and data mining ([7], [8]).

In this paper, we propose a new approach for TA which has a tight coupling with the temporal integrity constraint checking (TICC) stage. The TICC stage not only ensures the consistency of the data stored in the temporal database but also extracts appropriate datasets for the TA stage. This data preparation is very important for TA since the data input for temporal abstraction algorithms should be valid and in a suitable form required by TA task.

The method has been applied in the field of monitoring the treatment of patients who have colorectal cancer at the Ho Chi Minh City Cancer Hospital. Experimental results on real data set show that the performance of our TA system is quite encouraging.

The organization of the paper is as follows. Section 2 explains briefly transition graphs and the temporal integrity constraint checking in temporal databases. Section 3 describes inference graph as the main technique in the implementation of our TA system. Section 4 presents the data preprocessing for temporal abstraction. Experimental results are given briefly in section 5. Conclusions are given in section 6. 2. Checking Temporal Integrity Constraints in Temporal Databases

Actual temporal database system implementations need an integrity constraint checking method to assure the consistency of the database. In order to monitor temporal integrity constraints (TICs) in temporal databases, a method was proposed by Gertz and Lipeck ([6]) to utilize transition graphs which describe the admissible lifecycles of objects and can be constructed from TICs. Inspired by the approach proposed by Gertz et al., we develop some extensions to their framework on how to check some more complicated forms of

2009 First Asian Conference on Intelligent Information and Database Systems

978-0-7695-3580-7/09 $25.00 © 2009 IEEE

DOI 10.1109/ACIIDS.2009.62

121

2009 First Asian Conference on Intelligent Information and Database Systems

978-0-7695-3580-7/09 $25.00 © 2009 IEEE

DOI 10.1109/ACIIDS.2009.62

121

TICs in which repeated states are allowed in the state sequences of objects.

2.1 Transition Graphs

Temporal integrity constraints are used to restrict the possible lifecycles of objects to admissible ones. Assume we have the temporal integrity constraint stating that: “A patient who has a cancer is hospitalized (s0). He may be treated by radiation (s1) or surgery (s2). Some time later, he can go to a new watching state (s3) in which he must be watched in a period of time before moving to the recovering situation (s4)”.

Figure 1. An example of transition graph

This TIC comes from the simplified therapy plan

for treating colorectal cancer. The constraint can be represented as a transition graph given in Fig. 1. The graph consists of five states and s4 is the final state. The label at an edge of the graph represents the condition that must be hold for producing the corresponding state transition. Notice that to make Figure 1 simple, we do not elaborate the conditions on the edges of the transition graph.

A transition graph here is required to be deterministic, i.e. at each node the labels of different outgoing edges must exclude each other. Furthermore, in this work we support some more complicated forms of TICs, which corresponds to transition graphs that allow cycles, i.e. there may be repeated vertices in a state sequence of objects.

A transition graph can be used to analyze a state sequence by searching a corresponding path, whose edge labels successively hold in the states of the sequence. The main advantage of using transition graphs is that we can reduce analysis of state sequences to checks on “state transitions”. Monitoring the TIC can be realized by storing and updating the states of all database objects, and by checking state-dependent transitional conditions.

2.2 Checking Integrity Constraints using Transition Graphs

Due to the limitation of space, in this section, we outline roughly the method for checking TICs by using transition graphs. For more details, interested readers can refer to [10].

The Repeated Vertex in States Sequence of Objects

Consider the transition graph in Figure 1. The paths on this transition graph may consist of many cycles. Checking TICs in this situation is quite challenging. To solve this problem, we propose a technique using a positive integer t called frequency-value. The method is summarized as follows:

- A lifecycle of an object θ begins when it moves from the initial vertex. At that moment t is assigned to a frequency-value 1.

- When object θ reaches to a new vertex, the value of t does not change.

- Object θ moves back to some old vertex (old vertex is the vertex to which the object has moved before), the value of t is incremented by 1.

Checking Temporal Integrity Constraints

Assume that the transition graph has already been stored internally and each of its vertices corresponds to a table in the temporal database. We need a group of small tables that represents a given transition graph internally. These tables are automatically created and their data are derived from the transition graph, when a TIC being input to the system by user. Their structures are explained as follows.

VERTEX table consists of Vname column that keeps names of vertices, and Vpos column that distinguishes vertices as initial, final or ordinary. TRANSITION_STATE table includes current vertex (Curr_state), label of outgoing edge (Label) and ingoing vertex (Trans_state) of the object. The data in the two tables VERTEX and TRANSITION_STATE are as follows. VERTEX(V_ID, Vname, Vpos) TRANSITION_STATE( Cur_State, Label, Trans_State)

Besides, for temporal modifications, we have to maintain another table: OBJECT_POS. This table shows the current vertex (Vertex_To), previously passed vertex (Vertex_From) and frequency-value (Times) of the object under consideration (P_ID). This table must be updated whenever an object is updated. OBJECT_POS (P_ID, Vertex_From, Vertex_To, Times)

Any temporal modifications on objects of the data model may include a valid time interval [vs, ve] where

s0 s1

s2

s3 s4

122122

vs is the start value of valid time and ve is the end value of valid time.

Here an update at a vertex means updating a tuple of the table which keeps the data related to an object of interest at the state represented by that vertex. So, an update here means “an update at a vertex”. Checking TICs is invoked at some transactions such as insert, update, or delete and these modifications apply on one vertex at a time.

Since the transition graph describes admissible lifecycles of objects, any modification on objects can be checked efficiently for admissibility through a searching on the transition graph and checking state-dependent transitional conditions. We have developed the procedure for checking TIC. For more details about this procedure, readers can refer to [10].

3. Temporal Abstraction using Inference Graphs 3.1 Background

Basing on the KBTA framework developed by Shahar ([12]), we introduce briefly in this section the basic TA methods as well as the knowledge required in the TA process. The KBTA framework decomposes the TA task into five parallel subtasks: temporal context restriction, vertical temporal inference, horizontal temporal inference, temporal interpolation and temporal pattern matching. For each subtask, there is a corresponding TA method or mechanism. Some of the above-mentioned TA tasks are explained as follows.

Vertical temporal inference. This method involves creation of abstractions by inference from contemporaneous propositions into higher-level concepts. For example, two (primitive) parameter intervals with value ‘low’ in [1, n] and with value ‘high’ in [m, k] are abstracted into one parameter interval with the value ‘grade 2’ in [m, n] (where [m, n] is the intersection of [m, k] and [1, n]).

Temporal interpolation. This method involves creation of longer parameter intervals by bridging gaps between similar-type disjoint points or interval-based propositions. This method can bridge the gap only if the gap has an acceptable length.

In our application, we apply TA mechanisms that extract trends (increase, decrease or stationary patterns) or states (e.g. low, normal, high values) from a temporal clinical database.

3.2 Knowledge Base for Temporal Abstraction

Knowledge required by TA consists of the rules

which specify how data objects change from one state

to another and the inference rules which are used in temporal abstraction process. The first group of rules are called state transition rules. The second group of rules are called TA inference rules.

● State transition rules. Rules of this kind have the form of if..then clauses. For example, the rule IF (CEA level < 5) and (the total time of watching ≥ 5 years) THEN “recover” is the rule that enables the cancer patients to move to “recover” state.

● TA inference rules. These rules are specified using the syntax of the Temporal Abstraction Rule language TAR proposed by Boaz et al ([3], [4]). The TAR rule looks like a rule in deductive database with some differences.

3.3 Inference Graph

From a transition graph, we can build an inference graph by attaching some TA inference rules at each vertex of the transition graph. The inference graph for a TA application has the following characteristics.

The inference graph contains sufficient (and not redundant) knowledge required by the TA inference process. It helps to simplify the TA inference since the rules and data were distributed on the vertices of the inference graph and the knowledge as well as data at a particular vertex is sufficient for the temporal abstraction process at that vertex. Data at any vertex in the inference graph are valid and consistent at the time interval under consideration due to the preceding TICC stage. The termination of the TA inference process can be ensured since the inference graph is deterministic and the set of rules attached to the inference graph is a well-defined knowledge base. The finiteness of a well-defined knowledge base written in TAR language is proved in ([3], [4]). Inference on the inference graph follows an explicit path which is a state transition sequence related to a given object. During inference process, at the vertex where the object will not change state any more, the inference terminates and does not have to scan all the remaining vertices. Inference graph can be viewed as a knowledge base which is represented in a form of a deterministic graph.

Knowledge required by TA consists of state transition rules and TA inference rules. The TA inference rules aim to represent any of five temporal abstraction mechanisms in the KBTA system.

3.4 Temporal Abstraction

Let V= {s1, s2,.., sn} be the set of vertices in the inference graph and s1, sn be respectively the initial vertex and the final vertex of the graph. In our approach, during data collection or update, temporal integrity constraint checking (TICC) task also

123123

decomposes the database table in original form into several small tables in such a way that each small table stores only the time-stamped data related to each vertex in the inference graph. After this data preprocessing has been done, the TA task related to a given object Ob during the time interval from ti to tj needs to perform two steps: data retrieval step and temporal abstraction step. Data Retrieval Step. This step retrieves the data related to the object Ob during the time interval from ti to tj on the inference graph. This data retrieval step traverses from the initial vertex in the graph to gather all the data of Ob at the vertices along its state sequence to eventually the vertex at which the object terminates its state transitions. All the data collected in this step are stored in a temporary table T. The procedure data_retrieve is as follows:

procedure data_retrieve (T: table) /* [v_begin, v_end ] is ValidTime of Ob */ begin if Ob is not in state s1 or min((v_begin) in s1) > ti then Ob doesn’t have any data else for each state si in the interval v do if Ob is in si and [v_begin, v_end] ⊆ [ti, tj] then copy Ob’s data at si into table T; end;

Temporal Abstraction Step. Data collected in the data retrieval step is kept in ascending order of valid times. Then the TA inference rules attached at each vertex in the inference graph are applied to perform TA mechanisms on the collected data in the temporary table T. For each rule, this step performs reasoning on all the relevant parameters. All the abstracted facts are stored in the temporary T and the answer of the TA query is composed from these generated facts. This step performs the following data_inference procedure.

procedure data_inference (T : table) begin for each rule r attached at each vertex do inference_rule (r, T) ; for each multiple-state-inference rules r do inference_rule (r, T); end; procedure inference_rule (r: inference rule, T: table) begin sort the table T in ascending order of valid-time; for each parameter mi in T do assign a qualitative value (e.g. low, high, . . ., or normal ) to parameter mi;

read the first record in T and let it be current record;

while not EOF(T) do if r is applicable to current record & next record

then mark the current record ;

update the result of the inference into the next record

else read next record; endif; endwhile; erase the marked records; endfor; end; 4. Temporal Data Preprocessing for Temporal Abstraction

Since the TA step is based on inference graph, it

requires that the input data must be in an appropriate form.

For data pre-processing before TA, we propose an algorithm that can decompose underlying temporal data into several small tables in such a way that each small table stores only the time-stamped data related to each vertex in the inference graph. Assume that there is only one TIC in the temporal database. The transition graph has n +1 vertices, with the initial state s0, the final state sn, and intermediate states s1, s2, s3 , . . , sn-1. The algorithm for decomposing temporal data works on the transition graph which is used in the TICC stage for every object. Therefore, it can find errors of temporal data when they violate some constraints, then, errors can be repaired. The outline of the algorithm is given as follows.

4.1- Algorithm ( Decomposing temporal data) Input: Table E, a temporal integrity constraint is represented by transition graph TS. Output: The new tables are decomposed from E.

1- Create table Temp is similar E but at first Temp is empty.

2- Create table θ which contains all the objects in E, each object in θ is unique.

3- With each state si of transition graph, create a new table which contains the objects’ data at si.

4- For each object of θ, do the following steps 4a- Take out all object’s data from E, sort by

valid-time, insert them into table Temp. 4b- Check object’s data in Temp if they can

satisfy a state sequence in TS - if not, they can be repaired (or remove it from E) before go to step 4c.

124124

4c- Take out all object’s data of state si from Temp, and insert them into new table si.

4.2- Example

Input: Given transition graph TS as Figure 1 and the underlying temporal table E as follows. Table E: The underlying temporal table

P_ID Treatment V_begin V_end . . . P001 treat-s0 10/1/2002 15/1/2002 … P001 treat-s1 16/1/2002 25/1/2002 … P005 treat-s0 18/1/2002 20/1/2002 … P005 treat-s2 24/1/2002 5/2/2002 … P001 treat-s3 26/1/2002 5/2/2002 … P001 treat-s3 7/2/2002 8/3/2002 … P005 treat-s3 10/2/2002 21/2/2002 … P005 treat-s3 21/2/2002 10/4/2002 … P001 treat-s4 15/3/2002 Now … P005 treat-s3 11/4/2002 3/6/2002 … P005 treat-s0 10/6/2002 13/6/2002 … P005 treat-s1 14/6/2002 20/6/2002 … P005 treat-s3 22/6/2002 Now …

The algorithm 4.1 generates the following Temp table. Table Temp: The temporary table used in the algorithm 4.1. P_ID Treatment V_begin V_end . . . P001 treat-s0 10/1/2002 15/1/2002 P001 treat-s1 16/1/2002 25/1/2002 P001 treat-s3 26/1/2002 5/2/2002 … P001 treat-s3 7/2/2002 8/3/2002 … P001 treat-s4 15/3/2002 Now … P005 treat-s0 18/1/2002 20/1/2002 … P005 treat-s2 24/1/2002 5/2/2002 … P005 treat-s3 10/2/2002 21/2/2002 … P005 treat-s3 21/2/2002 10/4/2002 … P005 treat-s3 11/4/2002 3/6/2002 … P005 treat-s0 10/6/2002 13/6/2002 … P005 treat-s1 14/6/2002 20/6/2002 … P005 treat-s3 22/6/2002 Now …

We can see that table θ contains two objects P001 and P005.

OutPut: The algorithm 4.1 generates five new tables S0, S1, S2, S3, and S4 which are corresponds to the five states in the transition graph TS.

Table s0

P_ID Treatment V_begin V_end . . . P001 treat-s0 10/1/2002 15/1/2002 … P005 treat-s0 18/1/2002 20/1/2002 … P005 treat-s0 10/6/2001 13/6/2001 …

Table s1 P_ID Treatment V_begin V_end . . . P001 treat-s1 16/1/2002 25/1/2002 … P005 treat-s1 14/6/2002 20/6/2002 …

Table s2 P_ID Treatment V_begin V_end . . . P005 treat-s2 24/1/2002 5/2/2002 …

Table s3 P_ID Treatment V_begin V_end . . . . P001 treat-s3 26/1/2002 5/2/2002 … P001 treat-s3 7/2/2002 8/3/2002 … P005 treat-s3 10/2/2002 21/2/2002 … P005 treat-s3 21/2/2002 10/4/2002 … P005 treat-s3 11/4/2002 3/6/2002 … P005 treat-s3 22/6/2002 Now …

Table s4 P_ID Treatment V_begin V_end . . . P001 treat-s4 15/3/2002 Now … In the above tables, the attribute treatment taking

the value treat-si indicates the treatment treat-si is applied to a patient if he or she is in the state si.

Let k be the number of objects in the object table θ and n be the number of records in underlying temporal table E. The time complexity of the Algorithm 4.1 is T(n) = O(k*n). In the context of clinical databases, k is quite small in comparison to n, the complexity of the algorithm is almost linear.

5. Experiments

We have implemented a KBTA system for

monitoring the treatment of patients who have colorectal cancer. The patient data consist of 1597 records related to 73 patients. The data have been gathered in five contiguous years from 1994 to 2001 in Ho Chi Minh City Cancer Hospital. The original data has been decomposed into 11 small tables with average 147 records in each small table. The set of inference rules for this TA application consists of about 36 rules.

The user can input a TA query related to a given patient. If the query is syntactically correct, the TA system will provide the TA result which shows the concise summarized findings related to that particular patient. This abstracted information is very helpful to the physicians for decision making and to the patient as well. The TA system integrates into a temporal query system and an integrity constraint checking module which are all built on top of a temporal clinical database implemented in an Oracle DBMS. The whole system, called TDM, has been described in more details in [11].

Experimental results on real data from a clinical database in Ho Chi Minh City Cancer Hospital show that the performance of our TA system is quite encouraging. One TA query processing on the synthetic data set (with 100000 records) takes about 9 seconds on average with a Pentium IV 2.4 GHz and 256 MB RAM. 6. Conclusions

125125

The paper has outlined a new approach for TA which has a tight coupling with the temporal integrity checking stage. The TICC stage not only ensures the consistency of the data stored in the temporal database but also extracts appropriate datasets for the TA stage. This data preparation step is very important for TA since the data input for temporal abstraction algorithms should be valid and in a suitable form required by TA task. The TA approach in our work differs from related TA approaches mainly in two features: (1) an advantageous data preprocessing step which has been done by a preceding integrity constraint checking task and (2) the domain-specific and deterministic properties of the inference graph which help to make TA process efficient. In a direction for future work, we plan to extend our framework so that the TICC module can check more than one underlying TICs applying on the temporal database. 7. References [1] R. Bellazzi, C. Larizza, A. Riva, “Temporal abstraction for pre-processing and interpreting diabetes monitoring time series”, Proc. of Workshop on Intelligent data analysis in Medicine and Pharmacology (IDAMAP 97), 1997, pp 1-9. [2] R. Bellazzi, C. Larizza, P. Magni, S. Montani, M. Stefanelli, “Intelligent analysis of clinical time series: an application in diabetes mellitus domain”, Artificial Intelligence in Medicine, 20(1), 2000, pp. 35-57. [3] D. Boaz, M. Balaban, and Y. Shahar, “ A Temporal-Abstraction Rule Language for Medical Databases”, Proceedings of the Workshop on Intelligent Data Analysis in Medicine and Pharmacology (IDAMAP) '03, Protaras, Cyprus, 2003, pp. 67-73.

[4] D. Boaz and Y. Shahar, “A Framework for Distributed Mediation of Temporal-Abstraction Queries to Clinical Databases”, Artificial Intelligence in Medicine, 34(1), 2005, pp. 3-24. [5] C. J. Date, H. Darwen, N.A. Lorentzos, “Temporal Data and Relational Model”, Morgan Kaufmann Publisher, 2003. [6] M. Gertz, U. W. Lipeck, “Temporal Integrity Constraints in Temporal Databases”, Proceedings of the International Workshop on Temporal Databases, J. Clifford, A. Tuzhilin (Eds.), Sep.1995, Workshops in Computing, Springer-Verlag, Berlin, pp. 77-92. [7] T. B. Ho, D. D. Nguyen, S. Kawasaki, T. D. Nguyen, “Extracting Knowledge from Hepatitis Data with Temporal Abstraction”, IEEE Conference on Data Mining, Workshop on Active Mining, Maebashi, Dec. 9-12, 2002, pp. 91-96. [8] T. B. Ho, S. Kawasaki, S.Q. Le, T.N. Tran, K. Takabayashi, H. Yokoi, “Combining Temporal Abstraction and Data Mining to Study Hepatitis”, ECML/PKDD 2004 Workshop on Discovery Challenge, Pisa 20-24 Sep. 2004, pp. 155-167. [9] W. Horn, S. Miksch, G. Egghart, C. Popow, F. Paky, “Effective Data Validation of High-frequency Data: Time-Point, Time-Interval, and Trend-Base Methods”, Computers in Biology and Medicine, 27(5), 1997. [10] V. C. Pham, “Checking Temporal Integrity Constraints and Temporal Abstraction in Temporal Clinical Databases”, Ph. D. Dissertation, Faculty of Information Technology, HCM City University of Technology, March 2008. [11] V.C. Pham, D. T. Anh, “Applying Temporal Abstraction in Clinical Databases”, Proc. of IEEE Int. Conf. on Research, Innovation and Vision for the Future Information & Communication Technologies (RIVF’2007), Mar. 05-09, Hanoi, Vietnam, 2007, 192-199. [12] Y. Shahar, “A framework for knowledge-base temporal abstraction”, Technical Report, Section on Medical Informatics, Stanford University School of Medicine, 1995.

126126

Documents

[IEEE 2009 First Asian Conference on Intelligent Information and Database Systems, ACIIDS - Dong hoi, Quang binh, Vietnam (2009.04.1-2009.04.3)] 2009 First Asian Conference on Intelligent