33
A conceptual model for temporal data warehouses and its transformation to the ER and the object-relational models q E. Malinowski a, * , E. Zima ´nyi b a Department of Computer & Information Science, Universidad de Costa Rica, San Jose ´, Costa Rica b Department of Computer & Decision Engineering (CoDE), Universite ´ Libre de Bruxelles, 1050 Brussels, Belgium Received 22 June 2007; accepted 22 June 2007 Available online 7 September 2007 Abstract The MultiDim model is a conceptual multidimensional model for data warehouse and OLAP applications. These appli- cations require the presence of a time dimension to track changes in measure values. However, the time dimension cannot be used to represent changes in other dimensions. In this paper we introduce a temporal extension of the MultiDim model. This extension is based on research realized in temporal databases. We allow different temporality types: valid time, transaction time, and lifespan, which are obtained from source systems, and loading time, which is generated in the data warehouse. Our model provides temporal support for levels, attributes, hierarchies, and measures. For hierarchies we discuss different cases depending on whether the changes in levels or in the relationships between them must be kept. For measures, we give different scenarios that show the usefulness of the different temporality types. Further, since measures can be aggregated before being inserted into data warehouses, we discuss the issues related to different time granularities between source systems and data warehouses. We finish the paper presenting a transformation of the MultiDim model into the entity-relationship and the object-relational models. Ó 2007 Elsevier B.V. All rights reserved. Keywords: Temporal data warehouses; Temporal multidimensional model; Conceptual multidimensional model; Logical design; Data modeling 1. Introduction Decision-making users increasingly rely on data warehouses to access historical data for supporting the strategic decisions of organizations. A data warehouse is ‘‘a collection of subject-oriented, integrated, non-vol- atile, and time-variant data to support management’s decisions’’ [25]. Subject orientation means that the development of data warehouses is done according to the analytical necessities of managers at different levels 0169-023X/$ - see front matter Ó 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.datak.2007.06.020 q The work of E. Malinowski was funded by the Cooperation Department of the Universite ´ Libre de Bruxelles. * Corresponding author. E-mail addresses: [email protected] (E. Malinowski), [email protected] (E. Zima ´nyi). Available online at www.sciencedirect.com Data & Knowledge Engineering 64 (2008) 101–133 www.elsevier.com/locate/datak

A Conceptual Model for Temporal Data Warehouse

Embed Size (px)

Citation preview

Page 1: A Conceptual Model for Temporal Data Warehouse

Available online at www.sciencedirect.com

Data & Knowledge Engineering 64 (2008) 101–133

www.elsevier.com/locate/datak

A conceptual model for temporal data warehousesand its transformation to the ER and

the object-relational models q

E. Malinowski a,*, E. Zimanyi b

a Department of Computer & Information Science, Universidad de Costa Rica, San Jose, Costa Ricab Department of Computer & Decision Engineering (CoDE), Universite Libre de Bruxelles, 1050 Brussels, Belgium

Received 22 June 2007; accepted 22 June 2007Available online 7 September 2007

Abstract

The MultiDim model is a conceptual multidimensional model for data warehouse and OLAP applications. These appli-cations require the presence of a time dimension to track changes in measure values. However, the time dimension cannotbe used to represent changes in other dimensions.

In this paper we introduce a temporal extension of the MultiDim model. This extension is based on research realized intemporal databases. We allow different temporality types: valid time, transaction time, and lifespan, which are obtainedfrom source systems, and loading time, which is generated in the data warehouse. Our model provides temporal supportfor levels, attributes, hierarchies, and measures. For hierarchies we discuss different cases depending on whether the changesin levels or in the relationships between them must be kept. For measures, we give different scenarios that show the usefulnessof the different temporality types. Further, since measures can be aggregated before being inserted into data warehouses, wediscuss the issues related to different time granularities between source systems and data warehouses. We finish the paperpresenting a transformation of the MultiDim model into the entity-relationship and the object-relational models.� 2007 Elsevier B.V. All rights reserved.

Keywords: Temporal data warehouses; Temporal multidimensional model; Conceptual multidimensional model; Logical design; Datamodeling

1. Introduction

Decision-making users increasingly rely on data warehouses to access historical data for supporting thestrategic decisions of organizations. A data warehouse is ‘‘a collection of subject-oriented, integrated, non-vol-atile, and time-variant data to support management’s decisions’’ [25]. Subject orientation means that thedevelopment of data warehouses is done according to the analytical necessities of managers at different levels

0169-023X/$ - see front matter � 2007 Elsevier B.V. All rights reserved.

doi:10.1016/j.datak.2007.06.020

q The work of E. Malinowski was funded by the Cooperation Department of the Universite Libre de Bruxelles.* Corresponding author.

E-mail addresses: [email protected] (E. Malinowski), [email protected] (E. Zimanyi).

Page 2: A Conceptual Model for Temporal Data Warehouse

102 E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133

of the decision-making process. Integration represents the complex effort to join data from different opera-tional and external systems. Non-volatility ensures data durability while time-variation indicates the possibil-ity to keep different values of the same information according to its changes in time. Therefore, the last twofeatures indicate that data warehouses should allow changes to data values without overwriting existingvalues.

The structure of a data warehouse is usually represented at a logical level using a star or snowflake schema.These schemas provide a multidimensional view of data where measures (e.g., quantity of products sold) areanalyzed from different perspectives or dimensions (e.g., by product) and at different levels of detail with thehelp of hierarchies. On-line analytical processing (OLAP) systems allow users to perform automatic aggrega-tions of measures while traversing hierarchies: the roll-up operation transforms detailed measures into aggre-gated values (e.g., daily into monthly or yearly sales) while the drill-down operation does the contrary.

Current data warehouse and OLAP models include a time dimension that, as the other dimensions, is usedfor grouping purposes (using the roll-up operation). The time dimension also indicates the timeframe for mea-sures, e.g., 100 units of a product were sold in March 2007; however, it cannot be used to keep track ofchanges in other dimensions, e.g., when a product changes its ingredients. Therefore, usual multidimensionalmodels are not symmetric in the way of representing changes for measures and dimensions. Consequently, thefeatures of ‘‘time-variant’’ and ‘‘non-volatility’’ only apply for measures leaving to applications the represen-tation of changes occurring in dimensions.

Since in many cases the changes of dimension data and the time when they have occurred are important foranalysis purposes, in [29] are proposed several implementation solutions for this problem in the context ofrelational databases, the so-called slowly changing dimensions. Nevertheless, these solutions are not satisfac-tory since either they do not preserve the entire history of data or are difficult to implement. Further, theydo not consider the research realized in the field of temporal databases.

Temporal databases have been extensively investigated over the last decades (e.g., [53]). They provide struc-tures and mechanisms for representing and managing information that vary over time. Two different tempo-rality types.1 are usually considered: valid time (VT) and transaction time (TT) that allow representing,respectively, when the data is true in the modeled reality and when it is current in the database. If both tem-porality types are used, they define bitemporal time (BT). In addition, the lifespan (LS) is used to recordchanges in time for an object as a whole.

These temporality types are used for representing either events, i.e., something that happens at a particulartime point, or states, i.e., something that has extent over time. For the former an instant is used, i.e., a timepoint on an underlying time axis; the specific value of an instant is called timestamp. An instant may haveassigned a particular value now [64] indicating current time. An instant is defined according to a non-decom-posable time unit called granule, and its size is called granularity. States are represented by an interval or period

indicating the time between two instants using, respectively, non-anchored (e.g., 2 weeks) or anchored lengthsof time (e.g., [02/11/2004, 05/01/2005]). Sets of instants and sets of intervals can also be used for representingevents and states.

Temporal data warehouses join the research achievements of temporal databases and data warehouses inorder to manage time-varying multidimensional data. Temporal data warehouses raise many issues includingconsistent aggregation in presence of time-varying data, temporal queries, storage methods, temporal viewmaterialization, etc. Nevertheless, very little attention from the research community has been drawn to con-ceptual and logical modeling for temporal data warehouses and to the analysis of which temporal supportshould be included in temporal data warehouses.

In this paper, we propose a temporal extension for the MultiDim model [33], a conceptual model used forrepresenting data requirements of data warehouse and OLAP applications. We refer to different temporalitytypes supported by the model, i.e., valid time, transaction time, lifespan, and loading time. Then, we presentthe inclusion of temporal support in different elements of the model, i.e., in levels, hierarchies, and measures.For levels, we discuss temporal support for attributes and for a level as a whole. For hierarchies, we present

1 Usually called temporal dimensions; however, we use the term dimension in the multidimensional context.

Page 3: A Conceptual Model for Temporal Data Warehouse

E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133 103

different cases considering whether temporal changes to levels, to the links between them, or to both levels andlinks are important to be kept.

Since source systems and data warehouses may have different time granularities, (e.g., source data may beintroduced on a daily basis yet data warehouse data is aggregated by month), we consider two different situ-ations: when measures are not aggregated before loading them into a temporal data warehouse and when theseaggregations are realized. For the former, by means of real-world examples we show the usefulness of havingdifferent temporality types. For the latter, we discuss issues related to different time and data granularities andpropose the inclusion of temporality types meaningful for aggregated measures.

Finally, we present a mapping of the conceptual model for time-varying multidimensional data into a clas-sical (i.e., non-temporal) entity-relationship (ER) and an object-relational (OR) models. In this paper, we donot consider operations in temporal data warehouses. There are not easy to cope with since (1) different timegranularities between dimension data and measures should be considered, and (2) as demonstrated by, e.g.,[17], solutions for managing different schema versions should also be included that currently does not formpart of our model.

Parts of this paper have been already presented in [32,34,35]. However, this paper not only collects sparseinformation in an unified manner but also refers to several new aspects. We extend our model by includingrole-playing dimensions, and different types of measures. We provide a mapping of our model into the ERmodel, thus allowing designers to transform MultiDim schemas into ER schemas. Afterwards, based onwell-known rules, e.g., [18], they are able to represent the ER schemas in different logical and physical modelsaccording to the target implementation platform. We also provide additional implementation solutions fortemporal hierarchies, include the OR representation for temporal levels and temporal hierarchies, and presentexamples of implementation using a commercial DBMS (Oracle 10g). We also mention different physicalaspects that should be considered during the implementation of temporal data warehouses.

This paper is organized as follows: In Section 2 we present the definition of the MultiDim model. Section 3refers to different temporality types supported by the model and presents a general overview of the proposedtemporal extension. Section 4 refers to temporal support for levels, hierarchies, and measures. Section 5 pro-vides the mapping of the constructs of the MultiDim model to the ER and OR models. Section 6 surveysworks related to temporal data warehouses. The conclusions are given in Section 7.

2. Overview of the multidim model

It has been acknowledged for several decades that conceptual models are essential for designing applica-tions. In particular, conceptual models allow describing the requirements of an application in terms thatare as close as possible to users’ perception. Thus, they facilitate the communication between users and design-ers since they do not require the knowledge of technical features of the underlying implementation platform.

We proposed in [33] the MultiDim model – a conceptual multidimensional model for data warehouse andOLAP applications. The MultiDim model uses graphical notations similar to those of the entity-relationship(ER) model; they are shown in Fig. 1. In order to explain the different elements of the model, we will use theexample shown in Fig. 2, which illustrates the conceptual schema of a Sales data warehouse.

A schema is composed of set of levels organized into dimensions as well as a set of fact relationships.A level corresponds to an entity type in the ER model. It describes a set of real-world concepts that, from

the application’s perspective, have similar characteristics. For example, Product, Category, and Departmentare some of the levels of Fig. 2. Instances of a level are called members. As shown in Fig. 1a, a level has aset of attributes describing the characteristics of their members. In addition, a level has one or several keys(underlined in Fig. 1), identifying uniquely the members of a level, each key being composed of one or severalattributes. Each attribute of a level has a type, i.e., a domain for its values. Typical value domains are integer,real, or string. For brevity, in the graphical representation of our conceptual schemas we do not include typeinformation for attributes. This can be done if necessary, and it is included in the textual representation of ourmodel.

A fact relationship (Fig. 1c) expresses the focus of analysis and represents an n-ary relationship betweenlevels. For example, the fact relationship between the Product, Time, Client, and Store levels in Fig. 2 is usedfor analyzing sales figures. Instances of a fact relationship are called facts. Since the cardinality of every level

Page 4: A Conceptual Model for Temporal Data Warehouse

Fig. 1. Notations of the MultiDim model: (a) level, (b) hierarchy, (c) fact relationship with associated measures and levels, (d) measurestypes, (e) cardinalities, (f) distributing factor, (g) exclusive relationships, and (h) analysis criterion.

Fig. 2. A conceptual multidimensional schema of a Sales data warehouse.

104 E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133

participating in a fact relationship is (0,n), we omit such cardinalities to simplify the model. Further, as shownin Fig. 1c, the same level can participate several times in a fact relationship playing different roles. Each role isidentified by a name and is represented by a separate link between the level and the fact relationship, as can beseen for the roles Payment date and Order date relating the Time level to the Sales fact relationship.

Page 5: A Conceptual Model for Temporal Data Warehouse

E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133 105

A fact relationship may contain attributes commonly called measures. They contain data (usually numer-ical) that are analyzed using the different perspectives represented by the dimensions. For example, the Salesfact relationship in Fig. 2 includes the measures Quantity, Price, and Amount. Key attributes of the levelsinvolved in a fact relationship indicate the granularity of measures, i.e., the level of detail at which measuresare represented.

Measures can be classified as additive, semi-additive, or non-additive [29,31]. As shown in Fig. 1d, we sup-pose by default that measures are additive, i.e., they can be summarized along all dimensions. For semi-addi-tive and non-additive measures we include the symbols +! and F[x], respectively, next to the measure’s name.Further, both measures and level attributes can be derived, i.e., calculated based on other measures or attri-butes. We use the symbol/for indicating derived attributes and measures.

A dimension is an abstract concept grouping data that shares a common semantic meaning within thedomain being modeled. A dimension is composed of one level or one or more hierarchies. Hierarchies are usedfor establishing meaningful aggregation paths. A hierarchy comprises several related levels, e.g., the Product,Category, and Department levels. Given two related levels, the lower level is called child, the higher level iscalled parent, and the relationship between them is called child–parent relationship. Since these relationshipsare only used for traversing from one level to the next one, they are simply represented with a line to simplifythe notation.

Child–parent relationships are characterized by cardinalities, shown in Fig. 1e, indicating the minimum andthe maximum number of members in one level that can be related to a member in another level. For example,in Fig. 2 the child level Product is related to the parent level Category with a many-to-one cardinality, whichmeans that every product belongs to only one category and that each category can have many products.

The levels in a hierarchy allow analyzing data at different granularities, i.e., at different levels of detail. Forexample, the Product level contains specific information about products while the Category level allows con-sidering these products from a more general perspective of the categories to which they belong. The level in ahierarchy that contains the most detailed data is called leaf level; it must be the same for all hierarchiesincluded in a dimension. The leaf level name is used for defining the dimension’s name. The last level in a hier-archy representing the most general data is called the root level. If several hierarchies are included in a dimen-sion, their root levels may be different. For example, both hierarchies in Product dimension in Fig. 2 comprisethe same leaf level Product, while they have different root levels, the Department and the Distributor levels.

In some works the root of a hierarchy is represented using a level called All. We leave to designers the deci-sion of including it in multidimensional schemas. In this paper we do not present the All level for the differenthierarchies since we consider that it is meaningless in conceptual schemas and in addition it adds unnecessarycomplexity to them.

Key attributes of a parent level define how child members are grouped. For example, in Fig. 2 theDepartment name in the Department level is a key attribute; it is used for grouping different categorymembers during the roll-up operation from the Category to the Department levels. However, in the caseof many-to-many child–parent relationships it is necessary to determine how to distribute the measuresfrom child to parent members. For example, in Fig. 2 the relationship between Product and Categoryis many-to-many, i.e., the same product can be included in several categories. The notation in Fig. 1fis used indicating that a distributing factor is used to allocate the measures associated to a product amongall its categories.

Moreover, it is sometimes the case that two or more child–parent relationships are exclusive. This is rep-resented using the symbol of Fig. 1g. An example is given in Fig. 2, where clients can be either persons or orga-nizations. Thus, according to their type, clients participate in only one of the relationships departing from theClient level: persons are related to the Profession level, while organizations are related to the Sector level.

The hierarchies in a dimension may express different conceptual structures used for analysis purposes; theseare differentiated with an analysis criterion (Fig. 1h). For example, the Product dimension in Fig. 2 includestwo hierarchies: Product groups and Distribution. The former hierarchy comprises the levels Product, Category,and Department, while the latter hierarchy includes the levels Product and Distributor.

Single-level dimensions, such as Time and Store in Fig. 2, indicate that even though these levels containattributes that may form a hierarchy, such as City name and State name in the Store dimension, the user isnot interested in using them for aggregation purposes.

Page 6: A Conceptual Model for Temporal Data Warehouse

106 E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133

As can be seen in Fig. 2, our model allows users to clearly indicate their analysis requirements related toboth the focus of analysis and the summarization levels described by the hierarchies. It also preserves the char-acteristics of logical schemas, i.e., star or snowflake schemas, providing at the same time a more abstract con-ceptual representation. Our model allows distinguishing different kinds of hierarchies existing in real-worldapplications [33], which is not the case when they are represented in traditional logical schemas.

3. General description of the temporally extended MultiDim model

In this section we briefly describe the temporal extension of the MultiDim model. First, we present tempo-rality types supported in the model. Then, we show a small application example, which is used to introducesome features of the model. For brevity, and to simplify the understanding of the model, we do not refer to allfeatures of the model presented in Section 2, such as many-to-many and exclusive child–parent relationshipsor role-playing dimensions.

3.1. Temporality types

The MultiDim model allows designers to include valid time, transaction time, and lifespan support. How-ever, these temporality types are not introduced by users (valid time and lifespan) or generated by the tempo-ral data warehouse (transaction time) as is done in temporal databases; on the contrary, they are brought fromsource systems (if they exist). For example, logged systems [26], which register all actions in log files, containtransaction time; they may also include valid time represented in user-defined attributes.

To the best of our knowledge, no other work includes the different types of temporal support proposed in theMultiDim model. However, these temporality types are important in the data warehouse context for several rea-sons. First, having valid time and lifespan support, users can analyze measures taking into account changes indimension data. Second, these temporality types help implementers to develop procedures for correct measureaggregation during roll-up operations in the presence of changes in dimension data [9,17,40,62]. Finally, trans-action time is important for traceability applications, for example, for fraud detection, when the changes to datain operational databases and the time when they occurred are required for investigation processes.

In addition, considering that data in data warehouses is neither modified nor deleted, we do not includetransaction time generated in a data warehouses as is done in most works related to temporal data ware-houses. Instead, we propose the so-called loading time that indicates when data was loaded into a data ware-house. This time can differ from transaction time or valid time from source systems due to the delay betweenthe time when changes have occurred in source systems and the time when these changes are integrated into atemporal data warehouse. Loading time can help users to know since when data was available in a data ware-house for analysis purposes.

In this work we do not refer to different clocks used in source systems and in a data warehouse, such aswhen an international company has a headquarter in one country and receives data form stores located incountries with different time zones. Although this is an important topic, it goes out of scope of this paper sinceit should consider the integration processes between different source systems.

Since the different temporality types can be used for representing events or states, we include in our modeldifferent temporal data types based on [48]. An instant represents a single point in time, an interval (or a period)denotes a set of successive instants between two instants, an instant set is used for representing the same eventoccurring at different time instants, and an interval set can be used to represent the same state in different inter-vals of time.

As for attributes, for brevity we do not include temporal data types in the graphical representation of ourconceptual schemas. This can be done in the textual representation of our model.

3.2. Model description

In this section we shortly present the temporal extension of the MultiDim model based on the exampleshown in Fig. 3. The metamodel of the MultiDim model is included in Section 4.4, while its formal definitioncan be found at http://cs.ulb.ac.be/research/dw/MultiDimFormalization.pdf.

Page 7: A Conceptual Model for Temporal Data Warehouse

E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133 107

Even though most of real-world phenomena vary over time, keeping the history of their evolution may benot necessary for an application. Therefore, determining which data evolve in time depends on applicationrequirements and the availability of temporal support in source systems. The MultiDim model allows usersto determine which historical data they need by including in the schema the symbols of the corresponding tem-porality types. For example, in the schema of Fig. 3 users are not interested in keeping track of changes toclients’ data. Therefore, this dimension does not include any temporal support. On the other hand, changesin measures values and in data related to products and stores are important for the application.

The schema in Fig. 3 includes four temporal levels, i.e., levels for which the application needs to keep thelifespan of their members (noted by the LS symbol next to the level’s name). This support allows users to trackchanges of a member as a whole, e.g., inserting or deleting a product, splitting a category, etc. A level may betemporal independently of the fact that it has temporal attributes. For instance, the Product level in the figurehas two temporal attributes (i.e., Size and Distributor with valid time (VT) support); this indicates that thechanges to these attribute values and the time when they occur will be kept. On the other hand, the Store levelincludes only lifespan support without any temporal attribute.

Child–parent relationships may also include temporal support. For example, in Fig. 3 the LS symbol in therelationship linking the Product and Category levels indicates that the evolution on time of assignments ofproducts to categories will be kept. Temporal support for relationships leads to two interpretations of cardi-nalities. The snapshot cardinality is valid at every time instant whereas the lifespan cardinality is valid over theentire member’s lifespan. The former cardinality is represented using the symbol indicating temporality typenext to the child–parent relationship while the lifespan cardinality includes the LS symbol surrounded by anellipse. In Fig. 3, the snapshot cardinality between Store and Sales district levels is many-to-one while the life-span cardinality is many-to-many. They indicate that a store belongs to only one sales district at every timeinstant but belongs to many sales districts over its lifespan, i.e., its assignment to sales districts may change.

In a temporal multidimensional model it is important to provide a uniform temporal support for the dif-ferent elements of the model, i.e., for levels, hierarchies, and measures. We would like to avoid mixing twodifferent approaches where dimensions include explicit temporal support (as described above) while measuresrequire the presence of the traditional time dimension to keep track of changes. Therefore, considering thatmeasures are attributes of fact relationships, we provide temporal support for them in the same way as is donefor levels’ attributes as can be seen in Fig. 3. In this example, changes in measure values for both measuresQuantity and Amount are represented using valid time.

An important question is thus whether it is necessary to have a time dimension in the schema when includ-ing temporality types for measures. If all attributes of the time dimension can be obtained by applying timemanipulation functions, such as the corresponding week, month, or quarter, this dimension is not requiredanymore. However, in some temporal data warehouse applications this calculation can be very time-consum-ing, or the time dimension contains data that cannot be derived, e.g., events such as promotional seasons.

Fig. 3. An example of conceptual schema for a temporal data warehouse.

Page 8: A Conceptual Model for Temporal Data Warehouse

108 E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133

Thus, the time dimension is included in a schema depending on users’ requirements and the capabilities pro-vided by the underlying DBMS.

4. Temporal extension of the multidim model

In this section we present in detail the temporal extension of the MultiDim model. Although for levels andhierarchies we only give examples using valid time, the results may be straightforwardly generalized for trans-action time.

4.1. Temporal levels

As was said before, changes in a level can occur either for a member as a whole (e.g., inserting or deleting aproduct) or for attribute values (e.g., changing the size of a product). Representing these changes in temporaldata warehouses is important for analysis purposes, e.g., to discover how the exclusion of some products orthe changes to the product’s size influence sales.

Lifespan support is used to keep changes of levels’ members; this is represented by putting the LS symbolnext to the level’s name. Lifespan can be combined with transaction time and loading time which indicate,respectively, when the level member is current in a source system and in a temporal data warehouse.

On the other hand, temporal support to attributes allow keeping changes in their values and the time whenthey have occurred. This is represented by including the symbol of the corresponding temporality type next tothe attribute name. For attributes we allow valid time, transaction time, loading time, or a combination ofthem. We group temporal attributes firstly, to ensure that both kinds of attributes (temporal and non-tempo-ral) can be clearly represented and secondly, to reduce the number of symbols in schemas.

Levels can have lifespan support and can have temporal and non-temporal attributes. For example, theProduct level in Fig. 2 keeps the lifespan of its members; it also keeps one value per attribute for non-temporalattributes (e.g., Name) and the history of value changes for temporal attributes (e.g., Size).

Many existing temporal models impose constraints on temporal attributes and the lifespan of their corre-sponding entity types. A typical constraint is that the valid time of attribute values must be included in thelifespan of their entity. As it is done in [48], in our model we do not impose such constraints a priori. In thisway, different situations can be modeled, e.g., a product that does not belong to a store inventory (it is notincluded in the master file), but it is on sales for defining its acceptance level. For this product, the valid timeof temporal attributes may not be within the product’s lifespan. On the other hand, temporal integrity con-straints may be explicitly defined, if required. using a calculus that includes Allen’s operators [2].

4.2. Temporal hierarchies

Hierarchies in the MultiDim model contain several related levels. Given two related levels in a hierarchy,the levels, the relationship between them, or both may have temporal support. We examine next these differentsituations.

4.2.1. Non-temporal relationships between temporal levels

Levels with temporal support can be associated with non-temporal relationships. An example is given inFig. 4. However, incorrect analysis scenarios may occur if the child–parent relationships change. For example,consider the situation depicted in Fig. 5a where at time t1 product P is assigned to category C, and later on, attime t2 category C ceases to exist. In order to have meaningful roll-up operations, product P must be assignedto another category at instant t2. However, non-temporal relationships indicate that either these relationshipsnever change or if they do, only the last modification is kept. Therefore, if for a product P the relationshipP–C1 replaces a previous version P–C, there is no link that leads from product P to a category that this prod-uct was assigned to before instant t2. Consequently, two incorrect aggregation scenarios may occur: (1) eithermeasures cannot be aggregated before this time instant if category C1 have not existed before instant t2

(Fig. 5b) or (2) incorrect assignment of product P to category C1 will be considered before instant t2 if categoryC1 have existed before this instant.

Page 9: A Conceptual Model for Temporal Data Warehouse

Fig. 4. An example of a non-temporal relationship between temporal levels.

Fig. 5. An example of incorrect analysis scenario: (a) before and (b) after deleting a category.

E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133 109

Therefore, users and designers must be aware of the consequences of having non-temporal child–parentrelationships between temporal levels and to allow these kind of relationships if they do not change over time.

Moreover, incorrect analysis scenario may occur when key attributes include valid time support. Recall thatkey attributes of levels are used for the roll-up and the drill-down operations and their values are displayed forthe users, e.g. Category name. For example, suppose now that valid time support is added to the Category namein the Category level in Fig. 4 and that at time t2 category C is renamed as category C1 as shown in Fig. 5b. Inour model we use time-invariant identifiers for members to represent links between child and parent levels,therefore, product P will always reference the same category member. However, since we have temporal sup-port for key attributes, two names for the category will exist: C before instant t2 and C1 after instant t2. There-fore, in order to display adequate values of key attributes (e.g., category name) for different periods of time,special aggregation procedures must be developed for the roll-up operation.

4.2.2. Temporal relationships between non-temporal levels

Temporal relationships allow keeping track of the evolution of links between child and parent members.This is represented by placing the corresponding temporal symbol, e.g., LS, on the link between hierarchy lev-els as can be seen in Fig. 6. The MultiDim model allows designers to include lifespan, transaction time, loadingtime, or a combination of these temporality types for representing temporal relationships between levels.

Nevertheless, temporal relationships between non-temporal levels can lead to the problem of dangling ref-erences if level members cease to exist. For example, consider the situation depicted in Fig. 7a where anemployee E is assigned to a section S at instant t1, and later on, at instant t2 section S ceases to exist. To ensuremeaningful roll-up operations, the employee E must be assigned to another section at instant t2 (Fig. 7b). Sincelevels are non-temporal there is no more information about the existence of the section S. On the other hand,the relationship is temporal, thus, both assignments will be kept with their corresponding validity interval as

Fig. 6. An example of a temporal relationship between non-temporal levels.

Page 10: A Conceptual Model for Temporal Data Warehouse

Fig. 7. An example of dangling references: (a) before and (b) after deleting a section.

110 E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133

can be seen in Fig. 7b. However, during the roll-up operations before the instant t2, references to the non-exist-ing section S will be made.

Therefore, to avoid dangling references and inconsistency during roll-up and drill-down operations, thetemporal relationships between levels should be allowed only if the level members do not change.

Notice since we use time-invariant identifiers for members to represent links between child and parent lev-els, these links do not change if we modify the key attributes, e.g., change the section name. However, sincelevels are non-temporal, only the last modification is kept. Therefore, during the roll-up operation users canonly display the last value of the key attribute, loosing history of its changes. Thus, it is users and designersdecision whether to keep this history by including temporal support for key attributes.

4.2.3. Temporal relationships between temporal levels

Temporal relationships may link levels having lifespan support and/or temporal attributes. This helps toavoid incorrect analysis scenarios and dangling references as described in Sections 4.2.1 and 4.2.2.

The example of Fig. 8 models a sales company that is in an active development: changes to sales districtsmay occur to improve the organizational structure. The application needs to keep the lifespan of districts inorder to analyze how the organizational changes affect sales. Similarly, new stores may be created or existingones may be closed; thus the lifespan of stores is kept. Finally, the application needs to keep track of the evo-lution of assignments of stores to sales districts.

Some temporal models impose a constraint on temporal relationships between temporal levels indicatingthat the valid time of a relationship instance must be included in the intersection of the valid times of partic-ipating objects. In order to ensure correctness of the roll-up and drill-down operations, in multidimensionalhierarchies it is further required the following:

2 We

Every instant included in the lifespan of a level must be included in the lifespan of some member of thenext related level, i.e., a valid child member must have a valid parent member and vice versa. If this con-dition is not fulfilled, structural changes to hierarchies could occur, e.g., forcing some level members toskip the current parent level.2

Notice that when levels have included valid time support for key attributes, special aggregation proceduresmay be required if these key attributes change their values as was explained in Section 4.2.1.

4.2.4. Snapshot and lifespan cardinalities

Cardinalities in a non-temporal model indicate the number of members in one level that can be related tomember(s) in another level. In our temporal model this cardinality may be considered at every time instant(snapshot cardinality) or over the members’ lifespan (lifespan cardinality). The lifespan cardinality may be dif-ferent from the snapshot cardinality.

In the MultiDim model the snapshot cardinality is by default equal to the lifespan cardinality; however, ifthese cardinalities are different, the lifespan cardinality is represented as an additional line with the LS symbolsurrounded by a ellipse as shown in Fig. 9. In the example the snapshot and lifespan cardinalities for the Workhierarchy are many-to-many indicating that an employee can work in more than one section at the same time

do not consider structural changes to hierarchies since they require schema versioning, which is out of the scope of this paper.

Page 11: A Conceptual Model for Temporal Data Warehouse

Fig. 8. An example of temporal relationships between temporal levels.

Fig. 9. Snapshot and lifespan cardinalities between hierarchy levels.

E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133 111

instant and over his lifespan. On the other hand, the snapshot cardinality for the Affiliation hierarchy is one-to-many, and the lifespan cardinality is many-to-many indicating that at every time instant an employee can beaffiliated to only one section, but over his lifespan he can be affiliated to many sections.

Further, it is necessary to impose a constraint such that the minimum and the maximum values of the life-span cardinality are equal to or greater than the minimum and maximum values of the snapshot cardinalities,respectively.

4.3. Temporal measures

Current multidimensional models only provide valid time support for measures. Nevertheless, as we will seein this section, providing transaction time or loading time support for measures allows expanding the analysispossibilities. We consider two situations (1) when the time granularity of measures are the same in source sys-tems and in a temporal data warehouse (TDW), i.e., measures are not aggregated with respect to time duringthe loading process, and (2) when this granularity is finer in source systems, i.e., measures are aggregated withrespect to time during the loading process. The case when the time granularity of measures in source systems iscoarser than in a temporal data warehouse is meaningless since detailed data cannot be obtained from aggre-gated data without loss of information.

4.3.1. Temporal support for non-aggregated measures

Temporal support in data warehouses depends on both the availability of temporality types in source sys-tems and the kind of required analysis. We present next different situations that refer to these two aspects inorder to show the usefulness of different types of temporal support for measures. For simplicity, we use non-temporal dimensions; the inclusion of temporal dimensions is straightforward.

Case 1. Sources: non-temporal, TDW: LT In real-world situations, sources may be non-temporal or tempo-ral support is implemented in an ad-hoc manner that can be both inefficient and difficult to obtain. Eventhough the sources have temporal support, their integration into the data warehouse can be too costly,e.g., for checking the time consistency between different source systems. Nevertheless, decision-making usersmay require the history of how source data has evolved [63]. Thus, measure values can be timestamped withloading time indicating the time when this data is loaded into the warehouse.

In the example in Fig. 10 users require the history of product inventory considering different suppliers andwarehouses. The LT abbreviation next to the measures indicates that measure values will be timestampedwhen loaded into the temporal data warehouse.

Page 12: A Conceptual Model for Temporal Data Warehouse

Fig. 10. Inclusion of LT for measures.

112 E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133

Case 2. Sources and TDW: VT In some situations source systems can provide valid time and this valid timeis required in a temporal data warehouse. Fig. 11 gives an example used for the analysis of banking transac-tions. Different types of queries can be formulated for this schema. For example, we can analyze clients’behavior related to the time between operations, a maximum or minimum withdrawal, total amount for with-draw operations, total number of transactions during lunch hours, frequency of using a specific ATM, etc.

Case 3. Sources: TT, TDW: VT In this case, users require to know the time when data is valid in realitywhile source systems can only provide the time when data was modified in a source system, i.e., transactiontime. Thus, it is necessary to analyze whether transaction time can be used for approximating valid time. Forexample, if a measure represents clients’ account balance, the valid time for this measure can be calculatedconsidering the transaction time of two consecutive operations.

Nevertheless, transaction time cannot always be used for calculating valid time, since data can be insertedin source systems (registering transaction time) when they are not valid in the modeled reality, e.g., recordingan employee’s previous or future salary. Since in many applications only the user can determine the valid time,it is incorrect to assume that if valid time is not given, the data is considered valid when it is current in sourcesystems [36]. The transformation from transaction time to approximate valid time must be a careful decisionand the designer must make aware decision-making users about the imprecision that this may introduce.

Case 4. Sources: VT, TDW: VT and LT The most common practice is to include valid time in a temporaldata warehouse. However, the addition of loading time for measures can give the information since when thedata has been available for the decision-making process.

Fig. 11. Inclusion of VT for measures.

Page 13: A Conceptual Model for Temporal Data Warehouse

E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133 113

The inclusion of loading time can help to better understand decisions made in the past and to adjust loadingfrequencies. For the example in Fig. 12, suppose that it was decided to increase the inventory of a productbased on the increasing trend of its sales during weeks 10, 11 and 12. However, a sudden decrease of saleswas revealed in the next data warehouse load occurred 8 weeks later. Thus, an additional analysis can be per-formed to understand the causes of these changes in sales behavior. Further, the decision of more frequentloads may be taken.

Case 5. Sources: TT, TDW: TT (LT, VT) When a data warehouse is used for traceability applications (e.g.,for fraud detection), the changes to data and the time when they have occurred should be available. That ispossible if the source systems include transaction time, since in this case past states of a database are kept.

The example given in Fig. 13 is used for an insurance company having as analysis focus the amount ofinsurance payments. If there is suspicion of an internal fraud that modifies the amount of insurance paidto clients, it is necessary to obtain the detailed information indicating when changes in measure values haveoccurred. Notice that including in addition loading time would give the information since when data has beenavailable for the investigation process. Further, the inclusion of valid time would allow to know when the pay-ment was received by client. In many real systems, the combination of both transaction time and valid time,i.e., bitemporal time, will be included.

Case 6. Sources: BT, TDW: BT and LT Data in temporal data warehouses should provide a timely consis-tent representation of information [10]. Since some delay may occur between the time when the data is valid inthe reality, when it is known in the source systems, and when it is stored in the data warehouse, it is sometimesnecessary to include valid time, transaction time as well as loading time.

Fig. 14 shows an example inspired from [10] of the usefulness of having these three temporality types. Inthis example a salary 100 with valid time from month 2 to 5 was stored at month 3 (TT1) in a source system.Afterwards, at month 8 (TT2) a new salary was inserted with value 200 and valid time from the month 6 untilnow. Data was loaded into the temporal data warehouse at time LT1 and LT2 as shown in the figure. Differentvalues of salary can be retrieved depending on which instant of time users want to analyze, e.g., the salary atmonth 1 is unknown, but at month 4 the value 100 is retrieved since this is the last value available in a tem-poral data warehouse even though a new salary is already stored in a source system. For more details andanalysis, readers can refer to [10]; they specify additional conditions to ensure timely correct states during ana-lytical processing.

Fig. 12. An example of the usefulness of having VT and LT.

Fig. 13. An example of a TDW for insurance company.

Page 14: A Conceptual Model for Temporal Data Warehouse

Fig. 14. An example of the usefulness of having VT, TT, and LT.

114 E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133

4.3.2. Temporal support for aggregated measures

As already said in Section 4.3.1, if source systems are non-temporal only loading time can be included formeasures. On the other hand, even if transaction time is provided by source systems, it will not be included in atemporal data warehouse when measures are aggregated. Indeed, the purpose of having transaction time is toanalyze changes occurred to individual data, and transaction time for aggregated data will not give usefulinformation for decision-making users. Therefore, in this section we will only consider measure aggregationwith respect to valid time.

We consider different time granularities between source systems and a temporal data warehouse. We ana-lyze how to match these time granularities and also how to aggregate measures with different granularities.Notice that loading frequencies in temporal data warehouses may be different from the time granularity usedfor measures, e.g., data may be stored using as a granule month but the loading is performed every quarter.We suppose that data can be kept in source systems before loading it into a temporal data warehouse.

4.3.2.1. Mapping between different time granularities. Since measures from source systems can be aggregatedwith respect to time before loading them into temporal data warehouses, an adequate mapping between multi-ple time granularities should be considered. Two mappings may be distinguished: regular and irregular [15].

In regular mappings some conversion constant exists, i.e., one granule is a partitioning of another granule,so if one granule is represented by an integer it can be converted to another one by a simple multiply or dividestrategy. Typical examples are converting between minutes and hours or between days and weeks.

In irregular mappings, granules cannot be converted by a simple multiply or divide strategy, e.g., when con-verting between months and days, since each month is composed by a different number of days. Other exam-ples include granularities that include gaps [5], e.g., business weeks that contain 5 days separated with a 2-daygap. Thus, mapping between different time granules must be specified explicitly. For example, Dyreson [15]requires customized functions with a detailed specification to obtain the desired conversion.

Some mappings between different granularities are not allowed in temporal databases [13,15], e.g., betweenweeks and months since a week can belong to two months. Nevertheless, this situation can be found in datawarehouse applications, e.g., the analysis of employees’ salaries for each month where some employees receivea salary on a weekly basis. We call the mapping of such granularities forced. It requires special handling duringmeasure aggregations, to which we refer in the next section.

4.3.2.2. Aggregation of measures with different granularities. Measure aggregation must be realized taking intoaccount the type of measures. As already said before, three different types of measures can be distinguished[29,31]. Additive measures, e.g., monthly income, can be summarized during different periods of time; forexample, if the time granularity in a temporal data warehouse is quarter, three monthly incomes should beadded before being loaded into a temporal data warehouse. Semi-additive measures, e.g., inventory quantities,cannot be summarized along the time dimension, although then can be summarized along other dimensions.Therefore, it is necessary to determine what kinds of functions can be applied to them, e.g., average. Finally,non-additive measures, e.g., item price, cannot be summarized along any dimension.

In some cases the procedures for measure aggregation could be complex due to the different granularitiesbetween source systems and the data warehouse. A simplifying example is given in Fig. 15 where sources havea month granularity and the data warehouse has a quarter granularity. This example includes different cases:(1) a period of time with a constant salary that overlaps several quarters (salary 20 and 40), (2) a quarter with

Page 15: A Conceptual Model for Temporal Data Warehouse

Fig. 15. An example of coercion function for salary.

E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133 115

different salaries (quarter 2), and (3) a quarter when the salary is not paid during several months (quarter 3).Suppose that a user requires the measure average salary per quarter. For the first quarter, the average value iseasily calculated. For the second quarter, the simple average does not work, thus the weighted mean value maybe given instead. However, for the third quarter, a user should indicate how the value must be specified. In theexample, we opt for giving an undefined value. Nevertheless, if instead of using average salary we use the sum(total salary earned during a quarter), the measure value for the quarter 3 can be defined.

Real-world situations could be more complicated demanding the specification of coercion functions orsemantic assumptions [5,41,57], which include rules of how to calculate values attached to multiple time gran-ularities. In the previous example in Fig. 15, we use a user-defined coercion function [41] stating that if a tem-poral data warehouse granule is not totally covered by the valid time of one or several salaries, the averagesalary is undefined. The idea of coercion functions or semantic assumptions is not new in the temporal data-base community. The proposed solutions are important to consider in the temporal data warehouse contextsince they are needed to develop aggregation procedures.

It should be noted that coercion functions are always required for forced mappings of granularities since afiner time granule can map to more than one coarser time granule, e.g., a week spanning over two months.Therefore, measure values to which a finer granule is attached must be distributed. For example, suppose thata salary is paid on weekly basis and that this measure is stored into a temporal data warehouse at a monthgranularity. If a week belongs to two months, a user may specify that the percentage of salary that is assignedfor a month is obtained from the percentage of the week contained in the month, e.g., 2 days from 7.

4.4. Metamodel of the temporally extended multidim model

In this section, in order to provide a more general description of the temporally extended MultiDim model,we present its metamodel using the UML notation as shown in Fig. 16.

As shown in the figure, a dimension is composed of either one level or one or more hierarchies, while eachhierarchy belongs to only one dimension. A hierarchy contains two or more related levels that can be sharedbetween different hierarchies. A criterion name identifies a hierarchy and each level has a unique name. Theleaf level name is used for deriving a dimension’s name: this is represented by the derived attribute Name inDimension. Levels include attributes, some of which are key attributes used for aggregation purposes whileothers are descriptive attributes. Attributes have a name and a type, which must be a data type, i.e., integer,real, string, etc. Further, an attribute may be derived.

Temporal support for levels and for attributes is captured by the TempSup multivalued attribute of typeTemp. The latter is a composite attribute with two components. The first one (tempType) is of an enumeratedtype containing the literals VT, TT, BT, LS, LTS, and LT that are used for representing different kinds oftemporal support, respectively, valid time, transaction time, bitemporal time, lifespan, lifespan with transac-tion time, and loading time. The second component (tdType) is of an enumerated type containing differenttemporal data types defined in the model, i.e., instant, interval, set of instants, and set of intervals.

The levels forming hierarchies are related through the Connects association class. These relationships mayalso be temporal independently of whether the levels are temporal or not. This is indicated by the attributeTempSup of the Connects association class. Additionally, the relationship between two levels is characterizedby snapshot and lifespan cardinalities. Both kinds of cardinalities include their minimum and maximum values

Page 16: A Conceptual Model for Temporal Data Warehouse

Fig. 16. Metamodel of the temporally extended MultiDim model.

116 E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133

expressed in the child and in the parent roles. The Connects association class includes also the distributing fac-tor, if any. A constraint not shown in the diagram specifies that a child–parent relationship has a distributingfactor only if the maximum cardinalities of the child and the parent are equal to many.

A fact relationship represents an n-ary association between leaf levels with n > 1. Since leaf levels can playdifferent roles in this association, the role name is included in the Related association class A fact relationshipmay contain attributes, which are commonly called measures. They are temporal and they may be additive,semi-additive, or non-additive.

Table 1Temporality types in the MultiDim model

Level Attributes Measures Child–parent relationships

Aggregated Non-aggregated

LS Yes No No No YesVT No Yes Yes Yes NoTT Yes Yes No Yes YesLT Yes Yes Yes Yes Yes

Page 17: A Conceptual Model for Temporal Data Warehouse

E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133 117

A dimension is temporal if it has at least one temporal hierarchy. A hierarchy is temporal if it has at leastone temporal level or one temporal relationship between levels. This is represented by the derived attributeTemporal in Dimension and in Hierarchy.

Table 1 summarizes the temporality types that are allowed in the MultiDim model.

5. Mapping to the ER and the OR models

The MultiDim model can be implemented by mapping its specifications into those of operational data mod-els, e.g., relational, object-relational, or object-oriented models. In this paper we use a two-phase approachwhere a MultiDim schema is first transformed into a classical entity-relationship (ER) schema and then, intoan object-relational schema (OR). We choose the ER model since it is a well-known and widely used concep-tual model. As a consequence, the ER representation of the MultiDim constructs allows a better understand-ing of their semantics. Further, the transformation of the ER model into operational data models is wellunderstood (e.g., [18]) and this translation can be done using usual CASE tools. Therefore, in a second stepwe propose mappings that allow a translation of the intermediate ER schemas into OR schemas.

We chose a mapping instead of normalization for several reasons. First, there are no well-accepted normalforms for temporal databases even though some formal approaches exist, e.g., [27,57–59]. Further, the pur-pose of normalization is to avoid the problems of redundancy, inconsistency, and update anomalies. However,the usual practice in data warehouses is to de-normalize relations to improve performance and to avoid thecostly process of joining tables in the presence of high volumes of data. This de-normalization can be donesafely because data in temporal data warehouses is integrated from operational databases, which are usuallynormalized, and thus, there is no danger in incurring the mentioned problems. Finally, using a normalizationapproach may introduce a number of artificial relations that do not correspond to real-world entities, makingthe system more complex for designing, implementing, and querying.

We decided to use an OR model as an example of implementation model since it extends the relationalmodel by allowing attributes to have complex types. The OR model inherently groups related facts into a sin-gle row [12], thus allowing to keep together changes to data and the time when they have occurred. These facil-ities are not provided within the relational model, which imposes to users the responsibility to know and tomaintain the groupings of tuples representing the same real-world fact in all their interactions with thedatabase.

5.1. Mapping of temporality types

Temporal support in the MultiDim model is added in an implicit manner, i.e., using pictograms. Therefore,the transformation of temporal support into the ER model requires additional attributes for timestamps,which are manipulated as usual attributes. The mapping also depends on whether temporal support is usedfor representing events or states. The former require an instant or a set of instants and the latter need a periodor a set of periods.

As already said, the MultiDim model provides several temporality types: valid time, transaction time, life-span, and loading time. Valid time and lifespan are used for indicating the validity of both events and states.When valid time is represented as a set of instants or a set of periods it allows specifying that an attribute hasthe same value in discontinuous time spans, e.g., an employee working in the same section during differentperiods of time. Similarly, representing lifespan as a set of periods allows considering discontinuous lifespans,e.g., a professor leaving for sabbatical during some period of time. For representing transaction time, the usualpractice in temporal databases is to use a period or a set of periods. Since loading time indicates the time whendata was loaded into a temporal data warehouse, an instant is used for representing this temporality type.

The rules for mapping the temporality types from the MultiDim model to the ER model are as follows:

Rule 1: A temporality type representing an instant is mapped to a monovalued attribute.Rule 2: A temporality type representing a set of instants is mapped to a multivalued attribute.Rule 3: A temporality type representing an interval is mapped to a composite attribute having two attributes

indicating the begin and the end of the interval.

Page 18: A Conceptual Model for Temporal Data Warehouse

118 E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133

Rule 4: A temporality type representing a set of intervals is mapped to a multivalued composite attributecomposed of two attributes indicating the begin and the end of the interval.

We use the SQL:2003 standard to specify the mapping to the OR model. SQL:2003 represents collectionsusing array and multiset types. The array type allows storing in a column variable-sized vectors of values ofthe same type while the multiset type allows storing unordered collections of values. Unlike arrays, multisetshave no declared maximum cardinality. Composite types can be combined allowing nested collections,although this is considered an ‘‘advanced feature’’ in the standard.

SQL:2003 also supports structured user-defined types, which are analogous to class declarations in objectlanguages. Structured types may have attributes, which can be of any SQL type including other structuredtypes at any nesting. Structured types can be used as domain of a column of a table, as domain of an attributeof another type, or as a domain of a table. These structured types allow to group semantically relatedattributes.

Therefore, in the mapping to the OR model (1) a multivalued attribute in the ER model is represented as amultiset (or array) attribute, and (2) a composite attribute in the ER model is represented as an attribute of astructured type. In this way, an instant, a set of instants, a period, and a set of periods can be represented inSQL:2003 as follows:

create type InstantType as date;create type InstantSetType as InstantType multiset;create type PeriodType as (FromTime date, ToTime date);create type PeriodSetType as PeriodType multiset;

As example of a commercial object-relational DBMS we use Oracle 10g [47]. Oracle includes constructs thatallow representing collections. A varying array stores an ordered set of elements in a single row while a table

type allows having unordered sets and creating nested tables, i.e., a table within a table. The former corre-sponds to the array type of SQL:2003 and the latter to the multiset type. Further, Oracle provides object types

that are similar to structured user-defined types in SQL:2003. Thus, the above-specified declarations usingSQL:2003 can be expressed in Oracle 10g as follows:

create type InstantType as object (Instant date);create type InstantSetType as table of Instant;create type PeriodType as object (FromTime date, ToTime date);create type PeriodSetType as table of PeriodType;

5.2. Mapping of temporal levels

5.2.1. Temporal attributes of a level

The transformation of the Product level from Fig. 4 (not considering for now lifespan support) to the ERmodel is shown in Fig. 17a and it is done according to the following rules:

Rule 5: A level corresponds to an entity type in the ER model.Rule 6: A non-temporal attribute is represented in the ER model as a monovalued attribute.Rule 7: A temporal attribute is represented in the ER model as a multivalued composite attribute, including

an attribute for the value and an attribute for each associated temporality types.

Notice that the multivalued attribute included in the last rule above allows keeping different values of theattribute at different periods of time. In the example of Fig. 17a, the validity of attribute values is representedby a period, which is a typical practice for dimension data in temporal data warehouses [7,17].

As can be seen by comparing the Product level in Figs. 4 and 17a, the MultiDim model provides a betterconceptual representation of time-varying attributes than the ER model. It contains less elements, it allows

Page 19: A Conceptual Model for Temporal Data Warehouse

Fig. 17. A level with temporal attributes: (a) the ER schema and (b) the OR representation.

E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133 119

clearly distinguishing which data changes should be kept, and it leaves outside of users’ concern technicalaspects such as multivalued or composite attributes.

Applying to the Product level in 17a the traditional mapping to the relational model (e.g., [18]), gives threetables: one with all monovalued attributes and one for each multivalued attribute. All tables include the keyattribute of Product. This relational representation is not very intuitive since attributes of a level are stored asseparate tables. It also has well-known performance problems due to the required join operations, especially iflevels belong to hierarchies.

An OR representation allows overcoming these drawbacks, keeping together in a single table a level andits temporal attributes. Fig. 17b shows the corresponding OR schema using a tabular representation. In thefigure we use the symbol * for denoting collections. For simplicity we do not include in the figure the Dis-tributor attribute, which can be mapped similarly to the Size attribute. The OR representation correspondsto a temporally grouped data model [12], which is considered as more expressive for modeling complex data[59].

The temporal attribute in the OR model is represented as a multiset attribute of a structured type composedof two attributes: one for representing the value and another one for the associated temporality type. Forexample, given the declarations for the temporality types in Section 5.1, the types for the Size attribute aredefined as follows:

create type SizeType as (Value real, VT PeriodType);create type SizeCollType as SizeType multiset;

Since Size is a multivalued attribute, we represent it above as a multiset, but an array could also be usedinstead.

As a level corresponds to an entity type in the ER model (Rule 5), it is represented in the OR model as atable containing all its attributes and an additional attribute for its key. Two kinds of tables can be defined inSQL:2003. Relational tables are usual tables, although the domains for attributes are all predefined or user-defined types. Typed tables are tables that use structured types for their definition. In addition, typed tablescontain a self-referencing column keeping the value that uniquely identifies each row. Such column may bethe primary key of the table, it could be derived from one or more attributes, or it could be a column whosevalues are automatically generated by the DBMS, i.e., surrogates.

Surrogates are important in data warehouses since they ensure both better performance during join oper-ations and independency from transactional systems. Further, surrogates do not vary over time, so two enti-ties having identical surrogates represent the same entity, thus allowing to include historical data in an

Page 20: A Conceptual Model for Temporal Data Warehouse

120 E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133

unambiguous way. For this reason, we use a typed table for representing the Product level.3 Notice that thedeclaration of a typed table requires the previous definition of a type for the elements of the table:

create type ProductType as (Number integer, Name character varying(25),Description character varying(255), Size SizeCollType)ref is system generated;

create table Product of ProductType (constraint prodPK primary key (Number),ref is Sid system generated);

The clause ref is Sid system generated indicates that Sid is a surrogate attribute automatically generated bythe system.

To define the Product table in Oracle 10g we use the specification of temporality types from Section 5.1 asfollows:

create type SizeType as object (Value number, VT PeriodType);create type SizeTabType as table of SizeType;create type ProductType as object (Number number(10), Name varchar2(25),

Description varchar2(255), Size SizeTabType);create table Product of ProductType (constraint prodPK primary key (Number))

nested table Size store as SizeNT, object identifier is system generated;

The definitions in Oracle slightly differ from those in SQL:2003. As specified in Section 5.1 two differenttypes of collections can be used, i.e., varying array and table types. Further, typed tables in SQL:2003 corre-spond to object tables in Oracle and they either include automatically generated surrogates (as in the previousexample, the default option) or alternatively can use the primary key for that purpose.

However, it is important to consider the differences at the physical level between the two options that Ora-cle provides for representing collections. Varying arrays are in general stored ‘‘inline’’ in a row4 and cannot beindexed. On the other hand, rows in a nested table can have identifiers, can be indexed, and are not necessarilybrought to memory when accessing the main table (if the field defined as a nested table is not specified in thequery). Nested tables require to specify their physical locations. For example, as shown in the previous dec-laration, the physical location for the nested table Size must be explicitly defined when the Product table iscreated. Therefore, the choice between nested tables and varying arrays must be done according to applicationspecificities, e.g., which data is accessed, which operations are required, as well as taking into account perfor-mance issues.

5.2.2. Level lifespanThe ER representation for a level that includes lifespan support, e.g., the Product level in Fig. 3, is shown in

Fig. 18a. Recall that in Section 5.1 we explained the mapping of the lifespan temporality type to the ER model.In the example we represented the lifespan by a set of periods, thus allowing products to have discontinuouslifespans.

Rule 8: The lifespan of a level is represented in the ER schema by an additional attribute mapped accordingto Rules 1, 2, 3 or 4. If the lifespan is combined with transaction time and/or loading time, for each ofthese temporality types an additional attribute should be included.

Fig. 18b shows the OR representation where the surrogate attribute, the lifespan, and all level’s attributesare kept together in the same table. On the contrary, representing the lifespan in the relational model would

3 For simplicity, in the examples we omit full specification of constraints and additional clauses required by the SQL:2003 standard.4 If they are less than 4000 bytes or not explicitly specified as a large object (LOB).

Page 21: A Conceptual Model for Temporal Data Warehouse

Fig. 18. A level with lifespan: (a) the ER schema and (b) the OR representation.

E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133 121

introduce an additional table with a foreign key referring to the surrogate attribute and with two attributesFromTime and ToTime representing the begin and the end instants of the lifespan.

The SQL:2003 declaration for the Product type includes an additional attribute representing the temporalelement of the lifespan as follows:

create type ProductType as (LS PeriodSetType, Number integer,Name character varying(25), Description character varying(255),Sizes SizeCollType) ref is system generated;

5.3. Mapping of child–parent relationships

In the MultiDim model child–parent relationships can be temporal or not, and they may relate levels thatare temporal or non-temporal. In Section 5.2 we already discussed the mapping procedures for temporal lev-els; non-temporal levels are mapped in the similar way ignoring all aspects related to temporal support [33]. Inthis section we present the mapping of child–parent relationships considering two cases: when these relation-ships are temporal or not.

5.3.1. Mapping of non-temporal relationships

The transformation of the non-temporal relationship between Product and Category in Fig. 45 to the ERmodel is shown in Fig. 19a. This transformation is based on the following rule:

Rule 9: Non-temporal relationships are represented as usual binary relationships in the ER model.

For obtaining the corresponding OR schema, we first represent each level as explained in Section 5.2. Then,we use the traditional mapping for binary many-to-one relationships and include a reference to the parent level(i.e., Category) in the child level table (i.e., Product). Since the levels are identified by surrogates, which aretime-invariant, this mapping does not depend on whether the levels are temporal or not. For example, themapping of the Product level and the Product–Category relationship gives the relation shown in Fig. 19b.

To define the Product table in SQL:2003, we need to create first a typed table Category with the surrogate inthe Sid attribute. This is shown next:

create type CategoryType as (Name character varying(25), . . .)ref is system generated;

5 For simplicity we only show one temporal attribute.

Page 22: A Conceptual Model for Temporal Data Warehouse

Fig. 19. A hierarchy with a non-temporal relationship: (a) the ER schema and (b) the OR representation.

122 E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133

create table Category of CategoryType (/* additional constraints */,ref is Sid system generated);

create type ProductType as (Number integer, . . ., Sizes SizeCollType,CategoryRef ref(CategoryType) scope Category references are checked)ref is system generated;

create table Product of ProductType (constraint prodPK primary key (Number),ref is Sid system generated);

Notice the CategoryRef attribute in ProductType, which is a reference type pointing to the Category table.This attribute can be used for the roll-up operation from the Product to the Category level. However, there isno such direct access in the opposite direction for the drill-down operation, i.e., from the Category to the Prod-uct level. If the application requires such link, it can be represented in an additional attribute of the Category-Type as follows: ProductRef ref(ProductType) multiset. It allows representing a set of product surrogates thatbelongs to a specific category.

The Oracle declarations for representing the Product groups hierarchy are very similar to those ofSQL:2003:

create type CategoryType as object (Name varchar(25), . . .);create table Category of CategoryType (/* additional constraints */)

object identifier is system generated;create type ProductType as object (Number number(10), . . ., Sizes SizeTabType,

CategoryRef ref(CategoryType));create table Product of ProductType (constraint prodPK primary key (Number), constraint prodFK foreignkey CategoryRef references Category)

nested table Size store as SizeNT, object identifier is system generated;

5.3.2. Mapping of temporal relationships

Temporal relationships can link either non-temporal levels an in Figs. 6 and 9, or temporal levels as inFig. 8. Further, the maximum cardinality of the child level can be equal to 1 or to n. Notice that in order

Page 23: A Conceptual Model for Temporal Data Warehouse

Fig. 20. Different cardinalities in a temporal relationships linking non-temporal levels.

Fig. 21. Mapping of the schema from Fig. 20a into the ER model.

E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133 123

to have meaningful hierarchies, we suppose that the maximum cardinality of the parent level is always equalto n.

Fig. 20a shows a schema where the maximum child cardinality is equal to 1. In the figure the snapshot andlifespan cardinalities are the same (i.e., many-to-one) allowing a child member to belong to at most one parentmember during its entire lifespan; they indicate that an employee may work only in one section and if hereturns after a leave, he must be assigned to the same section.

There are two different cases when the maximum child cardinality is equal to n. In Fig. 20b the snapshotand lifespan cardinalities are different, i.e., a child member is related to one parent member at every timeinstant and to many parent members over its lifespan. On the other hand, in Fig. 20c the snapshot and lifespancardinalities are the same, i.e., a child member is related to many parent members at every time instant.6 Themapping of both cases are handled in the same way since for the first case when the cardinalities are different,we must keep the different links according to the lifespan cardinality. The snapshot cardinality is then repre-sented as a constraint stating that among all links of a member only one is current.

The following rule is used for mapping temporal relationships.

Rule 10: Temporal relationships are mapped into the ER model as usual binary relationships taking intoaccount their lifespan cardinalities. This relationship should include an additional attribute for eachof its associated temporality types; it maps according to the explanation given in Section 5.1.

An example of applying this rule for the schema in Fig. 20a is shown in Fig. 21. The valid time of the rela-tionship is represented as a multivalued composite attribute since an employee can be hired several times in thesame section, i.e., it corresponds to a set of periods. The mapping to the ER model of the schema in Fig. 20band c is straightforward. It can be represented in the same way as in Fig. 21 excepted that the cardinalities ofrelationships are both (1, n).

6 Called in [33] non-strict hierarchies.

Page 24: A Conceptual Model for Temporal Data Warehouse

Fig. 22. OR mapping of temporal links with different cardinalities: (a) and (b) many-to one cardinalities and (c) many-to-manycardinalities.

124 E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133

There are several options for representing temporal relationships in the OR model. The first optionconsists in creating a separate table for the temporal relationship; this is shown in Fig. 22a for the ERschema in Fig. 21. The Work table is composed of the surrogates of the Employee and the Sectionlevels, as well as the temporality of the relationship. For defining the Work table in SQL:2003, theEmployee and Section tables must have been previously declared as typed tables. We do not use atyped table for representing the Work relationship since the relationship does not exist without theirrelated levels.

create type WorkType as (SectionRef ref(SectionType) scope Sectionreferences are checked, LS PeriodSetType);

create table Work (EmployeeRef ref(EmployeeType) scope Employeereferences are checked, InSection WorkType);

Notice that when the cardinalities between the child and the parent levels are many-to-many (Figs. 20b andc), it suffices to define the InSection attribute above as multivalued (i.e., of type WorkType multiset), since anemployee can work in many sections over his lifespan.

The second option consists in representing a temporal relationship as a composite attribute in one of thelevel tables; this is shown in Fig. 22b for the relationship in Fig. 21. In the figure, the Employee table includesthe InSection attribute keeping the Section surrogates with its associated temporal span. The correspondingSQL:2003 declaration is as follows:

create type EmployeeType as (EmplID integer, . . ., InSection WorkType);

If the cardinalities between the Employee and Section levels are many-to-many (Fig. 20b and c), the InSec-tion attribute should be defined as WorkType multiset allowing an employee to work in many sections; this isshown in Fig. 22c.

Page 25: A Conceptual Model for Temporal Data Warehouse

E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133 125

Similar to the mapping of non-temporal relationships (Section 5.3.1) in order to facilitate the access tomembers of the Employee level during the drill-down operations from the Section level, we could include inthe Section table an additional multivalued attribute that keeps for a section member the references to therelated employee members and their temporal characteristics.

The second solution above, i.e., the inclusion of an additional attribute in a level table, allows keeping allattributes and relationships of a level in a single table. This table expresses thus the whole semantics of a level.However, the choice among the alternative OR representations may depend on query performance require-ments and physical-level considerations for the particular DBMS, such as join algorithms, indexing capabil-ities, etc. For example, defining the InSection attribute as a nested table in Oracle 10g, will require a join of twotables, thus not offering any advantage with respect to the solution of a separate table for the Workrelationship.

The ER and the OR representations for temporal levels linked with temporal relationships are the same asthe ones described for non-temporal levels and temporal relationships since the surrogates of child and parentmembers are time-invariant. An example of a MultiDim schema and its OR representation corresponding tothis case is given in Fig. 23.

5.4. Mapping of fact relationships with temporal measures

Temporal measures may represent either events or states. In this section we only refer to measures whosevalid time is represented as an instant with granularity month. Nevertheless, the results may be straightfor-wardly generalized for other granularities or if valid time is represented by a period.

Fig. 23. Temporal relationship between temporal levels: (a) a MultiDim schema and (b) corresponding OR representation.

Page 26: A Conceptual Model for Temporal Data Warehouse

126 E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133

The following rule is used for mapping fact relationships.

Rule 11: A fact relationship corresponds to an n-ary relationship in the ER model. Measures of the relation-ship are mapped to the ER model in the same way as temporal attributes of a level (Rule 7).

Fig. 24a shows the mapping to the ER model of the fact relationship with temporal measures from Fig. 3.Mapping this fact relationship to the relational model in first normal form (1FN) gives two tables. In Fig. 24bwe only show the table for the Amount measure since the other table for the Quantity measure has similarstructure. However, this schema can be simplified if additional information is available. For example, if allmeasures are temporally correlated they can be represented in one table and tuple timestamping can beapplied. In our example this means to add the Quantity attribute to the table in Fig. 24b.

The OR model also creates a separate table that includes as attributes the references to the surrogate keys ofthe participating levels. In addition, every measure is mapped into a new attribute in the same way as was donefor the temporal attribute of a level. An example of the tabular OR representation is given in Fig. 24c.

However, even though the OR model allows for representing the changes in measure values for the samecombination of foreign keys, in practice it may be not well suited for aggregations with respect to time. Theobjects created for measures contain a two-level nesting: one for representing different measure values for thesame combination of foreign keys and another for representing temporal elements. Therefore, it is difficult toexpress aggregation statements related to time when accessing the second-level nesting.

For choosing the OR representation, it is important to consider physical-level features of the particular ORDBMS. For example, when using nested varying arrays in Oracle 10g, timestamps cannot be indexed andcomparisons for valid time must be done by programming. On the other hand, if two nested tables are used,indexing and comparisons are allowed improving query formulation and execution. However, for accessing a

Fig. 24. Temporal measures: (a) ER representation, (b) relational representation for the Amount measure, and (c) OR representation.

Page 27: A Conceptual Model for Temporal Data Warehouse

E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133 127

measure and its corresponding valid time two nested tables must be joined in addition to a join with the maintable containing the foreign keys. Therefore, depending on the specific features of the target OR DBMS, therelational representation may be more adequate in order to represent in a more ‘‘balanced’’ manner all attri-butes that may be used for aggregation purposes.

6. Related work

The necessity to manage time-varying data in databases has been acknowledged for several decades, e.g.,[19,28]. However, no such consensus has been reached for representing time-varying multidimensional data.Works related to temporal data warehouses raise many issues, for example, the inclusion of temporality typesin temporal data warehouses (e.g., [1,10]), temporal querying of multidimensional data (e.g., [40,49]), correctaggregation in presence of data and structural changes (e.g., [17,24,40]), temporal view materialization fromnon-temporal sources (e.g., [63]), evolution of a multidimensional structure (e.g., [9,17,40]), or implementationconsiderations for a temporal star schema (e.g., [7]). Nevertheless, very little attention has been drawn to con-ceptual modeling for temporal data warehouses and its subsequent logical mapping.

The works reviewed next fall into four categories. First, the works describing different temporality typesthat may be included in temporal data warehouses; second, proposals of conceptual models for temporal datawarehouses; third, works referring to a logical-level representation; and finally, works related to differentgranularities.

6.1. Types of temporal support

The inclusion of different temporality types in temporal data warehouses is briefly mentioned in severalworks. While most of them consider valid time [4,7,9,11,16,17,36,46,50,63], they do no distinguish betweenlifespan and valid time support. As we saw in Section 4.2, this distinction is important since it leads to differentconstraints when levels form hierarchies.

With respect to transaction time several approaches are taken. Some approaches ignore transaction time[9,39]. As we already mentioned, the lack of transaction time precludes traceability applications. In [49] thepossibility of including transaction time is briefly mentioned, but there is no analysis of the usefulness of hav-ing this temporality type in the data warehouse context. Other approaches transform transaction time fromsource systems to represent valid time [1,36]. This is semantically incorrect because data may be included indatabases independently of their period of validity, for example, adding data about future employees. Finally,other approaches consider transaction time generated in a temporal data warehouse in the same way as trans-action time is used in temporal databases [30,36,52], i.e., allowing to know when data was inserted, updated, ordeleted from data warehouses. Since data in temporal data warehouses is neither modified nor deleted, trans-action time generated in a temporal data warehouse represents indeed the time when data was loaded into adata warehouse.

Our proposal differs from the works mentioned above in several respects: (1) we distinguish lifespan sup-port for levels and valid time support for attributes and measures, (2) we include valid time, lifespan, andtransaction time support coming from source systems (if available), and (3) we include a new temporality type,i.e., loading time, that is generated in a temporal data warehouse. We showed by means of examples the use-fulness of these temporality types. To the best of our knowledge, only [10] discusses the inclusion of valid time,transaction time, and loading time. However, unlike our approach, they limit the use of these temporalitytypes for active data warehouses and do not provide a conceptual model that includes these types.

6.2. Conceptual modeling and data manipulation

Several authors provide solutions for handling changes in multidimensional models [9,61]. These solutionscan fit into two groups [9]: schema evolution [6,23,24,61] and historical models [8,9,11,16,17,20,39,49]. In theformer approach, a unique data warehouse schema is maintained and data is mapped to the most recentschema version. The authors usually propose a set of operations that allow schema (and instance) modifica-tions. However, since only the last version of the schema is included, the history of data evolution is lost.

Page 28: A Conceptual Model for Temporal Data Warehouse

128 E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133

Historical models keep track of the evolution of schema and/or instances [9], allowing the coexistence ofdifferent schema and/or instance versions. In general, the proposed model includes temporal support for levelsand links between them. Very few models only timestamp instances. For example, [11] timestamps hierarchicalassignments between level instances. At the implementation level, the assignments are represented as a matrixwhose rows and columns correspond to the level instances and cells store the validity times of hierarchicalassignments between level instances. A similar approach that timestamps level instances and their hierarchicalassignments is presented in [16]. However, in their matrix the rows and columns represent the old and newversion of instances while cells include the value that is used for transforming measures from one versionto another one.

Most models are able to represent changes at the schema level; the changes at the instance level are man-aged as a consequence of schema modifications. These models mainly focus on the problem of data aggrega-tion in the presence of different schema/instance versions.

Different mechanisms are used for creating a new schema version. For example, [51] defines a multiversionmultidimensional model that consists of a set of star versions. Each star version is associated to a temporalinterval that includes one version for a fact and one version for each dimension associated to a fact. Whateverchanges occur at the schema level (to dimensions or facts), a new star version is created. A similar approach istaken by [61] where each change at the schema or instance levels results in the creation of a new schema ver-sion, even though it remains the same as the original one in the case of changes at the instance level. On theother hand, in the approach of [20,21] a schema is first transformed into a graph of functional dependenciesand then, a new schema is created using schema modification operators.

Several models refer to the aggregation of measures in the presence of time-varying dimensional data, i.e.,when the issued queries refer to data included in several schema versions. Some works require transformationfunctions between different schema/instance versions. For example, [20,21] created the so-called augmentedschema that contains elements in which the old and new schema versions differ. Then, users should specifytransformation actions between different versions. A set of mapping functions is also required in the solutionsproposed by [8,9,16,17,50,51]. The model of [39,40] extends the approach of [23,24] by including timestampsfor schema and instance versions. It also defines a specific language, called TOLAP (Temporal OLAP) thatallows users to aggregate measures according to the dimension structures when the corresponding measureswere introduced. Another query language for temporal data is presented in [61], which integrates several pre-vious works of [4,45,46]. This language first decomposes a query into partial queries executed on a unique ver-sion, and then, present these partial results to the users together with version and metadata information. In thesecond phase, the integration of partial results (if possible) into a common set of data is realized.

Other works relate to specific problems, such as maintaining changes for measures representing late regis-tration events (i.e., events needing some confirmation to be valid) [52]. Even though, the authors consider validtime and transaction time, they include them only for measures. Therefore, changes to dimension data are notconsidered.

Summarizing we can say that the above-mentioned models formally describe the temporal support for mul-tidimensional models, allowing to express changes in dimension members, hierarchy links, and in fact relation-ships. Further, several works provide a query mechanism for the specific models proposed by the authors.However, none of the models propose a graphical representation based on a multidimensional view of tem-poral data that can be used for communication between users and designers. Further, they do not considerdifferent aspects as proposed in this paper, for example, a hierarchy that may have temporal and non-temporallevels linked with either temporal or non-temporal relationships, the inclusion of different temporality typesfor measures, and the problem of different time granularities between source systems and temporal data ware-houses. In addition, these models do not provide an associated logical representation.

With respect to query and aggregation mechanisms, all solutions proposed require customized implemen-tations. On the contrary, we showed that temporal data warehouses can be implemented in current DBMSs. Inthis way we provided a more general approach that does not require a specific software for manipulating mul-tidimensional data that vary over time. Since at the moment we do not consider schema versioning, ourapproach is closer to the solutions proposed for temporal databases, in particular for temporal aggregations[44,55,56]. However, it is possible to extend our model with schema versioning. This extension can be done in asimilar way as proposed by [4,45,46,61] when after realizing temporal aggregation for each version of the

Page 29: A Conceptual Model for Temporal Data Warehouse

E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133 129

schema/instances (corresponding to temporal aggregations based on instant grouping [55]), users must decidewhether the results can be integrated.

6.3. Logical representation

Regarding logical representation for temporal data warehouses, [7] introduce a temporal star schema thatdiffers from the classical one by the fact that the time dimension does not exist; instead the rows in all tables ofthe schema are timestamped. The authors compare this model with the classical star schema taking intoaccount database size and performance. They conclude that the temporal star schema facilitates expressingand executing queries, it is smaller in size, and it does not keep redundant information.

In [24,40] are proposed two logical representation for their conceptual model, called fixed and non-fixedschemas, that are similar to snowflake and star schemas, respectively, with the difference that each row is time-stamped. Since their approach also refers to the schema versioning, additional tables that contain informationabout schemas and their validity are required. On the other hand, [9] implements their model using starschema changing some model characteristics and incurring into data repetition for those instances that donot change in different versions.

Given the lack of a satisfactory solution for a logical representation of temporal data warehouses, we brieflyreview logical models for temporal databases with the goal to adapt some of these ideas for the logical rep-resentation of temporal data warehouses.

One approach for logical-level design of temporal databases is to use normalization. Temporal functionaldependencies have been defined, e.g., in [27,57–59]. Most of these approaches rely on the first normal form(1NF), However, the non-first normal form (NF2), e.g., [3], was proposed for solving the well-known limita-tions of the first normal form for modeling complex data. The NF2 allows structured domains, collectiondomains, and relation-valued domains, and these are also included in the SQL:2003 standard under the nameof object-relational model [37,38]. In addition, leading DBMS vendors (for example, Oracle, IBM DB2) havealso included object-relational features.

[12] distinguish temporally grouped and temporally ungrouped historical data models. The former cor-responds to attribute-timestamping models using complex domains in NF2, the latter to tuple-timestam-ping models represented in 1NF. Although these two approaches model the same information, they arenot equivalent: while a grouped relation can be ungrouped, for an ungrouped relation there is not aunique grouped relation. [59] considers that the approach given in [12] has difficulties in managingtime-varying data due to the absence of an explicit group identifier. Further, [12,59] consider that tempo-rally grouped models are more expressive. Based on this conclusion, we have chosen a mapping to object-relational databases that are able to represent temporally grouped model with an explicit group identifier(i.e., a surrogate).

Another approach for logical-level design of temporal databases is based on mapping conceptual models.While this is the usual practice for conventional (i.e., non-temporal) database design, to the best of our knowl-edge only [14,22,54] propose such an approach for obtaining a logical schema from a temporal conceptualmodel. In general, the approach for mapping timestamped elements is to create a table for each entity typethat includes lifespan, a separate table for each timestamped monovalued attribute, and one additional tablefor each multivalued attribute, whether timestamped or not. This approach produces a significant number oftables since entities and their time-varying attributes are represented separately. It is not intuitive for express-ing the semantics of the modeled reality.

6.4. Temporal granularity

There are many works in temporal databases related to transformations for different granularities, e.g.,[5,13,60]. In this section we only mention some of them; more detailed references can be found, for example,in [5,13]. For example, [15] defines mappings between different granularities as explained in Section 4.3.2.1,while [5,41] refer to the problem of conversion of different time granularities as well as handling data attachedto these granules. [5] propose calendar operations that allow to capture the relationships existing between timegranularities. The authors define point- and interval-based assumptions that can be used, respectively, for data

Page 30: A Conceptual Model for Temporal Data Warehouse

130 E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133

conversion between the same or different time granularities. [41] consider the transformation of time-varyingdata for the is-a relationship in a temporal object-oriented data model. To ensure the adequate transformationbetween a type and its subtypes having different time granularities, they introduce and classify coercionfunctions.

Even though the aspect of managing data with multiple time granularities is widely investigated in temporaldatabases, this is still an open research in temporal data warehouses. In this paper, we only consider differenttime granularities between source systems and a temporal data warehouse.

7. Conclusions

Bringing together two research areas, data warehouses and temporal databases, allows combining theachievements of each of them leading to the emerging field of temporal data warehouses. Nevertheless, neitherdata warehouses nor temporal databases have a well-accepted conceptual model that can be used for captur-ing users’ requirements. To establish a better communication between designers and users, we presented a tem-poral extension of the MultiDim model. We included temporal support for levels, attributes, child–parentrelationships forming hierarchies, and measures giving their conceptual and logical representations.

First, we discussed the inclusion of valid and transaction time coming from source systems and the loadingtime generated in a temporal data warehouse. Next, we referred to levels that include temporal attributes andlifespan support. We also discussed three different cases for temporal hierarchies: (1) non-temporal relation-ships between temporal levels, (2) temporal relationships between non-temporal levels, and (3) temporal rela-tionships between temporal levels. For temporal measures we analyzed two different situations depending onwhether the time granularity for representing measures in temporal data warehouses is either the same or coar-ser than the one in source systems.

For each element of our multidimensional model we included its conceptual representation and its trans-formation to the entity-relationship (ER) and the object-relational (OR) models. We also showed examplesof implementation using Oracle, indicating physical features that should be considered during the implemen-tation of a temporal data warehouse.

Providing temporality types in a conceptual model allows including temporal semantics as an integral partof temporal data warehouses. In this way, the temporal extension provides symmetry to multidimensionalmodels allowing to represent changes and the time when they occur for all data warehouse elements. After-wards, logical and physical models can be derived from such a conceptual representation.

The translation of the constructs of the MultiDim model to the ER model allows a better understanding oftheir semantics. However, the ER model provides a less convenient conceptual representation of time-varyingattributes, levels, and relationships than the MultiDim model. The latter contains less elements, it clearlyallows to distinguish which data changes should be kept, and it leaves outside of user’s concerns more tech-nical aspects.

On the other hand, the proposed mapping to the OR model helps implementers who use the MultiDimmodel for conceptual design of temporal data warehouses. The mapping also shows the feasibility of imple-menting our model in current DBMSs. Further, the mapping considers the particularities of the different ele-ments of a multidimensional model as well as the specificities of current DBMSs.

The object-relational model allows a better representation of temporal levels and hierarchies than the rela-tional model. In the former model a level and its corresponding temporal attributes are kept together while therelational model produces a significant number of tables with well-known disadvantages for modeling andimplementation. Unlike the relational model, there are several alternatives for representing in the OR modela child–parent relationship. The choice among them must be made considering both application semantics andthe physical level features of the particular OR DBMS. On the other hand, the relational model is more ade-quate for representing temporal measures. It considers in the same manner all attributes including the onesthat represent time, thus facilitating aggregation procedures.

The proposed mapping may vary according to the expected usage patterns, e.g., data mining algorithms,and specific features of the target implementation system. For example, users may choose a tool-specific mul-tidimensional storage (e.g., using Analytic Workspace in Oracle 10g) instead of relying on more general solu-tions as the ones proposed in this paper.

Page 31: A Conceptual Model for Temporal Data Warehouse

E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133 131

References

[1] A. Abello, C. Martın, A bitemporal storage structure for a corporate data warehouse, in: Proceedings of the Fifth InternationalConference on Enterprise Information Systems, ICEIS’03, Angers, France, April 2003, pp. 177–183.

[2] J. Allen, Maintaining knowledge about temporal intervals, Communications of the ACM 26 (11) (1983) 832–843.[3] H. Arisawa, K. Moriya, T. Miura, Operations and the properties on non-first-normal-form relational databases, in: M. Schkolnick,

C. Thanos (Eds.), Proceedings of the 9th International Conference on Very Large Data Bases, VLDB’83, Morgan KaufmannPublishers, Florence, Italy, 1983, pp. 197–204.

[4] B. Bebel, J. Eder, C. Koncilla, T. Morzy, R. Wrembel, Creation and management of versions in multiversion data warehouse, in: H.Haddad, A. Omicini, R. Wainwright, et al. (Eds.), Proceedings of the ACM Symposium on Applied Computing, SAC’04, ACMPress, Nicosia, Cyprus, 2004, pp. 717–723.

[5] C. Bettini, S. Jajodia, X. Wang, Time Granularities in Databases, Data Mining, and Temporal Reasoning, Springer-Verlag, 2000.[6] M. Blaschka, C. Sapia, Hofling, On schema evolution in multidimensional databases, in: Mohania and Min Tjoa [43], pp. 153–164.[7] R. Bliujute, S. Slatenis, G. Slivinskas, C. Jensen. Systematic change mangement in dimensional data warehousing. Technical Report

TR-23, Time Center, 1998.[8] M. Body, M. Miquel, Y. Bedard, A. Tchounikine, A multidimensional and multiversion structure for OLAP applications, in: D.

Theodoratos (Ed.), Proceedings of the Fifth ACM International Workshop on Data Warehousing and OLAP, DOLAP’02, ACMPress, McLean, VA, USA, 2002, pp. 1–6.

[9] M. Body, M. Miquel, Y. Bedard, A. Tchounikine, Handling evolution in multidimensional structures, in: U. Dayal, K.Ramamritham, T. Vijayaraman (Eds.), Proceedings of the 19th International Conference on Data Engineering, ICDE’03, Bangalore,India, IEEE Computer Society Press, 2003, pp. 581–592.

[10] R. Bruckner, A. Min Tjoa, Capturing delays and valid times in data warehouses – towards timely consistent analyses, Journal ofIntelligent Information Systems 19 (2) (2002) 169–190.

[11] P. Chamoni, S. Stock, Temporal structure in data warehousing, in: Mohania and Min Tjoa [43], pp. 353–358.[12] J. Clifford, A. Croker, A. Tuzhilin, On completeness of historical relational query languages, ACM Transactions on Database

Systems 19 (1) (1994) 64–116.[13] C. Combi, M. Franceschet, A. Peron, Representing and reasoning about temporal granularities, Journal of Logic and Computation 4

(1) (2004) 52–77.[14] V. Detienne, J. Hainaut, CASE tool support for temporal database design, in: H. Kunii, S. Jajodia, S. Solvberg (Eds.), Proceedings of the

20th International Conference on Conceptual Modeling, ER’01, LNCS 2224, Springer-Verlag, Yokohama, Japan, 2001, pp. 208–224.[15] C. Dyreson, Valid-Time Indeterminacy. PhD thesis, University of Arizona, 1994.[16] J. Eder, C. Koncilia, Changes of dimension data in temporal data warehouses, in: Y. Kambayashi, W. Winiwater, M. Arikawa

(Eds.), Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery, DaWaK’01, LNCS 2114,Springer-Verlag, Munich, Germany, 2001, pp. 284–293.

[17] J. Eder, C. Koncilia, T. Morzy, The COMET metamodel for temporal data warehouses, in: A. Banks Pidduck, J. Mylopoulos, C.Woo, M. Ozsu (Eds.), Proceedings of the 14th International Conference on Advanced Information Systems Engineering, CAiSE’02,LNCS 2348, Springer-Verlag, Toronto, Canada, 2002, pp. 83–99.

[18] R. Elmasri, S. Navathe, Fundamentals of Database Systems, fourth ed., Addison-Wesley, 2003.[19] R. Elmasri, G. Wuu, A temporal model and query language for ER databases , Proceedings of the Sixth International Conference on

Data Engineering, ICDE’90, Los Angeles, CA, IEEE Computer Society Press, 1990, pp. 76–83.[20] M. Golfarelli, J. Lechtenborger, S. Rizzi, G. Vossen, Schema versioning in data warehouses, in: S. Wang, Y. Dongqing, K. Tanaka,

et al. (Eds.), Proceedings of the ER’04 Third International Workshop on Evolution and Changes in Data Management, ECDM’04,LNCS 3289, Springer-Verlag, Shanghai, China, 2004, pp. 415–428.

[21] M. Golfarelli, J. Lechtenborger, S. Rizzi, G. Vossen, Schema versioning in data warehouses: enabling cross-version querying viaschema augmentation, Data & Knowledge Engineering 59 (2006) 435–459.

[22] H. Gregersen, L. Mark, C. Jensen. Mapping temporal ER diagrams to relational schemas. Technical Report TR-39, Time Center,1998.

[23] C. Hurtado, A. Mendelzon, A. Vaisman, Maintaining data cubes under dimension updates , Proceedings of the 15th InternationalConference on Data Engineering, ICDE’99, Sydney, Australia, IEEE Computer Society Press, 1999, pp. 346–355.

[24] C. Hurtado, A. Mendelzon, A. Vaisman, Updating OLAP dimensions , Proceedings of the Second ACM International Workshop onData Warehousing and OLAP DOLAP’99, ACM Press, Kansas City, MO, USA, 1999, pp. 60–66.

[25] W. Inmon, Building the Data Warehouse, John Wiley & Sons Publishers, 2002.[26] M. Jarke, M. Lenzerini, Y. Vassiluiou, P. Vassiliadis (Eds.), Fundamentals of Data Warehouse, second ed., Springer-Verlag, 2003.[27] C. Jensen, R. Snodgrass, Temporally enhanced database design, in: M. Papazoglou, S. Spaccapietra, Z. Tari (Eds.), Advances in

Object-Oriented Data Modeling, MIT Press, 2000, pp. 163–193.[28] C. Jensen, R. Snodgrass, M. Soo. Extending normal forms to temporal relations. Technical Report TR-17, Time Center, 1992.[29] R. Kimball, M. Ross, The Data Warehouse Toolkit, second ed., John Wiley & Sons Publishers, 2002.[30] C. Koncilia, A bi-temporal data warehouse model, in: J. Eder, M. Missikoff (Eds.), Proceedings of the 15th International Conference

on Advanced Information Systems Engineering, CAiSE’03, LNCS 2681, Springer-Verlag,, Klagenfurt, Austria, 2003, pp. 77–80.[31] H. Lenz, A. Shoshani, Summarizability in OLAP and statistical databases, in: Y. Ioannidis, D. Hansen (Eds.), Proceedings of the

Ninth International Conference on Scientific and Statistical Database Management, SSDBM’97, Olympia, Washington, USA, IEEEComputer Society Press, 1997, pp. 132–143.

Page 32: A Conceptual Model for Temporal Data Warehouse

132 E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133

[32] E. Malinowski, E. Zimanyi, A conceptual solution for representing time in data warehouse dimensions, in: M. Stumptner, S.Hartmann, Y. Kiyoki (Eds.), Proceedings of the Third Asia-Pacific Conference on Conceptual Modelling, APCCM’06, CRIPTAustralian Computer Society, Hobart, Australia, 2006, pp. 45–54.

[33] E. Malinowski, E. Zimanyi, Hierarchies in a multidimensional model: from conceptual modeling to logical representation, Data &Knowledge Engineering 59 (2) (2006) 348–377.

[34] E. Malinowski, E. Zimanyi, Inclusion of time-varying measures in temporal data warehouses, in: Y. Manolopoulos, J. Filipe, P.Constantopoulos, J. Cordeiro (Eds.), Proceedings of the Eighth International Conference on Enterprise Information Systems,ICEIS’06, Paphos, Cyprus, May 2006, pp. 181–186.

[35] E. Malinowski, E. Zimanyi, Object-relational representation of a conceptual model for temporal data warehouses, in: E. Dubois, K.Pohl (Eds.), Proceedings of the 18th International Conference on Advanced Information Systems Engineering, CAiSE’06, LNCS4001, Springer-Verlag, Luxembourg, 2006, pp. 96–110.

[36] C. Martın, A. Abello, A temporal study of data sources to load a corporate data warehouse, in: Y. Kambayashi, M. Mohania, W.Wos (Eds.), Proceedings of the Fifth International Conference on Data Warehousing and Knowledge Discovery, DaWaK’03, LNCS2737, Springer-Verlag, Prague, Czech Republic, 2003, pp. 109–118.

[37] J. Melton, Understanding Object-Relational and Other Advanced Features, Morgan Kaufman Publishers, 2003.[38] J. Melton, SQL:2003 has been published. SIGMOD Record 33 (1) (2003) 119–125.[39] A. Mendelzon, A. Vaisman, Temporal queries in OLAP, in: A.E. Abbadi, M. Brodie, S. Chakravarthy, U. Dayal, N. Kamel, G.

Schlageter, K.Y. Whang (Eds.), Proceedings of the 26th International Conference on Very Large Data Bases, VLDB’00, MorganKaufmann Publishers, Cairo, Egypt, 2000, pp. 243–253.

[40] A. Mendelzon, A. Vaisman, Time in multidimensional databases, in: M. Rafanelli (Ed.), Multidimensional Databases: Problems andSolutions, Idea Group Publishing, 2003, pp. 166–199.

[41] I. Merlo, E. Bertino, E. Ferrari, G. Guerrini, A temporal object-oriented data model with multiple granularities , Sixth InternationalWorkshop on Temporal Representation and Reasoning, TIME’99, Orlando, FL, USA, IEEE Computer Society Press, 1999, pp. 73–81.

[42] A. MinTjoa, J. Trujullo (Eds.), Proceedings of the Eighth International Conference on Data Warehousing and Knowledge Discovery,DaWaK’06, LNCS 4081, Springer-Verlag, Krakow, Poland, 2006.

[43] M. Mohania, A. MinTjoa (Eds.), Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery,DaWaK’99, LNCS 1676, Springer-Verlag, Florence, Italy, 1999.

[44] B. Moon, F. Vega, V. Immanuel, Efficient algorithms for large-scale temporal aggregation, IEEE Transactions on Knowledge andData Engineering 15 (3) (2003) 744–759.

[45] T. Morzy, R. Wrembel, Modeling a multiversion data warehouse: a formal approach, in: Proceedings of the Fifth InternationalConference on Enterprise Information Systems, ICEIS’03, Angers, France, April 2003, pp. 120–127.

[46] T. Morzy, R. Wrembel, On querying versions of multiversion data warehouse, in: I. Song, K. Davis (Eds.), Proceedings of theSeventh ACM International Workshop on Data Warehousing and OLAP, DOLAP’04, ACM Press, Washington, DC, USA, 2004,pp. 92–101.

[47] Oracle Coorporation. Oracle database application developer’s guide: Object-relational features, 10g release 2. download-east.oracle.com/docs/cd/B19306_01/appdev.102/b14260.pdf, 2005.

[48] C. Parent, S. Spaccapietra, E. Zimanyi, Conceptual Modeling for Traditional and Spatio-Temporal Applications: The MADSApproach, Springer-Verlag, 2006.

[49] T. Pedersen, C. Jensen, C. Dyreson, A foundation for capturing and querying complex multidimensional data, Information Systems26 (5) (2001) 383–423.

[50] F. Ravat, F. Teste, Supporting data changes in multidimensional data warehouses, International Review on Computer and Software1 (3) (2006) 251–259.

[51] F. Ravat, O. Teste, G. Zurfluh, A multiversion-based multidimensional model, in: Min Tjoa and Trujullo [42], pp. 65–74.[52] S. Rizzi, M. Golfarelli, What time is it in the data warehouse? In: Min Tjoa, Trujullo [42], pp. 134–144.[53] R. Snodgrass, The TSQL2 Temporal Query Language, Kluwer Academic Publishers, 1995.[54] R. Snodgrass (Ed.), Developing Time-Oriented Database Applications in SQL, Morgan Kaufman Publishers, 2000.[55] R. Snodgrass, S. Gomez, L. McKenzie, Aggregates in the temporary query language TQuel, IEEE Transactions on Knowledge and

Data Engineering 5 (5) (1993) 826–842.[56] Y. Tao, D. Papadias, C. Faloutsos, Approximate temporal aggregations, Proceedings of the 20th International Conference on Data

Engineering ICDE’04, IEEE Computer Society Press, Boston, MA, USA, 2004, p. 190–201.[57] X. Wang, C. Bettini, A. Brodsky, S. Jajodia, Logical design for temporal databases with multiple granularities, ACM Transactions on

Database Systems 22 (2) (1997) 115–170.[58] J. Wijsen, Design of temporal relational databases based on dynamic and temporal functional dependencies, in: J. Clifford, A.

Tuzhilin (Eds.), Proceedings of the International Workshop on Temporal Databases, Springer-Verlag, Zurich, Switzerland, 1995, pp.61–76.

[59] J. Wijsen, Temporal FDs on complex objects, ACM Transactions on Database Systems 24 (1) (1999) 127–176.[60] J. Wijsen, A string-based model for infinite granularities, Proceedings of the AAAI Workshop on Spatial and Temporal Granularities,

Austin, TX, USA, AAAI Press, 2000, pp. 9–16.[61] R. Wrembel, B. Bebel, Metadata management in a multiversion data warehouse, in: S. Spaccapietra, P. Atzeni, F. Fages, et al.

(Eds.), Journal on Data Semantics, LNCS 4380, vol. VIII, Springer-Verlag, 2007, pp. 118–157.

Page 33: A Conceptual Model for Temporal Data Warehouse

E. Malinowski, E. Zimanyi / Data & Knowledge Engineering 64 (2008) 101–133 133

[62] R. Wrembel, T. Morzy, Managing and querying version of multiversion data warehouse, in: Y. Ioannidis, M. Scholl, J. Schmidt,et al. (Eds.), Proceedings of the 10th International Conf. on Extending Database Technology, EDBT’06, LNCS 3896, Springer-Verlag, Munich, Germany, 2006, pp. 1121–1124.

[63] J. Yang, J. Widom, Maintaining temporal views over non-temporal information sources for data warehousing, in: H. Schek, F.Saltor, I. Ramos, G. Alonso (Eds.), Proceedings the 6th International Conference on Extending Database Technology, EDBT’98,LNCS 1377, Springer-Verlag, Valencia, Spain, 1998, pp. 389–403.

[64] E. Zimanyi, C. Parent, S. Spaccapietra, TERC+: a temporal conceptual model, in: Proceedings of the International Symposium onDigital Media Information, Nara, Japan, November 1997.

Elzbieta Malinowski is a professor at the Department of Computer and Information Science at the Universidad deCosta Rica and a professional consultant in Costa Rica in the area of the Data Warehousing. She received hermaster degrees from Saint Petersburg Electrotechnical University, Russia (1982), University of Florida, USA(1996), and Universite Libre de Bruxelles (2003), and her Ph.D. degree from Universite Libre de Bruxelles,Belgium (2006). Her research interests include data warehouses, OLAP systems, geographic information systems,and temporal databases.

Esteban Zimanyi is a professor and director of the Department of Computer and Network Engineering of theUniversite Libre de Bruxelles (ULB). He started his studies at the Universidad Autonoma de Centro America,

Costa Rica. He received a B.Sc. (1988) and a doctorate (1992) in computer science from the Faculty of Sciences atthe ULB. During 1997, he was a visiting researcher at the Database Laboratory of the Ecole PolytechniqueFederale de Lausanne, Switzerland. His current research interests include Semantic Web and Web Services,Bioinformatics, Geographic Information Systems, Temporal Databases, and Data Warehouses.