Storage Structures, Query Processing and Implementation of On-Line Analytical ... · 2004. 2. 19. · Abstract On Line Analytical Processing (OLAP) has caused a significant shift

NATIONAL TECHNICAL UNIVERSITY OF ATHENS SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING

DIVISION OF COMPUTER ENGINEERING AND INFORMATICS

Storage Structures, Query Processing and Implementation of On-Line Analytical Processing Systems

Ph.D. Thesis

NIKOS N. KARAYANNIDIS Dipl. Electrical and Computer Engineering N.T.U.A. (1997)

Athens, April 2003

NATIONAL TECHNICAL UNIVERSITY OF ATHENS SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING DIVISION OF COMPUTER ENGINEERING AND INFORMATICS

Storage Structures, Query Processing and Implementation of On-Line Analytical Processing Systems

Ph.D. Thesis

NIKOS N. KARAYANNIDIS

Dipl. Electrical and Computer Engineering N.T.U.A. (1997)

Advisory Committee: T. Sellis Y. Vasiliou E. Zachos

Approved by the seven-member examining committee on April 15 2003.

................................... ................................... ................................... T. Sellis Y. Vasiliou Ε. Zachos Professor N.T.U.A. Professor N.T.U.A. Professor N.T.U.A. ................................... ................................... ................................... P. Afrati Α. Stafylopatis Y. Ioannidis Professor N.T.U.A. Professor N.T.U.A. Professor Univ. of Athens

...................................

Y. Theodoridis Assistant Professor Univ. of Pereus

Athens, April 2003

...................................

NIKOS N. KARAYANNIDIS Ph.D. Electrical and Computer Engineering N.T.U.A. © 2003 – All rights reserved

Abstract

On Line Analytical Processing (OLAP) has caused a significant shift in the traditional database query paradigm. Queries have become more complex and entail the processing of large amounts of data. Ad hoc analysis is a powerful tool in the context of business intelligence. Efficient processing of ad hoc OLAP queries is the cornerstone of ad hoc analysis. Usually, the only way to evaluate such a query is to access directly the most detailed data and compute the result on the fly. Considering the size of the data stored in contemporary data warehouses, as well as the processing-intensive nature of OLAP que-ries, such an endeavor might prove unrealistic in terms of response time of the query. There is a strong need for an appropriate physical organization of the underlying data that guarantees physical data clus-tering and reduces the I/O cost, in combination with specialized processing. In this thesis, we introduce a new data structure for physically organizing the most detailed data of an OLAP cube, the CUBE File. This is a multidimensional data structure that natively supports dimension hierarchies. It imposes a hierarchical clustering on the data and thus is intended for speeding up que-ries with hierarchical restrictions, which constitute the most typical workload in OLAP. Moreover, it allocates space conservatively aiming at high space utilization and adapts perfectly to the extensive sparseness of the cube data space. In the presence of multidimensional structures that impose hierarchical clustering (such as the CUBE File), the processing of ad hoc star queries changes radically. Considering the importance of these que-ries, which are the most prevalent kind of queries in data warehouses, we present an overall framework for processing these queries over hierarchically clustered cubes. In addition, we materialize our abstract operations with respect to the CUBE File data structure and present specific processing algorithms. Aiming for a pragmatic evaluation of our proposals, we have implemented our ideas into a real OLAP system, ERATOSTHENES. We present the architecture of this system and describe the implementation of various fundamental components, such as the storage manager and the processing engine. Moreover we present the incorporation of the CUBE File as the primary storage alternative for cubes in ERA-TOSTHENES and discuss design decisions and implementation choices. This work sets a new paradigm for the storage and processing of multidimensional data with hierar-chies and opens the road for new processing and optimization challenges. The emphasis has been on the OLAP application domain, but the reported results can potentially be exploited in other domains where multidimensional data with hierarchies play a central role, such as GIS systems or XML docu-ments.

Acknowledgments

Firstly, I would like to thank my supervisor Prof. Timos Sellis, for his support, his under-standing and patience in my many stressful moments during the completion of this thesis, for his technical guidance and advice in core database issues, but most of all for being such a great inspiration for me. Also, I want to thank my colleagues Aris Tsois and Panos Vassili-adis for the fruitful discussions on many issues of this thesis, as well as for the coauthoring of some of the relative publications. The graduate student Babis Samios implemented the early processing engine prototype, as part of his diploma thesis, and I want to thank him, for his work and for his patience in my supervision. Other graduate students have also helped me in the implementation of ERATOSTHENES and I feel the need to thank them sincerely. These are, Antonis Delhgiannakis, Paulseph-John Farrugia, Loukas Sinos and also Yannis Kouva-ras, who implemented a data generator for testing SISYPHUS and is currently working on this project.

The work in this thesis has been partially funded by the European Union's Information So-ciety Technologies Programme (IST) under project EDITH (IST-1999-20722). This project has been a great opportunity for bringing out real-world problems pertaining to state-of-the-art OLAP systems. I would especially like to thank our partners from the database research center FORWISS and TransAction Software GmbH in Munich for their collaboration and coauthoring of the relative publications.

Certainly, the achievement of the completion of this thesis would not have been possible without the discreet support of my family and for this I am grateful to them.

Last but not least, I wish to thank my beloved wife Kyriaki for walking by my side during the whole journey of this thesis.

TABLE OF CONTENTS

1 INTRODUCTION..................................................................................................................19 1.1 MOTIVATION....................................................................................................................19 1.2 SCOPE AND OBJECTIVES ...................................................................................................21 1.3 STRUCTURE OF THE THESIS ..............................................................................................23

PART I: THE PRIMARY FILE ORGANIZATION CUBE FILE...................................25

2 THE HIERARCHICALLY CHUNKED CUBE .........................................................................27 2.1 DIMENSION ENCODING ....................................................................................................27 2.2 HIERARCHICAL CHUNKING ..............................................................................................29 2.3 THE CHUNK-TREE REPRESENTATION...............................................................................33 3 BUILDING THE CUBE FILE...............................................................................................37 3.1 MEASURING HIERARCHICAL CLUSTERING.......................................................................38 3.2 THE HPP CHUNK-TO-BUCKET ALLOCATION PROBLEM...................................................46 3.2.1 Formation of Bucket-Regions......................................................................................50 3.2.2 Storing Large Data Chunks..........................................................................................56 3.2.3 Storage of the Root Directory ......................................................................................61 4 CUBE FILE OPERATIONS .................................................................................................69 4.1 CUBE FILE PRIMARY DATA NAVIGATION OPERATIONS....................................................69 4.1.1 drill_down ....................................................................................................................71 4.1.2 roll_up..........................................................................................................................72 4.1.3 get_next ........................................................................................................................73 4.1.4 move_to ........................................................................................................................74 4.1.5 Defining Access Operations ........................................................................................76 4.2 CUBE FILE MAINTENANCE OPERATIONS........................................................................77 4.2.1 Primary Reorganization Operations.............................................................................79 4.2.2 Bulk Incremental Updating..........................................................................................84 4.2.3 Data Purging ................................................................................................................86 4.3 OTHER MAINTENANCE ISSUES .........................................................................................87 4.3.1 Anticipating Updates Along the Time Dimension.......................................................87 4.3.2 Slowly Changing Dimensions......................................................................................89 5 RELATED WORK I .............................................................................................................93 5.1 CHUNKING .......................................................................................................................93 5.2 THE LINEAR CLUSTERING PROBLEM FOR MULTIDIMENSIONAL DATA .............................94 5.3 GRID FILE-BASED MULTIDIMENSIONAL ACCESS METHODS.............................................95 5.4 TAXONOMY OF CUBE PRIMARY ORGANIZATIONS ............................................................97

PART II: AD HOC STAR QUERY PROCESSING ..........................................................99

6 PROCESSING AD HOC STAR QUERIES OVER HIERARCHICALLY CLUSTERED FACT TABLES ..................................................................................................................................101 6.1 PRELIMINARY CONCEPTS...............................................................................................102 6.2 DATABASE SCHEMA.......................................................................................................103 6.3 AD HOC STAR QUERIES .................................................................................................106 6.4 ABSTRACT PROCESSING PLAN .......................................................................................107 6.4.1 Example of an Abstract Processing Plan ...................................................................110 6.5 PERFORMANCE EVALUATION.........................................................................................111 9

7 PROCESSING AD-HOC STAR QUERIES OVER CUBE FILE-ORGANIZED FACT TABLES 115 7.1 STAR QUERY NORMAL FORM ........................................................................................115 7.2 REGULAR CHUNK EXPRESSIONS ....................................................................................117 7.3 H-SURROGATE PROCESSING FOR THE CUBE FILE.........................................................121 7.3.1 Dimension Physical Design .......................................................................................122 7.3.2 Implementation Rules for the H-Surrogate Processing .............................................124 8 QUERY PROCESSING ALGORITHMS FOR AD HOC STAR QUERIES ................................129 8.1 THE ITERATOR MODEL FOR PHYSICAL OPERATORS.........................................................129 8.2 CUBE FILE-ACCESS PHYSICAL OPERATORS .................................................................132 8.2.1 ChunkSelect (ε) ..........................................................................................................132 8.2.2 ChunkTreeScan (δ) ....................................................................................................136 8.2.3 MDRangeSelect (ρ) ....................................................................................................140 8.2.4 MDRangeAggregate (α).............................................................................................146 8.2.5 MDRangeGroup (γ) ...................................................................................................149 8.2.6 Analysis of the Cost of CUBE File-Access Operators ..............................................155 8.3 PHYSICAL EXECUTION PLANS: THE BIG PICTURE ..........................................................158 9 RELATED WORK II..........................................................................................................165 9.1 STAR-JOIN PROCESSING.................................................................................................165 9.2 UB-TREE-BASED AD HOC STAR QUERY PROCESSING....................................................167 9.3 AD HOC STAR QUERY OPTIMIZATION............................................................................169

PART III: ERATOSTHENES IMPLEMENTATION.....................................................171

10 ERATOSTHENES: BUILDING A NOVEL OLAP SYSTEM...........................................173 10.1 SCOPE AND OBJECTIVES ..............................................................................................173 10.2 THE ARCHITECTURE OF ERATOSTHENES ................................................................175 11 THE SISYPHUS STORAGE MANAGER .........................................................................177 11.1 OLAP REQUIREMENTS RELATIVE TO STORAGE MANAGEMENT ..................................178 11.2 THE SISYPHUS ARCHITECTURE ................................................................................179 11.2.1 SHORE Storage Manager ........................................................................................180 11.2.2 File Manager ............................................................................................................180 11.2.3 Buffer Manager........................................................................................................180 11.2.4 Access Manager .......................................................................................................181 11.2.5 Catalog Manager......................................................................................................182 11.2.6 System Manager.......................................................................................................183 11.3 MULTI-USER SUPPORT.................................................................................................183 11.4 ALTERNATIVE STORAGE CONFIGURATION OPTIONS....................................................185 12 IMPLEMENTATION OF THE CUBE FILE .......................................................................187 12.1 OBJECTIVES OF A CHUNK-BASED FILE SYSTEM...........................................................187 12.2 COPING WITH CUBE SPARSENESS.................................................................................188 12.3 INTERNAL ORGANIZATION OF BUCKETS ......................................................................189 12.4 LOGICAL VS. PHYSICAL CHUNK-IDS ............................................................................193 12.5 INTERNAL ORGANIZATION OF CHUNKS........................................................................194 12.6 INITIAL BULK-LOADING AND CONSTRUCTION OF THE CUBE FILE ..............................196 13 THE PROCESSING ENGINE.............................................................................................201 13.1 SQNF QUERY EVALUATION........................................................................................201 13.2 TOWARDS A PRAGMATIC OLAP PROCESSING ENGINE ................................................205 13.2.1 Accessing the Cube..................................................................................................206 13.2.2 Hierarchy-based Buffer Management......................................................................208

10

13.2.3 Execution Plan Implementation...............................................................................214 14 RELATED WORK III ......................................................................................................217 14.1 SHORE..........................................................................................................................217 14.2 PREDATOR ...................................................................................................................218 14.3 UB-TREE INTEGRATION INTO A DBMS........................................................................219 15 EPILOGUE ..................................................................................................................221 15.1 SUMMARY AND CONCLUSIONS.....................................................................................221 15.2 FUTURE WORK ............................................................................................................226 15.2.1 Synopses for Multidimensional Data with Hierarchies ...........................................226 15.2.2 Support for “Advanced” Dimensions ......................................................................227 15.2.3 Query driven chunking ............................................................................................227 16 BIBLIOGRAPHY ..............................................................................................................229

11

TABLE OF FIGURES

Figure 1: (a) An example of a hierarchy in a dimension. (b) A member code denotes the whole path of a member in a specific level hierarchy. ....................................................28

Figure 2: Dimension members of our 2-dimensional example cube. ......................................32 Figure 3: The cube from our running example hierarchically chunked...................................33 Figure 4: The whole subtree up to the data chunks under chunk 0|0. .....................................34 Figure 5: A greedy algorithm for the HPP chunk-to-bucket allocation problem. ...................48 Figure 6: A chunk-tree to be allocated to buckets by the greedy algorithm. ...........................49 Figure 7: The chunk-to-bucket allocation for SB = 30.............................................................50 Figure 8: The region proximity for two bucket-regions: rP1 > rP2...........................................53 Figure 9: A row-wise traversal of the input trees. ...................................................................54 Figure 10: Bucket-region formation based on query patterns .................................................55 Figure 11: A bucket-region formation algorithm that is driven by query patterns..................56 Figure 12: Example of a large data chunk. ..............................................................................58 Figure 13: The example large data chunk artificially chunked................................................59 Figure 14: An example of a root directory...............................................................................62 Figure 15: Recursive algorithm for storing the root directory.................................................63 Figure 16: Resulting allocation of the running example cube for a bucket size SB = 30 and a

cache area equal to a single bucket. .................................................................................64 Figure 17: Resulting allocation of the running example cube for a bucket size SB = 10 and a

cache area equal to a single bucket. .................................................................................66 Figure 18: Resulting allocation of the running example cube for a bucket size SB = 10 and a

zero cache area.................................................................................................................67 Figure 19: The drill_down operation. ......................................................................................71 Figure 20: The roll_up operation.............................................................................................72 Figure 21: The get_next operation. ..........................................................................................73 Figure 22: The move_to operation...........................................................................................75 Figure 23: Example of the definition of a value-added operation. ..........................................76 Figure 24: The insertion of a new data point M at the data chunk level..................................78 Figure 25: The split_tree reorganization primitive. (a) The insertion of TN overflows bucket

B1. (b) split_tree reallocates the subtrees under the root to two new buckets B1 and B2..........................................................................................................................80

Figure 26: The split_region reorganization primitive. (a) The insertion of TN overflows bucket B1. (b) split_region splits the initial region into two smaller ones. (c) split_region creates a smaller region (bucket B3) and reallocates the updated tree of the initial region to a separate bucket B2................................................................................82

Figure 27: The expand_root_dir reorganization primitive. (a) The insertion of TN overflows bucket B1. (b) expand_root_dir allocates TN into a new bucket B2. .............83

Figure 28: A schematic representation of the incremental updating process. .........................84 Figure 29: The bulk incremental updating algorithm for the CUBE File................................85 Figure 30: (a) A typical data purging operation along the TIME dimension, (b) The

reorganization requires only an update on the root chunk, without accessing the rest chunk-trees in the CUBE File. .........................................................................................86

Figure 31: Alignment of chunk-cells in order to anticipate insertions along the Time dimension.........................................................................................................................87

Figure 32: The insertion of a new member in a dimension level triggers an update to a large number of chunks, due to the consecutive ordering of the level members......................90

Figure 33: Star schema with flat dimension tables. ...............................................................105 Figure 34: The ad hoc star query template.............................................................................106

13

Figure 35: The abstract processing plan. ...............................................................................109 Figure 36: The schema of the data warehouse.......................................................................110 Figure 37: The dimension hierarchies of the example...........................................................111 Figure 38: The abstract processing plan for the example query ............................................112 Figure 39: The Create_Range operator consists of two distinct processing steps. ...............122 Figure 40: A physical design proposal for dimension D. ......................................................124 Figure 41: Definition and invocation of a physical operator X. ............................................130 Figure 42: Definition of the ChunkSelect (ε) physical operator. ...........................................133 Figure 43: Definition of the ChunkSelect open function. ......................................................134 Figure 44: Definition of the ChunkSelect next function. .......................................................135 Figure 45: Example of a ChunkSelect evaluation. .................................................................136 Figure 46: Definition of the ChunkTreeScan (δ) physical operator......................................136 Figure 47: Definition of the ChunkTreeScan open function..................................................137 Figure 48: Definition of the ChunkTreeScan next function...................................................138 Figure 49: Evaluation of ChunkTreeScan on a 3-level tree with an input id of 0|0. ...........139 Figure 50: Definition of the MDRangeSelect (ρ) physical operator. .....................................141 Figure 51: Definition of the MDRangeSelect open function. ................................................142 Figure 52: Definition of the MDRangeSelect next function. .................................................144 Figure 53: Example of an MDRangeSelect evaluation. .........................................................145 Figure 54: Definition of the MDRangeAggregate (α) physical operator...............................147 Figure 55: Definition of the MDRangeAggregate open function. .........................................147 Figure 56: Definition of the MDRangeAggregate next function. ..........................................148 Figure 57: Definition of the MDRangeGroup (γ) physical operator......................................151 Figure 58: Definition of the MDRangeGroup open function. ...............................................152 Figure 59: Definition of the MDRangeGroup next function. ................................................153 Figure 60: Example of an MDRangeGroup evaluation. ........................................................155 Figure 61: (a) In the best case a single data point query can be evaluated with a single I/O,

(b) If the cache area cannot hold the whole root directory, more buckets might interleave in the evaluation path. ....................................................................................................156

Figure 62: A physical execution plan for the running example of abstract processing (§6.4.1).........................................................................................................................................161

Figure 63: Physical execution plan for the running example containing an MDRangeGroup operator. .........................................................................................................................162

Figure 64: A query box in the multidimensional space translates to a set of Z-intervals in the one-dimensional Z-value space [Ram02]. .....................................................................169

Figure 65: Architecture of ERATOSTHENES........................................................................174 Figure 66: The abstraction levels in SISYPHUS storage manager.........................................179 Figure 67: The basic threads of control when N clients are connected to a server using

SISYPHUS......................................................................................................................182 Figure 68: The internal organization of a bucket...................................................................190 Figure 69: The internal organization of the root-bucket........................................................191 Figure 70: The internal organization of a chunk....................................................................195 Figure 71: Algorithm for the construction and bulk loading of a CUBE File. ......................196 Figure 72: Order of range-selections over the bulk-loaded B+ tree during the cost-tree

creation...........................................................................................................................198 Figure 73: The general processing flow for the evaluation of SQNF queries. ......................203 Figure 74: Decision flow chart depicting the selection of the appropriate CUBE File

processing algorithm......................................................................................................205 Figure 75: Definition of the CubeAccess class. .....................................................................207 Figure 76: A clock-based buffer replacement policy.............................................................209

14

Figure 77: Architecture of the session cache. ........................................................................211 Figure 78: A hierarchy-based bucket replacement policy......................................................212 Figure 79: Class hierarchy of physical operators...................................................................215 Figure 80: The physical representation of an execution plan. ...............................................216

15

TABLE OF TABLES

Table 1: The space of proposed primary organizations for cube storage ................................98 Table 2: Response time (in sec) for the three plans for the three query classes. ...................113 Table 3: Mapping between SQNF and the SQL ad hoc star query template of Figure 34. ...117 Table 4: Explanation of symbols appearing in a member code specification........................119 Table 5: Summary of physical operators accessing the CUBE File storage organization.....159 Table 6: Subset of member-code specification symbols implemented in the processing

engine.............................................................................................................................204 Table 7: Description of the current position (CP) status flags...............................................208

TABLE OF DEFINITIONS

Definition 1 (Hierarchical Prefix Path Restriction) .................................................................39 Definition 2 (Hierarchical Prefix Path Query).........................................................................39 Definition 3 (Bucket-Region) ..................................................................................................41 Definition 4 (Region contribution of a tree stored in a bucket – cr) ........................................41 Definition 5 (Depth contribution of a tree stored in a bucket – cd).........................................42 Definition 6 (Hierarchical Clustering Degree of a Bucket – HCDB).......................................42 Definition 7 (Hierarchical Clustering Factor of a Physical Organization for a Cube – fHC) ...45 Definition 8 (The HPP Chunk-to-Bucket Allocation Problem)...............................................46 Definition 9 (The bucket-region formation problem)..............................................................50 Definition 10 (Region Proximity rP) ........................................................................................52 Definition 11 (Local Depth d)..................................................................................................57 Definition 12 (The Root Directory RD )..................................................................................61 Definition 13 (The Current Position in the CUBE File - CP)..................................................70 Definition 14 (The Current Drilling Path in the CUBE File - CDP ) ......................................70 Definition 15 (The Current Point in Time - CPT) ...................................................................88 Definition 16 (Star Query Normal Form - SQNF).................................................................117 Definition 17 (Member Code Specification) .........................................................................118 Definition 18 (Rule for processing an ad hoc star query over a CUBE File-organized cube)

........................................................................................................................................120 Definition 19 (Maximum Depth of Restrictions – DMAX-R)...................................................140 Definition 20 (Syntactic Decision Rule 1).............................................................................202 Definition 21 (Syntactic Decision Rule 2).............................................................................202

TABLE OF THEOREMS

Theorem 1: Theorem of maximum hierarchical clustering degree of a bucket .......................44 Theorem 2: Complexity of the HPP chunk-to-bucket allocation problem ..............................51 Theorem 3: Size upper bound for an artificially chunked large data chunk............................60 Theorem 4: Upper bound of the size ratio between the root directory and the cube’s data

space.................................................................................................................................65

17

Chapter 1: Introduction

1 Introduction

D uring the last decade and as the database technology constantly evolves and matures,

the term business intelligence has appeared, in order to describe a set of technologies

that provide to the end-user the ability to obtain valuable information from data that are usu-

ally gathered from the everyday business operation, and promote strategic decision-making.

The advent of business intelligence above all signifies a shift in the “expectations” of the

user from a database management system. The insufficiency of traditional transaction-

oriented database management systems to meet these expectations has resulted to a vast re-

search effort in this area.

Data warehousing and On-Line Analytical Processing (OLAP) are core database technolo-

gies that comprise the driving forces behind business intelligence. The work reported in this

thesis is focused on the field of OLAP but can also be exploited in other application domains

where vast amounts of multidimensional data with hierarchies need to be processed effi-

ciently, e.g., GIS systems or XML-based document processing, etc.

1.1 Motivation On line analytical processing has caused a significant shift in the traditional database query

paradigm. The need for advanced analysis on the data has resulted into queries that are more

complex. Such queries combine different aspects of the business (or other application do-

main) yielding a consolidated result, which leads to a multidimensional modeling of the data.

Hence, OLAP queries are multidimensional (e.g., show me sales for all the bookstores in

Athens for the past two years). Moreover, the need for obtaining results at different

granularities has led to the organization of dimensions into hierarchies of aggregation levels

19


(e.g. for a Location dimension such a hierarchy of levels could be: store, region, city, and

country). Dimension hierarchies play a dominant role in OLAP query-loads; since the most

typical queries include restrictions on the hierarchies and/or grouping and aggregation based

on the hierarchy levels.

In addition, advanced decision support calls for ad hoc analysis, in contrast to using prede-

fined reports that are constructed periodically, or have already been pre-computed. The foun-

dation for this kind of analysis is the support of ad hoc OLAP queries, which comprise the

real essence of OLAP. Efficient processing of ad hoc OLAP queries is a very difficult task

considering, on the one hand the native complexity of typical OLAP queries, which poten-

tially combine huge amounts of data, and on the other, the fact that no a-priori knowledge for

the query exists and thus no pre-computation of results or other query-specific tuning can be

exploited. The only way to evaluate these queries is to access directly the base data in an effi-

cient way. To this end, an appropriate multidimensional data structure must be used as well

as a “good” physical clustering of the data. However, this multidimensional structure must

also be “hierarchy-enabled” meaning that it can provide fast access to the data through re-

strictions imposed on the dimension hierarchies (e.g. year = 2000, or country = “Greece”

etc.). Considering also, the size of the data and above all the inherent large sparseness of the

cube’s data space, the problem becomes even more difficult. Therefore, the lack of a data

structure that is natively multidimensional, explicitly supports hierarchies and imposes a

physical clustering of the data, while at the same time guarantees a small storage cost with

good space utilization and management of the cube sparseness, so as to enable efficient proc-

essing of ad hoc OLAP queries, has been the most important motivation for the work reported

in this thesis.

In addition, the requirements for and the characteristics of a database management system

used for OLAP are significantly different from those for a system used for on-line transaction

processing (OLTP). The former is used in a read-mostly environment, with updates occurring

periodically in a batch form rather in arbitrary mode and with high frequency. This fact

changes radically the focus of the data organization, allowing the use of techniques such as

data redundancy (e.g., storing de-normalized relational tables) or physical data clustering that

can reduce query response time significantly but would never be employed in a OLTP system

since they would deteriorate updating performance. Moreover, concurrency control is not so

important in an OLAP system as in an OLTP system, since OLAP is primarily aimed at a

small group of “power-users” that want to analyze the data for making strategic decisions. 20


We have already mentioned the difference in complexity between queries in an OLAP envi-

ronment and the ones in an OLTP environment. Another characteristic that significantly dif-

fers in OLAP is the user’s working-session pattern. An OLAP database typically consists of a

set of cubes (or hyper-cubes) that we discuss in detail later in this thesis. Typical analysis of

an OLAP cube takes place in terms of a sequence of queries (query session). This means that

usually the business user will not just submit a single ad hoc query to a cube and then con-

tinue by querying some other cube, as would be the case for individual relational tables in a

non-OLAP environment. A cube constitutes a significant, and more importantly, a self-

contained part of the enterprise activity history and thus the user will want to query a lot more

a stored cube than he would a common relational table. A typical OLAP user essentially

“opens” a cube for querying and then submits a series of ad hoc queries to the cube. There-

fore, an appropriate system designed for OLAP, should aim at supporting this user behavior

by exploiting appropriate data structures and clever in-memory caching.

The fact that the requirements for a DBMS pertaining to OLAP, like the above, are not met

by conventional systems (e.g., commercial RDBMS), has been the major motivation for the

work on the design and development of the OLAP system ERATOSTHENES, which is also

reported in this thesis.

1.2 Scope and objectives The work reported in this thesis is guided by three major objectives. Next, we discuss each

one, define the scope of our work and discuss our contribution.

As discussed above, the foundation for the efficient processing of ad hoc OLAP queries is a

physical organization for the most detailed data that enables multidimensional access, explic-

itly supports hierarchies and provides physical clustering of the data in order to reduce the

I/O cost, while it achieves low storage overhead through high utilization of the available

space and adaptability to the inherent cube sparseness. The investigation for such a data

structure is the first objective of this work. In particular, our focus is on speeding up queries

that impose restrictions on the dimension hierarchies, which is the most typical case for an

OLAP query.

To this end, we propose a file organization for the most detailed data of a cube called the

CUBE File. The CUBE File is a multidimensional data structure inspired from the grid file

[NHS84]:

• It explicitly supports hierarchies, in the sense that it provides access paths to the data di- 21


rectly through the restrictions imposed on any level(s) of the dimension hierarchies.

• Moreover, since the focus has been to speedup queries with hierarchical restrictions, the

CUBE File imposes hierarchical clustering of the data. This is achieved first via a multi-

dimensional chunking scheme that is based on the hierarchies’ structure called hierarchi-

cal chunking; and second through a “hierarchical packing” of chunks into buckets.

• It utilizes space conservatively (employing compression of chunks when necessary) and

due to the fact that the chunking is based on the dimension hierarchies it adapts perfectly

to the sparseness of the data space.

• Operationally, the CUBE File is intended for an initial bulk loading and then for read-

only query sessions, while it supports bulk incremental updates, which fits to the most

fundamental requirements of an OLAP user.

A second objective of the work reported in this thesis is to investigate the processing of

OLAP queries in the context of hierarchically clustered OLAP cubes. In particular:

• We focus on ad hoc star query processing and show how we can exploit in a relational

star schema, a physical representation of the fact table, based on a multidimensional data

structure that achieves hierarchical clustering through the use of special path-based surro-

gate keys like (but not restricted to) the CUBE File structure. With such structures, the

evaluation of the costly star-join becomes a simple multidimensional range query, which

is evaluated very efficiently due to the native support for many dimensions.

• Moreover, we propose an abstract processing framework for ad hoc star query evaluation

over hierarchically clustered cubes. As our experimental evaluation has shown, our

framework reduces significantly the query response time, exhibiting speedups up to 25

times faster than the most prevalent contemporary method for star query evaluation.

• Finally, we propose processing algorithms that implement our identified abstract opera-

tions in the context of the CUBE File organization.

The third and last objective of this work is to design and implement a novel OLAP system

that incorporates the above results and corresponds to the OLAP user specific needs. To this

end:

• We present the overall architecture and design of the OLAP system ERATOSTHENES.

• Then, we discuss the implementation of the storage manager of the system, SISYPHUS.

SISYPHUS adopts the CUBE File organization as its primary file structure for storing a

22


cube and is a system aiming at fulfilling storage management requirements pertaining to

OLAP that are not met by conventional record-based storage managers.

• In addition, we present the implementation and discuss design choices for the CUBE File

incorporation into SISYPHUS.

• Finally, we discuss the implementation of the processing engine and propose a novel

buffer management strategy specialized for hierarchy-based queries.

1.3 Structure of the thesis This thesis is separated in three parts. The first part discusses the CUBE File organization,

which comprises our work on the issues of storage and physical data clustering for the OLAP

cube. The second part deals with the processing of ad hoc star queries and the third part dis-

cusses the implementation of the prototype OLAP system ERATOSTHENES.

We begin with the presentation of our hierarchical chunking method leading to the chunk-tree

representation of the cube, in Chapter 2. Next, in Chapter 3, we discuss the construction of

the CUBE File and formalize the hierarchical clustering problem as an optimization problem,

providing also an algorithm for solving it. Chapter 4 introduces the basic data navigation op-

erations offered by the CUBE File structure, as well as discusses basic maintenance issues.

Chapter 5 concludes Part I with a discussion on related work.

Part II begins with Chapter 6, which presents our proposal for an abstract processing plan for

ad hoc star queries. In this chapter the reader can get a general idea of how a structure such as

the CUBE File can be exploited for OLAP query processing. Chapter 7 examines the same

topic, however with the assumption that the underlying storage structure is the CUBE File.

Chapter 8 goes one step further providing processing algorithms (in terms of physical opera-

tors) that implement our abstract operations with respect to the CUBE File data structure. Fi-

nally, Chapter 9 concludes this part with a discussion on related work.

The third and last part of this thesis begins by setting the scope and goals of the ERATOS-

THENES project, in Chapter 10, as well as by presenting the overall system architecture.

Chapter 11 details the implementation of the storage manager component, SISYPHUS. Chap-

ter 12 describes a realistic implementation of the CUBE File in SISYPHUS and discusses de-

sign choices. Chapter 13, describes the implementation issues for the processing engine and

proposes a hierarchy-based buffer management strategy. Part III ends with Chapter 14, which

discusses other related projects.

23


Chapter 15 is a summary of the main conclusions and the contributions of this thesis. It also

provides directions on future research.

24

PART I: The Primary File Organi-zation CUBE File

If we had to answer the question: “How would you describe with a few

words an appropriate data structure for a cube?” then, the very first an-

swer that would come to our mind would be: “What we really need is

a multidimensional file organization that is “ hierarchy aware””.

In the first part of this thesis, we will present a physical organization

for the most detailed data of an OLAP cube: the primary file organi-

zation CUBE File. The CUBE File is a multidimensional data struc-

ture that natively supports hierarchies. It is intended primarily for

speeding up ad-hoc OLAP queries containing restrictions on the di-

mension hierarchies, and/or grouping operations on the hierarchy

levels, which of course constitute the most prevalent type of queries

imposed on a cube. It has been designed to support the initial bulk

loading; then, read-only query sessions and incremental bulk updat-

ing, rather than dynamic read/insert/ delete/update operations.

In the chapters that follow, we present a method for chunking the

cube according to the dimension hierarchies’ structure, called hierar-

chical chunking. Then, we discuss how to materialize a hierarchically

chunked cube on a bucket-based storage system, in order to achieve

the hierarchical clustering of the stored data. We conclude, with a

presentation of the basic data navigation operations of the CUBE File

as well as its primary reorganization operations.

25

Chapter 2: The Hierarchically Chunked Cube

2 The Hierarchically Chunked Cube

C learly, our aim is to define a multidimensional file organization that natively supports

hierarchies. There is indeed a plethora of data structures for multidimensional data

[GG98], but to the best of our knowledge, none of these explicitly supports hierarchies. Hier-

archies complicate things, basically because, in their presence, the data space “explodes” (K-

times exponential in the number of dimensions, i.e., O(mKN), where N is the number of di-

mensions, K is the length of the dimension hierarchies and m the fan out for each hierarchy).

Moreover, since we are primarily aiming at speeding up queries including restrictions on the

hierarchies, we need a data structure that can efficiently lead us to the corresponding data

subset based on these restrictions. A key observation at this point is that all restrictions on the

hierarchies intuitively define a subcube or a cube-slice.

To this end, we exploit the intuitive representation of a cube as a multidimensional array and

apply a chunking scheme in order to create subcubes, i.e., the so-called chunks. Our method

of chunking is based on the dimension hierarchies’ structure and thus we call it hierarchical

chunking. In the following sections we discuss a dimension-data encoding scheme that as-

signs hierarchy-enabled unique identifiers to each data point in a dimension. Then, we pre-

sent our hierarchical chunking method. Finally, we propose a tree structure for representing

the hierarchy of the resultant chunks.

2.1 Dimension Encoding OLAP data are divided into two main categories. The measures (or facts) are mainly numeric

values, which correspond to measurements of some value related to an event at specific

points in time (e.g., amount of money appearing in a line of an invoice at a specific day, or

balance of an account at the end of each day, etc.) and are expected to change rapidly, as new

27


events occur (i.e., new invoice lines appear and so on). The dimension data (or simply dimen-

sions) are used to characterize the measures and are considered to be almost static (or slowly

changing) in time. The dimension values characterize a specific measure value in the same

way that coordinate values characterize a specific point in a multidimensional space. Exam-

ples of dimensions for a retailing business can be DATE, PRODUCT, CUSTOMER, LOCA-

TION etc.

In many cases, dimension values are organized into levels of aggregation defining a hierar-

chy, i.e., an aggregation path. For example, the Time dimension consists of day values,

month values and year values, which belong to the Day level, Month level and Year level re-

spectively. As an example in Figure 1(a), we depict a LOCATION dimension consisting of a

hierarchy of four levels. We call the most detailed level the grain level of the dimension. A

specific value in a level L of a dimension D is called a member of L, e.g., the value “cityA” is

a member of the City level of dimension STORE.

Country

State

City

StoreLOCATION

Grain level ---

cityA (0)

cityB (1)

cityC (2)

cityD (3)

stateA (0)

stateB (1)

countryA(0)

(a) (b) Figure 1: (a) An example of a hierarchy in a dimension. (b) A member code denotes the whole path of a

member in a specific level hierarchy.

A very useful characteristic in OLAP is that the members of a level are typically known a

priori. Moreover, the value domain remains unchanged for sufficiently long periods of time.

A very common trend in the literature [DRSN98, MRB99, RKR97, Sar97, VS00] is to im-

pose a specific ordering on these members. One can implement this ordering through a map-

ping of the members of each level to integers. Obviously, this total ordering among the mem-

bers can be either inherent (e.g., for day values), or arbitrarily set (e.g., for city values). We

call this distinct value the order code of a member.

In our model, we choose to order the members of a level according to the hierarchy path that

this member belongs to. We start from 0 and assign consecutive order codes to members with

a common parent member. The sequence is never reset but continuously incremented until we

28


reach the end of a level’s domain. This way an order code uniquely specifies a member

within a level. Similar “hierarchical” ordering approaches have been used in [DRSN98,

MRB99].

In order to uniquely identify a member within a dimension we also assign to each member a

member code. This is constructed by the order codes of all its ancestor members along the

hierarchy path, separated by dots. For example, the member-code of “cityC” along the hierar-

chy path of Figure 1(b) is "0.1.2".

It is typical for a dimension to consist of more than one aggregation paths, i.e., hierarchies. In

our model, all the possible hierarchies of a dimension have always a common level contain-

ing the most detailed data (i.e., the grain level is the same for all the hierarchies of a dimen-

sions). The file organization that we propose is based on a single hierarchy from each dimen-

sion. We call this hierarchy the primary hierarchy (or the primary path) of the dimension.

Data will be physically clustered according to the dimensions’ primary paths. Since queries

based on primary paths (either by imposing restrictions on them, or by requiring some group-

ing based on their levels) are very likely to be favored in terms of response time, it is crucial

for the designer to decide on the paths that will play the role of the primary paths based on

the query load. For example, the path (per dimension) where the majority of queries impose

their restrictions should be identified as the primary path. Naturally, the only way to favor

more than one path (per dimension) in clustering is to maintain redundant copies of the cube

[SS94], or to treat different hierarchy paths as separate dimensions [MRB99], thus increasing

the cube dimensionality, bearing the corresponding consequences (see [WSB98] for the noto-

rious dimensionality curse and also our mention on this issue in §5.2).

2.2 Hierarchical Chunking Very simply put, a chunk is a subcube within a cube with the same dimensionality (i.e., a

multidimensional tile). A chunk is created by defining a range of members along each dimen-

sion of the cube. In other words, by applying chunking to the cube we essentially perform a

kind of grouping of data, in the sense that we select some data to reside in the same chunk.

In this section we discuss our proposal for a chunking method in order to physically organize

the data of the cube. Intuitively, one can support that a typical OLAP workload, where con-

secutive drill-downs into detail data, or roll-ups to more aggregated views of the data are

common, essentially involves swing movements along one or more aggregation paths. In

[DRSN98] this property of OLAP queries is characterized as hierarchical locality. The basic

29


incentive behind hierarchical chunking is to partition the data space by forming a hierarchy of

chunks that is based on the dimensions' hierarchies.

We model the cube as a large multidimensional array, which consists only of the most de-

tailed data. In this primary definition of the cube, we assume no pre-computation of aggre-

gates. Therefore, a cube C is formally defined as the following (n+m)-tuple: C ≡ (D1,…,Dn,

M1,… Mm), where Di, for 1 ≤ i ≤ n, is a dimension and Mj, for 1 ≤ j ≤ m, is a measure.

Initially we partition the cube in a very few regions (i.e., chunks) corresponding to the most

aggregated levels of the dimensions' hierarchies. Then we recursively partition each region as

we drill-down to the hierarchies of all dimensions in parallel. We define a measure in order to

distinguish each recursion step, called global chunking depth D, or simply chunking depth.

For illustration purposes we will use an example of a 2-dimensional cube, hosting sales data

for a fictitious company. The dimensions of our cube, as well as the members for each level

of these dimensions (each appearing with its member-code), are depicted in Figure 2. The

two dimensions are namely LOCATION and PRODUCT.

In order to apply our method, we need to have hierarchies of equal length. For this reason, we

insert pseudo-levels P into the shorter hierarchies until they reach the length of the longest

one. This "padding" is done after the level that is just above the grain level. In our example,

the PRODUCT dimension has only three levels and needs one pseudo-level in order to reach

the length of the LOCATION dimension. This is depicted next, where we have also noted the

order code range at each level:

LOCATION:[0-2].[0-4].[0-10].[0-18]

PRODUCT:[0-1].[0-2].P.[0-5]

The rationale for inserting the pseudo levels above the grain level lies in that we wish to ap-

ply chunking (i.e., partitioning along each dimension) the soonest possible and for all possi-

ble dimensions. Bearing in mind that the chunking proceeds in a top-to-bottom fashion (i.e.,

from the more aggregated levels to the more detailed ones), this “eager chunking” has the ad-

vantage of reducing very early the chunk size and also provide faster access to the underlying

data, since it increases the fan-out (in this sense, it is equivalent with increasing the fan-out of

the intermediate nodes in a B+tree). Therefore, since pseudo levels restrict chunking in the

dimensions that are applied, we must insert them to the lowest possible level. Consequently,

since there is no chunking below the grain level (a data cell cannot be further partitioned), it

is easy to see why the pseudo level insertion occurs just above the grain level.

30


Figure 3, illustrates the hierarchical chunking of our example cube. We begin chunking at

chunking depth D = 0 in a top-down fashion. We choose the top level from each dimension

and insert it into a set called the set of pivot levels PVT. Therefore initially, PVT = {LOCA-

TION: Continent, PRODUCT: Category}. This set guides the chunking process at each step.

On each dimension, we define discrete ranges of grain-level members, denoted in the figure

as [a..b], where a and b are grain-level order-codes. Each such range is defined as the set

of members with the same parent (member) in the corresponding pivot level. Due to the im-

posed ordering, these members will have consecutive order codes, thus, we can talk about

"ranges" of grain-level members on each dimension. For example, if we take member 0 of

pivot level Continent of the LOCATION dimension, then the corresponding range at the grain

level is cities [0..5] (see Figure 2).

The definition of such a range for each dimension defines a chunk. For example, the chunk

defined from the 0, 0 members of the pivot levels Continent and Category respectively, con-

sists of the following grain data (LOCATION:0.[0-1].[0-3].[0-5], PRODUCT:0.[0-1].P.[0-

3]). The '[]' notation denotes a range of members. This chunk appears shaded in Figure 3 at D

= 0. Ultimately at D = 0 we have a chunk for each possible combination between the mem-

bers of the pivot levels, that is a total of [0-1]×[0-2] = 6 chunks in this example.

Next we proceed at D = 1, with PVT = {LOCATION: Country, PRODUCT: Type} and we recur-

sively chunk each chunk of depth D = 0. This time we define ranges within the previously

defined ranges. For example, on the range corresponding to Continent member 0 that we cre-

ated before, we define discrete ranges corresponding to each country of this continent (i.e., to

each member of the Country level, which has parent 0). In Figure 3, at D = 1, shaded boxes

correspond to all the chunks resulting from the chunking of the chunk mentioned in the pre-

vious paragraph.

Similarly, we proceed the chunking by descending in parallel all dimension hierarchies and

at each depth D we create new chunks within the existing ones. The total number of chunks

created at each depth D, denoted by #chunks(D), equals the number of possible combinations

between the members of the pivot levels. That is:

#chunks(D) = card(pivot_level_dim1)x …x card(pivot_level_dimN)

where card(pivot_level_dimX) denotes the cardinality of a pivot level of a dimension X. We

assume N dimensions for the cube.

If at a particular depth one (or more) pivot-level is a pseudo-level, then this level does not

31


take part in the chunking. This means that we don't define any new ranges within the previ-

ously defined range for the specific dimension(s) but instead we keep the old one with no fur-

ther chunking. In our example this occurs at D = 2 for the PRODUCT dimension. In the case of a

pseudo-level in a dimension, in the above formula we use the latest non-pseudo pivot level

from a previous step, for this dimension.

Category Type ItemBooks0

Literature0.0

“Murderess”, A. Papadiamantis0.0.0

“Karamazof brothers” F.Dostoiewsky0.0.1

Philosophy0.1

“Zarathustra”, F. W. Nietzsche0.1.2

“Symposium”, Plato0.1.3

Music1

Classical1.2

“The Vivaldi Album SpecialEdition”1.2.4

“Mozart: The Magic Flute”1.2.5

Continent Country Region CityEurope0

Greece0.0

Greece -North0.0.0

Salonica0.0.0.0

Greece- South0.0.1

Athens0.0.1.1

Rhodes0.0.1.2

U.K.0.1

U.K.- North0.1.2

Glasgow0.1.2.3

U.K.- South0.1.3

London0.1.3.4Cardiff0.1.3.5

North America1

USA1.2

USA - East1.2.4

New York1.2.4.6Boston1.2.4.7

USA - West1.2.5

Los Angeles1.2.5.8

San Francisco1.2.5.9

USA - North1.2.6

Seattle1.2.6.10

Asia2

Japan2.3

Kiusiu2.3.7

Nagasaki2.3.7.11

Hondo2.3.8

Tokyo2.3.8.12Yokohama2.3.8.13

Kioto2.3.8.14

India2.4

India - East2.4.9

Calcutta2.4.9.15New Delhi2.4.9.16

India - West2.4.10

Surat2.4.10.17

Bombay2.4.10.18

PRODUCT

Category

Type

Item

LOCATION

Continent

Country

Region

City

PRODUCT LOCATION

Figure 2: Dimension members of our 2-dimensional example cube.

The procedure ends when the next levels to include in the pivot set are the grain levels. Then

we do not need to perform any further chunking because the chunks that would be produced

from such a chunking would be the cells of the cube. In this case, we have reached the so-

called maximum chunking depth DMAX. In our example, chunking stops at D = 2 and the

maximum depth is D = 3. Notice the shaded chunks in Figure 3 depicting chunks belonging

in the same chunk hierarchy.

Next we will discuss the mechanism for addressing a single chunk within this hierarchy of

chunks.

32


2.3 The Chunk-Tree Representation In order to address a chunk in the CUBE File, a unique identifier must be assigned to each

chunk (i.e., a chunk-id). For chunks to be made addressable via their chunk-id, we need some

sort of an internal directory that will guide us to the appropriate chunk. We have seen that the

hierarchical chunking method described previously results in chunks at different depths

(Figure 3). One idea would be to use the intermediate depth chunks as directory chunks that

will guide us to the DMAX depth chunks containing the data and thus called data chunks. This

leads to a chunk-tree representation of the hierarchically chunked cube and is depicted in

Figure 5 for our example cube.

In Figure 4, we have expanded for our hierarchically chunked cube, the chunk-subtree corre-

sponding to the family of chunks that has been shaded in Figure 3. The topmost chunk is

called the root-chunk. We can see the directory chunks containing “pointer” entries that lead

to larger depth directory chunks and finally to data chunks. Pseudo-levels are marked with

“P” and the corresponding directory chunks have reduced dimensionality (i.e., one dimen-

sional in this case).

[0] [1..2]

[2..3]

[0..1]

[0..2] [3..5]

[4..5]

[0..3]

[6..10][0..5]

[0..5]

[0..18]

Cube

LOCATION

PRODUCT

[11..18]

D = 0

[4..5]

[6..10] [11 - 14] [15-18]

D = 1

[0..1]

[2..3]

[4..5]

[3] [4..5] [6..7] [8..9] [10] [11][12..14][15..16][17..18]

D = 2

Figure 3: The cube from our running example hierarchically chunked.

33


If we interleave the member-codes of the pivot level members that define a chunk, then we

get a code that we call chunk-id. This is a unique identifier for a chunk within a CUBE File.

Moreover, this identifier depicts the whole path in the chunk hierarchy of a chunk. In Figure

4, we note the corresponding chunk-id above each chunk. The root chunk does not have a

chunk-id because it represents the whole cube and chunk-ids essentially denote subcubes.

1

3

Grain level(Data Chunks)

Root Chunk

P P

0 1 2 3

D = 0

D = 1LOCATION

PRODUCT

0 1 2

0

1

0

0|0.0|0 0|0.1|0

D = 2

0

0|0.0|0.0|P

0

1

1 2

0|0.0|0.1|P

0

10|0.1|0.2|P|P

0

1

4 5

0|0.1|0.3|P

0

1

0 1

0|0

P P

0 1 2 3

0|0.0|10|0.1|

1

30

0|0.0|1.0|P

2

3

1 2

2

30|0.1|1.2|P

2

3

4 5

0|0.1|1.3|P|P

2

3

D = 3 (Max Depth)0|0.0|1.1|P

Figure 4: The whole subtree up to the data chunks under chunk 0|0.

Note that the chunks of Figure 3 are presented as directory chunk cells in Figure 4. Let's look

at the previously defined chunk at D = 1 in Figure 3 from the pivot level members LOCA-

TION:0.0 and PRODUCT:0.1. This chunk in Figure 4 corresponds to the cell (0,1) of chunk with

chunk-id 0|0 at depth D = 1. Where we have assumed an interleaving order ord = (LOCATION,

PRODUCT) major-to-minor from left-to-right. Equivalently, it corresponds to the directory

chunk with chunk-id 0|0.0|1 at depth D = 2, with the “|” character acting as a dimension

separator. This id describes the fact that this is a chunk at depth D = 2 (see Figure 4) and it is

defined within chunk 0|0 at D = 1 (parent chunk). Note that with this scheme, we handle

chunks and cells in a completely uniform way in the sense that the cells of a chunk at depth D

= d represent the chunks at depth D = d+1. Therefore, the most detailed chunks in Figure 3

34


at depth D = 2 can be viewed either as the cells of the directory chunks at D = 2 in Figure 4,

or as the data chunks at D = 3. Each such data chunk contains measure values for City and

Item combinations of a specific Region and a specific product Type.

Similarly, the grain level cells of the cube (i.e., the cells that contain the measure values) also

have chunk-ids, since we can consider them as the smallest possible chunks. For instance, the

data cell with coordinates (LOCATION:0.0.0.0 and PRODUCT:0.1.P.2), can be assigned the

chunk-id 0|0.0|1.0|P.0|2 (see shaded data cell in the grain level in Figure 4). The part of a

chunk-id that is contained between consecutive dots and corresponds to a specific depth D is

called D-domain.

Next we will see how all these chunks can be stored into buckets provided by the underlying

bucket-based storage system.

35

Chapter 3: Building the CUBE File

3 Building the CUBE File

A ny physical organization of data must determine how the latter are distributed in disk

pages. A CUBE File physically organizes its data by allocating its chunks into a set of

buckets, which is the I/O transfer unit counterpart in our case. In this chapter, we will de-

scribe the chunk to bucket allocation strategy in CUBE File.

We will formally define the problem of allocating the chunks of a hierarchically chunked

cube into a set of buckets in order to achieve a maximum degree of hierarchical clustering.

To this end, first we provide a general metric for quantifying the hierarchical clustering factor

of a specific allocation. Then we formalize the chunk to bucket allocation as an optimization

problem. We show that this problem is NP-Hard. Finally, we present a greedy algorithm

based on heuristics for solving this problem.

First, lets try to understand what are the objectives of such an allocation. As already stated the

primary goal is to achieve a high degree of hierarchical clustering. This statement, although

clear, it still could be interpreted by several different ways. What are the elements that can

guarantee that a specific hierarchical clustering scheme is “good”? We attempt to list some

next:

Elements of α “good” hierarchical clustering:

1. Efficient evaluation of queries containing restrictions on the dimension hierarchies.

2. Minimum storage cost.

3. High space utilization.

The most important goal of hierarchical clustering is to improve response time of queries

containing hierarchical restrictions. Therefore, the first element calls for a minimal I/O cost

(i.e., bucket reads) for the evaluation of such restrictions. Of course, the storage overhead

37


must be also minimized and therefore the second element requires a minimum number of al-

located buckets. Naturally, the best way to keep the storage cost low is to utilize the available

space as much as possible. Therefore the third element implies that the allocation must adapt

well to the data distribution, e.g., more buckets must be allocated to more densely populated

areas and fewer buckets for more sparse ones. Also, buckets must be filled almost to capacity

(i.e., imposing a high bucket occupancy threshold).

In the following, we propose a metric for measuring the hierarchical clustering factor of an

allocation of chunks into buckets. Then we proceed to formally define the chunk to bucket

allocation problem and finally present a greedy algorithm for solving this problem.

3.1 Measuring Hierarchical Clustering We believe that hierarchical clustering is the most important goal for a file organization for

OLAP cubes. However, the space of possible combinations of dimension hierarchy members

is huge. Hence, the problem of linear clustering has a search space that very fast explodes

combinatorially. To this end, we exploit the chunk-tree representation, resulting from the hi-

erarchical chunking of a cube, and deal with the problem of hierarchical clustering, as a prob-

lem of allocating chunks of the chunk-tree into disk buckets. Thus, we are not searching for a

linear clustering (i.e., for a total ordering of the cubes data points), on the contrary, we are

interested in the packing of chunks into buckets according to the criteria of good hierarchical

clustering posed above.

The intuitive explanation for the utilization of the chunk-tree for achieving hierarchical clus-

tering, lies in the fact that the chunk-tree is built essentially based solely on the hierarchies’

structure and content and not on some storage criteria (e.g., each node corresponding to a disk

page, etc.); as a result, it embodies all possible combinations of hierarchical values. For ex-

ample, the subtree hanging from the root-chunk in Figure 4, at the leaf level contains all the

sales figures corresponding to the continent “Europe” (order code 0) and to the product cate-

gory “Books” (order code 0) and any possible combinations of the children members of the

two. Therefore, each subtree in the chunk-tree corresponds to a “hierarchical family” of val-

ues and thus reduces the search space significantly. In the following we will regard as a stor-

age unit the bucket. In this section, we attempt to define a metric for evaluating the degree of

hierarchical clustering of different storage schemes in a quantitative way

To move the discussion one step further, we have defined above as a crucial element of the

quality of hierarchical clustering, the efficient evaluation of queries containing restrictions on 38


the dimension hierarchies. Let us try to define the most usual restrictions appearing in OLAP

queries. We call them hierarchical prefix path restrictions (HPP restrictions) and are the

most common type of restrictions in typical OLAP queries, basically because the core of

analysis is conducted along the hierarchies. Note that an HPP restriction, apart from a stand-

alone query, more often appears as a part of a larger query (typically in the WHERE clause of

an SQL statement requiring grouping and aggregations on the cube data). An HPP restriction

describes the data set that must be retrieved from the most detailed data in order to compute a

final consolidated result. Therefore HPP restrictions lie at the heart of any ad hoc OLAP

query.

Definition 1 (Hierarchical Prefix Path Restriction)

We define a hierarchical prefix path restriction (HPP restriction) on a hierarchy H of a di-

mension D, to be a set of equality restrictions linked by conjunctions on H’s levels that form a

path in H, which always includes the topmost (most aggregated) level of H.

For example, if we consider a dimension LOCATION consisting of a 3-level hierarchy (Re-

gion/Area/City) and a DATE dimension with a 3-level hierarchy (Year/Month/Day), then the

query “show me sales for area A in region B for each month of 1999” contains two whole-

path restrictions, one for the dimension LOCATION and one for DATE: (a) LOCA-

TION.region = ‘A’ AND LOCATION.area = ‘B’, and (b) DATE.year = 1999.

Consequently, we can now define the class of HPP queries:

Definition 2 (Hierarchical Prefix Path Query)

We call a query Q on a cube C a hierarchical prefix path query (HPP query), if and only if all

the restrictions imposed by Q on the dimensions of C are HPP restrictions, which are linked

together by conjunctions.

In the previous example, the query was an HPP query because its restrictions were of the

form: (LOCATION.region = ‘A’ AND LOCATION.area = ‘B’) AND DATE.year = 1999, i.e.,

two HPP restrictions on dimensions LOCATION and DATE respectively, linked by a logical

AND. Note that it is not necessary to have HPP restrictions on all the dimensions. However, it

is necessary whatever restrictions do appear, to be HPP restrictions for the definition to hold.

39


Now, let’s assume a hierarchically chunked cube CB of N dimensions, represented by a

chunk-tree CT of a maximum chunking depth DMAX. Further assume that all the dimensions

consist of a hierarchy of the form: (h11, h21 … hk11) for the 1st dimension, (h12, h22 …

hk22) for the 2nd dimension and so forth, with the hierarchical levels appearing from the most

aggregated to the most detailed from left to right. At any chunk CH of depth d (where 0 ≤ d

≤ DMAX) a cell c is represented by the following N-tuple: c = (o1, o2 … oN), where oi

(1 ≤ i ≤ N) is the corresponding order-code on dimension i. With the notation (*, o2

… oN) we denote all the cells of the chunk, which lie along the hyper-plane, which is parallel

to the 1st dimension and perpendicular to the others. Similarly, we can specify the cells on

any other hyper-plane on the chunk. Finally with (*,*…*) we denote all the cells of the

chunk.

The restrictions of an HPP query Q on CB can be represented as follows:

(h11 = c11 AND h21 = c21 … hp1 = cp1) AND … (h1N = c1N AND h2N = c2N

… hqN = cqN), where the cij’s are constant values.

Therefore for an HPP-query Q, at each chunking depth d (where 0 ≤ d ≤ DMAX), which

corresponds to the hierarchical levels hd+11, hd+12 … hd+1N, there might appear an equality

restriction on a specific hierarchical level or not. If so, then along this dimension we only

need to access those cells whose order-code qualifies the restriction, otherwise all the cells

along this dimension must be accessed. Therefore, for each chunking depth of CT we can

construct in a straightforward way an “access pattern” using the HPP restrictions of Q and the

cell notation described above. For example, the following HPP restrictions:

(h11 = c11) AND (h12 = c12 AND h22 = c22) AND … (h1N = c1N AND h2N =

c2N)

would have an access pattern that could be represented like this:

Depth 0, (c11, c12, … c1N)

Depth 1, (*,c22, … c2N)

Depth 2, (*, *, … *)

…

Depth DMAX, (*, *, … *)

Intuitively, in order to evaluate the HPP restrictions on the chunk-tree CT, at each depth we

have to follow the pointers specified by the corresponding pattern, down to the chunks of

40


greater depth, until we reach the data chunks (at DMAX), where the desired data reside.

Clearly, a hierarchical clustering scheme that respects the quality element of efficient evalua-

tion of HPP queries that we have posed in the beginning of this chapter, must ensure that the

access of the subtrees hanging under a specific chunk must be done with a minimal number

of bucket reads. Intuitively, one can say that if we could store whole subtrees in each bucket

(instead of single chunks), then this would result to a better hierarchical clustering since all

the restrictions on the specific subtree, as well as on any of its children subtrees, would be

evaluated with a single bucket I/O. For example, the subtree hanging from the root-chunk in

Figure 4, as mentioned earlier, at the leaf level contains all the sales figures corresponding to

the continent “Europe” (order code 0) and to the product category “Books” (order code 0).

By storing this tree into a single bucket, we can answer all queries containing hierarchical

restrictions on the combination “Books” and “Europe” and on any children-members of these

two, with just a single disk I/O.

Therefore, each subtree in this chunk-tree corresponds to a “hierarchical family” of values.

Moreover, the smaller is the chunking depth of this subtree the more value combinations it

embodies. Intuitively, we can say that the hierarchical clustering degree of a storage organi-

zation for a cube could be assessed by the degree of storing low-depth whole chunk subtrees

into each storage unit. Next, we exploit this intuitive criterion to define the hierarchical clus-

tering degree of a bucket (HCDB). We begin with a number of auxiliary definitions:

Definition 3 (Bucket-Region)

Assume a hierarchically chunked cube represented by a chunk-tree CT of a maximum chunk-

ing depth DMAX. A group of chunk-trees of the same depth having a common parent node,

which are stored in the same bucket, comprises a bucket-region.

Definition 4 (Region contribution of a tree stored in a bucket – cr)


ing depth DMAX. We define as the region contribution cr of a tree t of depth d that is stored in

a bucket B, to be the total number of trees in the bucket-region that this tree belongs divided

by the total number of trees of the same depth in the total chunk-tree CT in general. This is

then multiplied by a bucket-region proximity factor rP, which expresses the proximity of the

trees of a bucket-region in the multidimensional space.

41


Pr rCTdtreeNumBdtreeNumc ⋅≡

),(),(

Where

treeNum(d, B) : total number of subtrees in B of depth d,

treeNum(d, CT) : total number of subtrees in CT of depth d and

rP: bucket-region proximity (0 < rP ≤ 1).

Definition 5 (Depth contribution of a tree stored in a bucket – cd)


ing depth DMAX. We define as the depth contribution cd of a tree t of depth d that is stored in a

bucket B, to be the ratio of d to DMAX.

MAXd D

dc ≡

Next, we provide the definition for the hierarchical clustering degree of a bucket:

Definition 6 (Hierarchical Clustering Degree of a Bucket – HCDB)


ing depth DMAX. For a bucket B containing T whole subtrees {t1, t2 … tT} of chunking depths

{d1, d2 … dT} respectively, where none of these subtrees is a subtree of another, we define as

the Hierarchical Clustering Degree of bucket B to be the ratio of the sum of the region contri-

bution of each tree ti (1≤ i ≤T) included in B to the sum of the depth contribution of each tree ti

(1≤ i ≤T), multiplied by the bucket occupancy OB, where 0 ≤ OB ≤ 1.

BT

i

id

T

i

ir

B Oc

cHCD ⋅≡

∑

∑

=

=

1

1 (1)

Where is the region contribution of tree tirc i, and c is the depth contribution of tree ti

d i

(1 ≤ i ≤ T).

In this definition, we have assumed that the chunking depth di of a chunk-tree ti is equal to the

chunking depth of the root-chunk of this tree. Of course we assume that a normalization of 42


the depth values has taken place, so as the depth of the chunk-tree CT to be 1 instead of 0, in

order to avoid having zero depths in the denominator of equation (1). Furthermore, data

chunks are considered as chunk-trees with a depth equal to the maximum chunking depth of

the cube. Note that directory chunks stored in a bucket not as part of a subtree but isolated,

have a zero region contribution; therefore, buckets that contain only such directory chunks

have a zero degree of hierarchical clustering.

From equation (1), we can see that the more subtrees, instead of single chunks, are included

in a bucket the greater the hierarchical clustering degree of the bucket becomes, because

more HPP restrictions can be evaluated solely with this bucket. Also the highest these trees

are (i.e., the smaller their chunking depth is) the greater the hierarchical degree of the bucket

becomes, since more combinations of hierarchical attributes are “covered” by this bucket.

Moreover, the more trees of the same depth and hanging under the same parent node, we

have stored in a bucket, the greater becomes the hierarchical clustering degree of the bucket,

since we include more combinations of the same path in the hierarchy.

The region contribution cri of each tree ti (1 ≤ i ≤ T) to the hierarchical clustering degree of

the bucket denotes the percentage of trees at a specific depth that a bucket-region covers.

Therefore, the greater this percentage is, the greater the hierarchical clustering degree for the

corresponding bucket becomes, since more combinations of the hierarchy members will be

clustered in the same bucket. This calls for large bucket-regions of low depth trees, because

in low depths the total number of CT subtrees is small. Notice also that the region contribu-

tion includes a bucket-region proximity factor rP and as we have noted it expresses the prox-

imity of the trees of a bucket-region in the multidimensional space. We will see in more de-

tail the effects of this factor and its definition in a following subsection, where we will dis-

cuss the formation of the bucket-regions.

The depth contribution cdi of each tree ti (0 ≤ i ≤ 1) to the hierarchical clustering degree of the

bucket expresses the proportion between the depth of the tree and the maximum chunking

depth. The less this ratio becomes (i.e., the lower is the depth of the tree), the greater becomes

the hierarchical clustering degree for the corresponding bucket. Intuitively, the depth contri-

bution expresses the percentage of the number of nodes in the path from the root-chunk to the

bucket in question and thus the less it is the less is the I/O cost to access this bucket. Alterna-

tively, we could substitute the depth value from the nominator of the depth contribution with

the number of buckets in the path from the root-chunk to the bucket in question (with the lat-

ter included). 43


All in all, the HCDB metric favors the following storage choices for a bucket:

• Whole trees instead of single chunks or other data partitions.

• Smaller depth trees instead of greater depth ones.

• Tree regions instead of single trees.

• Regions with a few low-depth trees instead of ones with more trees of greater depth.

• Regions with trees of the same depth that are close in the multidimensional space in-

stead of dispersed trees.

• Buckets with a high occupancy.

We prove the following theorem regarding the maximum value of the hierarchical clustering

degree of a bucket:

Theorem 1: Theorem of maximum hierarchical clustering degree of a bucket


ing depth DMAX, which has been allocated to a set of buckets. Then, for any such bucket B

holds that:

MAXB DHCD ≤

Proof:

From the definition of the region contribution of a tree appearing in Definition 4, we can eas-

ily deduce that:

1≤irc (I)

This means that the following holds:

TcT

i

ir ≤∑

=1 (II)

In (II) T stands for the number of trees stored in B. Similarly, from the definition of the depth

contribution of a tree appearing in Definition 5, we can easily deduce that:

MAX

id D

c 1≥ (III)

since, the smallest possible depth value is 1. This means that the following holds:

MAX

T

i

id D

Tc ≥∑=1

(IV)

From (II), (IV), equation (1) and assuming that B is filled to its capacity (i.e., OB equals 1) 44


the theorem is proved.

It is easy to see that the maximum degree of hierarchical clustering of a bucket B is achieved

only in the ideal case, where we store the chunk-tree CT that represents the whole cube in B

and CT fits exactly in B1. In this case, all of our primary goals for a good hierarchical cluster-

ing, posed in the beginning of this chapter, such as the efficient evaluation of HPP queries,

the low storage cost and the high space utilization are achieved. This is because all possible

HPP restrictions can be evaluated with a single bucket read (one I/O operation) and the

achieved space utilization is maximal (full bucket) with a minimal storage cost (just one

bucket). Moreover, it is now clear that the hierarchical clustering degree of a bucket signifies

to what extend the chunk-tree representing the cube has been “packed” into the specific

bucket and this is measured in terms of the chunking depth of the tree.

By trying to create buckets with a high HCDB we can guarantee that our allocation respects

these elements of good hierarchical clustering. Furthermore, it is now straightforward to de-

fine a metric for measuring the overall degree of hierarchical clustering achieved by any

chunk to bucket allocation. Even more interestingly, this metric can be used for measuring

the degree of hierarchical clustering of any storage scheme that clusters data hierarchically

based on path-based surrogate keys (see §5.4 for a summary of such methods). This can be

achieved if sufficient information is provided pertaining to the allocation of data into disk

pages.

Definition 7 (Hierarchical Clustering Factor of a Physical Organization for a Cube – fHC)

For a physical organization that stores the data of a cube into a set of NB buckets, we define

as the hierarchical clustering factor fHC, the percent of hierarchical clustering achieved by

this storage organization, as this results from the hierarchical clustering degree of each indi-

vidual bucket divided by the total number of buckets and we write:

%1001 ⋅⋅

≡∑

MAXB

N

B

HC DN

HCDf

B

(2)

Note that NB is the total number of buckets used in order to store the cube; however only the 1 Indeed, a bucket with HCDB = DMAX would mean that the depth contribution of each tree in this bucket should be equal to 1/DMAX (according to the inequality (III)); however this is only possible for the whole chunk-tree CT, since this only has a depth equal to 1. 45


buckets that contain at least one whole chunk-tree have a non-zero HCDB value. Therefore,

allocations that spend more buckets for storing subtrees have a higher hierarchical clustering

factor than others, which favor e.g., single directory chunk allocations. From equation (2), is

clear that even if we have two different allocations of a cube that result to the same total

HCDB of individual buckets, the one that occupies the smaller number of buckets will have

the greater fHC, rewarding this way the allocations that use the available space more conserva-

tively.

Another way of viewing the fHC is as the average HCDB for all the buckets divided by the

maximum chunking depth. It is now clear that it expresses the percentage of the extent by

which the chunk-tree representing the whole cube has been “packed” into the set of the NB

buckets. Directly from Theorem 1, follows that this factor is maximized (i.e., equals 100%),

if and only if we store the whole cube (i.e., the chunk-tree CT) into a single bucket, which

corresponds to a perfect hierarchical clustering for a cube.

3.2 The HPP Chunk-to-Bucket Allocation Problem In this subsection, we exploit the hierarchical clustering factor that we described previously,

for measuring the hierarchical clustering ability of a storage organization, in order to define

the chunk to bucket allocation problem as an optimization problem. Later on, in a subsection,

we prove also that this problem is NP-Hard. We provide a greedy algorithm based on heuris-

tics for solving this problem.

Definition 8 (The HPP Chunk-to-Bucket Allocation Problem)

For a hierarchically chunked cube C, represented by a chunk-tree CT with a maximum

chunking depth of DMAX, find an allocation of the chunks of CT into a set of fixed-size buckets

that corresponds a maximum hierarchical clustering factor fHC. As input we are given:

- The storage cost of CT and any of its subtrees t (function cost(t)),

- the number of subtrees per depth d in CT (function treeNum(d)),

- the bucket size SB and

- a bucket of special size (SROOT) consisting of β consecutive simple buckets, called root-

bucket ΒR, where SROOT = β·SB, with β ≥ 1 and

=

B

R

SBt )(cos

β . Essentially, BR repre-

sents the set of buckets that contain no whole subtrees and thus have a zero HCDB.

46


The solution S that we provide for this problem is a set of K buckets, S = {B1, B2 … BK}, so

that each bucket contains at least one subtree of CT and a root-bucket BR that contains all the

rest part of CT (part with no whole subtrees). S must result in a maximum value for the fHC

factor for the given bucket size SB. Since the HCDB values of the buckets of the root-bucket

BR equal to zero (recall that they contain no whole subtrees), following from equation (2), the

fHC can be expressed as:

%100)(

1 ⋅⋅+

=∑

MAX

K

B

HC DK

HCDf

β (3)

From equation (3), is clear that the more buckets we allocate for the root-bucket (i.e., the

greater β becomes) the less will be the degree of hierarchical clustering achieved by our allo-

cation. Alternatively, if we consider caching the whole root-bucket in main memory (see fol-

lowing discussion in §3.2.3), then we could assume that β does not affect hierarchical cluster-

ing (since it does not introduce more bucket I/Os from the root-chunk to a simple bucket) and

could be zeroed.

Consequently, we have now the optimization problem of finding a chunk-to-bucket allocation

such that fHC is maximized. This problem is NP-Hard (see following Theorem 2). We proceed

next by providing a greedy algorithm based on heuristics for solving this problem. In Figure

5, we present the GreedyPutChunksIntoBuckets algorithm, which receives as input the root R

of a chunk-tree CT and the fixed size SB of a bucket. The output of this algorithm is a set of

buckets containing at least one whole chunk-tree, a directory chunk entry pointing at the root

chunk R and the root-bucket BR.

In each step the algorithm tries “greedily” to make an allocation decision that will maximize

the HCDB of the current bucket. For example, in lines 2 to 7 of Figure 5, the algorithm tries to

store the whole input tree in a single bucket thus aiming at a maximum degree of hierarchical

clustering for the corresponding bucket. If this fails, then it allocates the root R to the root-

bucket and tries to achieve a maximum HCDB by allocating the subtrees at the next depth,

i.e., the children of R (lines: 9-26).

47


0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38:

GreedyPutChunksIntoBuckets(R,SB) //Input: Root R of a chunk-tree CT, bucket size SB //Output: Updated R, list of allocated buckets BuckList, root // bucket BR, directory entry dirEnt pointing at R { List buckRegion // Bucket-region Candidates list IF (cost(CT) < SB){ Allocate new bucket Bn Store CT in Bn dirEnt = addressOf(R) RETURN } //R will be stored in the root-bucket BR IF (R is a directory chunk) { FOR EACH child subtree CTC of R { IF (CTC is empty){ Mark with empty tag corresponding R’s entry } IF (cost(CTC) ≤ SB){ //Insert CTc into list for bucket-region candidates buckRegion.push(CTC) } } IF(buckRegion != empty){ // Formulate the bucket-regions formBucketRegions(buckRegion, BuckList, R) } WHILE (there is a child CT : cost(CT ) > S ){ C C B

GreedyPutChunkIntoBuckets(root(CTC),SB) Update corresponding R entry for CTC } Store R in the root-bucket BR dirEnt = addressOf(R) } ELSE { //R is a data chunk and cost(R) > B Artificially chunk R, create 2-level chunk-tree CTA GreedyPutChunkIntoBuckets(root(CTA),SB) //storage of R will be taken cared of by previous call dirEnt = addressOf(root(CTA)) } RETURN }

Figure 5: A greedy algorithm for the HPP chunk-to-bucket allocation problem.

This essentially is achieved by including all direct children subtrees with size less than (or

equal to) the size of a bucket (SB) into a list of candidate trees for inclusion into bucket-

regions (buckRegion) (lines: 14-16). Then the routine formBucketRegions is called

upon this list and tries to include the corresponding trees in a minimum set of buckets, by

forming bucket-regions to be stored in each bucket, so as each one achieves the maximum

possible HCDB (lines: 19-22). We will come back to this routine and discuss how it solves

this problem in the next subsection. Finally, for the children subtrees of root R with size cost

greater than the size of a bucket, we recursively try to solve the corresponding HPP chunk-to-

48


bucket allocation subproblem for each one of them (lines: 23-26).

65

40 22

10

20

5

5 5

2

3

D = 1

D = 2

DMAX = 5

Figure 6: A chunk-tree to be allocated to buckets by the greedy algorithm.

Very important is also the fact that no space is allocated for empty subtrees (lines: 11-13);

only a special entry is inserted in the parent node to denote a NULL subtree. Therefore, the

allocation performed by the greedy algorithm adapts perfectly to the data distribution, coping

effectively with the native sparseness of the cube.

The recursive calls might lead us eventually all the way down to a data chunk (at depth

DMAX). Indeed, if the GreedyPutChunksIntoBuckets is called upon a root R, which is

a data chunk, then this means that we have come upon a data chunk with size greater than the

bucket size. This is called a large data chunk and a more detailed discussion on how to han-

dle them, will follow in a later subsection. For now it is enough to say that in order to resolve

the storage of such a chunk we extend the chunking further (with a technique called artificial

chunking) in order to transform the large data chunk into a 2-level chunk tree. Then, we solve

the HPP chunk-to-bucket subproblem for this tree (lines: 30-35). The termination of the algo-

rithm is guaranteed by the fact that each recursive call deals with a subproblem of a smaller

in size chunk-tree than the parent problem. Thus, the size of the input chunk-tree is continu-

ously reduced.

Assume the chunk-tree of DMAX = 5 of Figure 6. The numbers inside each node represent the

storage cost for the corresponding subtree, e.g., the whole chunk-tree has a cost of 65 units.

For a bucket size SB = 30 units the greedy algorithm yields a hierarchical clustering factor fHC

= 7.2%. The corresponding allocation is depicted in Figure 7.

49


65

40 22

10

20

5 5

5

2

3

D = 1

DMAX = 5

D = 2

SB = 30

B1

B2

B3

Figure 7: The chunk-to-bucket allocation for SB = 30.

The solution comprises three buckets B1, B2 and B3, depicted as rectangles in the figure. The

bucket with the highest clustering degree (HCDB) is B3, because it includes the lowest depth

tree. The chunks not included in a rectangle will be stored in the root-bucket. In this case, the

root-bucket consists of only a single bucket (i.e., β = 1 and K = 3, see equation (3)), since this

suffices for storing the corresponding two chunks.

3.2.1 Formation of Bucket-Regions We have seen that in each step of the greedy algorithm for solving the HPP Chunk-to-bucket

allocation problem (corresponding to an input chunk-tree with a root node at a specific

chunking depth), we try to store all the sibling trees hanging from this root to a set of buckets,

forming this way “groups” of trees to be stored in each bucket that we call bucket-regions.

The formation of bucket-regions is essentially a special case of the HPP Chunk-to-bucket al-

location problem and can be described as follows:

Definition 9 (The bucket-region formation problem)

We are given a set of N chunk trees T1, T2, … TN, of the same chunking depth d. Each tree Ti

(1 ≤ i ≤ N) has a size: cost(Ti) ≤ SB, where SB is the bucket size. The problem is to store these

trees into a set of buckets, so that the hierarchical clustering factor fHC of this allocation is

maximized.

Since all the trees are of the same depth, the depth contribution cdi (1 ≤ i ≤ N), defined in

equation (1), is the same for all trees. Therefore, in order to maximize the degree of the hier-

50


archical clustering HCDB for each individual bucket (and thus increase also the hierarchical

clustering factor fHC), we have to maximize the region contribution cri (1 ≤ i ≤ N) of each tree

(equation (1)). This happens when we create bucket-regions with as many trees as possible on

the one hand and, due to the region proximity factor rP, when the trees of each region are as

close as possible in the multidimensional space, on the other. Finally, according to the fHC

definition the number of buckets used must be the smallest possible. Summarizing, in the

bucket-region formation problem we seek a set of buckets to store the input trees, in order for

the following three criteria to be fulfilled:

1. The bucket-regions (i.e., each bucket) contain as many trees as possible.

2. The trees of a region are as close in the multidimensional space as possible.

3. The total number of buckets is minimum.

One could observe that if we focused only on the third criterion, i.e., on the goal of minimiz-

ing the number of buckets used, then the bucket-region formation problem would be trans-

formed to a typical bin-packing problem, which is a well-known NP-complete problem

[Wei95]. This essentially proves the following theorem.

Theorem 2: Complexity of the HPP chunk-to-bucket allocation problem

The HPP Chunk-to-Bucket allocation problem is NP-Hard.

Proof

If we restrict the HPP Chunk-to-Bucket allocation problem to the case of allocating the

chunks of a number of trees hanging from the same root (i.e., having the same depth) and

each one having a size less than, or equal to the size of a bucket and also each tree would be

allocated as a whole in each bucket, then the problem becomes a bucket-region formation

problem. If the latter is further restricted to the case, where the only goal would be to mini-

mize the total number of buckets used, then we have a typical bin-packing problem, which is

NP-complete. Therefore, any problem in NP can be reduced in polynomial-time to the HPP

Chunk-to-Bucket and thus the latter is NP-Hard.

The space proximity of the trees of a region is meaningful only when we have dimension

domains with inherent orderings. Typical example is the TIME dimension. For example, we

might have trees corresponding to the months of the same year (which guarantees hierarchi-

cal proximity) but we would also like the consecutive months to be in the same region (space

proximity). This is because these dimensions are the best candidates for expressing range 51


predicates (e.g., months from FEB99 to AUG99). Otherwise, when there isn’t such an inher-

ent ordering, e.g., a chunk might point to trees corresponding to products of the same cate-

gory along the PRODUCT dimension, space proximity is not important and therefore all re-

gions with the same number of trees are of equal value. In this case the corresponding predi-

cates are typically set inclusion predicates (e.g., products IN {“Literature”, “Philosophy”,

“Science”}) and not range predicates, so hierarchical proximity alone suffices to ensure a

low I/O cost. To measure the space proximity of the trees in a bucket-region we use the re-

gion proximity rP, which we define as follows:

Definition 10 (Region Proximity rP)

We define the region proximity rP of a bucket-region R defined in a multidimensional space

S, where all dimensions of S have an inherent ordering, as the relative distance of the aver-

age Euclidian distance between all trees of the region R from the longest distance in S:

MAX

MAXAVGP d

ddr

−≡

In the case where no dimension of the cube has an inherent ordering, then we assume that the

average distance for any region is zero and thus the region proximity rP equals with one. For

example, in Figure 8 we depict two different bucket-regions R1 and R2. The surrounding

chunk represents the subcube corresponding to the months of a specific year and the types of

a specific product category. Since, only the TIME dimension, among the two, includes an in-

herent ordering of its members, the data space, as long as the region proximity is concerned,

is specified by TIME only. By a simple substitution of the corresponding values in definition

4.7, we find that the region proximity for R1 equals 0.9, while for R2 equals 0.5. This is be-

cause the trees of the latter are more dispersed along the time dimension. Therefore region R1

exhibits a better space proximity than R2.

In order to tackle the region formation problem we propose an algorithm called FormBuck-

Regions. This algorithm is a variation of an approximation algorithm called best-fit [Wei95]

for solving the bin-packing problem. Best-fit is a greedy algorithm that does not find always

the optimal solution, however it runs in P-time (also can be implemented to run in NlogN, N

being the number of trees in the input), and provides solutions that are far from the optimal

solution within a certain bound. Actually, the best-fit solution in the worst case is never more

than roughly 1.7 times worse than the optimal solution [Wei95]. Moreover, our algorithm ex-

52


ploits a space- filling curve [Sag94] in order to visit the trees in a space-proximity preserving

way. We describe it next:

Months in Year 1999

Type

s in

Cat

egor

y “B

ooks

”

0 1 2 3 4 5 6 7 8 9 10 11

Literature

Philosophy

Computers

ScienceFiction R1

R2

Figure 8: The region proximity for two bucket-regions: rP1 > rP2.

FormBuckRegions

Traverse the input set of trees along a space-filling curve SFC on the data space defined by

the parent chunk. Each time you process a tree, insert it in the bucket that will yield the

maximum HCDB value, among the allocated buckets, after the insertion. On a tie, choose one

randomly. If no bucket can accommodate the current tree, then allocate a new bucket and

insert the tree in it.

Note that there is no linearization of multidimensional data points that preserves space prox-

imity 100% [GG98, Jag90]. In the case where no dimension has an inherent ordering the

space filling curve might be a simple row-wise traversal (see Figure 9). In this figure, we also

depict the corresponding bucket-regions that are formed.

We believe that a formation of bucket-regions that will provide an efficient clustering of

chunk-trees must be based on some query patterns. In the following we show an example of

such a query-pattern driven formation of bucket-regions.

A hierarchy level of a dimension can basically take part in an OLAP query in two ways: (a)

as a means of restriction (e.g., year = 2000), or (b) as a grouping attribute (e.g. “show me

sales grouped by month”). In the former, we ask for values on a hyper-plane of the cube per-

pendicular to the Time dimension at the restriction point, while in the latter we ask for values

53


on hyper-planes that are parallel to the Time dimension. In other words, if we know for a di-

mension level that it is going to be used by the queries more often as a restriction attribute,

then we should try to create regions perpendicular to this dimension. Similarly, if we know

that a level is going to be used more often as a grouping attribute, then we should opt for re-

gions that are parallel to this dimension. Unfortunately, things are not so simple, because if,

for example, we have two “restriction levels” from two different dimensions, then the re-

quirement for vertical regions to the corresponding dimensions is contradictory.

Figure 9: A row-wise traversal of the input trees.

In Figure 10, we depict a bucket-region formation that is driven by the table appearing in the

figure. In this table we note for each dimension level corresponding to a chunking depth,

from our example cube in Figure 2, whether it should be characterized as a restriction level or

as a grouping level. For instance, a user might know that 80% of the queries referencing level

continent will apply a restriction on it and only a 20% will use it as a grouping attribute,

thus this level will be characterized as a “restriction level”. Furthermore, in the column la-

beled “importance order”, we order the different levels of the same depth according to their

importance in the expected query load. For example, we might know that the category

level will appear much more often in queries than the continent level and so on.

In Figure 10, we also depict a representative chunk for each chunking depth (of course for the

topmost levels there is only one chunk, the root chunk), in order to show the formation of the

regions according to the table. The algorithm in Figure 11 describes how we can produce the

bucket-regions for all depths, when we have as input a table similar to the one appearing in

Figure 10.

54


In Figure 10, for the chunks corresponding to the levels country, type and city, item,

we also depict the column-major traversal method corresponding to the second part of the

algorithm. Note also that the term “fully-sized region” means a region that has a size greater

than the bucket occupancy threshold, i.e., it utilizes well the available bucket space. Finally,

whenever, we are at a depth where a pseudo level exists for a dimension, e.g., D = 2 for our

example, no regions are created for the pseudo level of course. As an aside, note that bucket-

region formation for chunks at the maximum chunking depth (as is the chunk in depth 3 in

Figure 10), is only required in the case, where the chunking is extended beyond the data-

chunk level. This is the case of large data chunks and is the topic of the next subsection.

LOCATION

PRO

DUC

T

Restriction Group By

continent

category

country

type

region

pseudo - -

ImportanceOrder

2

1

1

2

1

-

city

item

2

1

D = 0continent, category

D = 1country, type

D = 2region

D = 3city, item

Figure 10: Bucket-region formation based on query patterns

55


3.2.2 Storing Large Data Chunks In this subsection, we will discuss the case where the GreedyPutChunksIntoBuckets algo-

rithm (Figure 5) is called with input a chunk-tree that corresponds to a single data chunk.

This, as we have already explained, would be the result of a number of recursive calls to the

GreedyPutChunksIntoBuckets algorithm that led us to descend the chunk hierarchy and to

end up at a leaf node. Typically, this leaf node is large enough so as not to fit in a single

bucket, otherwise the recursive call upon this node would not have occurred in the first place

(Figure 5). 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30:

QueryDrivenFormBucketRegions //Input: query pattern table //Result: bucket-regions formed at all chunking depths { FOR EACH (global chunking depth value D) {

Pick first level in the importance order

LOOP { Try to create as many fully-sized regions that will fa-vor (i.e., being perpendicular or parallel to the level according to its characterization as a “restric-tive” or “grouping” attribute respectively) this level as it is possible. IF (there are more levels in the importance order AND there are more ungrouped chunk-trees to visit){

pick next level from the order } ELSE

exit from loop } IF (there are still ungrouped chunk-trees) {

Traverse the chunk in a row/column-major style with the first level in the importance order being the fastest (slowest) running attribute, if it is characterized as a grouping (restriction) attribute; then the second level in the importance order being the second fastest (slowest) running attribute, if it is characterized as a grouping (restriction) attribute and so on for all levels in the order, and try to “pack” in the same bucket as much trees as it is possible, until there are no more trees to visit.

} } }

Figure 11: A bucket-region formation algorithm that is driven by query patterns

The main idea for tackling this problem is to further continue the chunking process, albeit we

have fully used the existing dimension hierarchies, by imposing a normal grid. We call this

chunking artificial chunking in contrast to the hierarchical chunking presented in chapter 0.

This process transforms the initial large data chunk into a 2-level chunk-tree of size less than

or equal to the original data chunk. Then, we solve the HPP chunk-to-bucket allocation sub-

56


problem for this chunk-tree and therefore we once again call the GreedyPutChunksIntoBuck-

ets routine upon this tree.

In Figure 12, we depict an example of such a large data chunk. It consists of two dimensions

A and B. We assume that the maximum chunking depth is DMAX = K. Therefore, K will be the

depth of this chunk. Parallel to the dimensions, we depict the order codes of the dimension

members of this chunk that correspond to the most detailed level of each dimension. Also, we

denote their parent member on each dimension, i.e., the pivot level members (see §2.2) that

created this chunk. Notice that, the suffix of the chunk-id of this chunk consists of the con-

catenated order codes of the two pivot level members, i.e., 5|14.

In order to extend the chunking further, we need to insert a new level between the most de-

tailed members of each dimension and their parent. However, this level must be inserted “lo-

cally”, only for this specific chunk and not for all the grain level members of a dimension.

We want to avoid inserting another pseudo level in the whole level hierarchy of the dimen-

sion, because this would trigger the enlargement of all dimension hierarchies and would re-

sult in a lot of useless chunks. Therefore, it is essential that this new level remains local. To

this end, we introduce the notion of the local depth d of a chunk to characterize the artificial

chunking, similar to the global chunking depth D2 (introduced in §2.2) characterizing the hi-

erarchical chunking.

Definition 11 (Local Depth d)

The local depth d, where d ≥ -1, of a chunk Ch denotes the chunking depth of Ch pertaining

to artificial chunking. A local depth d = -1 denotes that no artificial chunking has been im-

posed on Ch. A value of d = 0 corresponds to the root of a chunk-tree by artificial chunking

and is always a directory chunk. The value of d increases by one for each artificial chunking

level.

Note that the global chunking depth D, while descending levels created by artificial chunk-

ing, remains constant and equal to the maximum global chunking depth of the cube (in gen-

eral, to the current global depth value); only the local depth increases.

Let us assume a bucket size SB that can accommodate a maximum of Mr directory chunk en-

tries, or a maximum of Me data chunk entries. In order to chunk a large data chunk Ch of N

dimensions by artificial chunking, we define a grid on it, consisting of mgi (1 ≤ i ≤ N) number

2 With the term “depth” we will refer, unless explicitly stated, to the global depth D. 57


of members per dimension, such as . This grid will correspond to a new direc-

tory chunk, pointing at the new chunks created from the artificial chunking of the original

large data chunk Ch and due to the aforementioned constraint it is guaranteed that it will fit in

a bucket. If we assume a normal grid, then for all i : 1 ≤ i ≤ N, it holds

r

N

i

ig Mm ≤∏

=1

Nr

ig M=m .

9 10 11 12 13 14 15 16

29

30

31

32

3

3

34

5

14

. . .DIMENSION A

. . .

DIM

ENSI

ON

B

Chunk ID = ... .5|14D = K (max depth)

Figure 12: Example of a large data chunk.

In particular, if ni (1 ≤ i ≤ N) corresponds to the number of members of the original chunk Ch

along the dimension i, then a new level consisting of mgi members will be inserted as a “par-

ent” level. In other words, a number of ci children (out of the ni) will be assigned to each of

the mgi members, where

≤ i

g

ii m

nc , as long as ci ≥ 1. If 0 < ci < 1, then the corresponding

new level will act as a pseudo level (see §2.2), i.e., no chunking will take place along this di-

mension. If all new levels correspond to pseudo levels, i.e., ni < mgi for all i : 1 ≤ i ≤ N, then

we take mgi = maximum(ni).

We will describe the above process with an example. Let us assume a bucket that can ac-

commodate a maximum of Mr = 10 directory chunk entries or a maximum of Me = 5 data

chunk entries. In this case the data chunk of Figure 12 is a large data chunk, since it cannot be

stored in a single bucket. Therefore, we define a grid with mg1, mg

2 number of members along

dimensions A and B respectively. If the grid is normal then mg1 = mg

2 = 10 = 3. Thus, we

create a directory chunk, which consists of 3x3 = 9 cells (i.e., directory chunk entries); this is

58


depicted in Figure 13.

0 1 2

29

30

31

32

3

3

34

5

14

. . .DIMENSION A

. . .

DIM

ENSI

ON

B

Chunk ID = ... .5|14

D = K (max depth)

9 10 11 12 13 14 15 16

d = 00

1

2

9 10 11

29

3

0

12 13 14

29

3

0

15 16

29

3

0

15 16

31

3

2

12 13 14

31

3

2

9 10 11

31

3

2

33

3

4

15 16

12 13 14

33

3

4

9 10 11

33

3

4

... .5|14.0|0 d = 1 ... .5|14.1|0 d = 1 ... .5|14.2|0 d = 1

... .5|14.2|1 d = 1

... .5|14.1|1 d = 1

... .5|14.2|2 d = 1

... .5|14.1|2 d = 1... .5|14.0|2 d = 1

... .5|14.0|1 d = 1

X

Figure 13: The example large data chunk artificially chunked.

In Figure 13, we can also see the new members on each dimension and the corresponding

parent-child relationships between the original members and the newly inserted ones. In this

case, each new member will have at most c1 ≤

38 = 3 children for dimension A and c2 ≤

36

= 2 children for dimension B respectively. The created directory chunk will have a global

depth D = K and a local depth d = 0. Around it, we depict all the data chunks (partitions of

the original data chunk) that correspond to each directory entry. Each such data chunk will

have a global depth D = K and a local depth d = 1. The chunk-ids of the new data chunks

include one more domain as a suffix, corresponding to the new chunking depth that they be-

long. Notice that from the artificial chunking process new empty chunks might arise. For ex-

ample see the rightmost chunk in the top of Figure 13. Since no space will be allocated for

such empty chunks, it is obvious that artificial chunking might lead to a minimization of the

size of the original data chunk; especially for sparse data chunks. This important characteris-

59


tic is stated with the following theorem, which shows that in the worst case the extra size

overhead of the resultant 2-level tree will be equal to the size of a single bucket. However,

since cubes are sparse, chunks will be also sparse and therefore practically the size of the tree

will always be smaller than that of the original chunk.

Theorem 3: Size upper bound for an artificially chunked large data chunk

For any large data chunk Ch of size SCh holds that the two-level chunk tree CT resulting from

the application of the artificial chunking process on Ch, will have a size SCT such as:

BChCT SSS +≤

where SB is the bucket size.

Proof

Assume a large data chunk Ch which is 100% full. Then from the application of artificial

chunking no empty chunks will be produced. Moreover, from the definition of chunking we

know that if we connect these chunks back together we will get Ch. Consequently, the total

size of these chunks is equal to SCh. Now, the root chunk of the new tree CT will have (by

definition) at most Mr entries, so as to fit in a single bucket. Therefore the extra size overhead

caused by the root is at most SB. From this we infer that SCT ≤ SCh + SB . Naturally if this

holds for the largest possible Ch it will certainly also hold for all other possible Ch’s that are

not 100% full and thus may result to empty chunks after the artificial chunking.

As soon as we create the 2-level chunk-tree, we have to solve the corresponding HPP chunk-

to-bucket allocation subproblem for this tree; i.e., we recursively call the GreedyPutChunk-

sIntoBucket algorithm, with input the root node of the new tree. The algorithm will then try to

store the whole chunk-tree in a bucket (which is possible because as explained above artifi-

cial chunking reduces the size of the original chunk for sparse data chunks), or create the ap-

propriate bucket-regions and store the root-node in the root-bucket (see Figure 5). Also it will

mark the empty directory entries. In Figure 13, we can see the formed region assuming that

the maximum number of data entries in a bucket is Me=5.

Finally, if still there exists a large data chunk that cannot fit by itself in a whole bucket, then

we repeat the whole procedure and thus create some new data chunks at local depth d = 2.

This procedure may continue until we finally store all parts of the original large data chunk.

60


3.2.3 Storage of the Root Directory

In the previous subsections we formally defined the HPP chunk-to-bucket allocation problem

(see Definition 8). From this definition we have seen that the root-bucket BR essentially rep-

resented the entire set of buckets that had a zero degree of hierarchical clustering HCDB, and

therefore, had no contribution to the hierarchical clustering achieved by a specific chunk-to-

bucket allocation. Moreover, due to the β factor in equation (3) (β was defined as the number

of fixed-size buckets in BR), it is clear that the larger the root-bucket becomes the worse hier-

archical clustering is achieved. In this subsection, we will present a method for improving the

hierarchical clustering contribution of the root-bucket by reducing the β factor, with the use

of a main memory cache area, and also by increasing the HCDB of the buckets in BR.

In Figure 14, we depict an example of a set of directory nodes that will be stored in the root-

bucket. These are all directory chunks and are rooted all the way up to the root chunk of the

whole cube. These chunks are of different global depths D and local depths d and form an

unbalanced chunk-tree that we call the root directory.

Definition 12 (The Root Directory RD )

The root directory RD of a hierarchically chunked cube C, represented by a chunk-tree CT, is

an unbalanced chunk-tree (i.e., the leaves are not at the same level) with the following prop-

erties:

1. The root of RD is the root node of CT.

2. For the set SR of the nodes of RD holds: , where SCTR SS ⊂ CT is the set of the nodes of

CT.

3. All the nodes of RD are directory chunks.

4. The leaves of the root directory contain entries that point to chunks stored in a different

bucket than their own.

5. RD is an empty tree iff the root node of CT is stored in the same bucket with its children

nodes.

In Figure 14, the empty cells correspond to subtrees that have been allocated to some bucket,

either on their own or with other subtrees (i.e., forming a bucket-region). We have omitted

these links from the figure in order to avoid cluttering the picture. Also note the symbol “X”

for cells pointing to an empty subtree. Beneath the dotted line we can see directory chunks

61


that have resulted from the artificial chunking process described in the previous subsection. Root Chunk

X

X X

X X

X

X X X

X

X

X

X

maximum D

X X

X

X

X

D = 0, d = -1

D = 1, d = -1

D = 2, d = -1

D = 3, d = -1

D = 4, d = 0

D = 4, d = 1

X

X

Figure 14: An example of a root directory.

The basic idea of the method that we will describe next is based on the simple heuristic that if

we impose hierarchical clustering to the root directory, as if it was a chunk-tree on its own,

the evaluation of HPP queries would be improved, because all the HPP queries need at some

point to access a node of the root directory. Moreover, since the root directory always con-

tains the root chunk of the whole chunk tree as well as certain higher level (i.e., lower depth)

directory chunk nodes, we could assume that these nodes are permanently resident in main

memory during a querying session on a cube. The latter is of course a common practice for

all index structures in databases [Ram98].

The algorithm that we propose for the storage of the root directory is called StoreRootDir. It

assumes that directory entries in the root directory pointing to already allocated subtrees

(empty cells in Figure 14) are treated as pointers to empty trees, in the sense that their storage

cost is not taken into account for the storage of the root directory. The algorithm receives as

an input the root directory RD, a cache area of size SM and a root-bucket BR of size SROOT =

62


β·SB, where β ≥ 1 and

=

B

M

SSβ (therefore MROOT SS ≅ ). We describe it in Figure 15.

0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21:

StoreRootDir(R,SM,SB) //Input: Root R of root directory, cache area size SM, // bucket size SB, root-bucket BR //Output: list of allocated buckets BuckList for // the root directory, root-bucket BR, { current node is R WHILE(BR can accommodate current node) { store current node in root-bucket BR current node = next node in breadth-first way } IF (we have stored all nodes of root directory){ //whole root directory in cache (i.e.,in root-bucket) RETURN BR } FOR each unallocated subtree ST { // solve subproblem, update BuckList GreedyPutChunksIntoBuckets(root(ST), SB) IF (the root directory of ST is not empty){ //make a recursive call with 0 cache StoreRootDir(root(ST),0,SB) } } RETURN (BR, BuckList) }

Figure 15: Recursive algorithm for storing the root directory.

In Figure 15, we begin from the root and visit in a breadth-first manner all nodes of RD (lines

1-5). Each node we visit, we store it in the root-bucket BR, until we find a node that can no

longer be accommodated. Then, for each of the remaining unallocated chunk subtrees of RD

we solve the corresponding HPP chunk-to-bucket subproblem (lines 10-13). For the storage

of the new root directories that might result from these subproblems, we use again the Store-

RootDir algorithm but with a zero cache area size this time (lines 15-18).

From the above description we can see that the proposed algorithm uses the root-bucket only

for storing the higher-level nodes that will be loaded in the cache. Therefore, the I/O over-

head due to the root-bucket during the evaluation of an HPP query is zeroed. Furthermore,

the chunk-to-bucket allocation solution of a cube is now augmented with an extra set of

buckets resulting from the solutions to the new subproblems from within StoreRootDir. The

hierarchical clustering degree HCDB (Definition 6) of these buckets is calculated based on the

input chunk-tree of the specific subproblem and not on the chunk-tree representing the whole

cube. In the case where the former is an unbalanced tree, the maximum chunking depth DMAX

is calculated from the longest path from the root to a leaf.

63


Notice that for each such subproblem a new root directory might arise. (In fact the only

chance for an empty root directory is the case where the whole chunk subtree, upon which

GreedyPutChunksIntoBuckets (Figure 5) is called, fits in a single bucket). Therefore, we

solve each of these subproblems by recursively using StoreRootDir, but this time with no

available cache area. This will make the StoreRootDir to recursively invoke the Greedy-

PutChunksIntoBuckets algorithm, until all chunks of a subtree are allocated to a bucket. Re-

call from §3.2 that the termination of the GreedyPutChunksIntoBuckets algorithm is guaran-

teed by the fact that each recursive call deals with a subproblem of a smaller in size chunk-

tree than the parent problem. Thus, the size of the input chunk-tree continuously reduces.

Consequently, this also guarantees the termination of StoreRootDir.

65

40 22

10

20

5 5

5

2

3

D = 1

DMAX = 5

D = 2

SB = 30

B1

B2

B3

Cache Area SM = 30

Figure 16: Resulting allocation of the running example cube for a bucket size SB = 30 and a cache area equal to a single bucket.

Note that the root directory is a very small fragment of the overall cube data space. Thus, it is

realistic to assume that in most cases we can store the whole root directory in the root-bucket

and load it entirely in the cache during querying. In this case, we can evaluate any point HPP

query with a single bucket I/O.

In the following we provide an upper bound for the size of the root directory. In order to

compute this upper bound, we use the full chunk-tree resulting from the hierarchical chunk-

ing of a cube. A guaranteed upper bound for the size of the root directory could be the size of

64


all the possible directory chunks of this tree. However, the root directory of the CUBE File is

a significantly smaller version of the whole directory tree for the following reasons: (a) it

does not contain all directory chunk nodes, only the ones that were not stored in a bucket

along with their descendants, (b) space is not allocated for empty subtrees and (c) chunks are

stored in a compressed form, not wasting space for empty entries.

Lemma 3.1

For any cube C consisting of N dimensions, where each dimension has a hierarchy repre-

sented by a complete K-level m-way tree, the size of the root directory in terms of the number

of directory entries is Ο )( )2( −KNm .

Proof

Since the root directory is always (by its definition) smaller in size from the tree containing

all the possible directory chunks (called directory tree), then we can write: the size of the root

directory is O(size of directory tree). The size of the directory tree can be very easily com-

puted by the following series, which adds the number of all possible directory entries:

Size of directory tree = 1 = NKNN mmm )2(2 ... −++++ )( )2( −KNmO

Next we provide a theorem that proves an upper bound for the ratio between the size of the

root directory and that of the full most-detailed data space of a cube.

Theorem 4: Upper bound of the size ratio between the root directory and the cube’s data space

For any cube C consisting of N dimensions, where each dimension has a hierarchy repre-

sented by a complete K-level m-way tree, the ratio of the root directory size to the full size of

C’s detailed data space (i.e., the Cartesian product of the cardinalities of the most detailed

levels for all dimensions) is )1( NmΟ .

Proof

From the above lemma we have that the size of the root directory is . Similarly we

can prove that the size of the C’s most detailed data space is . Therefore, the ratio

)( )2( −Ο KNm

))1( −K(Ο Nm

size spacedata detailed most cubesizedirectory root = )1()( )1(

)2(

NKN

KN

mmm

Ο=−

−

Ο .

65


Theorem 4, proves that as dimensionality increases the ratio of the root directory size to the

full cube size at the most detailed level, exponentially decreases. Therefore, as N increases,

the root directory size becomes very fast negligible compared to the cube’s data space.

65

40 2210

8

5

5 5

2

3

D = 1

D = 2

DMAX = 5

D = 3

20

9

D = 4

B1

SB = 10

B2

B3

B4

B5

Cache Area SM = 10

B6

B7

Figure 17: Resulting allocation of the running example cube for a bucket size SB = 10 and a cache area

equal to a single bucket.

If we go back to the allocated cube in Figure 7, and assume a cache area of size equal to a

single bucket, then the StoreRootDir algorithm will store the whole root directory in the root-

bucket. In other words, the root directory can be fully accommodated in the cache area and

therefore from equation (3), for K = 3 and β = 0 (since the root-bucket will be loaded into

memory the β factor is zeroed) we get an improved hierarchical clustering degree fHC = 9.6%.

The new allocation is depicted in Figure 16. Notice that any point query can be answered

now with a single bucket I/O.

If for the cube of our running example we assume a bucket size of SB = 10, then the chunk to

bucket allocation resulting from the GreedyPutChunksIntoBuckets and the subsequent call to

StoreRootDir is depicted in Figure 17. In this case, we have once more assumed a cache area

equal to a single bucket. In the figure, we can see the upper nodes allocated to the cache area

(i.e., stored in the root-bucket), in a breadth-first way. The buckets B1 to B5 have resulted

from the initial call to GreedyPutchunksIntoBuckets. Buckets B6 and B7 store the rest nodes of

the root directory that could not be accommodated in the cache area and are a result of the 66


call to the StoreRootDir algorithm. Finally, in Figure 18, we present the corresponding

allocation for a zero cache area.

65

40

22

10

8

5

5 5

2

3

D = 1

D = 2

DMAX = 5

D = 3

20

9

D = 4

B1

SB = 10

B2

B3

B4

B5

Cache Area SM = 0

B6

B7

B8

Figure 18: Resulting allocation of the running example cube for a bucket size SB = 10 and a zero cache

area.

67

Chapter 4: CUBE File Operations

4 CUBE File Operations

I n this chapter we discuss the operations provided by the CUBE File organization. We

distinguish these operations to two main categories: (a) data navigation operations and

(b) maintenance operations. For the former we present a set of basic operations for navigating

in a CUBE File organization. Moreover, these operations can be used in order to define more

complex access, or processing operations enabling added-value functionality. In addition, we

present a set of reorganization primitives that enable incremental updating of the CUBE File

and reduce query-down time. Then, we provide an algorithm for performing bulk incremental

updating. Finally, we discuss the operation of data purging, and other maintenance issues

such as the anticipation of updates along the Time dimension and the case of slowly changing

dimensions.

4.1 CUBE File primary data navigation operations The data space of a cube is defined by the set of its dimensions - thus it is multidimensional –

and by the hierarchies in each dimension – thus it is also multi-level. A data point in this

space can be any of the following two:

a) A point at the grain level of the cube, corresponding to a single measure value, or

b) a point at a higher level (i.e., with at least one non-grain dimension level), correspond-

ing to a set of measure values.

Note that in our definition of the cube, where no aggregates are included within the cube data

space, the case where one or more dimensions are coalesced (i.e., aggregation takes place

along the full range of their value domain) is not considered to correspond to a data point.

This is simply regarded as an aggregation operation on the cube data points.

In the CUBE File each data point is assigned a unique identifier called a chunk-id (see §2.3).

69


Moreover, we have seen that with the chunk-tree representation of the cube the cube data

space is expressed naturally. For example, each time we are at a specific directory entry

pointing to some subtree, we are essentially pointing at the set of measure values correspond-

ing to this data point.

The CUBE File offers a set of basic operations for navigation in the multidimensional space

of a hierarchically chunked cube. These fundamental operations can be used for the definition

of more elaborate data access and query processing constructs. The CUBE File navigation

operations provide the means for traversing the chunk-tree of a hierarchically chunked cube

and implement the notion of the current position in the CUBE File.

Definition 13 (The Current Position in the CUBE File - CP)

The current position CP in the CUBE File, organizing the data of a hierarchically chunked

cube C, represented by a chunk-tree CT, simulates a reading head that can be positioned

over a single non-empty cell each time, in any of the chunks of CT and is defined as the fol-

lowing quadruple:

( )IdClChBCP ,,,≡

where, B is the current bucket, Ch is the current chunk within B, Cl is the current cell within

Ch and Id is the unique chunk-id characterizing the current cell.

The primary goal of all the navigation operations is to move the current position from an ini-

tial state CPi to a final one CPf and return the entry at the latter. Notice that the current posi-

tion cannot be positioned over an empty cell (either a directory chunk entry, or a data chunk

entry). When a cube is initially “opened” for access, the CP is set to the first non-empty cell

of the root chunk, which usually resides in the root-bucket BR (see §3.2.3).

A major aspect of the navigation in a CUBE File includes the movement along the chunk hi-

erarchy. In order to efficiently handle this case we introduce the concept of the current drill-

ing path.

Definition 14 (The Current Drilling Path in the CUBE File - CDP )

The current drilling path CDP in the CUBE File, organizing the data of a hierarchically

chunked cube C, represented by a chunk-tree CT, is an ordered set of positions in the CUBE

File P1, P2 … PK, such as:

1. For positions Pi = (Bi, Chi, Cli, Idi) and Pi+1 = (Bi+1, Chi+1, Cli+1, Idi+1), where 1 ≤ i ≤

K-1, holds that Chi is the parent chunk of Chi+1 in CT. 70


2. If CP = (B, Ch, Cl, Id) is the current position in the CUBE File, then for position PK =

(BK, ChK, ClK, IdK), holds that ChK is the parent chunk of Ch, in CT.

From this definition we can infer that the length of the current drilling path cannot exceed the

maximum chunking depth of the chunk-tree CT. A straightforward, and also efficient imple-

mentation of the CDP construct would be with a simple in-memory stack. Each time we

“drill-down” from an initial position CPi to a final position CPf child of the former, we exe-

cute a push(CPi) to insert the new parent position in the stack. Conversely, when we “roll-up”

from CPi to its parent CPf we execute a pop() to remove the most recent parent position from

the stack, i.e., CPf.

The current drilling path provides an efficient means for up and down movements in the

chunk tree basically for two reasons: (a) it enables efficient upward movements since the

chunk-tree does not include pointer entries to parent nodes and (b) at any time the buckets of

the positions in the CDP are already fetched into main memory.

The primary data navigation operations in a CUBE File are: drill_down, roll_up, get_next

and move_to. We describe them in the following subsections.

0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16:

drill_down() //Input: current position CP = (B,Ch,Cl,Id) in the CUBE File, // current drilling path CDP. //Output: Entry at final position { IF(current position is a data chunk){ Set error flag; return; // fail to drill-down! } Get address of target position IF(target bucket != current bucket) { Fetch target bucket into memory } ELSE { Use current bucket as the target bucket } Access target chunk within target bucket Access first non-empty cell within target chunk Push current position into the Current Drilling Path Make target position the CP RETURN entry at CP }

Figure 19: The drill_down operation.

4.1.1 drill_down drill_down gives us the ability to descend the chunk hierarchy in a chunk-tree representation

of a cube. We "drill-down" to the child chunk node and set the current position to the first

71


non-empty cell of this chunk, in the order of physical storage. A drill_down issued from a

current position in a data chunk fails and the current position remains unchanged.

The implementation of drill_down is really straightforward (see Figure 19). Essentially, what

is needed is to read the directory chunk entry corresponding to the current position and access

the corresponding child-chunk. However, due to the chunk-to-bucket allocation that favors

the storage of whole subtrees in the same bucket (see §3.2), it is very likely that the target

chunk resides within the current bucket. In this case, no new bucket I/O is necessary (lines 5-

10 in Figure 19). As soon as the target bucket is accessed, it is a simple matter of accessing

the appropriate chunk (whose id is contained in the source cell’s entry). The cell of the new

position will be the first non-empty cell in this chunk (lines 11-12).

The chunk-id of the target position can be easily created from the concatenation of the current

chunk-id and the D-domain constructed from the coordinates of the current cell. This also

applies even when the current position is in the root chunk, if we assume the empty string “”

as the corresponding chunk-id (recall from §2.3 that the root chunk does not have a chunk-

id).

Finally, prior to making the target position the new current position in the CUBE File, we

push into the current drilling path the initial current position (lines 13-14). 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16:

roll_up() //Input: current position CP = (B,Ch,Cl,Id) in the CUBE File, // current drilling path CDP. //Output: Entry at final position { IF(current position is the root chunk){ Set error flag; return; // fail to roll-up! } target position = Pop latest position from CDP Get address of target position IF(target bucket != current bucket) { Access target bucket from in memory buckets of CDP } ELSE { Use current bucket as the target bucket } Access target chunk within target bucket Access target cell within target chunk Make target position the CP RETURN entry at CP }

Figure 20: The roll_up operation.

4.1.2 roll_up With roll_up we ascend one level to the parent cell of the current position. A roll_up issued

from the root chunk of the cube fails and the current position remains unchanged. Since the

72


chunk-tree does not provide pointers to parent nodes, an efficient implementation (see Figure

20) of this operation makes extensive use of the current drilling path CDP construct.

In particular, in order to roll-up, we first need to pop the latest position from the CDP (line

4). Then, all we need to do is to set this as the new current position. Again due to the alloca-

tion of chunks into buckets, it is very likely that the target bucket coincides with the current

bucket. However, even if this is not the case, since the buckets in the CDP have already been

fetched into memory anyway, there will be no new bucket reading for this operation (lines: 6-

11). 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29:

get_next() //Input: current position CP = (B,Ch,Cl,Id) in the CUBE File, //Output: Entry at the next position, flag indicating that a // chunk boundary has been reached. { access next cell in physical ordering in current chunk IF (succeeded){ IF(this is the last non-empty cell in current chunk){ Set chunk-boundary-reached flag } make new cell the CP RETURN entry at CP } INT levels_up = 0 // counter of levels up WHILE (there is no next non-empty cell in current chunk && and current chunk is not the root chunk){ roll_up() levels_up++ } IF (current position is root chunk && there is no next non-empty cell to access) { Set flag “end of cells in current depth” RETURN; } access next cell in physical ordering in current chunk // descend to original level but now at sibling chunk WHILE(levels_up != 0){ drill_down() levels_up-- } IF(CP is the last non-empty cell in current chunk) Set chunk-boundary-reached flag RETURN entry at CP }

Figure 21: The get_next operation.

4.1.3 get_next The get_next operation provides an enumeration facility of the cells at a specific depth in the

data space of the hierarchically chunked cube. In particular, get_next enables an enumeration

of the cells at a specific depth in the order of physical storage. A call to this operation will

place the current position at the next non-empty cell within the current chunk. When we reach

73


the end of the current chunk we access the next sibling chunk (i.e., of the same depth) accord-

ing to the physical order of the parent cells. This continues until all chunks of the same depth

are visited. If we issue get_next from the last non-empty cell at the current depth, then it fails

and the current position remains unchanged.

In the context of the current chunk, the access of the next cell is a simple matter of sequential

in-memory reading (lines 1-8 of Figure 21). However, when we are at the end of the current

chunk, then a roll_up is introduced, in order to climb up to the parent node, then a get_next to

access the next parent cell and finally a drill_down to get to the next sibling chunk. The lat-

ter might cause an extra I/O for reading the corresponding bucket, if the two sibling chunks

do not reside in the same bucket. Similarly, if we also reach the end of the parent chunk, we

issue a second roll_up and so on up to the root chunk. Then we access the next sibling entry

with a get_next and start descending to the original level with consecutive drill_down’s (lines

10-25). This process may continue until all the cells of the root chunk are visited. The imple-

mentation in Figure 21 also sets a flag on, whenever we reach a current position, which corre-

sponds to the last non-empty cell within the current chunk. This way we can know whenever

we have reached at a chunk boundary.

4.1.4 move_to The primary goal of this operation is to provide an easy way to navigate in the hierarchy en-

abled multidimensional space, exploiting the chunk-id identifier that we have proposed (see

§2.3). It is the CUBE File counterpart of a random positioning operation in a serial file of

bytes. In particular, this method receives as input a chunk-id corresponding to a specific data

point CPf in our data space that we would like to set as the new current position. The move_to

implements the transition from the current position CPi to the final one CPf. If the target posi-

tion corresponds to an empty cell, or to a non-existent cell, then move_to fails and the current

position remains unchanged. If no input chunk-id is specified the move_to by default sets the

current position to the first non-empty cell of the root chunk.

An implementation of move_to is presented in Figure 22. Initially, we separate the target

chunk-id to a prefix part (prfx) that is common with the current position’s chunk-id and a

non-common suffix part (sffx) (line 5). This prefix corresponds either to a smaller depth

than the current depth or is the same with the current depth. In the former case we roll-up to

the depth of the chunk with id prfx (lines 8-12). Note that the common prefix part might be

empty, which simply means that we have to roll-up all the way up to the root chunk. After

74


we have reached the depth corresponding to prfx, we access the cell (within the current

chunk) corresponding to the first D-domain of sffx (lines 14-15). If there are more domains

in sffx, then this means that the target position is at a greater depth and thus we have to fur-

ther drill-down, accessing at each level the corresponding cell (lines 22-32). If any of these

cells is empty, then a flag is set and move_to returns without changing the current position.

Similarly, if a cell cannot be accessed at all, then this means that the target_id dis not corre-

spond to an existing cell and the operation aborts without affecting the CP. Finally, we have

reached the target position and set it as the new current position and return the corresponding

entry (lines 34-35). 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36:

move_to(target_id) //Input: target chunk-id, // current position CP = (B,Ch,Cl,Id) in the CUBE File, //Output: Entry at the target position { IF (target_id is empty) { Make target the first non-empty cell of the root chunk } Separate target chunk-id to a common part (prfx) and a non common part (sffx) IF(prfx is at smaller depth than CP){ WHILE(not at depth of prfx){ roll_up() } } // Now CP corresponds to prfx isolate from sffx the first domain dom access cell corresponding to dom IF(cell does not exist) Error in target_id, RETURN IF(cell is empty) { Set empty flag RETURN } WHILE(there are more domains in sffx){ drill_down() isolate from sffx the first domain dom access cell corresponding to dom IF(cell does not exist) Error in target_id, RETURN IF(cell is empty) { Set empty flag RETURN } } // Now we have reached the target cell make new cell the CP RETURN entry at CP }

Figure 22: The move_to operation.

75


4.1.5 Defining Access Operations In this subsection we will demonstrate an example of how one can exploit the CUBE File

data navigation primitives, in order to define more elaborate access or processing operations.

In fact, this process can lead to the definition of value-added operations, i.e., to operations

that are built on top of simpler ones and provide more functionality. 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32:

Chunk-Tree evaluator class CTEval { public: bool open(cid) bool next(value) bool close() private: targetID } // ----------------------------------------------------- bool CTEval::open(cid){ targetID = cid IF(move_to(cid) == error) RETURN FALSE while(drill_down() != error) {;} RETURN TRUE; } // ----------------------------------------------------- bool CTEval::next(value){ value = read() IF(get_next() == error){ //end of data RETURN FALSE } IF(chunk-id at CP has not targetID as a prefix){ // End of result data set range reached RETURN FALSE } RETURN TRUE } // ------------------------------------------------------ bool CTEval::close(){ free up allocated resources e.g., CDP RETURN TRUE }

Figure 23: Example of the definition of a value-added operation.

In our example we will define an HPP-query (see §3.1) evaluation operation. We will restrict

our presentation to a subset of HPP-queries, for the sake of clarity of the presentation. In the

second part of this thesis, which deals with query processing, the evaluation of more compli-

cated queries will be discussed.

Our target group of HPP-queries in this example consists of queries, where all HPP-

restrictions correspond to the same depth in the chunk-tree representation of a cube. In other

words, the input to this operation will be a chunk-id, corresponding to a subtree of the hierar-

chically chunked cube and the output is the data values stored at the data chunks of this sub-

76


tree. This evaluation operation will be defined via an iterator interface [Gra93]. Essentially

this means, that it will provide an interface consisting of three methods: open, next, and close.

open performs the necessary initialization tasks, next provides a means of iteration over the

returned result dataset and close is responsible for any “cleaning up” tasks (see Figure 23

lines 0-7). All methods return TRUE on success or FALSE on failure.

The open method receives as input the chunk-id corresponding to the root of the subtree

whose values we want to retrieve. It then moves the current position at the first non-empty

cell, at the leaf level (i.e., data chunk level) of this subtree (lines 9-14). The next method re-

turns the value at the current position (line 17) and sets the current position to the next sibling

cell by a call to operation get_next. If this call fail, then this means that we have reached the

end of the values at the leaf level; so it returns FALSE (lines 18-21). If this is not the case,

then we have to check whether the current position is corresponds to a value included by the

input subtree. This can be achieved by checking if the chunk-id at the current position is pre-

fixed by the input chunk-id. If not, then we have reached the end of the desired dataset and

next returns FALSE (lines 22- 25). On all other cases it returns TRUE indicating that there

are still more result values to read. Finally, close performs the necessary “cleaning-up” by

releasing any reserved resources.

4.2 CUBE File Maintenance Operations In the previous chapter, we have described the initial loading of the CUBE File. We have

seen that the main concern of this structure is to appropriately cluster the data so that typical

OLAP queries based on the dimension hierarchies are answered faster. Moreover, the intrin-

sic sparseness of the cube was considered and so no space is allocated for empty regions, de-

fined via combination of members of the dimension hierarchies. Finally, the issue of good

space utilization was taken into account and therefore chunk-trees are packed into the same

bucket, forming bucket-regions and achieving a high percentage of bucket occupancy.

We have clearly stated from the beginning that the CUBE File is a structure designed for

OLAP and therefore there is not a need for supporting ad hoc insert/delete/update operations.

However, in a data warehousing/OLAP environment there is indeed a need for updates. The

difference is that these updates occur at regular periods and are executed in a batch form. For

example, the advent of the new data at the end of the day will trigger a bulk loading proce-

dure, in order to insert these data in the cube. It is common knowledge that this kind of main-

tenance operation is one of the most critical issues in data warehousing environments.

77


We believe that an efficient bulk updating procedure of a cube stored in a data warehouse

must be done incrementally. This sets the need for local reorganizations of the stored cube

and not overall restructuring that would impose a significant time cost. Moreover provision

should be taken for restricting cube unavailability, i.e., query-down time. To this end, the

CUBE File organization supports bulk incremental updating of a cube. This is the basic

maintenance operation. Furthermore, since a cube essentially records the history of some en-

terprise measurable entity, deletions are quite seldom. However, there is a need for archiving

the oldest parts of the cube and adding new ones. This operation is called “data purging” and

is efficiently supported by the CUBE File. For example, such an operation could remove

from the cube all data corresponding to the oldest year, in order to include a new year in the

cube’s data space. Root Chunk

maximum D

D = 0, d = -1

D = 1, d = -1

D = 2, d = -1

D = 3, d = -1

D = 4, d = 0

D = 4, d = 1

M

Figure 24: The insertion of a new data point M at the data chunk level.

In this section, we begin our discussion with the presentation of the CUBE File reorganiza-

tion primitives. Then, we discuss the basic maintenance operations supported, i.e., bulk in-

cremental updating and data purging.

78


4.2.1 Primary Reorganization Operations An insertion of a data point at the data chunk level in a CUBE File might trigger the insertion

of a whole chunk subtree. This is so because the CUBE File does not allocate space for empty

trees; therefore, if a data point of such a tree appears, then the corresponding subtree must be

constructed, in order a subsequent navigation in the CUBE File to be able to lead us to it.

The depth of the root of this new subtree depends on the point that it will be “hanged” from.

This will be an empty cell at the closest parent-chunk in the path from the data chunk, where

the insertion took place, to the root chunk. This is depicted in Figure 24, where the insertion

of a new measure value M at the data chunk level, might trigger the creation of any of the

chunks in the path from the chunk containing M to the root chunk. Of course any such newly

created chunk will have its other cell entries empty.

The new chunk-tree will be stored in an existing bucket, or a new bucket depending on its

size and on where its direct parent chunk is stored. Basically we distinguish three different

cases when adding a new subtree TN under a parent node P:

1) P is stored in a bucket containing one or more whole subtrees (i.e., subtrees that in-

clude data chunks).

2) P is a chunk of the root directory, i.e., is stored in a bucket including only directory

chunks (see §3.2.3).

3) P is stored in a bucket hosting a single data chunk.

In the first case, when a new tree TN arrives (note that TN can also be a single data chunk or

even a single data chunk entry), we try to store it in the same bucket where its parent P re-

sides. If TN can be accommodated in this bucket, then the update completes there. Only a sin-

gle bucket is affected, leading to a minimum of reorganization and allowing other parts of the

cube to be queried freely. If the bucket containing P overflows, then there exist two different

subcases to consider:

(a) the overflow of a bucket containing a single tree and

(b) the overflow of a bucket containing a region of trees (i.e., a bucket-region, see

§3.2.1).

In the former subcase we use the reorganization primitive split_tree. This is described next:

split_tree

Whenever a bucket B containing a single whole chunk-tree T overflows, then we remove the

root node R of T from B and try to allocate anew the subtrees pointed-to by R to a new set of

79


buckets.

This operation can be easily implemented by calling the GreedyPutChunksIntoBuckets algo-

rithm (see §3.2) upon chunk-tree T. This will try to store T in a single bucket and will fail;

then, it will allocate R to the root-bucket and try to form bucket-regions. For all these subtrees

not fitting in any region the algorithm will be recursively invoked. In Figure 25(a), the inser-

tion of TN into bucket B1, under node P, causes an overflow of the latter (the bucket size

noted as SB). The split_tree reorganization primitive will reallocate the subtrees under the

root of the whole tree in B1 into two new buckets B2 and B3, releasing B1 (Figure 25(b)) Of

course the corresponding entries in the root must be updated appropriately to point to the new

location of their children nodes. The root now has been added to the root directory. If one of

the two subtrees under the root could not also fit in a bucket, then the corresponding root

node would be added to the root directory as well and the chunk-to-bucket allocation would

continue for its own subtrees. This addition of chunks to the root directory may result in an

upward propagation of the reorganization and we will discuss it later in this subsection.

65

40 22

10

20

5 5

5

2

3

D = d

DMAX = d+K

D = d+1

SB = 70B1

20

TN

85

60 22

10

20

5 5

5

2

3

D = d

DMAX = d+K

D = d+1

SB = 70

B2

20

TN

B3

(a) (b)

P P

Figure 25: The split_tree reorganization primitive. (a) The insertion of TN overflows bucket B1. (b) split_tree reallocates the subtrees under the root to two new buckets B1 and B2.

The second subcase arises when a bucket containing a region of trees overflows. In order to

deal with this situation, we use a reorganization primitive called split_region:

split_region

Whenever a bucket B containing a bucket-region consisting of the whole chunk-trees Ti (i =

1,…K) overflows, then we access the root R of the region, which resides in a different bucket

B’ and try to allocate anew the subtrees of this region to a new set of buckets.

80


The fact that the root of a region resides in a separate bucket than the one that holds the re-

gion, results directly from the GreedyPutChunksIntoBuckets algorithm. The implementation

of the split_region operation requires that we invoke the formBucketRegions routine (see

§3.2.1) for the trees of the region containing the updated tree. If this tree’s size, after the up-

date, becomes less than or equal with the bucket size SB, then formBucketRegions will form

anew two bucket-regions from the trees of the original region. If however, the tree, where the

update took place, obtains a size larger than SB, then we invoke the GreedyPutChunksInto-

Buckets algorithm upon this tree, in order to allocate separately the chunks of the updated tree

(advancing to a greater depth) and store the rest of the trees of the original region in a sepa-

rate bucket.

In Figure 26(a), we insert a new tree TN, of size ST, under a tree of a bucket-region consisting

of three trees, stored in bucket B1. For ST = 20, we can see in Figure 26(b) that split_region

has split the initial region into two smaller ones and has allocated them into buckets B2 and B3

respectively. For ST = 65 (Figure 26(c)), the updated tree of the initial region becomes so

large that can no longer be accommodated in a single bucket. Therefore, the Greedy-

PutChunksIntoBuckets algorithm is invoked upon it and its subtrees are allocated to a new

bucket B2 (forming this way a new bucket-region at a greater depth than the original region),

while its root is added to the root directory. The latter might again cause an upward propaga-

tion of the reorganization, which we will discuss next. In addition, a smaller region is created

and stored in another new bucket B3.

Furthermore, note that the rest of the subtrees, not belonging to the region, where the inser-

tion took place and hanging from the root of the tree in Figure 26(a), are not affected by the

split_region reorganization. The local scope of split_region, that affects only the “updated”

region and not others, enables other parts of the cube to be available for querying during the

update. However, this may result to a “not-optimal” allocation of chunks into buckets, i.e., to

buckets with a reduced hierarchical clustering degree (see §3.1). If high cube availability is

not our primary concern, then we can re-form all the regions hanging from the root and not

only the region, where the update took place. This way, we increase the chances of obtaining

greater hierarchical clustering degrees for the created buckets.

The second case of adding a new subtree TN under a parent node P arises when P is a node of

the root directory (see Definition 12). This case typically occurs, when one or more chunks

81


are inserted into the root-bucket after a split_tree, or a split_region. Therefore, in this case TN

is not a whole tree, i.e., it does not include data chunks, but is a “root directory subtree”. Re-

call from §3.2.3 that the root directory is partly stored in the so-called root-bucket, with size

equal to the available main memory cache area and partly into simple buckets. Whenever, a

tree TN is inserted under a node P of the root directory, then we try to store TN in the same

bucket as P. If TN can be accommodated, then the updating completes. If the bucket of P

overflows (either this is a simple bucket or the root-bucket), then we use the expand_root_dir

reorganization primitive, which is described next:

150

40

10

20

5

D = d

DMAX = d+K

D = d+1

SB = 100

B1

ST

TN

(a)

(c)

30

.

.

.

30

215

105

10

20

5

D = d

DMAX = d+K

D = d+1

SB = 100

B3

65TN

30

.

.

.

30B2

170

60

10

20

5

DMAX = d+K

D = d+1

SB = 100

B2

20

TN

(b)

30

.

.

.

B3

30

D = d

P

P P

Figure 26: The split_region reorganization primitive. (a) The insertion of TN overflows bucket B1. (b) split_region splits the initial region into two smaller ones. (c) split_region creates a smaller region

(bucket B3) and reallocates the updated tree of the initial region to a separate bucket B2.

expand_root_dir

Whenever a bucket B containing a part of the root directory overflows, due to the insertion of

a subtree TN, then we try to allocate the chunks of TN to a new set of buckets by invoking al-

82


gorithm StoreRootDir with a zero cache input (see §3.2.3).

In Figure 27(a), we try to add subtree TN, consisting of two directory chunks, under a node P

in a bucket B1 containing a part of the root directory. However, B1 cannot accommodate the

insertion and therefore we invoke StoreRootDir on TN, which finally allocates both of its

nodes into a new bucket B2 (Figure 27(b)).

10

5

SB = 10010

15 10

10

B1

5

20

10

20 P

10

5

SB = 10010

15 10

10

B1

5

20

10

20 P

B2

(b)(a)

TN TN

Figure 27: The expand_root_dir reorganization primitive. (a) The insertion of TN overflows bucket B1.

(b) expand_root_dir allocates TN into a new bucket B2.

Finally, the last case of adding a new subtree TN under a parent node P arises whenever we

insert a new data point (i.e., trivial case of a subtree TN) into a data chunk P stored in a bucket

on its own. If this bucket overflows after the insertion, then chunk P becomes a large data

chunk and therefore we exploit the method for storing large data chunks described in §3.2.2.

This reorganization primitive is named split_chunk:

split_chunk

Whenever a bucket B containing only a single data chunk P overflows, due to the insertion of

a data point TN into P, then we artificially chunk P, creating a 2-level chunk-tree of size less

than or equal to the original data chunk. Then, we solve the HPP chunk-to-bucket allocation

subproblem for this chunk-tree and therefore we invoke the GreedyPutChunksIntoBuckets

algorithm upon this tree.

Recapitulating, we have described four reorganization primitives for the CUBE File structure,

namely: split_tree, split_region, expand_root_dir, and split_chunk. All these cause a local

reorganization and do not affect the whole structure. This way, they enable an incremental

83


updating of the cube and at the same time allow parts of the cube to be queried. Whenever an

insertion causes an overflow of an existing bucket, then some local reorganization is required.

In the worst case this might lead to an increase of the buckets in the path from the insertion

point (at the data chunk level) to the root chunk. Note however that this occurs only for the

specific path. Moreover, the local depth of certain chunks might increase (due to artificial

chunking), while the global depth remains constant at all times.

D =

0

D =

1

D =

M

. . .. . .

. . .

. . .

New data pointssorted by their

chunk id

a partition

update CUBE File

Figure 28: A schematic representation of the incremental updating process.

4.2.2 Bulk Incremental Updating After the initial building and loading of the CUBE File, the corresponding cube goes through

a periodic bulk updating procedure. This should be an incremental procedure, in the sense

that only the new data will be loaded each time and only certain parts of the CUBE File

should be affected. In particular, we assume as input to this procedure, a set of data points

corresponding to the grain level. An example could be the sales of the day at the end of a day,

or the sales of the last week and so on. In general, these data points correspond to some

empty entries in the grain level of the CUBE File.

In Figure 28, we depict a schematic representation of the CUBE File incremental updating

process. We can see that the input to this process is a file containing only the new data points

that we want to include in the CUBE File, and is sorted in ascending order by their chunk-id.

The updating process partitions the input file, based on the chunk-id prefixes, to groups of

data points corresponding to the same chunk-tree, hanging from a specific depth D. Then, it

proceeds by updating the CUBE File, according to the data points included in each partition

in a depth-by-depth basis. This depth-by-depth process is in accord with the fact that different

84


input data points (at the most detailed level) might trigger updates at different depths in the

CUBE File (see Figure 24). 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26:

BulkIncrementalUpdating //Input: new data points sorted by chunk-id //Result: Updated CUBE File { Chunk_Queue.push(Root chunk) Partition_Queue.push(Input File) WHILE (Partition Queue is not empty) { // get next chunk from Chunk Queue Current Chunk = Chunk_Queue.pop() // get next partition from Partition Queue Current Partition = Partition_Queue.pop() Partition the Current Partition to K partitions corresponding to the cells of the Current Chunk. FOR (partition P = 1..K) DO { IF (partition corresponds to empty entry){ Update CUBE File by inserting a new chunk-tree (this might also be a single data chunk, or eventually a single data chunk entry, depending on the depth of the current chunk) } ELSE{ Chunk_Queue.push(child chnk corresponding to partition P) Partition_Queue.push(P) } } } }

Figure 29: The bulk incremental updating algorithm for the CUBE File.

In Figure 28, we depict the initial partitioning of the input file to a number of partitions corre-

sponding to the cells of the root chunk (D = 0); i.e., the data points in each partition have

chunk-ids with the same first D-domain (see §2.3). Then, if any such partition of data points,

corresponds to an empty cell of the root chunk, then we update the CUBE File by inserting

the appropriate subtree (containing all data points of this partition) under this cell. When this

update completes, we can discard the specific partition. For all the other partitions of data

points that correspond to non-empty cells of the root chunk, the process must be repeated. In

other words, we re-partition each partition from the previous step to a set of smaller parti-

tions, corresponding to the cells at the next depth (D = 1); i.e., we partition the data points of

each original partition based on the second D-domain of each chunk-id. Then, we once more

insert the partitions (i.e., we add the respective subtrees) corresponding to the empty cells at

this depth. This procedure continues until we reach the maximum chunking depth DMAX,

where we insert simple data points (i.e., trivial case of subtrees) at the empty cells of the ex-

isting data chunks. In Figure 29, we present the bulk incremental updating algorithm for the

85


CUBE File. The algorithm depicts the iterative steps in each depth, where each time, we par-

tition the remaining data points and insert the new chunk-trees when necessary.

Root Chunk

0 1 2

01

TIME

AB

C

Dold new

Root Chunk

1 2 3

01

TIME

AB

D

C

new

(a) (b) Figure 30: (a) A typical data purging operation along the TIME dimension, (b) The reorganization re-

quires only an update on the root chunk, without accessing the rest chunk-trees in the CUBE File.

4.2.3 Data Purging In this section we deal with the issue of data purging. This term refers to the removal of older

parts of the cube, at regular periods, probably archiving it in some tertiary storage device, and

adding new ones corresponding to current data. A typical example is the removal of the old-

est year’s data at the beginning of a new fiscal year, in order to include the latest year in the

cube data space. In this subsection we will discuss, how we can handle data purging in the

CUBE File.

Technically speaking, data purging in CUBE File terms means to remove all chunk-trees

hanging from the root chunk corresponding to a specific member of a specific dimension and

to add a number of new chunk-trees, which are probably almost empty, at the right end of the

dimension in question. In Figure 30, we depict an example of a two-dimensional root chunk

with the horizontal dimension being the TIME dimension. In Figure 30 (a), we show that we

want to purge from the cube the grayed slice corresponding to the oldest member in the high-

est level of the TIME dimension (e.g., year level) and insert a new slice (noted with dotted

lines) corresponding to the most recent member of the TIME dimension in this level.

The approach that we adopt performs this purging operation with a minimum of reorganiza-

tion overhead. This is depicted in Figure 30 (b). The only thing we have to do after removing

86


the grayed part is to copy all the entries in the root chunk at the right of the removed slice,

one position to the left along the TIME dimension; then we shift to the right the order code

range of the root chunk along the TIME dimension and make it from [0,2] to [1,3]. This

way, no updating is required for the chunks included in the A, B, C and D trees, since the or-

der code of their parent has not changed. Finally, we “hang” the new chunk-trees (in Figure

30 these are empty) at the end of the root chunk along the TIME dimension. We will allocate

new buckets for these new trees, instead of clustering them in some bucket-region, since we

anticipate future insertions of new data.

4.3 Other Maintenance Issues In this final section, we discuss some other maintenance issues of interest. The first subsec-

tion deals with the issue of update anticipation. In particular, we present some ideas regarding

the anticipation of updates on the Time dimension. In the second subsection, we discuss the

case where we have updates on the dimensions instead of directly on the cube. Of course,

these changes are propagated to the cube as well and thus appropriate actions must be taken.

More specific, we examine the case of slowly changing dimensions, where a change on a spe-

cific dimension member, will trigger the creation of a new version of this member, which is

to be stored along with the original one.

Chunk

T0 T1 T2 T3

X 0X 1

TIME

new

X0T0 X0T1 X0T2 X1T0 X1T1 X1T2

X0T3 X1T3

Ordering 1

T0X0 T0X1 T1X0 T1X1 T2X0 T2X1

T3X0

Ordering 2

T3X1

Figure 31: Alignment of chunk-cells in order to anticipate insertions along the Time dimension.

4.3.1 Anticipating Updates Along the Time Dimension In this subsection, we argue that in anticipation of incoming facts, we can choose not to in-

clude in a bucket-region a small-sized chunk-tree and store it on a bucket on its own, regard-

less of the poor bucket space utilization that this might cause.

To the best of our knowledge all OLAP cubes, no matter what is the application domain that 87


utilizes them, contain the “Time” dimension. This is quite reasonable, if we think that essen-

tially a cube records some short of event that occurs at different time points and which is also

described by some properties (the other dimensions). This is common for all types of busi-

nesses: retail, inventory, banking, insurance etc. [Kim96]. Since the facts stored in a cube are

aligned along the time axis, we introduce the notion of the current point in time:

Definition 15 (The Current Point in Time - CPT)

The current point in time CPT of a cube C is defined as the time-point in the Time dimension

up to which we have recorded facts in C. In other words, all data points of C that correspond

to time values greater than the CPT are empty. After each new fact insertion the CPT is up-

dated to reflect the new time-point.

Typically the values stored in the Time dimension (i.e., the domain of this dimension) will

exceed the CPT. The important thing here is that we know that there is going to be some bulk

loading operation in the near future that might fill in the empty entries corresponding to time

values beyond the CPT. Therefore, a reasonable thing to do regarding the formation of

bucket-regions, is to postpone creating bucket-regions for those chunks, whose order-code

range in the time dimension includes (as child in the most detailed level, e.g., day level) the

CPT. For example, if the CPT = “10 February 2001”, then we might exclude from the region

formation procedure all chunk-trees that correspond to the month February 2001 (the ones

corresponding to subsequent months will be empty anyway), as well as those corresponding

to the 1st quarter of 2001 and those that correspond to year 2001. Then, we can periodically

organize the CUBE File, by running the bucket-region creation procedure for all trees that

were excluded due to the CPT constraint at loading time but now qualify, and thus obtain a

more compact storage of the cube.

Of course the CPT construct can be generalized to other dimensions that we know that fre-

quent fact insertions are due along their domain, as well. In fact, we could maintain a differ-

ent CPT for each such dimension. In this case, the periodic reorganization would affect those

chunks whose order-code ranges on the corresponding dimensions have been shifted below

all the respective CPTs.

Another, simple but quite effective measure for anticipating updates along the Time dimen-

sion, has to do with the physical ordering of the cells of a chunk. If we choose to implement

the chunk construct in terms of a multidimensional array (see §12.5 for a justification on this

88


choice), then we can align the cells in such a way, so as to convert all future insertions along

the time dimension to append-only operations and avoid the overhead of moving cells. (Of

course, in the CUBE File, since a chunk is always confined within a bucket, this overhead

would not correspond to extra I/O cost but only to CPU cost). In Figure 31, we depict an ex-

ample of two different cell-orderings for a 2-dimensional cube. The first one entails the

movement of certain cells in order to accommodate the new insertions, while the second en-

ables a simple appending of the new cells.

4.3.2 Slowly Changing Dimensions The term slowly changing dimensions refers to dimensions that are being updated from time

to time with a relatively small number of changes. For example, if we add some new items in

the product dimension, then some new dimension members will be created; with new order

codes and consequently new member-codes. Moreover, this case of updating might also be

triggered by a so-called type II slowly changing dimension [Kim96]. In this case, whenever

we have an update to some dimension member (e.g., we change the price of an item, or re-

move an item from one category and insert it into another and so on), then we do not over-

write the old values but we create a new member with a new (member) code that reflects the

new properties of the item and store it along with the old one, thus keeping two (or more)

versions of the same item. This is the most popular way to handle such dimension updates in

data warehouses [Kim96] and it is based on the idea that in a data warehouse we record his-

tory and therefore we want to be able to track all different versions of e.g., a product item

along the time dimension.

In order to insert into the CUBE File data points with chunk-ids that include new member-

codes along a dimension, i.e., member-codes that do not correspond to the hierarchy of the

dimension at the time that the CUBE File was initially built, we first need to “expand” the

data space covered by the CUBE File. This “expansion” entails the visit to a number of

chunks in order to update the stored order-code range covered by the chunk along the specific

dimension (for a discussion on the implementation of chunks and the chunk meta-data stored

within a chunk see §12.5). This is due to the consecutive ordering of members of the same

level as it was described in §2.1.

In Figure 32, we show that a potential insertion of a new member at D = 2 in the Region level

of the LOCATION dimension, will trigger also an update to all the grayed chunks of the same

depth, to the right of the chunks that the insertion took place, in order to “shift” the order

89


code ranges by one. Therefore, in the worst case, the insertion of a new member in a dimen-

sion at a specific level will entail the visit of all the chunks at the corresponding depth. Nev-

ertheless, this inefficiency can be overcome, or at least stalled until some later time, where a

wider reorganization might take place.

1

3

Grain level(Data Chunks)

Root Chunk

P P

0 1 2 3

D = 0

D = 1LOCATION

PRODUCT

0 1 2

0

1

0

0|0.0|0 0|0.1|0

D = 2

0

0|0.0|0.0|P

0

1

1 2

0|0.0|0.1|P

0

10|0.1|0.2|P|P

0

1

4 5

0|0.1|0.3|P

0

1

0 1

0|0

P P

0 1 2 3

0|0.0|1 0|0.1|1

30

0|0.0|1.0|P

2

3

1 2

2

30|0.1|1.2|P

2

3

4 5

0|0.1|1.3|P|P

2

3

D = 3 (Max Depth)0|0.0|1.1|P

New member insertion

. . .

Figure 32: The insertion of a new member in a dimension level triggers an update to a large number of chunks, due to the consecutive ordering of the level members.

A simple solution would be to anticipate the oncoming new members and thus make provi-

sion for reserving order codes that could be occupied later by potential updates. This simply

means that if we have a dimension hierarchy definition that says that in a level A we have x

members, thus we have order codes in the range [0, x-1], then we built the CUBE File as if

we had y members in that level, where y > x; i.e., we have reserved order codes in the range

[0, y-1]. It is very important to note at this point that due to the conservative space allocation

done by the CUBE File, where no space is reserved for empty regions (i.e., chunk-trees) and

also due to the compressed form of the chunk implementation (see §12.2 for a discussion on

90


how the CUBE File copes with the inherent cube sparseness), these extra order codes will

result to no extra space overhead.

In order to specify the number of the extra order codes required, we could use an old storage

rule of thumb [Ram98] referring to the capacity of common index pages, saying that a page

should be about 75-80% full, in order to anticipate for oncoming updates. Applied to our

case, we could say that the actual number of members in a level is only the 75% of the ones

that we are going to reserve for the CUBE File.

A more elaborate approach could be to ask from the end-user to characterize each dimension

as static or slowly-changing, depending on the business needs. Then, for each level of a

slowly-changing dimension we could receive from the user the maximum number of siblings

for that level. This parameter will specify the extent to which the number of children, at a

specific level, of a parent can get. For example, we know that for the month level, under the

year level, this parameter is equal with 12. Then, we could use this parameter to specify the

number of order codes at each level, when building the CUBE File instead of the actual num-

ber of members in the dimension hierarchy. For example, if we have the Product dimension

of Figure 2, and we know that the maximum-sibling parameter is for the item level 50 (i.e., a

type can have a maximum of 50 items); for the type level is 10 (i.e., a category can have at

most 10 types); and for the category is 6 (i.e., we can have at most 6 categories), then regard-

less of the actual Product hierarchy instantiation, for the CUBE File we would reserve: 6

members for category, 6×10 = 60 members for type and 60×50 = 300 members for the item

level.

We wish to comment at this point that the decision for a consecutive number assignment for

the members of a dimension level is not strict for the CUBE File organization. This means

that nothing essential would change for this structure, if we decided to assign the members of

a level with consecutive numbers that would be reset, each time we changed to a different

parent. In this case, all the new member insertions would remain local to the chunk-trees that

were being updated.

An advantage of using a consecutive number coding is that it provides us with a unique iden-

tifier of a member in a level (i.e., the order code), which is a simple integer, where only two

bytes would suffice for most cases to store it. Therefore, we can easily “link” other attributes

to specific level members. This could be exploited in normalized dimension table designs, or

even in order to support multiple hierarchy paths. Moreover, it gives us the opportunity to

91


express a multiple-range query in a compact representation that can be efficiently evaluated

with minimum I/O cost, which is the topic discussed in §8.2.3.

92

Chapter 5: Related Work I

5 Related Work I

I n this chapter we discuss related work relative to the first part of the thesis. In particular,

we discuss several works exploiting “chunking”, then describe the clustering problem for

multidimensional data and see different approaches in the literature. Furthermore, we discuss

about the grid file and related organizations. Finally, we provide taxonomy of proposals for

primary organizations for the cube data.

5.1 Chunking Chunking is not a new concept in the relevant literature. Several works exploit chunks; to our

knowledge, the first paper to introduce the notion of the chunk was [SS94]. Chunks were

proposed as an alternative means of storing a large multidimensional array on a series of disk

blocks, in contrast to the common cell linearization approach, i.e., storing the array in a row

major or column major order. In this work, the size of the chunk was fixed and determined

from a workload probability distribution. It was shown that chunking minimizes the number

of blocks fetched. In [ZDN97], this chunk-based storage method for multidimensional arrays

was exploited in order to define an algorithm for computing the “cube” operator [GBLP96],

which computes group-by aggregations over all possible subsets of the cube dimensions. The

proposed method out-performed by far all ROLAP (Relational OLAP [CD97a]) methods pro-

posed by that time.

In addition, in [DRSN98], chunking was used as a unit for caching when answering OLAP

queries. The authors have shown that chunk-based caching allows fine granularity caching,

and allows queries to partially reuse the results of previous queries with which they overlap.

Finally, in [CI99] a hierarchical decomposition of cubes, resulting in uniform size chunks,

has been proposed as a design choice for pre-computed cubes that are used for answering

93


range-sum queries, with a better query-update tradeoff.

5.2 The Linear Clustering Problem for Multidimensional Data The linear clustering problem for multidimensional data is one of finding a linear ordering of

records indexed on multiple attributes, to be stored in consecutive disk blocks, such as the I/O

cost for the evaluation of queries is minimized. The clustering of multidimensional data has

been studied in terms of finding a mapping of the multidimensional space to a one-

dimensional space. This approach has been explored mainly in two directions: (a) in order to

exploit traditional one-dimensional indexing techniques to a multidimensional index space -

typical example is the UB-tree [Bay97], which exploits a z-ordering of multidimensional data

[OM84], so that these can be stored into a one-dimensional B-tree index [BM72] – and (b)

for ordering buckets containing records that have been indexed on multiple attributes, to

minimize the disk access effort. For example, a grid file [NHS84] exploits a multidimen-

sional grid in order to provide a mapping between grid cells and disk blocks. One could find a

linear ordering of these cells – and therefore an ordering of the underlying buckets - such as

the evaluation of a query to entail more sequential bucket reads than random bucket accesses.

To this end, space-filling curves (see [Sag94] for a survey) have been used extensively. For

example, Jagadish in [Jag90] provides a linear clustering method based on the Hilbert curve

that out-performs previously proposed mappings. Note however that all linear clustering

methods are inferior to a simple scan in high dimensional spaces. This is due to the notorious

dimensionality curse [WSB98], which states that clustering in such spaces becomes meaning-

less due to lack of useful distance metrics.

In the presence of dimension hierarchies the multidimensional clustering problem becomes

combinatorially explosive. Jagadish in [JLS99] tries to solve the problem of finding an opti-

mal linear clustering of records of a fact table on disk, given a specific workload in the form

of a probability distribution over query classes. The authors propose a subclass of clustering

methods called lattice paths, which are paths on the lattice defined by the hierarchy level

combinations of the dimensions. The HPP chunk-to-bucket allocation problem that we have

discussed in the first part of this thesis is a different problem for the following reasons:

a) It tries to find an optimal way (in terms of reduced I/O cost during query evaluation) to

“pack” the data into buckets, rather that finding a linear ordering of the data. The problem

of finding an optimal linear ordering of the buckets, for a specific workload, so as to re-

duce random bucket reads, is an orthogonal problem and therefore, the methods proposed

94


in [JLS99] could be used additionally.

b) It deals apart from the data also with the intermediate node entries (i.e., directory chunk

entries), which provides clustering at a whole-index level and not only at the index-leaf

level. In other words, index data are also clustered along with the “real” data.

Since, we know that there is no linear clustering of records that will permit all queries over a

multidimensional space to be answered efficiently [JLS99], we strongly advocate that linear

clustering of buckets (inter-bucket clustering) must be exploited in conjunction with an effi-

cient allocation of records into buckets (intra-bucket clustering).

Furthermore, in [MRB99], a path-based encoding of dimension data, similar to our encoding

scheme, is exploited in order to achieve linear clustering of multidimensional data with hier-

archies, through a z-ordering [OM84]. The authors use the UB-tree [Bay97] as an index on

top of the linearly clustered records. This technique has the advantage of transforming typical

star-join [OG95] queries to multidimensional range queries, which are computed more effi-

ciently due to the underlying multidimensional index.

However, this technique suffers from the inherent deficiencies of the z space-filling curve,

which is not the best space-filling curve according to [Jag90, FR91]. However, is very easy to

compute and thus straightforward to implement even for high dimensionalities. A typical ex-

ample of such deficiency is that in the z-curve there is a dispersion of certain data points,

which are close in the multidimensional space but are not close in the linear order and the op-

posite, i.e., distant data points are clustered in the linear space. The latter results also to an

inefficient evaluation of multiple disjoint query boxes, due to the repetitive retrieval of the

same pages for many of them. Finally, the benefits of z-based linear clustering starts to dis-

appear quite soon as dimensionality increases, practically even when dimensionality passes

the number of 4-5 dimensions.

5.3 Grid File-based Multidimensional Access Methods The CUBE File organization was initially inspired by the grid file organization [NHS84],

which can be viewed as the multidimensional counterpart of extendible hashing [FNPS79].

The grid file superimposes a d-dimensional orthogonal grid on the multidimensional space.

Because the grid is not necessarily regular, the resulting cells may be of different shapes and

sizes. A grid directory associates one or more of these cells with data buckets, which are

stored on one disk page each. Each cell is associated with one bucket, but a bucket may con-

tain several adjacent cells, therefore bucket-regions may be formed. 95


To ensure that data items are always found with no more than two disk accesses for exact

match queries, the grid itself is kept in main memory represented by d one-dimensional ar-

rays called scales. The grid file is intended for dynamic insert/delete operations, therefore it

supports operations for splitting and merging directory cells. A well-known problem of the

grid file is that it suffers from a superlinear growth of the directory even for data that are uni-

formly distributed [Reg85]. One basic reason for this is that splitting is not a local operation

and thus can lead to superlinear directory growth. Moreover, depending on the implementa-

tion of the grid directory merging may require a complete directory scan [Hin85].

Hinrichs in [Hin85] attempts to overcome the shortcomings of the grid file by introducing a

2-level grid-directory. In this scheme, the grid directory is now stored on disk and a scaled-

down version of it (called root directory) is kept in main memory to ensure the two-disk ac-

cess principle still holds. Furthermore, he discusses efficient implementations of the split,

merge and neighborhood operations. In a similar manner, Whang extends the idea of a 2-

level directory to a multilevel directory, introducing the multilevel grid file [WK91], achiev-

ing a linear directory growth in the number of records. There exist more grid file-based or-

ganizations. A comprehensive survey of these and of multidimensional access methods in

general can be found in [GG98].

An obvious distinction of the CUBE File organization from the above multidimensional ac-

cess methods is that it has been designed to fulfill completely different requirements; namely

those of an OLAP environment and not of a transaction oriented one. A more detailed analy-

sis of the storage requirements of OLAP in general is presented in §11.1. However, we can

say that a CUBE File is designed for an initial bulk-loading and then a read-only operation

mode, in contrast, to the dynamic insert/delete/update workload of a grid file. Moreover, a

CUBE File aims at speeding up queries on multidimensional data with hierarchies and ex-

ploits hierarchical clustering to this end. Furthermore, since the dimension domain in OLAP

is known a-priori the “directory” does not have to grow dynamically. In addition, changes to

the directory are rare, since dimension data do not change very often (compared to the rate of

change for the cube data), and also deletions are seldom, therefore split and merge operations

are not needed so much. Nevertheless, more important is to adapt well to the native sparse-

ness of a cube data space and to efficiently support incremental updating, so as to minimize

the updating window and cube query-down time, which are critical factorw in nowadays

business intelligence applications.

96


5.4 Taxonomy of Cube Primary Organizations The set of reported methods in the literature for primary organizations for the storage of

cubes is quite confined. We believe that this occurs basically for two reasons: First of all the

generally held view is that a “cube” is a set of pre-computed aggregated results and thus the

main focus has been to devise efficient ways to compute these results [HRU96], as well as to

choose, which ones to compute for a specific workload (view selection/maintenance problem

[GM95, Rou99, SDJL96]). Kotidis et. al in [KR98] proposed a storage organization based on

packed R-trees for storing these aggregated results. We believe that this is a one-sided view

of the problem since it disregards the fact that very often, especially for ad hoc queries, there

will be a need for drilling down to the most detailed data in order to compute a result from

scratch. Ad hoc queries represent the essence of OLAP and in contrast to report queries, are

not known a-priori and thus cannot really benefit from pre-computation. The only way to

process them efficiently is to enable fast retrieval of the base data, thus providing effective

primary storage organizations for the cube most detailed data. This argument is of course

based on the fact that a full pre-computation of all possible aggregates is prohibitive due to

the consequent size explosion, especially for sparse cubes [OLR99].

The second reason that makes people reluctant to work on new primary organizations for

cubes is their adherence to relational systems. Although this seems justified, one could pin-

point that a relational table (e.g., a fact table of a star schema [CD97a]]) is a logical entity

and thus should be separated from the physical method chosen for implementing it. There-

fore, one can use apart from a paged record file, also a B+-tree or even a multi-dimensional

data structure as a primary organization for a fact table. In fact, there is only one commercial

RDBMS that we know of that exploits a multidimensional data structure as a primary organi-

zation for fact tables [TBHC00]. All in all, the integration of a new data structure in a full-

blown commercial system is a strenuous task with high cost and high risk and thus usually

the proposed solutions are reluctant to depart from the existing technology (see also

[RMF+00]).

Table 1 positions the CUBE File organization in the space of primary organizations proposed

for storing a cube (i.e., only the base data and not aggregates). The columns of this table de-

scribe the alternative data structures that have been proposed as a primary organization, while

the rows classify the proposed methods according to the achieved data clustering. At the top-

left cell lies the conventional star schema [CD97a], where a paged record file is used for stor-

ing the fact table. This organization guarantees no particular ordering among the stored data 97


and thus additional secondary indexes are built around it in order to support efficient access

to the data.

In [SS94] a chunk-based method for storing large multidimensional arrays is proposed. No

hierarchies are assumed on the dimensions and data are clustered according to the most fre-

quent range queries of a particular workload. In [DRSN98] the benefits of hierarchical clus-

tering in speeding-up queries was observed as a side effect of using a chunk-based file or-

ganization over a relation (i.e., a paged file of records) for query caching with chunk as the

caching unit. Hierarchical clustering was achieved through appropriate “hierarchical” encod-

ing of the dimension data.

Markl et. al in [MRB99], also impose a hierarchical encoding on the dimension data and as-

sign a path-based surrogate key on each dimension tuple that call compound surrogate key.

They exploit the UB-tree multidimensional index [Bay97] as the primary organization of the

cube. Hierarchical clustering is achieved by taking the z-order [OM84] of the cube data

points by interleaving the bits of the corresponding compound surrogates. [DRSN98],

[MRB99] and the CUBE File all exploit hierarchical clustering of the cube data and the last

two use multidimensional structures as the primary organization. This has among others the

significant benefit of transforming a star-join [OG95] into a multidimensional range query

that is evaluated very efficiently over these data structures. The primary organization pro-

vided by the CUBE File has been described in the first part of this thesis and is inspired from

the grid file organization [NHS84].

Table 1: The space of proposed primary organizations for cube storage

Multidimensional data structure Primary Organization

Clustering Achieved

Relation MD-Array UB-tree GRID FILE-

based

No Clustering Star Schema

Chunk-based [SS94]

Clustering Other

Chunk-based [DRSN98] CUBE File Hierarchical Clustering z-order based [MRB99]

98

PART II: Ad Hoc Star Query Proc-essing

Star queries are the most prevalent kind of queries in data ware-

housing, OLAP and business intelligence applications. Even more,

the true power of analysis lies in ad hoc star queries. Thus, there is an

imperative need for efficiently processing ad hoc star queries. To this

end, a new class of fact table organizations has emerged that exploits

path-based surrogate keys in order to hierarchically cluster the fact

table data of a star schema ([DRSN98, MRB99], the CUBE File). In

the context of these new organizations, star query processing

changes radically.

In the second part of this thesis, we discuss the processing of ad hoc

star queries over hierarchically clustered fact tables (i.e., cubes). In

the first chapter, we present a complete abstract processing plan that

captures all the necessary steps in evaluating such queries. Then, in

the next chapter, we discuss the evaluation of star queries specifi-

cally over a CUBE File-organized star schema. Next, we proceed by

presenting specific processing algorithms, expressed as physical op-

erators, over the CUBE File data structure. We end the chapter with

a discussion on related work.

99

Chapter 6: Processing Ad Hoc Star Queries over Hierarchically Clustered Fact Tables

6 Processing Ad Hoc Star Queries

over Hierarchically Clustered Fact

Tables

S tar queries are the most prevalent kind of queries in data warehousing, OLAP and busi-

ness intelligence applications. Star queries impose restrictions on the dimension values

that are used for selecting specific facts; these facts are further grouped and aggregated ac-

cording to the user demands. The major bottleneck in evaluating such queries has been the

join of the central (and usually very large) fact table with the surrounding dimension tables

(also known as a star join [OG95]). To cope with this problem various indexing schemes

have been developed [OG95, OQ97, Sar97, CI98, WB98, Wu99, WOS01]. Also precomputa-

tion of aggregation results has been studied extensively - mainly as a view maintenance prob-

lem - and is used as a means of accelerating query performance in data warehouses [GM95,

Rou98, SDJL96].

However, the need for doing On-Line Analytical Processing (OLAP) on the data makes proc-

essing of ad hoc star queries, i.e. queries that are not known in advance, also a necessity. For

this kind of queries the usage of precomputed aggregation results is extremely limited or even

impossible in some cases. Even when elaborate indexes are used, due to the arbitrary ordering

of the fact table tuples, there might be as many I/Os as are the tuples resulting from the fact

table. The only alternative one can have for such queries is a good physical clustering of the

data, and it is exactly for this reason that a new class of primary organizations for the fact ta-

ble has emerged ([DRSN98, MRB99], the CUBE File). These organizations exploit a special

kind of key that is based on the hierarchy paths of the dimensions, in order to achieve hierar- 101


chical clustering of the facts. This physical clustering results in a reduced I/O cost for the ma-

jority of star queries, which are based on the dimension hierarchies. Moreover, [MRB99] and

the CUBE File exploit a multidimensional index for storing the tuples (see also the taxonomy

of cube primary organizations in §5.4). A typical star join is transformed then into a multidi-

mensional range query, which is very efficiently computed using the underlying multidimen-

sional data structures.

In this chapter, we study the processing of ad hoc star queries over hierarchically clustered

fact tables. We show that the processing entailed is significantly different from the ones in

previous approaches. In particular, we present a complete abstract processing plan that covers

all the necessary steps for answering such queries. This plan directly exploits the benefits of

hierarchically clustered fact tables and opens the road for new optimization challenges. Our

proposals have already been implemented in a commercial RDBMS [TBHC00] and have

been deployed to customers. Finally, we present preliminary measurements that have been

confirmed in real-life applications and show significant performance gains for typical star

queries. The results presented in this chapter have been published in [KTS+02].

The rest of the chapter is organized as follows: First we define the necessary preliminary

concepts, as well as provide a description of the database schemata of interest. Then we pro-

ceed by defining the star query and provide a query template. Next, we describe our proposal

for a general abstract processing plan. Finally, we present the results from our experimental

evaluation

6.1 Preliminary Concepts As was also mentioned in the first part of this thesis, OLAP data are divided into two main

categories. The measures (or facts) are mainly numeric values, which correspond to meas-

urements of some value related to an event at specific points in time (e.g., amount of money

appearing in a line of an invoice at a specific day, or balance of an account at the end of each

day, etc.) and are expected to change rapidly, as new events occur (i.e., new invoice lines ap-

pear and so on). The dimension data (or simply dimensions) are used to characterize the

measures and are considered to be almost static (or slowly changing) in time. The dimension

values characterize a specific measure value in the same way that coordinate values charac-

terize a specific point in a multidimensional space. Examples of dimensions for a retailing

business can be DATE, PRODUCT, CUSTOMER, LOCATION etc.

Each dimension represents a distinctive property of a measure. In a relational OLAP

102


(ROLAP) implementation a dimension is stored into one or more dimension tables {D1, D2,

D3…} each having a set of attributes. In the simplest case, a dimension is represented by only

one table with only one attribute, say h1. Based on the values of h1 one may add additional

attributes (h2, h3, …) to the dimension table in order to form a classification hierarchy. In this

case the h1 attribute is classified by the h2 attribute, which is further classified by the h3 at-

tribute, etc. We call the attributes h1, h2, h3, … hierarchical attributes because they partici-

pate in the definition of a hierarchy. For example day, month and year can be a hierarchical

classification in the DATE dimension. In general, a single dimension may contain many dif-

ferent hierarchical classifications that stem from a common grain level (i.e., the most detailed

level). For the purposes of this chapter we will assume a single hierarchy for each dimension.

A dimension table may also contain one or more feature attributes f. A feature attribute cor-

responds to a property of the entity represented by a dimension table tuple, which is semanti-

cally different from a hierarchical attribute in that it cannot participate in the dimension hier-

archy. Feature attributes contain additional information about a number of hierarchical attrib-

utes and are always functionally dependent on one (or more) of them. For example, popula-

tion could be a feature attribute dependent on the region hierarchical attribute of dimension

LOCATION.

6.2 Database Schema As mentioned earlier, the dimensions are used to characterize measures, which in turn are

stored in fact tables. A fact table may contain one or more measure attributes and is always

linked (by foreign key attributes) to the dimension tables. This logical organization consisting

of a central table (the fact table) and surrounding tables (the dimension tables) that link to it

through 1:N relationships is known as the star schema [CD97a]. In a typical scenario, the hi-

erarchical attribute representing the most detailed level will be the primary key of the respec-

tive dimension. Each such attribute will have a corresponding foreign key in the fact table.

In order to create a fact table that is clustered according to the dimension hierarchies we first

need to apply a hierarchical encoding (HE) on each dimension table (refer also to the first

part of this thesis and particularly in §2.1). We achieve this by assigning to each dimension

table D containing the hierarchical attributes hm, hm-1, …, h1 (hm being the most aggregated

level and h1 the most detailed one) a surrogate key (sk) attribute that has a unique value for

each tuple. This is something very common in data warehousing practice, since surrogate

keys provide a level of independence from the keys of the tables in the source systems

103


[Kim96]. In our case, surrogate keys are defined over hm, hm-1, …, h1 and are essentially the

means to achieve hierarchical clustering of the fact table data. We refer to these keys as hier-

archical surrogate keys (hsk) or simply h-surrogates.

The main idea is that an h-surrogate value for a specific dimension table tuple is constructed

as a combination of encoded values of the hierarchical attributes of the tuple. For example, if

h1, h2, h3 are the hierarchical attributes of a dimension table from the most detailed level to

the most aggregated one, then the h-surrogates for this dimension table are represented by the

values oc1(h3)/oc2(h2)/oc3(h1), where the functions oci (i = 1,2,3) define a numbering scheme

for each hierarchy level and assign some order-code to each hierarchical attribute value. Ob-

viously the h-surrogate attribute of a dimension table is a key for this table since it determines

all hierarchical attributes, which in turn functionally determine all feature attributes. The h-

surrogate should be a system assigned and maintained attribute, and typically should be made

transparent to the user.

The actual implementation of the hierarchical surrogate keys depends heavily on the underly-

ing physical organization of the fact table. Proposals for physical organizations ([MRB99],

CUBE File) exploit such path-based surrogate keys in order to achieve hierarchical clustering

of the stored data of a fact table.

In this thesis, we adopt a de-normalized approach for the design of a dimension; i.e., we rep-

resent each dimension with only one table. The hierarchical attributes (h1, h2, …,hm), the fea-

ture attributes (f1, f2, …, fk) as well as the hierarchical surrogate key hsk are stored in a

unique dimension table D. De-normalization of the dimension tables is a common data ware-

housing practice. It is based on the rationale that the major overhead in storage space comes

from the fact table and therefore, normalizing the dimension tables will not exhibit any sig-

nificant space savings. Moreover, de-normalization enhances performance significantly, since

it avoids the consequent joins between the tables of the same dimension. Although the alter-

native of normalized schemata (also known as snowflake schemata [CD97a]), is also another

option, in this thesis we will not address it for the sake of simplicity of the presented abstract

processing plan. However, our ideas are fully applicable to normalized schemata as well, with

the only difference that extra joins between the several dimension tables (corresponding to

separate hierarchy levels) must be included in the plan.

104


d1d2...dN

hsk1hsk2...hskN

m1m2...mk

Fact table

h1h2...h K1

f1...f L1

hsk1

D1

h1h2...

f1...

hsk2

D2

h1h2...

f1...

hskN

DN

h K2

h KN

f L2

f LN

Figure 33: Star schema with flat dimension tables.

The schema of Figure 33 is a typical star schema where the dimension tables have been hier-

archically encoded. This schema consists of N dimensions stored in the dimension tables D1,

…, DN. Each dimension is logically structured in a hierarchy. The hierarchy attributes for di-

mension Di (i = 1, …N) are , i.e., it consists of KiKhhh ,..., 21 i hierarchy levels. Each dimen-

sion table Di includes a set of Li feature attributes that characterize one or more

hierarchical attributes. In Figure 33 we depict h

iLfff ,..., 21

1, i.e. the most detailed level in each hierarchy

as the primary key of each dimension table. In Figure 33 we can also see the h-surrogate

attribute (hski), which is an alternate key for each table Di (i = 1,…N).

The fact table contains the measure attributes (m1, m2, …mk), the reference to the h-surrogate

of each dimension (hsk1, hsk2, …, hskN) and a reference to the most detailed hierarchical at-

tribute of each dimension (d1, d2, …,dN). Hence, d1 is a reference to h1 of D1, d2 is a reference

to h1 of D2 and so on. All measure values refer to the most detailed level of the hierarchy of

each dimension. For an example of a star schema, the reader is referred to Figure 36.

In the fact table of Figure 33, we have two alternative composite keys: (a) (d1, d2, …,dN) that

links to the corresponding lowest hierarchical attribute of each dimension and (b) (hsk1, hsk2,

…, hskN) that links to the h-surrogate attribute. Note that the former is not necessary in order

105


to achieve hierarchical clustering of the data and thus could be omitted in order to reduce

storage overhead.

In this chapter, we assume a special physically organized schema. The fact table is stored hi-

erarchically clustered in a multidimensional index (e.g., the UB-tree [MRB99], or the CUBE

File), i.e., the index attributes of this clustering index are the h-surrogates. SELECT SGA, Aggr FROM ft, D WHERE JC AND LP AND MP GROUP BY GAh, GAf, GAm HAVING HP ORDER BY OL SGA: Selection attribute(s) of dimension table(s) (Di.hj∈GAh or Di.fj∈GAf) and/or, measure at-

tribute(s) of the fact table (ft.mi∈GAm). Aggr: Aggregation function(s) (MIN, MAX, COUNT, SUM) on measure attribute(s) of the fact ta-

ble (ft.mi) and/or, on attribute(s) of the dimension table(s) (Di.hj or Di.fj; Di∈D). ft: The fact table. D: Dimension table(s) involved in the query (D1, D2, …, DN). JC: Natural join conditions; joining the fact table ft with the involved dimension tables Di

(Di∈D) on key-foreign key (ft.di=Di.h1) LP: A conjunction of local predicates on some of the involved dimension tables:

LP=LOCPRED1(D1)∧LOCPRED2(D2)∧… ∧LOCPREDk(Dk); D1,D2,…,Dk∈D MP: Restriction predicate on measure attribute(s) of the fact table. GAh: Grouping hierarchical attribute(s) of dimension table(s) (Di.hk, Di ∈D). GAf: Grouping feature attribute(s) of dimension table(s) (Di.fk , Di ∈D). GAm: Grouping measure attribute(s) of fact table (ft.mi) HP: Restriction predicate on grouping attributes (GAh∪GAf∪GAm) and/or on aggregation func-

tions. OL: An ordered list of attributes.

(OL⊆GAh∪GAf∪GAm)

Figure 34: The ad hoc star query template

6.3 Ad Hoc Star Queries OLAP queries typically include restrictions on multiple dimension tables that trigger restric-

tions on the (usually very large) fact table. This is known as a star join [OG95]. In this thesis,

we use the term star query to refer to flat SQL queries, defined over a single star schema, that

include a star join. Star queries represent the majority of OLAP queries. In particular, we are

interested in ad hoc star queries. With the term “ad hoc” we refer to queries that are not

known in advance and therefore the administrator cannot optimize the DBMS specifically for

these.

In Figure 34, we depict an SQL query template for ad hoc star queries. The template defines

the most complex query structure supported and uses abstract terms that act as placeholders.

Note that queries conforming to this template have a structure that is a subset of the above

template and instantiate all the appropriate abstract terms.

106


Our template will be applied on a schema similar to the one in Figure 33, which is a typical

star schema. Looking at the part containing the join constraints between the fact table and the

dimension tables (JC), we see that it includes a star join. Apart from the star join, there is a

GROUP BY and HAVING clause (HP). In general any attribute (hierarchical, feature, or measure)

can appear in a GROUP BY clause (GAh, GAf, GAm). However, most queries impose grouping

on a number of hierarchical and/or feature attributes. Finally, there is an ORDER BY clause for

controlling the order of the presented results (OL).

LOCPREDi(Di) is a local predicate on a dimension table Di (LP). The characterization “local”

is because this predicate includes restrictions only on Di and not on other dimension tables or

the fact table. This predicate is very important for the h-surrogate processing phase explained

later, and is used to produce the necessary h-surrogate specification for accessing the fact ta-

ble.

Note that the vast majority of OLAP queries contain an equality restriction on a number of

hierarchical attributes and more commonly on hierarchical attributes that form a complete

path in the hierarchy. For example, if we consider a dimension LOCATION consisting of a 3-

level hierarchy (Region/Area/City) and a DATE dimension with a 3-level hierarchy

(Year/Month/Day), then the query “show me sales for area A in region B for each month of

1999” contains two whole-path restrictions, one for the dimension LOCATION and one for

DATE: (a) LOCATION.region = ‘A’ AND LOCATION.area = ‘B’, and (b) DATE.year =

1999. This is reasonable since the core of analysis is conducted along the hierarchies. We call

this kind of restrictions hierarchical prefix path (HPP) restrictions (see Definition 1). Note

also that even if we impose a restriction on an intermediate level hierarchical attribute (i.e., a

we can still have an HPP restriction, as long as hierarchical attributes functionally determine

higher-level ones.

Finally, MP is a predicate that contains any constraints on measures of the fact table. Those

constraints do not reference any dimension tables. An example would be to ask for sales fig-

ures that exceed a certain value threshold.

6.4 Abstract Processing Plan In this section we will describe the major processing steps entailed when we want to answer

star queries over a hierarchically clustered fact table.

Step 1 – Identifying relevant fact table data: The processing begins with the evaluation of

107


the restrictions on the individual dimension tables, i.e., the evaluation of the local predicates.

This step performed on each hierarchically encoded dimension table will result in a set of h-

surrogates that will be used in order to access the corresponding fact table data. Due to the

hierarchical nature of the h-surrogate this set can be represented by a number of h-surrogate

intervals, called the h-surrogate specification. An interval can for example have the form

v3/v2/∗, where v3, v2 are specific values of the h3 and h2 hierarchical attributes of the dimen-

sion in question. The symbol ‘∗’ signifies all values of the h1 attribute in the dimension tuples

that have h3 = v3 and h2 = v2. In the case of a DATE dimension, the h-surrogate specification

could be 1999/January/* to specify any day in this month. We will show in the next chapter

how this step can be performed very efficiently. We will use the term range to denote the h-

surrogate specification arising from the evaluation of the restrictions on a single dimension.

Once the h-surrogate specifications are determined for all dimensions, the evaluation of the

star join follows. In hierarchically clustered fact tables this translates to one or more simple

range queries on the underlying multidimensional structure that is used to store the fact table

data. Moreover, since data are physically clustered according to the hierarchies and the

ranges originate from hierarchical restrictions, this results in a very efficient evaluation of the

range selection.

Step 2 – Computing necessary joins: The tuples resulting from the fact table contain the h-

surrogates, the measures and the dimension table primary keys (see also Figure 33). At this

stage, there might be a need for joining this set of tuples with a number of dimension tables in

order to retrieve certain hierarchical and/or feature attributes that the user wants to have in the

final result and might also be needed for the grouping operation. We call these joins residual

joins.

Step 3 – Performing grouping and ordering: Finally, the resulting tuples may be grouped

and aggregated and the groups further filtered and ordered for delivering the result to the

user.

The abstract processing plan comprising of the above phases is illustrated in Figure 35 and

can be used to answer the single block queries described in the previous section. This plan is

abstract in the sense that it does not determine specific algorithms for each processing step; it

just defines the processing that needs to be done. That is why it is expressed in terms of ab-

stract operators (or logical operators), which in turn can be mapped to a number of alterna-

tive physical operators that correspond to specific implementations.

108


The plan can be logically divided in two main processing phases: the hierarchical surrogate

key processing (HSKP) phase which corresponds to Step 1 mentioned earlier, and the main

execution phase (MEP) corresponding to the other two steps. Next we describe the operators

appearing in the abstract processing plan of Figure 35.

Create_Range is responsible for evaluating the local predicate (LP in Figure 34) on each di-

mension table. This evaluation will result in an h-surrogate specification (set of ranges) for

each dimension. All these together define one (or more, disjoint) hype-rectangle(s) in the

multidimensional space of the fact table.

MD_Range_Access receives as input the h-surrogate specifications from the Create_Range

operators and performs a set of range queries on the underlying multidimensional structure

that holds the fact table data. Apart from the selection of data points that fall into the desired

ranges, this operator can perform further filtering based on predicates on the measure values

(MP) and projection (without duplicate elimination) of fact table attributes.

FT

MD_Range_Access

Order_By

Di

Dj

...

Create_Range

Create_Range

Di

Dj

Main Execution Phase

h-surrogate processing

...

Residual_Join

Residual_Join

Group_Select

Figure 35: The abstract processing plan.

Residual_Join is a join on a key-foreign key equality condition among a dimension table and

the tuples originating from the MD_Range_Access operator. This way, each incoming fact 109


table record is joined with at most one dimension table record. The join is performed in order

to enrich the fact table records with the required dimension table attributes. These attributes

might be required in the SELECT, GROUP BY, HAVING and ORDER BY clauses.

Group_Select performs grouping and aggregation on the resulting tuples and evaluates any

restrictions appearing in the HAVING clause. Finally, Order_By simply sorts the tuples in the

required output order.

Note that not all operators in the abstract plan may be needed for the execution of a particular

query. The plan represents the most complex abstract plan that might be required to answer a

query. For example, if the result records are not required in a specific order then the final Or-

der_By operator will not be applied. Also, many queries will not restrict all available dimen-

sions nor will require feature or hierarchical attributes from all dimension tables. This means

that only a restricted number of Create_Range and Residual_Join operators may be used. In

the simplest possible query (SELECT * FROM ft) only the MD_Range_Access operator is

needed.

6.4.1 Example of an Abstract Processing Plan In this subsection we first describe an example schema of a simplified data warehouse. Then

we present an abstract processing plan for an example query on this data warehouse.

customer_id product_id store_id day

cust_hsk prod_hsk loc_hsk date_hsk sales

SALES_FACT

customer_id profession name address hsk

CUSTOMER

store_id area region polulation hsk

LOCATION

item_id class category brand hsk

PRODUCT

day month year hsk

DATE

Figure 36: The schema of the data warehouse

The data warehouse stores sales transactions recorded per item, store, customer and date. It

contains one fact table SALES_FACT, which is defined over the dimensions: PRODUCT,

CUSTOMER, DATE and LOCATION with the obvious meanings. The single measure of

SALES_FACT is sales representing the sales value for an item bought by a customer at a store

at a specific day. The schema of the fact table is shown in Figure 36 and the dimension hier-

archies are depicted in Figure 37.

110


The dimension DATE is organized in three levels: Day-Month-Year. Hence, it has three hier-

archical attributes (day, month, year).

The dimension CUSTOMER is organized in only two levels: Customer-Profession. For each

customer the dimension table contains an ID, a name, an address and a profession. This di-

mension has two hierarchical attributes (customer_id, profession) and two feature attributes

(name, address). The LOCATION dimension is organized into three levels: Store-Area-

Region. Stores are grouped into geographical areas and the areas are grouped into regions.

For each area, the population is stored as feature attribute. This dimension has three hierar-

chical attributes (store_id, area, region) and one feature attribute (population) that is assigned

to the Area level.

Year

Month

Day

DATE

Category

Class

Item

PRODUCT

Region

Area

Store

LOCATION

Profession

Customer

CUSTOMER Figure 37: The dimension hierarchies of the example

Finally, the PRODUCT dimension is organized into three levels: Item-Class-Category. Items

are grouped into product classes and those classes are grouped into categories. For example,

one category could be “air condition”. Also, the attribute brand characterizing each item is a

feature attribute.

Let us now define an example query on the above schema: We want to see the sum of sales

by area and month for areas with population more than 1 million, for the months of the year

1999 and for products that belong to the category “air condition”. The corresponding SQL

expression of this query is given next, while the abstract processing plan for this query is

shown in Figure 38. SELECT L.area, D.month, SUM(F.sales) FROM SALES_FACT F, LOCATION L, DATE D, PRODUCT P WHERE F.day = D.day AND F.store_id = L.store_id AND F.product_id = P.item_id AND D.year = 1999 AND L.population>1000000 AND P.category = “air condition” GROUP BY L.area, D.month

6.5 Performance Evaluation The technology introduced in this chapter is fully implemented in the commercial relational

DBMS TransBase HyperCube® [TBHC00]. This section presents preliminary measurement

111


results that evaluate the performance of the proposed processing plan.

The measurements are performed on a two processor PC Pentium III, 750 MHz, with 256

MB RAM and 30 GB IDE hard disk.

SALES_FACT

LOCATION

DATE

Create_Range (year=1999)

Create_Range (category =

„air condition“)

DATE

PRODUCT

Main Execution Phase


Create_Range (population >

1000000)

LOCATION

Residual_Join (day)

MD_Range_Access

Group_Select (area, month)

Residual_Join (store_id)

Figure 38: The abstract processing plan for the example query

The data warehouse schema that we used for our experiments, consists of a fact table with

three dimensions CUSTOMER, PRODUCT and DATE and 3 measures: quantity, value and

unit_price. The data used come from a large electronic retailer in Hellas. The CUSTOMER

dimension contains 1,4 million records, PRODUCT consists of 27.000 products and the

DATE dimension covers 7 years on day granularity. 15.543.380 records are stored in the fact

table, amounting to 1,5 GB.

The query workload consisted of 220 ad hoc star queries from a real-world application. We

classified the queries into three groups according to their selectivity on the fact table (i.e.,

number of tuples retrieved from the fact table):

1. [0.0-0.1]: 0% to 0.1% of fact table, i.e., 0 to about 15K records

2. [0.1-1.0]: 0.1% to 1% of fact table, i.e., 15K to 160K records

3. [1.0-5.0]: 1.0% to 5.0% of fact table, i.e., 160K to 780K records

The goal of the performance evaluation was to measure three alternative execution plans:

a) the conventional star join plan (STAR),

112


b) the abstract execution plan as described in the previous section (called AEP) and

c) an enhanced version of AEP taking the hierarchical pre-grouping transformation into

account (called OPT). (The pre-grouping transformation imposes an early grouping on the

tuples coming out of the fact table before taking part in any residual join, thus reducing

dramatically the amount of tuples participating in the latter. This is a very promising op-

timization but its details are out of the scope of this thesis. The interested reader will find

more details on the hierarchical pre-grouping transformation in [KTS+02, PER+03] and it

is briefly mentioned in the related work section §9.3).

Table 2: Response time (in sec) for the three plans for the three query classes.

FT Sel. % [0.0-0.1] [0.1-1.0] [1.0-5.0]

STAR AEP OPT STAR AEP OPT STAR AEP OPT

MIN 0 0 0 65 2 2 274 11 6

MAX 30 6 3 290 9 6 1219 47 27

MEDIAN 1 1 1 182 8 5 477 23 13

STD-DEV 4.9 1.2 0.5 75.6 3.1 1.6 346.0 14.1 7.9

STAR uses secondary indexes that are created on the dimension keys of the fact table. The

restrictions on the dimension tables are evaluated and the resulting dimension keys are used

for index intersection on the fact table. The resulting records are joined with the dimension

tables, in order to perform grouping and get the final result. This is the typical processing of

star queries in commercial DBMSs (e.g., star transformation in Oracle [Ora01]). This proc-

essing has two major steps: the index intersection and the tuple materialization (retrieval of

the qualified fact table tuples). While the index intersection has largely been optimized (e.g.,

with bitmap indexes [OQ97]), the materialization of results is still the bottleneck of non-

clustering indexes. Consequently, we neglect the index intersection time for STAR and just

measure the time for fact record materialization, residual joins and grouping. For AEP and

OPT the complete processing including index access is measured; therefore favoring STAR.

Table 2, shows the response time analysis (in seconds) for the three alternative processing

plans. As the three classes contain queries with different result set size and thus different re-

sponse times we use the maximum, minimum, median time and the standard deviation to ana-

lyze the performance.

Our results show that the standard STAR processing is outperformed by our approaches.

However, for small queries, i.e., the class [0.0-0.1], the speedup is below an order of magni-

tude. In general, for small result sets, the advantage of clustering over non-clustering is not

that large. The picture changes dramatically, when we consider larger queries (classes [0.1-

113


1.0] and [1.0-5.0]), which are more typical for OLAP applications. The hierarchical cluster-

ing of AEP leads to an average speedup compared to STAR of 24 and with the additional op-

timization of hierarchical pre-grouping an additional factor of about two is gained.

Note also that STAR has a very high deviation in the response times for queries within one

class. This is mainly for two reasons: (a) STAR performance deteriorates very fast as the fact

table selectivity is increased and (b) since the fact table is not stored clustered the number of

performed I/Os may differ significantly from one query to another. On the other hand, the

deviation for AEP and OPT remains low, showing a much more stable behavior.

114

Chapter 7: Processing Ad Hoc Star Queries over CUBE File-Organized Fact Tables

7 Processing Ad-Hoc Star Queries

over CUBE File-Organized Fact

Tables

I n the previous chapter we have proposed an abstract processing plan for the evaluation of

ad hoc star queries over hierarchically clustered fact tables. In this chapter we discuss the

realization of the abstract operators of this plan for the case of fact tables organized as

CUBE Files. To this end, we define a normal form for representing star queries. This is the

topic of the first subsection. Then in the second subsection, we present “chunk expressions”,

which is an access pattern mechanism for describing the CUBE File data to be retrieved. In

the third subsection, we discuss the h-surrogate processing phase, which comprises the first

part of the abstract processing plan (see Figure 35). We propose a specific physical design for

the dimensions and present implementation rules that describe the entailed processing.

7.1 Star Query Normal Form Let us consider the example star schema in Figure 36 and assume that the fact table has been

organized as a CUBE File. Moreover, lets consider once more the example query on this

schema, posed in the previous chapter; we repeat it here also for ease of reference:

We want to see the sum of sales by area and month for areas with population more than 1

million, for the months of the year 1999 and for products that belong to the category “air

condition”. The corresponding SQL expression of this query is given next, while the abstract

processing plan for this query is shown in Figure 38. SELECT L.area, D.month, SUM(F.sales)

115


FROM SALES_FACT F, LOCATION L, DATE D, PRODUCT P WHERE F.day = D.day AND F.store_id = L.store_id AND F.product_id = P.item_id AND D.year = 1999 AND L.population>1000000 AND P.category = “air condition” GROUP BY L.area, D.month

If we wanted to analyze the definition of this query to a number of basic elements, then we

could see that one such element is the definition of the individual dimension restrictions (i.e.,

the definition of the local predicate per dimension – refer to §6.3): D.year = 1999 AND L.population > 1000000 AND P.category = “air condition”

So, in this case we have three separate local predicates corresponding to dimensions DATE,

LOCATION and PRODUCT respectively. The definition of the dimension restrictions sets the

target data set of fact values to retrieve. Then, with the GROUP BY clause: GROUP BY L.area, D.month

we define the granularity of the returned result. In this case, the returned values must corre-

spond to a (month, area, ALL, ALL) granularity. In other words, the results will be fully ag-

gregated along the dimensions CUSTOMER and PRODUCT and aggregated at the month,

and area levels for the DATE and LOCATION dimensions respectively. Finally, in the SE-

LECT clause, we define the measure that we wish to aggregate as well as provide the aggre-

gation function to be used; in this case, it is the sum of sales.

In a similar manner, we can generalize this procedure and define the set of discrete steps that

one has to make in order to define an ad hoc star query, corresponding to the template of

Figure 34, on a hierarchically clustered cube:

1. Define dimension restrictions (D)

2. Define measure restrictions (M)

3. Define result granularity (R)

4. Define result measures and aggregations on measures (A)

5. Define result restrictions (T)

6. Define presentation ordering (O)

Note that for a specific query maybe not all of these steps are needed. For instance, in the

above query only steps 1,3 and 4 were used. Note also, that we have included no step defin-

ing the star join between the fact table and the dimension tables. This is because this step is

not necessary to be defined by the user; the system may very well derive automatically all the

equality restrictions that join the dimensions to the fact table.

The second step refers to the definition of a restriction directly on the measures. An example

116


would be to ask for sales values that lie in the range 10 to 100 Euros (see MP predicate in the

template of Figure 34). In addition, the fifth step refers to a restriction imposed on the result

values: e.g., we could ask only for those totals that exceeded a certain amount (sum(sales)

> 1000). Clearly, this step corresponds to the HAVING clause in the template of Figure 34.

Finally, the last step specifies the order of the result, i.e., the OL predicate in the query tem-

plate.

Table 3: Mapping between SQNF and the SQL ad hoc star query template of Figure 34.

SQNF SQL Tem-

plate

D LP

M MP

R GAh, GAf,

GAm

A Aggr

T HP

O OL

For reasons of clarity and uniformity, we have chosen to represent an ad hoc star query to be

evaluated over a CUBE File in a normal form consisting of the six star query definition steps.

The corresponding definition follows:

Definition 16 (Star Query Normal Form - SQNF)

A star query conforming to the SQL template of Figure 34, is in Star Query Normal Form

(SQNF) when it is expressed with the following tuple: ([D],[M],R,A,[T],[O]). The six terms

correspond to the respective discrete steps for defining an ad hoc star query mentioned

above. Inside square brackets appear the optional terms.

The result granularity (R), as well as the declaration of the measures in the answer and poten-

tial aggregations on these measures (A) must always be present in a SQNF representation of a

query. The mapping between an SQL query based on the template of Figure 34 and SQNF is

straightforward and we present it in Table 3. Therefore the SQNF form of the above query is:

117


(D.year = 1999 AND L.population>1000000 AND P.category = “air condition”, (L.area, D.month), SUM(F.sales))

containing the D, R and A terms respectively.

7.2 Regular Chunk Expressions The very first step in the abstract processing plan (Figure 35) for ad hoc star queries over hi-

erarchically clustered fact tables, defined in the previous chapter, was the evaluation of the

dimension restrictions (i.e., the D term in SQNF). This processing step is represented by the

Create_Range abstract operation, which resulted to a set of h-surrogate specifications (see

§6.4); one for each dimension appearing in D. For example, such a specification could have

the form v3/v2/∗, where v3, v2 are specific values of the h3 and h2 hierarchical attributes of the

dimension in question. The symbol ‘∗’ represents all values of the h1 attribute in the dimen-

sion tuples that have h3 = v3 and h2 = v2. In the context, of the dimension encoding imposed

by the CUBE File organization (see §2.1) this h-surrogate specification is called member-

code specification and the “v” values correspond to order codes of the corresponding dimen-

sion while the slashes “/” are substituted by dots. A formal definition follows next:

Definition 17 (Member Code Specification)

The evaluation of the local predicate corresponding to a dimension Di, consisting of K hier-

archical levels (including potential pseudo levels – see §2.2), appearing in the D term of a

star query in SQNF over a cube C, results to a member code specification, denoting the

qualifying dimension members, and is noted with the following syntax (given in BNF gram-

mar): <member_specification>::=

<order_code_specification>1.<order_code_specification>2. … . <or-der_code_specification>K

<order_code_specification>::= oc | ôc | * | P | [oci-ocj] | [ôci-ocj] | (oci-ocj) | (oci-ocj] | [oci-ocj) | oc+ | oc- | ôc+ | ôc- | {<oc_comma_list>} | {^<oc_comma_list>}

<oc_comma_list>::= <oc_comma_list>, <oc_term> | <oc_term> <oc_term>::= [oci-ocj] | (oci-ocj] | [oci-ocj) | oc oc::= unsigned int /* i.e., an order code value */

In Table 4, we provide the definition of the symbols appearing in a member-code specifica-

118


tion. If for a dimension Di the local predicate is empty, then the corresponding member-code

specification has the form: *.*. … *. (With potential interleaving of “P” symbols for the

corresponding pseudo levels in the hierarchy.)

Table 4: Explanation of symbols appearing in a member code specification

Symbol What it means oc An order code value at a specific level in the hierarchy. ôc A negation of oc, i.e. do not include the members with this order code at the target

data set. * All order codes at this level under the specified ancestor. P Pseudo level, no order code at this level. [oc -oc ] i j Range of oc’s (boundaries included). [ôci-ocj] Negation of range, i.e. do not include the members with these order codes at the tar-

get data set. (oci-ocj] Range of oc’s (left boundary excluded). [oci-ocj) Range of oc’s (right boundary excluded). (oci-ocj) Range of oc’s (both boundaries excluded). oc+ Members with order codes ≥ oc. oc- Members with order codes ≤ oc. ôc+ Members with order codes > oc. ôc- Members with order codes < oc. {oc1,…,ock} List of non-consecutive order codes in ascending order. {ôc1,…,ock} Negation of list of non-consecutive order codes, in ascending order, i.e. do not in-

clude the members with these order codes at the target data set. {oc1,…,ock, [oci-ocj],…, [oc -oc ]} i’ j’

List of non-consecutive order codes and order code intervals, in ascending order.

{ôc1,…,ock, [oci-ocj],…, [oci’-ocj’]}

Negation of list of non-consecutive order codes and order code intervals, in ascend-ing order.

For example, let us assume a star query Q, expressed in SQNF, where the D term is as fol-

lows: D = PRODUCT.Category = “Music” AND LOCATION.Country IN (“Greece”, “USA”)

Then, from the example dimensions appearing in Figure 2, we have that the member-code

specification for the PRODUCT dimension is: 1.*.P.*

This denotes that the local predicate evaluation corresponds to all the descendant members of

the second member in the Category level (i.e., “Music”). Similarly, for the LOCATION di-

mension, the corresponding member-code specification will be: [0-1].{0,2}.*.*

This member-code specification includes a range of order codes at the Continent level, a list

of non-consecutive order codes at the Country level and two “*” symbols, denoting all the

corresponding descendant members at the Region and City levels respectively.

According to the abstract processing plan of Figure 35, the local predicate evaluation on each

dimension, results to a combined fact table data specification. In other words, the target data

set, to be retrieved form the cube is specified. More specifically, in the CUBE File data reside 119


into chunks identified by unique chunk-ids (see §2.3). In order to describe the target data set,

we need an access pattern to match the qualifying chunk-ids. We call this pattern a chunk

regular expression, or simply chunk expression. A chunk expression is created by interleav-

ing the member-code specifications of the cube dimensions, for a specific interleaving order

OI. This is the same mechanism with which chunk-ids are formed (see §2.3).

For example, if we choose the following interleaving order OI = (LOCATION, PRODUCT) major-

to-minor from left-to-right), for the dimensions of the previous example, then the resulting

chunk expression will be: [0-1]|1.{0,2}|*.*|P.*|*

This expression describes the chunk-ids of the qualifying cells in the cube’s grain level, cor-

responding to the dimension restriction term (D) in the SQNF-query Q. Therefore, the D term

in SQNF can be equivalently be expressed with a chunk expression.

In addition, the R term in SQNF representing the result granularity can be expressed equiva-

lently in terms of the corresponding chunking depths. In other words, one depth value for

each dimension that represents the depth up to which we should aggregate, beginning from

the grain level target data set (expressed with the D term in SQNF) up to the desired granular-

ity level. This is based on the chunk-tree representation of a cube discussed in §2.3. So, for

example, if we wanted the final results aggregated at the Continent and Type levels respec-

tively (see Figure 4), then the corresponding R term would be expressed as (0,1), because the

chunking depth of the chunks corresponding to the Continent level is 0 and the chunking

depth of the chunks corresponding to the Type level is 1. Next, we define a rule for process-

ing an ad hoc star query over a CUBE File-organized cube:

Definition 18 (Rule for processing an ad hoc star query over a CUBE File-organized cube)

In order to evaluate an ad hoc star query Q over a CUBE File-organized cube, Q must be

expressed in SQNF form, where the D term has been converted to a chunk expression and the

attributes appearing in the R term are represented by the corresponding chunking depth val-

ues.

The representation of an R term with the corresponding chunking depths is straightforward,

as long as we maintain the correspondence between the hierarchy levels of each dimension

and the respective chunking depth values. In particular, if we assume a cube C consisting of

N dimensions D1, D2,… DN, and an interleaving order ord = (D1, D2,… DN) major-to-minor

from left to right, then for any list of grouping attributes GAh, GAf (see star query template in 120


Figure 34) we create the corresponding result granularity term as follows:

Assume that each dimension Di (i = 1,…N) appears only once in the grouping list3, either by

a hierarchical attribute Di.h or by a feature attribute Di.f. Then, we replace each such appear-

ance of a dimension attribute by its corresponding depth in the chunk-tree representation of a

cube. (Assuming for feature attributes that we have recorded their functional dependency to a

specific hierarchical attribute and thus can relate them to a specific depth value). Further-

more, for dimensions not appearing in this list, we insert a special depth-value (e.g., -1) rep-

resenting an “ALL” term, which signifies a full aggregation along a dimension. Finally, we

reorder the list, so as the depth value of each dimension appears according to the interleaving

order.

The creation of a chunk expression entails the evaluation of the individual local predicates

over the dimension tables, in order to retrieve the corresponding order codes. We discuss this

processing next.

7.3 H-Surrogate Processing for the CUBE File The hierarchical surrogate key (h-surrogate) processing phase in the abstract processing

plan of Figure 35 represents the evaluation of the dimension restrictions in order to derive the

h-surrogate specifications, which identify the relevant fact table data. As we have seen, chunk

expressions are a means of describing the target grain-level data organized in a CUBE File.

Therefore, the result of the h-surrogate processing phase, in the context of a CUBE File, can

be represented with a single chunk expression.

The Create_Range operator of the abstract processing plan represents the core processing en-

tailed in this phase. Initially, the evaluation of the local predicates for each dimension has to

take place. This will result to a member specification for each dimension. Then, a final step

of interleaving these member specifications, in order to retrieve the corresponding chunk ex-

pression, has to be performed. In this subsection, we discuss this processing, which consti-

tutes the implementation of Create_Range. In particular, we analyze Create_Range to two

simpler operations. Then, we propose a physical design for a dimension table that will enable

the efficient processing of this phase. Finally, we propose an implementation of these opera-

tions in terms of a set of implementation rules.

The input to a Create_Range operator is a local predicate LOCPRED(D) imposed on a di-

mension D of the cube. The output is a member-code specification according to definition 1.2 3 Otherwise, we keep only the attribute, which corresponds to the most detailed level in the hierarchy. 121


in the previous subsection. In Figure 39, we depict the Create_Range operator analyzed into

two distinct processing steps. Restrict is responsible for evaluating the local predicate over

dimension table D and then projecting from the qualifying tuples the member-code value.

This corresponds to an h-surrogate column (hsk) in a dimension table of the star schema in

Figure 33. The result of this operation is passed on to operator Make_Specification. The lat-

ter, is responsible for creating a member-code specification out of a set of member-codes.

In order to describe the processing behind these two operators we have to decide the physical

organization of the dimension D. A discussion on this follows next.

Create_Range

LOCPRED(D)

Member codespecification

Restrict

LOCPRED(D)

Member codes

Make_Specification

Member codespecification

Figure 39: The Create_Range operator consists of two distinct processing steps.

7.3.1 Dimension Physical Design In Figure 33, we have presented a star schema design for a hierarchically clustered fact table.

Recall that in each dimension table an extra column was inserted corresponding to the h-

surrogate key. Also recall that we have chosen a “flat” dimension table design, where all the

attributes where included in a single table. Therefore each tuple of a dimension table D is

composed of the following attributes: (h1, h2,… hm, f1, f2,… fk, hsk). In this tuple, hi (i =

1,…m) are the hierarchical attributes; with h1 corresponding to the most detailed level and

comprising the primary key of the relation D; and hm corresponding to the most aggregated

level. Also, we have the set of feature attributes fj (j = 1,…k) and finally, hsk is the h-

surrogate which is an alternate key of the relation D.

According to the chosen physical organization for dimension table D, there will be different

available access paths to the underlying data. Moreover, if we decide to build additional sec-

ondary indexes for this table, then even more access paths will arise. The choice of the physi-

122


cal organization of a dimension table D and its additional secondary indexes is called the

physical design of dimension D. In this subsection, we propose a specific physical design for

a dimension D that will enable an efficient h-surrogate processing phase. This physical design

is not applicable only when the fact table is physically organized as a CUBE File, but it ap-

plies to any dimension table of a star schema, as long as the fact table has been hierarchically

clustered and h-surrogates are used.

In Figure 40, we depict our proposal for a physical design for a dimension table D. We can

see the dimension table organized as a B+ tree with the hsk attribute playing the role of the

search key. We have seen that the h-surrogates within a CUBE File-organized fact table cor-

respond to the member codes of the dimensions (see §2.1). Therefore, in this case we can as-

sume that hsk is implemented as a composite key composed of the order codes values of each

member-code. For the case of pseudo levels, we can assign a special order-code (e.g., -1) to

denote such a level. Note also, that due to fact that order codes act as unique identifiers of a

dimension value (i.e., member) within a hierarchy level, the grain-level order code is an al-

ternate key in D. Moreover, the D tuples at the B+ tree leaves will be ordered by ascending

member-code values, as well as by ascending grain-level order code values.

The reason why such an organization is beneficial to the entailed processing lies in that it en-

ables an efficient access path through the h-surrogate values. This can be proved invaluable

in the implementation of the residual join operation (see §6.4), since this join operation is

based on the h-surrogates of the dimension tables and the tuples coming out from the fact ta-

ble. Furthermore, it provides us with a means of retrieving the D tuples ordered by hsk val-

ues, which is required for the construction of the member-code specifications and thus elimi-

nates the need for external sorting.

In Figure 40, we have also defined a secondary B+tree index based on the following compos-

ite key: (h1, h2,… hm, hsk) named HPP-Index. This index’s purpose is twofold: (a) it can be

used to speedup the retrieval of D tuples, when a hierarchical prefix path restriction (HPP

restriction) (see Definition 1) appears in a local predicate for D and (b) it could also be used

as a table that stores the mapping between hierarchical prefix paths and h-surrogate values4.

The former use is the classic exploitation of an index, while the latter gives us the opportunity

to use this index solely to evaluate all predicates that contain restrictions on hierarchical at-

4 Remember that the hierarchical attributes comprise the primary attributes of the dimension (assigned by the user and not the system) and do not store order-codes (in contrast to the hsk attribute). These can have any data type and usually this is some sort of a string. 123


tributes only (and not on feature attributes), without accessing D. Naturally, the smaller tuples

of the index will deliver us the required h-surrogate values with much less page reads than if

we had to access the D tuples.

In a similar manner, in Figure 40 we have defined a set of m secondary indexes NHPP-Indexi,

where each one is based on a (hi, hsk) (where i = 1,…m) key. These indexes are used as an

access path whenever we have a non-hierarchical prefix path restriction (i.e., when not the

full path in the hierarchy is formed). Again their purpose is twofold: they can be used for ac-

cessing the corresponding D tuples as well as for evaluating the local predicate solely on the

index, if the predicate consists only of a restriction to a single hierarchical attribute.

Next, we continue by describing how can we exploit this physical design in order to effi-

ciently evaluate a local predicate on a dimension.

D

h1h2

…hmf1f2…fkhsk

D(hsk)

...

D tuples clustered byorder of hsk values

HPP-Index(hm,…,h1, hsk)

NHPP-Index1

(h1, hsk)

NHPP-Indexm

(hm, hsk)

.

.

.

Figure 40: A physical design proposal for dimension D.

7.3.2 Implementation Rules for the H-Surrogate Processing In this subsection we proceed to the implementation of the h-surrogate processing phase of

the abstract processing plan (Figure 35). In particular, we discuss alternative execution meth-

ods for this phase based on the aforementioned physical design. Moreover we provide a set of

implementation rules that guide the application of the proposed execution alternatives based

on the syntax of the local predicate that has to be evaluated. We have seen that the Cre-

ate_Range operator in the abstract processing plan, is composed of two distinct operations,

namely Restrict and Make_Specification (see Figure 39). In this subsection our discussion is

essentially focused on the implementation of these two operations. 124


A local predicate LOCPRED(D) can contain restrictions on the hierarchical attributes hi

(where i = 1,…m), as well as on the feature attributes fj (where j = 1,…k) of a dimension D

(see §6.3). The vast majority of restrictions imposed on hierarchical attributes, consists of

equality restrictions. Inequality restrictions (>, ≥, <, ≤), usually are applied to feature attrib-

utes, since these might more often correspond to a numeric data type. Hierarchical attributes

typically are strings and do not have an inherent ordering. An exception to this rule is the

Time dimension, where time-interval restrictions are typical. From the query template of

Figure 34, we have seen that different local predicates are linked together by a logical AND.

Next, we provide the syntax of a local predicate, imposed on a dimension table D, expressed

in BNF grammar: LOCPRED(D) ::= PRED(D) AND LOCPRED(D) | PRED(D) PRED(D) ::=

D.hi op xi | D.f op x | j j

D.hi IN Si | D.fj IN Sj | D.hi BETWEEN xr AND xl | D.fj BETWEEN x AND x r’ l’

op ::= = | <> | < | <= | > | >= xi ::= scalar expression not containing fields of Di Si ::= set of constant values

Since the set inclusion predicates (IN) and the range predicates (BETWEEN) can be easily re-

written as equality restrictions, in the following we will assume that a local predicate has the

following form: LOCPRED(D) ::= PRED(D) AND LOCPRED(D) | PRED(D) PRED(D) ::= D.hi op xi | D.fj op xj op ::= = | <> | < | <= | > | >= xi ::= scalar expression not containing fields of Di

In the following discussion we assume the operations table-scan and index-scan. The former

refers to a direct access on a table and evaluation of a search condition, in order to retrieve

qualifying tuples. In our case, where a dimension table is organized as a B+ tree, if the search

condition matches a prefix of the search key (we will say in this case that the search condition

“matches the key”), then we can exploit the intermediate nodes of the structure to access the

qualifying tuples very fast; otherwise, we perform a full scan on the leaves of the B+ tree.

Note also that we can perform a table-scan on a secondary index as well. In this case, we are

interested for the data stored at the leaves of the index itself and not for the corresponding

tuples stored in the referenced table. An index-scan on the other hand is used, when we ac-

cess a secondary index, with a search condition, in order to retrieve –through this index- the

qualifying tuples in the corresponding referenced table. Again, when the search condition

125


does not match the key, then the index-scan deteriorates to a full scan on the leaves of the

secondary index. The latter has the effect of retrieving the base table’s tuples in the order of

the secondary index’s key.

For the vast majority of dimension restrictions, the Restrict operator can be implemented very

efficiently. If we consider hierarchical prefix path (HPP) restrictions (Definition 1), then only

the first matching tuple on each dimension suffices in order to retrieve the appropriate h-

surrogate value that will generate the h-surrogate specification (see Definition 17 for the

CUBE File case). For example, if we have the restriction PRODUCT.category = “air condi-

tion” AND PRODUCT.class = “A” (see schema in Figure 36), then essentially what we want

is all the leaves of the subtree with root “air condition”/“A”/ defined in the tree instantiating

the hierarchy of dimension PRODUCT. Therefore, if we retrieve the h-surrogate value corre-

sponding to the first tuple that qualifies and truncate the suffix that corresponds to level Item,

then the remaining will be the same for all the qualifying tuples. Thus, the implementation of

Make_Specification in this case comprises the truncation of a single member-code and the

addition of an appropriate number of trailing “*” symbols. This can be synopsized in the fol-

lowing implementation rule:

Implementation Rule 1 (HPP restriction – without feature attribute restrictions)

If the local predicate contains an HPP restriction without feature attribute restrictions in-

cluded, then Restrict is implemented by a table-scan on the HPP-Index with the HPP restric-

tion as a search key condition. In this case the member-code value from the first matching

tuple is returned. Make_Specification is implemented by a truncation of the received mem-

ber-code, so as to leave a prefix that corresponds to the most detailed level appearing in the

HPP restriction. Then a “*” symbol is added to the position of each truncated order code.

Also appropriate pseudo level symbols are added if necessary.

Note that, if we have stored more information on the correlations between the attributes of a

dimension, apart from the definition of the hierarchy only, then we can benefit from the

above processing scheme, even for non hierarchical prefix path restrictions (NHPP restric-

tions). Suppose we have a hierarchy hm, hm-1,…, h1 on a dimension and we have a restriction

of the form: hk = c1 AND hp = c2 AND … hi = ci, where hk, hp,…,hi do not form a prefix of

(hm,…,h1) and hi is the most detailed of the referenced attributes. If we know that hi function-

126


ally determines hj (i.e., hj is functionally dependent5 on hi) for all j > i, then we can still apply

the above strategy. For example, for the restriction DATE.month = “AUG99”, we know that

the month attribute determines the year attribute and thus only the first tuple that has this

value for month suffices for our processing needs. If a hierarchical attribute at a specific level

cannot determine the hierarchical attributes of the higher levels, then we can apply the fol-

lowing implementation rule.

Implementation Rule 2 (NHPP restriction – without feature attribute restrictions)

If the local predicate contains an NHPP restriction without feature attribute restrictions in-

cluded, then Restrict is implemented by an index-scan on the NHPP-Index corresponding to

the most detailed hierarchical attribute appearing in the local predicate, with the corre-

sponding restriction as a search key condition. The base table tuples are retrieved and then

further filtered according to the restrictions posed on the higher-level hierarchical attributes.

Finally, the qualifying member-codes are projected out. Note that if the local predicate con-

sists of restrictions on a single hierarchical attribute only, then we do not need to access the

base table tuples at all. Make_Specification receives the stream of member-codes and trun-

cates each one, so as to leave a prefix that will correspond to the most detailed level appear-

ing in the NHPP restriction. The set of truncated member-codes is sorted in lexicographic

order and duplicate values are eliminated. “*” symbols are added to the end and appropri-

ate pseudo level symbols are inserted, if this is necessary. Finally, the corresponding member

specification is constructed according to the notation in Table 4.

Next, we examine the case where feature attribute restrictions are also included in the local

predicate. In this case we assume that we have recorded for each feature attribute, which hi-

erarchical attribute it characterizes. In other words, from which hierarchical attribute is func-

tionally dependent. In this case, we perform an index-scan on the HPP-Index, or the NHPP-

Index -depending on whether we have an HPP or NHPP restriction respectively- with a

search condition the part from the local predicate including only the hierarchical attribute re-

strictions. Then, we access the corresponding D tuples and further evaluate the restrictions on

the feature attributes. Finally, we pass on the extracted member-codes to Make_Specification.

The above are synopsized with the following rule: 5 An attribute Y is “functionally dependent” on a attribute (or attributes) X, if it is invalid to have two records with the same X-value but different Y-values. Then we say also that X “functionally determines” Y. That is, a given X-value must always occur with the same Y-value. When X is a key, then all attributes are by definition functionally dependent on X in a trivial way, since there can’t be two records having the same X value. 127


Implementation Rule 3 (HPP/NHPP restriction – with feature attribute restrictions)

If the local predicate contains an HPP/NHPP restriction with feature attribute restrictions

included, then Restrict is implemented by an index-scan on the HPP/NHPP-Index, with the

restriction corresponding to the hierarchical attributes as the search key condition. Then, the

corresponding base-table tuples are retrieved and are further filtered according to the re-

maining local predicate restrictions posed on the feature attributes. Make_Specification re-

ceives the stream of member-codes and truncates each one, so as to leave a prefix that will

correspond to the most detailed level appearing in the local predicate. (This includes both

hierarchical and feature attributes as well.) The set of truncated member-codes is sorted, du-

plicates are eliminated and additional “*” symbols (and/or) pseudo level symbols are added

to the end. Finally, the appropriate member specification is constructed according to the no-

tation in Table 4.

The last case deals with local predicates that include only restrictions on the feature attrib-

utes. The implementation rule follows:

Implementation Rule 4 (only feature attribute restrictions)

If the local predicate contains only feature attribute restrictions, then Restrict is implemented

by a table-scan directly on the dimension D, with the restriction on the feature attributes as

the search key condition. This will cause a full scan on the leaf nodes of the corresponding

B+ tree. The tuples are filtered according to the restrictions posed on the feature attributes

and the member-codes are projected and passed on the next operator. Make_Specification

receives the stream of member-codes and truncates each one, so as to leave a prefix that will

correspond to the most detailed level appearing in the local predicate. The set of truncated

member-codes does not need to be sorted since the input stream of member-codes is already

sorted by member-code value due to the B+ tree organization of dimension D. Duplicates are

eliminated and additional “*” symbols (and/ or) pseudo level symbols are added to the end

of each truncated member-code. Finally, the appropriate member specification is constructed

according to the notation in Table 4.

128

Chapter 8: Query Processing Algorithms for Ad Hoc Star Queries

8 Query Processing Algorithms for

Ad Hoc Star Queries

I n this chapter, we present processing algorithms pertaining to the CUBE File data struc-

ture. In particular, we propose a set of physical operators that implement the fact table

access in the abstract processing plan of Figure 35. A physical operator differs from a logical

(or abstract) operator, mainly in that its definition must include an implementation algorithm

for it. For this reason, a physical operator usually is associated with a specific physical or-

ganization for the data. In this case, we assume for the star schema of Figure 33 that the fact

table is organized as a CUBE File. For the definition of the physical operators, we exploit the

iterator model, discussed in the first subsection, which defines each operator in terms of three

functions. Then, we proceed by defining each one of the physical operators. The definition of

the physical operators takes place in a stepwise fashion, starting from simpler operators to

more complex ones and demonstrates a value-added method for defining physical operators.

Among the operators presented, we discuss an operator for the evaluation of a multidimen-

sional range query over a CUBE File. Moreover, we extend this operator in order to incorpo-

rate also aggregation and grouping. Then, we provide a qualitative cost analysis of the pre-

sented operators. We end this chapter with two examples of physical plans corresponding to

the running example query from chapter 1 that demonstrate the overall picture pertaining to

the evaluation of ad hoc star queries with a CUBE File storage base.

8.1 The iterator model for physical operators Iterators ([GUW00, Gra93]) constitute a well-established technique for implementing physi-

cal operators. An iterator is a group of three functions that allows a consumer of the result of

129


the physical operator to get the result one tuple at a time. Assume a virtual reading head over

the stream of tuples coming as the result of an operation, implemented by a physical operator.

The three functions forming the iterator for an operator are:

1. Open. This function starts the process of getting tuples, but does not get a tuple. It pre-

pares the operator for producing data (i.e., moves the virtual reading head over the first

result tuple), e.g., by initializing any data structures needed to perform the operation. It

receives as input the input arguments of the operator.

2. Next. This function returns the next tuple in the result (i.e., returns the result tuple at the

current position of the virtual reading head) and adjusts data structures as necessary to al-

low subsequent result tuples to be obtained (i.e., it moves the virtual reading head to the

next result tuple). If there are not subsequent results, then we assume that a Boolean flag

FoundNext is set to false.

3. Close. Performs final housekeeping by releasing resources obtained by the operator. Class OpX { Public: … // Iterator interface open(Table T, InputArgs); next(); close(); Private: //local state is stored here … Bool FoundNext; Table* source_ptr }; // X invocation OpX x; IF(x.open(T, input)) { WHILE(x.getFoundNext()) { result = x.next(); //use current result } x.close(); }

Figure 41: Definition and invocation of a physical operator X.

The following is quoted from [GUW00] and describes the benefits from the use of iterators:

“…Iterators support efficient execution when they are composed within query plans. They

contrast with a materialization strategy, where the result of each operator is produced in its

entirety – and either stored on disk or allowed to take up space in main memory. When itera-

tors are used, many operations are active at once. Tuples pass between operators as needed,

thus reducing the need for storage. Of course, not all physical operators support the iteration

130


approach, or “pipelining”, in a useful way. In some cases, almost all the work would need to

be done by the Open function, which is a tantamount to materialization.” Of course, apart

from the pipelining a fundamental advantage of iterators it that it allows an entire execution

plan (no matter how complex it is) to be executed within a single operating system process

[Gra93].

Since we are interested for the evaluation of execution plans on a relational star schema,

which can include conventional relational physical operators (at least for the part of the proc-

essing that has to do with dimension tables), e.g., table-scan, or index-scan – see previous

subsection; we will adhere to the convention that all defined operators provide their output in

terms of tuples. This includes the operators applied on the CUBE File, albeit the latter is an

array-based structure and not a tuple-based one. We will assume that the results coming out

of the CUBE File are “tuple-fied” (i.e., converted into tuples) in order to smoothly collabo-

rate with other relational operators. This will be especially exploited for the implementation

of the residual join between a CUBE File and a dimension table, discussed later on.

The input stream of data for an iterator, typically comes in the form of another iterator, whose

next function is invoked whenever a new input-tuple is needed. In our case we will distin-

guish between two different types of physical operators: (a) those, which read their input

solely from another physical operator and (b) those that read data directly from a CUBE File

and thus their interface is the set of primary data navigation operations offered by this struc-

ture (§4.1). For the latter, we will assume that the input is received in the form of a pointer to

a CubeAccess class implementing the construct of the current position in the CUBE File

(Definition 13) and encapsulating the basic data navigation operations.

In the following subsections, we will assume that each physical operator is represented by a

class. The open, next and close functions will be part of the public interface of this class.

Moreover, the information used by an operator that needs to be saved from one invocation to

the next, often called as local state, will be stored within the class’ data members. In the code

fragment of Figure 41, we show the class definition of an operator X imposed on a relational

table T. We depict the iterator interface and show the declarations of the three functions. open

receives as input the source table, as well as other operator-specific arguments. The local

state always includes a pointer to the data source (or data sources if the operator is not unary)

and a Boolean flag indicating the result of the latest call to next. Below the class definition,

we show a typical invocation of the physical operator X. There we see that initially open is

called with the appropriate input arguments as well as the source table T. If open succeeds 131


(we assume that a Boolean true is returned), then while there is a result tuple available, we

read it and advance to the next one by calling next. Finally, we release occupied resources by

calling close.

8.2 CUBE File-Access Physical Operators In this section, five physical operators will be defined; namely these are: ChunkSelect (ε),

ChunkTreeScan (δ), MDRangeSelect (ρ), MDRangeAggegate (α) and MDRangeGroup (γ).

All of these access directly the CUBE File in order to retrieve data. Moreover, more complex

ones are based on simpler ones demonstrating the potential for an added-value definition of

physical operators.

The central operator presented is the MDRangeSelect, which implements the chunk expres-

sion (§0) evaluation on a CUBE File. Essentially it represents an algorithm for evaluating

multidimensional range queries over the CUBE File. However, since a chunk expression cor-

responds to multiple non-overlapping selection hyper-rectangles it is more than a simple

range query evaluator.

Additionally, based on the MDRangeSelect operator we define two more operators that pro-

vide aggregation and grouping functionality. More specifically, MDRangeAggegate (α),

evaluates a chunk expression and at the same time applies an aggregation function to all the

qualifying data points. Furthermore, the MDRangeGroup (γ) operator combines chunk ex-

pression evaluation with grouping and aggregation in a single operation. This enables the

evaluation of grouping with time comparable to that of a multidimensional restriction evalua-

tion, requiring only a single pass over the data, in contrast to the conventional (hash-based or

sort-based) grouping algorithms that require multiple passes over the data. We begin our dis-

cussion with two operators that constitute the base for defining the above operators.

8.2.1 ChunkSelect (ε) The goal of the ChunkSelect (ε) physical operator is to enumerate the cells of a specific chunk

that qualify a selection condition imposed on this chunk. In this case, the selection condition

is expressed by means of a D-domain (§2.3) of a chunk expression cx, corresponding to the

chunking depth d of this chunk, noted as domain(cx,d). Recall from §2.3 that a D-domain is

the part in a chunk-id included between two consecutive dots. For example, if we consider

the chunk expression discussed in §0: [0-1]|1.{0,2}|*.*|P.*|*, then, we see that

there are four D-domains corresponding to selection predicates imposed on chunks of depth

0,1,2 and 3 respectively. The part in a D-domain corresponding to a specific dimension can 132


take any of the forms appearing in Table 4 of §0. Note that a D-domain represents a selection

condition that may need to be evaluated over a number of chunks of the same depth and not

only over a single chunk. However, since ChunkSelect is applied to a specific chunk, it evalu-

ates the selection condition solely on this chunk; i.e., it returns the qualifying cells (if any)

from the specific chunk only. Therefore, the D-domain *|* would return all the cells of the

chunk on which ChunkSelect has been applied and not all the cells at the corresponding

depth.

The input arguments to this operator comprise a chunk expression (cx), a chunk-id (cid) (of

the chunk on which we wish to apply this operator) and a pointer to a CubeAccess instance

(C) representing the underlying CUBE File-organized data. In an algebraic form we write

ChunkSelect<cx, cid>(C), or ε<cx, cid>(C). In Figure 42, we present the definition of the

ChunkSelect physical operator. In lines 6-11, we define the local state of this operator, which

consists of the flag FoundNext (see previous section), a pointer cb to the data source (i.e., a

specific CUBE File) and other operator-specific information. The latter includes the D-

domain selection condition (domSelection), the chunk-id of the chunk in question (cid) and

the current cell in the result stream (i.e., the cell that will be returned with the next call to

next). 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12:

ChunkSelect (ε<cx, cid>(C)) Class ChunkSelect { public: open(Chunk_Expression cx, Chunk_Id cid, CubeAccess* cp); next(); close(); private: // Local State: Bool FoundNext; CubeAccess* cb; Chunk-Expression-Domain domSelection; Chunk_Id cid; Cell currentCell; // next() will read this }

Figure 42: Definition of the ChunkSelect (ε) physical operator.

The input arguments to this operator are presented as input arguments to the open function

and are a chunk expression (the correspondin D-domain can be easily retrieved from it, if we

know the depth of the input chunk), the chunk-id of the input chunk and a pointer to the cube

data. The function open is responsible for initializing the evaluation of this operator by locat-

ing the first qualifying cell in the chunk and storing it, in order a subsequent call to next to

retrieve it.

The definition of this routine appears in Figure 43. Initially, we extract from the input chunk

133


expression the D-domain of interest and store it (lines 2-3), as well as move the current posi-

tion in the CUBE File (referred to as CP in the following) to the first non-empty cell of the

input chunk (lines 7-13). The loop in lines 16-19 tries to find the first qualifying cell by enu-

merating all the cells in the chunk. If such a cell can not be found, then the FoundNext flag is

set off and open returns (lines 20-24); otherwise, the cell corresponding to the CP is stored as

the current cell in the operator’s local state (lines 25-29). 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30:

ChunkSelect::open(Chunk_Expression cx, Chunk_Id cid, CubeAc-cess* cb){ domSelection = extract corresponding domain selection from cx; // set CP to first non-empty cell of target // chunk. cb->move_to(cid); IF(move_to fails) { //error in cid FoundNext = FALSE; RETURN; } cb->drill_down(); //does not affect data chunks // set CP over the first qualifying cell WHILE(chunk_boundary_flag not set && Current cell of CP does not qualify) { cb->get_next(); } IF(Current cell of CP does not qualify) { // no qualifying cells found FoundNext = FALSE; RETURN; } FoundNext = TRUE; // have found a qualifying cell for // next() to read // save current cell currentCell = cell of current position; RETURN; }

Figure 43: Definition of the ChunkSelect open function.

The next function’s primary responsibility is to place the CP over the next qualifying cell of

the input chunk. To this end, it returns the current cell stored in the operators local state and

advances to the next qualifying cell in the result set. The corresponding definition is depicted

in Figure 44. Initially, the FoundNext flag is examined in order to check if the previous call to

next had successfully managed to find a new qualifying cell. If so, the CP is moved over the

stored current cell (lines 4-6). This might be necessary, if other operations have been inter-

leaved between consecutive calls to next and thus moved the CP to another position in the

CUBE File. In line 8, we construct the “next” result to be returned by invoking a “tuple-fy”

routine on the current cell, which transforms the current cell in a tuple-based form. This

134


makes ChunkSelect more adaptable to execution plans comprising relational physical opera-

tors.

After the result to be returned is formed, we search for the next qualifying cell (lines 14-24).

If such a cell exist, then we store it as the new current cell and return the result; otherwise we

set off the FoundNext flag and return the result (lines 25-34). 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35:

ChunkSelect::next(){ // if qualifying cell exists IF(FoundNext){ //Reset CP, if necessary IF(cell in current pos != currentCell) { move_to(currentCell) } // read qualifying cell result = tuple-fy(currentCell); } ELSE { RETURN; // no more qualifying cells, no result to return } //advance to the next qualifying cell IF(chunk_boundary_flag not set) cb->get_next(); ELSE { // no more cells to visit FoundNext = FALSE; RETURN result; } WHILE(chunk_boundary_flag not set && Current cell of CP does not qualify) { cb->get_next(); } IF(Current cell of CP does not qualify) { // no qualifying cells found FoundNext = FALSE; RETURN result; } FoundNext = TRUE; // have found a qualifying cell for // for “next” next() to read // save current cell currentCell = cell of current position; RETURN result; }

Figure 44: Definition of the ChunkSelect next function.

Finally the third function of the operator, close, ensures that the CP is left in a valid position

in the CUBE File, by revaluating the various status flags, and releases the stored local state.

In Figure 45, we depict an example of a ChunkSelect evaluation. In particular, we present a

chunk, residing at a depth D = d, with a chunk-id cid. On the top-left corner of the figure, we

depict the input chunk expression (cx), where we have isolated the D-domain corresponding

to this depth. This D-domain poses a restriction on this chunk. The grayed cells correspond to

the cells that qualify this restriction. Each call to next will retrieve a single grayed cell. In the

135


figure, we have ordered the qualifying cells according to their retrieval order. We have as-

sumed a column-major ordering as an interleaving order for the chunk expression, as well as

for the physical ordering of the cells of this chunk. Note also that in the figure we have

marked empty cells with the “X” symbol.

X

X 0 1 2

67

8

cid

1

2

3

4

5

cx = … .{0,2,4}|*. …

D =

d

D = d

Figure 45: Example of a ChunkSelect evaluation.

8.2.2 ChunkTreeScan (δ) The ChunkTreeScan (δ<cid>(C)) physical operator receives as input a single chunk-id (cid),

representing the root of a chunk subtree of a cube C and enumerates all the data cells, which

lie in the leaves (i.e., data chunks) of this tree. In other words, this operator can be used in

order to evaluate chunk expressions, where the corresponding member-code specifications

are of the form: c0. c1. …cK.*. …*, where K is the same for all dimensions of the input cube C

and 0 ≤ K ≤ DMAX (assuming that the minimum depth is zero). If K = DMAX, then this corre-

sponds to a point query. If no chunk-id is given, then a scan of the chunk tree representing

the whole cube is performed; i.e., a full cube-scan that will retrieve all the data in the order of

physical storage. 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12:

ChunkTreeScan (δ<cid>(C)) Class ChunkTreeScan { Public: open(Chunk_Id cid, CubeAccess* cp); next(); close(); private: // Local State: Bool FoundNext; CubeAccess* cb; Chunk_Id cid; // id of the root of the tree Chunk_Id last_parent_id; // signifies chunk tree boundary Cell currentCell; // next() will read this }

Figure 46: Definition of the ChunkTreeScan (δ) physical operator.

In Figure 23, we have presented a simplified version of the implementation of this operator

for the sake of demonstrating the use of the CUBE File basic data navigation operations. That 136


implementation was rather naïve, since it did not consider any local state maintenance. More-

over, in order to check whether the tree boundary has been reached, it accessed the next sib-

ling chunk and compared the chunk-ids; thus, imposing a potential extra I/O operation (since

the sibling chunk might have been stored in a different bucket). In this subsection, we present

a more efficient implementation of this operator. 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30:

ChunkTreeScan::open(Chunk_Id cid, CubeAccess* cb){ //move CP to cid cb->move_to(cid) IF(move_to fails) { //error in cid FoundNext = FALSE; RETURN; } //prepare first result for next() IF (CP is in a data chunk) { //then this is a point query, only one result to read // no need to move CP any more //the tree boundary equals the input cid last_parent_id = cid; } ELSE { //store in the local state the last parent last_parent_id = find last non-empty cell in current chunk; //drill-down to the leaves while(CP is not in a data chunk) cb->drill_down(); } FoundNext = TRUE; // have found a cell for // next() to read // save current cell currentCell = cell of current position; RETURN; }

Figure 47: Definition of the ChunkTreeScan open function.

The basic idea is to descend down to the data chunks of this tree and then scan over all data

cells, in the order of physical storage, with repetitive calls to the get_next CUBE File data

navigation operation. However, in order to avoid crossing over the “tree boundaries”, we

need to include in the local state of this operator a chunk-id that will denote the end of the

chunk-tree data cells. Actually, we store the chunk-id of the parent cell in the root node of the

input tree that can lead to the last data cell of this tree in the order of physical storage. The

definition of the ChunkTreeScan operator is presented in Figure 46. Therefore, the operator-

specific local state information in this case consists of a chunk-id denoting the input tree and

a chunk-id denoting the end of the chunk-tree data cells. Also, the current cell in the result

stream is included.

137


The primary responsibility of the open function for this operator is to position the CP over the

first non-empty data cell in the first data chunk (in the order of physical storage) under the

chunk-tree corresponding to the input chunk-id, as well as to store the chunk-id that signifies

the end of the tree data cells. The definition of open is presented in Figure 47. Initially we set

the CP over the cell corresponding to the input chunk-id (line 2). If this corresponds to a data

chunk, then we have a point query and the chunk-id that signifies the tree boundary is set to

the input chunk-id (lines 10-14). This will prevent next to attempt to reach a subsequent cell.

Otherwise, we drill-down to the data chunk level and move CP over the first non-empty data

cell of the tree (16-24). Finally, we initialize local state fields so as the next call to next to re-

trieve the corresponding result (lines 25–29).

0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35:

ChunkTreeScan::next(){ // if qualifying cell exists IF(FoundNext){ //Reset CP, if necessary IF(cell in current pos != currentCell) { move_to(currentCell) } // read qualifying cell result = tuple-fy(currentCell); } ELSE { RETURN; // no more qualifying cells, no result to return } IF(last_parent_id == currentCell id) { // then this is a point query, no more results FoundNext = FALSE; RETURN result; } IF(chunk_boundary_flag set && currentCell prefix matches last_parent_id) { // then we have reached the tree data-boundary, // no more results to return FoundNext = FALSE; RETURN result; } //advance to the next cell cb->get_next(); FoundNext = TRUE; // have found a qualifying cell for // for “next” next() to read // save current cell currentCell = cell of current position; RETURN result; }

Figure 48: Definition of the ChunkTreeScan next function.

In Figure 48, we can see the definition of the ChunkTreeScan next function. As usual, if the

FoundNext flag in the local state is on, then a valid result corresponds to the cell stored in

138


the currentCell field and thus it is retrieved (lines 2-9); otherwise the routine returns no

result. After the current result is retrieved, next tries to position the CP over the next result-

cell. However, it first has to check whether this was a point query (lines 14-18), or if the tree

boundary has been reached (lines 20-26). The latter can be verified with the chunk-boundary

flag in conjunction with the stored parent id of the last cell of the tree. In both of these two

cases the result is returned and the FoundNext flag is set off to prevent subsequent calls from

reading a result. If however, there is a next result, the get_next CUBE File operation is in-

voked in order to advance to the next cell and the corresponding local state fields are updated

(lines 34-35). Finally, the third function of the operator, close, ensures that the CP is left in a

valid position in the CUBE File, by revaluating the various status flags, and releases the

stored local state.

Root Chunk

X

X

X

X

X

X

X

0 1

01

0 1 2

01

0 1

3 4 5

2

01

2

01

2

34

5

0|0

0|0.0|0 0|0.1|0

0|0.2|1

cid = 0|0

X

1

2

3

4

5

6

7 8

9

10

11

12

13

last_parent_id = 0|0.2|1

Figure 49: Evaluation of ChunkTreeScan on a 3-level tree with an input id of 0|0.

In Figure 49, we depict an example of the evaluation of ChunkTreeScan over a 3-level chunk

tree. The input chunk-id is 0|0. We have assumed a column-major interleaving order as well

as storage ordering of the cells of each chunk. Assuming that upon invocation of the open

function the CP was in the root chunk, the grayed cells correspond to the subsequent moves

of CP in the CUBE File. As soon as we reach chunk 0|0 we store the id that signifies the last

data cells to retrieve. In this case this is 0|0.2|1. At the data chunk level we have attached

a number to each retrieved data cell, in order to depict the order in which the results are re-

139


trieved with each invocation of next.

8.2.3 MDRangeSelect (ρ) The MDRangeSelect (ρ<cx>(C)) physical operator is essentially an implementation of the

MD_Range_Access logical operator, in the abstract processing plan of Figure 35, when the

primary organization of the fact table is the CUBE File. Equivalently, this operator can be

considered as a chunk expression evaluator. Indeed, the input argument corresponds to a

chunk expression cx to be evaluated over a cube C. In increasing order of complexity a chunk

expression might correspond to a single data point, a single hyper-rectangle (query box) or

multiple non-overlapping query boxes. The MDRangeSelect operator exploits the previously

defined physical operators ChunkSelect and ChunkTreeScan, as well as the CUBE File data

navigation operations, in order to enumerate all the data cells that qualify a specific chunk

expression.

In particular, for each chunk expression cx this operator initiates a ChunkSelect, at all possi-

ble depths, in the scope of the chunk expression. Moreover, it identifies the chunking depth,

below which no restrictions are imposed, so as to limit the number of chunk-selections. In-

stead it initiates full tree-scans with the use of the ChunkTreeScan operator, thus reducing the

computational cost. This depth is called the maximum depth of restrictions DMAX-R. The defi-

nition follows:

Definition 19 (Maximum Depth of Restrictions – DMAX-R)

The maximum depth of restrictions DMAX-R of a chunk expression cx, defined over a cube C,

which is represented by a chunk-tree CT, is the largest chunking depth value in CT, where

restrictions are imposed on C through cx. A D-domain of a chunk expression cx imposes no

restrictions on the chunks of the corresponding depth if and only if it consists only of the * or

P symbols (see Table 4).

For example, in the following chunk expression: [0-1]|1.{0,2}|*.*|P.*|*, the last

D-domain, where restrictions are imposed is the {0,2}|*. Therefore the maximum depth of

restrictions is DMAX-R = 1, while the maximum chunking depth is DMAX = 3. For a chunk ex-

pression representing a data point query, the maximum depth of restrictions equals the maxi-

mum chunking depth DMAX-R = DMAX. For a chunk expression of the form: *|…|*. …

.*|…|*, i.e., which corresponds to a full scan over the whole CUBE File, the maximum

depth of restrictions is null; a value of zero would mean that restrictions are posed only on the

140


root-chunk. 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:

MDRangeSelect (ρ<cx>(C)) Class MDRangeSelect { public: open(Chunk_Expression cx, CubeAccess* cp); next(); close(); private: // Local State: Bool FoundNext; CubeAccess* cb; int D_max_R; //maximum depth of restrictions Stack<ChunkSelect> cs_stack; //stack of ChunkSelect //operators. ChunkTreeScan currentCTS; //current ChunkTreeScan operator Chunk_Expression cx; //input chunk expression Cell currentCell; // next() will read this }

Figure 50: Definition of the MDRangeSelect (ρ) physical operator.

The general idea for the implementation of MDRangeSelect is to initiate the appropriate

number of ChunkSelect operators, one for each D-domain of the input chunk expression cor-

responding to depths less than or equal to DMAX-R. Then, for each qualifying cell of a chunk in

DMAX-R, a ChunkTreeScan is invoked, in order to retrieve all data cells under the correspond-

ing subtree. In Figure 50, we present the definition of the MDRangeSelect physical operator.

The input arguments comprise a chunk expression and a pointer to the underlying cube data.

Special information stored in the local state of this operator consists of the maximum depth of

restrictions (D_max_R), a stack of ChunkSelect operators (cs_stack), which are invoked from

different depths and the current ChunkTreeScan operator (currentCTS). Each invocation of

the next function of currentCTS yields a result for the MDRangeSelect. The stack cs_stack

provides a means for visiting the next qualifying cell at each different depth according to the

restrictions imposed by the input chunk expression.

The open function for MDRangeSelect is responsible for initializing the local state fields as

well as moving the CP over the first qualifying data cell, so as a subsequent call to next

would retrieve the first result. The definition of the open function is presented in Figure 51.

Initially it identifies and stores the DMAX-R and “rewinds” the CP by setting it over the first

non-empty cell of the root chunk (lines 1-4). In line 7, a distinction between a chunk expres-

sion containing restrictions and a full scan on the whole cube is made. Open initiates a

ChunkSelect starting from the root-chunk and thus moving CP over the first qualifying cell in

this chunk. Then, it drills-down to the next level, initializing a new ChunkSelect. This is re-

peated until the maximum depth of restrictions DMAX-R has been reached. Each such chunk-

141


selection will evaluate a specific domain from the input chunk expression (cx) that corre-

sponds to the respective depth. Each “opened” ChunkSelect operator is pushed into the stack

maintained by the local state of MDRangeSelect. These are described in lines 7-34. 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49:

MDRangeSelect::open(Chunk_Expression cx, CubeAccess* cb){ //store DMAX-R in local state D_max_R = From cx find max depth of restrictions; //rewind CP: go to the root chunk cb->move_to(); //if there are restrictions on some depths IF(D_max_R is not NULL) { //move CP at the D_max_R WHILE(current depth < D_max_R) { // initialize a ChunkSelect on the current chunk ChunkSelect cs; // moves the CP to the 1st qualifying cell in // current chunk cs.open(cx, current chunk-id, cb); IF(!cs.FoundNext) { //error in cx, abort; FoundNext = FALSE; RETURN; } //push ChunkSelect into stack cs_stack.push(cs); //descend one level cb->drill_down() } //Now we are at D_max_R // initialize a ChunkSelect on the current chunk ChunkSelect cs; // moves the CP to the 1st qualifying cell cs.open(cx, current chunk-id, cb); IF(!cs.FoundNext) { //error in cx, abort; FoundNext = FALSE; RETURN; } //push ChunkSelect into stack cs_stack.push(cs); //CP is pointing at a tree, which we need to full-scan // move CP to the first non-empty data cell currentCTS.open(chunk-id of CP, cb); } ELSE { // no restrictions, full scan the whole cube // move CP to the first non-empty data cell currentCTS.open(cb); //no chunk-id specified } FoundNext = TRUE; // have found a cell for // next() to read // save current cell currentCell = cell of current position; RETURN; }

Figure 51: Definition of the MDRangeSelect open function.

Right after this set of commands, the CP is positioned over the first qualifying cell at DMAX-R,

142


pointing to a subtree (unless this is a data cell, i.e., DMAX-R = DMAX), whose corresponding

data cells at the leaf level, all qualify the restriction posed by the input chunk expression.

Therefore, in order to access these data cells, all we need to do is to initiate a ChunkTreeScan

on this tree (line 38). In line 42, we repeat the same thing but for the case where a full scan on

the whole cube is required. Finally, the CP is saved and the FoundNext flag is set, in order a

subsequent call to next to retrieve the first result (lines 44-48). The third function of the op-

erator, close, ensures that the CP is left in a valid position in the CUBE File, by revaluating

the various status flags, and releases the stored local state.

The definition of the next function for MDRangeSelect is presented in Figure 52. In lines 2-13

the FoundNext flag is checked in order to verify whether a subsequent result exists to re-

trieve. If so, the next function of the current chunk-tree scan operator is invoked (line 9). This

will yield the result, as well as advance the CP over the next qualifying data cell under the

current tree. In line 15 we check the result of this call. I.e., if the corresponding FoundNext

flag (stored as the local state of the currentCTS operator) is on, then we have successfully

found a subsequent result under the same tree. So, we save the current position and return the

previously retrieved result (lines 15-21). If, however, this flag is off, then this means that

there are no more qualifying data cells under the current tree. Therefore, the search must ex-

pand to another tree.

The next qualifying tree will be found from the “next” ChunkSelect operator residing in the

stack of ChunkSelect operators (cs_stack) of the MDRangeSelect local state. For this rea-

son, we retrieve the topmost ChunkSelect operator from the stack and attempt a next (lines

30-31). If the stack is empty, then there are no more subsequent results and thus the current

result is returned and the FoundNext flag is set off (lines 24-28). In the case that there are no

more qualifying cells (and therefore chunk subtrees) to retrieve from the topmost ChunkSe-

lect operator, we remove it from the stack and access the next ChunkSelect operator (lines 35-

37). The latter will correspond to the evaluation of a D-domain on a chunk at a higher level

(i.e., smaller depth) than the previous one. We invoke next for this operator and check if we

have successfully managed to locate a subsequent qualifying cell (line 38). We repeat until

either we find such a cell or the stack empties (lines 32-45). In the latter case, we have

reached the end of results. If a new tree is found, then we initiate a new ChunkTreeScan simi-

larly to the corresponding pseudo-code in the open function and save the current cell, as well

as set on the FoundNext flag (lines 46-56). Finally, the third function of the MDRangeSelect

operator, close, ensures that the CP is left in a valid position in the CUBE File, by revaluating 143


the various status flags, and releases the stored local state. 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: 52: 53: 54: 55: 56:

MDRangeSelect::next(){ // if qualifying cell exists IF(FoundNext){ //Reset CP, if necessary IF(cell in current pos != currentCell) { move_to(currentCell) } // read next qualifying cell from the current chunk- // tree scan, and advance to the subsequent result result = currentCTS.next(); } ELSE { RETURN; // no more qualifying cells, no result to return } //if the next result is under the same tree IF(currentCTS.FoundNext){ FoundNext = TRUE; // have found a cell for // next() to read // save current cell currentCell = cell of current position; RETURN result; } // End of data from current tree. // We need to move to another qualifying tree IF(cs_stack is empty) { //no more trees to visit // no more results to return FoundNext = FALSE; RETURN result; } // get topmost ChunkSelect (DONT remove yet) ChunkSelect cs = cs_stack.front(); cs.next(); //move to next tree WHILE(!cs.FoundNext) { // end of qualifying cells from this chunk selection // remove topmost ChunkSelect from stack cs_stack.pop(); IF(cs_stack not empty) { cs = cs_stack.front(); //get the next ChunkSelect cs.next(); //move to next tree } ELSE {//empty stack // no more results to return FoundNext = FALSE; RETURN result; } } // Found a new tree to scan: //We need to drill-down to the DMAX-R again //and start a new ChunkTreeScan //(copy from lines 8-38 of MDRangeSelect::open): … //(end of copy) FoundNext = TRUE; // have found a qualifying cell for // for “next” next() to read // save current cell currentCell = cell of current position; RETURN result;}

Figure 52: Definition of the MDRangeSelect next function. 144


In Figure 53, we depict an example of an MDRangeSelect evaluation. We have assumed a

cube represented by a 4-level chunk-tree. The restriction imposed on this cube is expressed

via the chunk expression (cx) appearing in the top-left corner of the figure. Again we assume

a column-major interleaving order, as well as storage order for the cells of a chunk. We can

see that each D-domain of cx poses a restriction at a different chunking depth D. From the

chunk expression, we can immediately infer that the maximum depth of restrictions DMAX-R

equals one. Indeed, restrictions stop after depth D = 1. Therefore, we initiate a full scan (by

means of the physical operator ChunkTreeScan) for every chunk subtree hanging from the

DMAX_R depth. As can be seen from the figure, in this case we use three different tree-scans on

the corresponding qualifying trees. For each of the directory chunks with a depth less than the

maximum depth of restrictions, a ChunkSelect operation is initiated, in order to retrieve the

corresponding qualifying cells. The latter are depicted by the grayed cells, which show the

successive moves of the current position in the CUBE File (CP) during the evaluation of the

operator. In particular, we depict the directory chunk cells numbered in {a,b,c,…} form, in

order to show the order in which these cells are visited. Similarly, at the data chunk level, we

have numbered the retrieved cells in a {1,2,3,…} form to show the retrieval order.

Root Chunk

X X

X

X X

X

X

0 1

01

0 1 2

01

0 1 4 5 6

01

2

0 1 2

01

2

0|0

0|0.0|0.0|P

0|0.0|0

0|0.0|0.2|P

a

b

1

2

3

4

e

5

6

7

8

9

10

cx = [0-1]|0.{0,2,4}|*.*|P.*|*

1|0

3 4

01 h

g

DMAX-R = 1X

P c dX

X

7 8

0|0.2|0

P f X12 13 14

1|0.4|1

P i jX

X

X

15 16

01

2

0|0.2|0.7|P

11

12

13

14X X

21 22 23

34

1|0.4|1.12|P

15

1|0.4|1.14|P

26 27

34

X

X17

16 18 19

20

D =

0

D =

1

D =

2

D =

3

D = 0

D = 1

D = 2

DMAX = 3

Figure 53: Example of an MDRangeSelect evaluation.

145


The example in Figure 53, apart from the evaluation of the MDRangeSelect physical opera-

tor, also demonstrates a very important characteristic of the range query processing over a

CUBE File. In particular, note that the restriction in this example, expressed via a chunk ex-

pression, corresponds to a multiple multidimensional range query. Indeed, if we observe the

data cells at the leaves of the two chunk subtrees hanging from the root-chunk, we will notice

that these lie at disjoint rectangles in the 2-dimensional space. We wish to emphasize at this

point that the proposed method of representing restrictions via chunk expressions, as well as

the dimension data encoding (that assigns consecutive order codes to the dimension values of

the same hierarchical level – see §2.1) results to the evaluation of multiple query boxes within

a single query. This is very important, because it means that we can benefit from the current

drilling path in the CUBE File (see §4.1) and evaluate the restriction with a minimum of I/O

overhead, since each required bucket is fetched only once in main memory. This must be

contrasted with the UB-tree primary organization for hierarchically clustered fact tables

[Bay97, MRB99], which performs a separate range query for each query box. This results to

the potential retrieval of the same disk pages (z-regions) many times, leading to a poor per-

formance. If we consider the fact that the majority of restrictions in OLAP queries result to

multiple non-overlapping query boxes in the multidimensional space, we can realize the sig-

nificance of an efficient evaluation of such restrictions.

8.2.4 MDRangeAggregate (α) The physical operator discussed in this subsection is an extension of the MDRangeSelect op-

erator discussed above. It is called MDRangeAgreggate (α) and its purpose is to apply a spe-

cific aggregation function to all the data cells resulting from the evaluation of a chunk ex-

pression (i.e., the results of an MDRangeSelect), so as to return a single result value repre-

senting the corresponding aggregated value. This operator will be proved invaluable, for the

definition of an operation that combines multidimensional range selection and grouping, as

we will see in the next subsection.

The implementation of the MDRangeAggregate operator is straightforward, because it is

based solely on the MDRangeSelect operator. The input arguments for this operator comprise

a chunk expression cx -denoting the target multidimensional range(s)- and an aggregation

function aggf that must be applied to all the data points within these ranges. In an algebraic

form, we write: α<cx,aggf>(C) to denote the MDRangeAggregate operator applied on a CUBE

File-organized cube C. Examples of possible aggregation functions are all the functions used

also in standard SQL [Dat97], i.e., SUM, MIN, MAX, AVG, COUNT. The input arguments to 146


these aggregation functions can be only measures, or arithmetic expressions of measures. We

assume that no other attributes (e.g., dimension attributes) appear as arguments to an aggrega-

tion function in MDRangeAggregate. 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12:

MDRangeAggregate (α<cx,aggf>(C)) Class MDRangeAggregate { public: open(Chunk_Expression cx, Aggegate_Function aggf, CubeAccess* cp); next(); close(); private: // Local State: Bool FoundNext; MDRangeSelect* src; //source of incoming data Chunk_Expression cx; //input chunk expression Aggregate_Function aggf; double result; //the result will be stored here }

Figure 54: Definition of the MDRangeAggregate (α) physical operator.

The local state for this operator consists of the input chunk expression, the input aggregation

function an MDRangeSelect operator that provides the input stream of data and a field for

storing the aggregated result value. Therefore, we don’t need to store a pointer directly on a

CubeAccess class representing the CUBE File-organized data. However we do need it as an

input argument, in order to pass it on to the underlying MDRangeSelect operator. The defini-

tion of MDRangeAggregate appears in Figure 54. 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24:

MDRangeAggregate::open(Chunk_Expression cxp, Aggegate_Function aggfu, CubeAccess* cb){ //store local state cx = cxp; aggf = aggfu; //”Open” an MDRangeSelect operator src = new MDRangeSelect; src->open(cx,cb); IF(src->FoundNext) { //found first qualifying data cell //iterate through all qualifying data cells while(src->FoundNext) { currVal = src->next(); //distinguish between different aggregate functions updatePartialResult(result, aggf, currVal); } FoundNext = TRUE //Now, next() can read the result } ELSE { // no qualifying cells exist FoundNext = FALSE; } RETURN; }

Figure 55: Definition of the MDRangeAggregate open function.

147


The primary responsibility of the open function of the MDRangeAggregate operator is to ini-

tialize appropriately an MDRangeSelect operator (i.e., call the corresponding open function)

and compute the single result in order next to be able to read it. As we have seen the

MDRangeSelect::open will place the CP over the first qualifying data cell in the range repre-

sented by the input chunk expression, so as a subsequent call to MDRangeSelect::next would

retrieve it. The definition of the open function is depicted in Figure 55. 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11:

MDRangeAggregate::next(){ // if the result has been computed IF(FoundNext){ // create a tuple for this result Tuple tup-result = tuple-fy(result, cx) FoundNext = FALSE // after this result, no more exist RETURN tup-result; } ELSE { RETURN; // no result to return } }

Figure 56: Definition of the MDRangeAggregate next function.

Open iterates through all the result cells and progressively builds the final result. In lines 13-

18 we can see the loop that continuously calls the next function of the underlying MDRange-

Select operator, in order to retrieve each qualifying data cell. In line 16, the final result is built

progressively according to the imposed aggregation function. For example, a SUM is built by

adding each retrieved value to a partial total. For COUNT we merely need to count the num-

ber of retrieved data cells, simply by adding 1 for each qualifying cell found. For AVG, we

need both of the aforementioned aggregates, in order to divide them at the end. For

MIN/MAX, we continuously update the current min/max value until we iterate through all the

data cells. The final result is stored as a field in the local state.

The next function for MDRangeAggregate returns a single value representing the already cal-

culated and stored result. This can be seen in Figure 56, where we depict the definition of the

next function. Note that before returning the final result, we first transform it into a tuple con-

taining a column for the aggregated value and a column for each dimension, for which a re-

striction has been imposed through the input chunk expression. In these dimension columns

we store the corresponding member-codes for each dimension, which are extracted directly

from the input chunk expression cx.

When the call to the next function of MDRangeAggregate is concluded, the CP is left in the

position that the last call to MDRangeSelect had put it. This will be at the last qualifying data

cell. The close function of this operator has nothing else to do than to call the respective close

148


of the underlying MDRangeSelect operator to ensure a proper termination of the operation.

8.2.5 MDRangeGroup (γ) Usually whenever we want to evaluate a restriction (and in our case this would be a multidi-

mensional restriction) on a data set and then perform some grouping and aggregation on the

qualifying data, we separate the two operations into two discrete processing steps that are

executed successively. Moreover, in order to compute the grouping the entire set of qualify-

ing tuples has to be read before the first result can be produced. Depending on the grouping

algorithm used, sort-based or hash-based (see [Gra93] for a survey on processing algorithms

for grouping), it is typical for large relations (as it is the fact table in a star schema) to require

multiple passes over the input data, assuming that the evaluation of the restriction predicate

results to a large number of qualifying tuples. Therefore, the grouping has traditionally been a

great bottleneck in the evaluation of aggregation queries, since it usually entails a significant

I/O cost.

The physical operator discussed in this subsection tries to combine in a single operation the

evaluation of a multidimensional range selection and the evaluation of a grouping operation.

The presented algorithm computes each group’s result, while retrieving solely the relevant

qualifying data, without the need to read the whole set of qualifying data. Thus, the evalua-

tion cost of this operation becomes comparable to that of a multidimensional range selection,

which is significantly cheaper than that of a traditional grouping algorithm.

The presented operator is called MDRangeGroup (γ<cx,rg,aggf>(C)) and is applied on a CUBE

File organized cube C. The basic idea is to exploit the previously defined MDRangeAggre-

gate physical operator, which computes a single aggregated value over a set of data points

that qualify a chunk expression. We enumerate through all the groups and initiate a new

MDRangeAggregate operation for each group, in order to compute the aggregated result

value for each group. The input argument list comprises a chunk expression (cx), the result

granularity (rg) and an aggregation function (aggf). Recall from §7.1, that when a star query

is expressed in SQNF one of the required terms is the result granularity (R-term), which cor-

responds to the grouping attribute list in an SQL form (see Table 3). Moreover according to

the processing rule appearing in Definition 18, this term must be expressed via the corre-

sponding chunking depths.

The result granularity term of any star query expressed in SQNF is an ordered list of chunk-

ing depth values according to the interleaving order for a specific CUBE File. Therefore, we

149


know, which value corresponds to what dimension. Moreover, dimensions not participating

in the grouping, appear in this list with a special depth-value (e.g., -1) signifying an “ALL”

specification; i.e., a full aggregation along the domain of this dimension.

For example lets consider the example star query of §6.4.1 imposed on the star schema of

Figure 36, which is repeated here for convenience: SELECT L.area, D.month, SUM(F.sales) FROM SALES_FACT F, LOCATION L, DATE D, PRODUCT P WHERE F.day = D.day AND F.store_id = L.store_id AND F.product_id = P.item_id AND D.year = 1999 AND L.population>1000000 AND P.category = “air condition” GROUP BY L.area, D.month

If we assume the following interleaving order: (DATE, CUSTOMER, LOCATION, PROD-

UCT) major-to-minor from left to right, then the corresponding result granularity term in

SQNF would be: (DATE.month, CUSTOMER.ALL, LOCATION.area, PRODUCT.ALL),

which in chunking depth terms (possible depth values are D = 0,1,2 in this case) translates to:

rg = (1, -1, 1, -1).

As already mentioned, for an input chunk expression cx and a result granularity rg, the

MDRangeGroup operator enumerates through all the possible groups. For each such group, a

new chunk expression cxg will be created, where g = 1,…G, with G being the total number of

groups. Each generated cxg is used to invoke an MDRangeAggegate operator. The single re-

sult returned from each such invocation correspondw to a group’s aggregated value -

assuming that aggregation functions are applied only on measure attributes and not on dimen-

sion attributes- and thus can be returned as a result of MDRangeGroup.

The groups are enumerated by ascending depth value in rg (with the “-1” terms omitted) ma-

jor-to-minor from left to right. Each group g is placed in the original chunk expression cx in

order to create a new chunk expression cxg (corresponding to a subset of the cells satisfying

cx). For example let’s consider the following chunk expression corresponding to a 2-

dimensional cube: cx = [0-1]|1.{0,2,4}|[0-1].*|P.*|*

Assume the following result granularity rg = (0,1), which means that for the first dimension

(in the interleaving order) we want to group according to the highest level (D = 0) and for the

second dimension we want to group for the second highest level (D = 1). The corresponding

order-codes have been typed in boldface above. We see that we have four possible groups: G1

= (0,0), G2 = (0,1), G3 = (1,0) and G4 = (1,1). This enumeration will yield four new chunk

expressions, appearing next, in the order that will be generated and evaluated:

150

cx1 = 0|1.{0,2,4}|0.*|P.*|*


cx2 = 0|1.{0,2,4}|1.*|P.*|* cx3 = 1|1.{0,2,4}|0.*|P.*|* cx4 = 1|1.{0,2,4}|1.*|P.*|*

By invoking an MDRangeAggregate operation for each one of the above chunk expressions

we compute the corresponding result for the respective group. In order to facilitate the gen-

eration of all the possible groups, we have to make sure that the restriction appearing in the

original chunk expression for a grouping level, does not appear in a ‘*’ form; since this

would mean that we would have to consider the full domain of values for this level. A simple

way to overcome this difficulty is to incorporate knowledge about the grouping operation into

the h-surrogate processing phase (§7.3). In particular, if we identify the maximum depth value

appearing in a result granularity term DMAX-GR, then we could pass it on as input to the

Make_Specification operation (§7.3). The latter would compare DMAX-GR with the depth of the

most detailed level appearing in the local predicate for this dimension and choose the greater

one (i.e., the one corresponding to the most detailed level), as the point where the suffix-

truncation will take place (§7.3.2). This will ensure that no ‘*’ specifications will appear for

grouping levels in a chunk expression. 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:

MDRangeGroup (γ<cx,rg,aggf>(C)) Class MDRangeGroup { public: open(Chunk_Expression cx, Result_Granularity rg, Aggegate_Function aggf, CubeAccess* cp); next(); close(); private: // Local State: Bool FoundNext; Chunk_Expression cx; //input chunk expression Result_Granularity rg; Aggregate_Function aggf; GroupGenerator group; // iterates through groups Chunk_Expression cxg; // current group chunk expression MDRangeAggregate* src; //current group evaluator Tuple result; // current group result tuple }

Figure 57: Definition of the MDRangeGroup (γ) physical operator.

In Figure 57, we depict the definition of the MDRangeGroup operator. The local state of this

operator consists of the input chunk expression, the result granularity and the aggregate func-

tion; also, an iterator (group) for enumerating the possible groups in the order discussed

above, the current chunk expression (cxg) corresponding to the current group (i.e., the one

whose aggregated result value will be retrieved with the next call to next), a pointer to the

current MDRangeAggegate operator (src) and a field for storing the next result to be read

from next in a tuple form.

151


0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44:

MDRangeGroup::open(Chunk_Expression cxp, Result_Granularity rgr, Aggegate_Function aggfu, CubeAccess* cb){ //store local state cx = cxp; rg = rgr aggf = aggfu; //generate and store list of possible groups group.open(rg, cx); //get first group and create group chunk //expression cxg = generateGroupCX(cx, group.next()); //Open an MDRangeAggregate for this group src = new MDRangeAggregate; src->open(cx,aggf,cb) IF(src->FoundNext) { //Store result tuple for first group result = src->next(); FoundNext = TRUE RETURN } ELSE { // no result for first group //continue until you find result for a following group WHILE(group.FoundNext){ //get next group and create group chunk //expression cxg = generateGroupCX(cx, group.next()); //Open an MDRangeAggregate for this group src = new MDRangeAggregate; src->open(cx,aggf,cb) IF(src->FoundNext) { //Store result tuple for current group result = src->next(); FoundNext = TRUE RETURN } } // No result for any group found FoundNext = FALSE; } RETURN; }

Figure 58: Definition of the MDRangeGroup open function.

The primary responsibility of the open function for MDRangeGroup is to initialize the proc-

ess of group-related chunk expression generation, as well as to compute and store the result

for the first group, so as a subsequent call to next would retrieve it. The definition of open

appears in Figure 58. In line 8 we initialize the “group generator”. Then, in line 12 we gener-

ate the first chunk expression corresponding to the first group. In lines 15-16 an

MDRangeAggegate operator is initialized for computing the aggregate value of the first

group. This computation comprises simply the invocation of the MDRangeAggegate next

152


function (lines 17-22). If no aggregated value was returned for the first group, then we con-

tinue the same process for the next groups until we find a result or the possible groups are

exhausted (lines 23-39). If a result is found we store it in the corresponding local state field;

otherwise the FoundNext flag is set off. 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27:

MDRangeGroup::next(){ // if a group result has been computed IF(FoundNext){ Tuple res = result; //get result //before return it advance to result for next group WHILE(group.FoundNext){ //get next group and create group chunk //expression cxg = generateGroupCX(cx, group.next()); //Open an MDRangeAggregate for this group src = new MDRangeAggregate; src->open(cx,aggf,cb) IF(src->FoundNext) { //Store result tuple for next group result = src->next(); FoundNext = TRUE RETURN res; //return current result } } // No result for any group found FoundNext = FALSE; RETURN res; } ELSE { RETURN; // no result to return } }

Figure 59: Definition of the MDRangeGroup next function.

A call to the next function returns the calculated result for the current group. Recall from the

definition of MDRangeAggregate that the aggregated value is returned in a tuple form. The

definition of next appears in Figure 59. In line 3 it retrieves the result for the current group.

Then, it attempts to calculate the result for the subsequent group. This is represented by the

loop in lines 5-19. As in the case of open, if the subsequent group does not yield a result we

continue to the following one, until the groups are exhausted.

Finally, the close function is responsible for “closing” the current MDRangeAggregate opera-

tor and releasing the local state information. The CP will be left at the last qualifying data cell

corresponding to the last group, i.e., where the current MDRangeAggregate operator would

leave it.

In Figure 60, we depict the evaluation of an MDRangeGroup operator for the chunk expres-

sion and result granularity example mentioned above. In the figure, we can see the maximum

153


depth value appearing in the result granularity DMAX-GR, which is equal to 1 for this case. We

have four possible groups: G1, G2, G3, and G4, which generate the four different chunk ex-

pressions: cx1, cx2, cx3 and cx4, already mentioned above. At depth D = 1, we can see the

grayed cells corresponding to the qualifying cells for each group. Moreover, at the data chunk

level, we have included within a closed line the data cells that contribute to the aggregated

value computed for each group. As can be seen, for the specific data distribution, only groups

G1 and G4 will yield a result. The other two are “empty groups”, since they correspond to

empty data sets. Finally, the numbering of the grayed cells, as in the previous figures, corre-

sponds to the successive positions of CP.

Note that due to the way with which the chunk expressions, corresponding to individual

groups, are generated, the consecutive chunk expressions correspond to hierarchically clus-

tered chunks. This means that for consecutive groups we will have to drill-down from similar

paths. This characteristic can be greatly exploited in terms of a hierarchy-based caching that

will ensure than common buckets are read in main memory only once (see §13.2.2 for a de-

tailed discussion on this subject). For example, for group G1 we follow the path denoted by

positions {a, b, c, d, e, f}, while for the subsequent group G2 we follow the path {g, h}, which

accesses the same top-level bucket (or buckets). For the latter we do not drill-down to further

depth because as soon as we reach depth 1, we realize that it corresponds to an empty group

and thus continue with the next group, namely G3.

One final remark that the alerted reader would make is that the MDRangeGroup operator can

be applied, only if the grouping is applied on hierarchical attributes and not on feature attrib-

utes (§6.1). This is true since the grouping performed by MDRangeGroup is based on the or-

der-codes, which are assigned only to hierarchical attributes. However, this does not restrict

the applicability, as well as the “usefulness” of this operator. We have already discussed that

each feature attribute characterizes a specific hierarchical attribute and it is functionally de-

pended on it. This, in terms of grouping, means that we can group according to the corre-

sponding hierarchical attribute that functionally determines the feature attribute, instead of

the latter, as long as we perform an additional post-grouping operation, in order to obtain the

final result in the desired granularity. In fact, this is applied in a query optimization transfor-

mation called hierarchical pre-grouping that comprises a very efficient optimization tech-

nique that aims at reducing the tuples that are retrieved from the fact table and then take part

into a series of residual joins with the dimension tables (see [KTS+02] and [PER+03] for a

comprehensive presentation of hierarchical pre-grouping). Thus, MDRangeGroup can aug- 154


ment even further the benefits of hierarchical pre-grouping by providing an efficient means

for implementing it.

Root Chunk

X X

X

X X

X

X

0 1

01

0 1 2

01

0 1 4 5 6

01

2

0 1 2

01

2

0|0

0|0.0|0.0|P

0|0.0|0

0|0.0|0.2|P

a,g

b,h

1

2

3

4

e

5

6

7

8

9

10

cx = 0.{0,2,4}| |P.*|*

1|0

3 4

01 j,l

i,k

DMAX-GR = 1

P c dX

X

7 8

0|0.2|0

P f X12 13 14

1|0.4|1

P m nX

X

X

15 16

01

2

0|0.2|0.7|P

11

12

13

14X X

21 22 23

34

1|0.4|1.12|P

15

1|0.4|1.14|P

26 273

4

X

X17

16 18 19

20

D =

0

D =

2

D =

3 D = 0

D = 1

D = 2

DMAX = 3

D =

0rg = (0,1)

cx1 = {0,2,4}| |P.*|* cx4 = 0.{0,2,4}| P.*|*

X X

G2

G1 G3

G4

[0-1]| [0-1].*

0|0. 0.* 1| 1.*|

Figure 60: Example of an MDRangeGroup evaluation.

8.2.6 Analysis of the Cost of CUBE File-Access Operators In this subsection we will attempt a qualitative, rather than a quantitative, analysis of the cost

of the presented operations that access a CUBE File organization. To this end, we will follow

the common practice in database query processing and especially in index structure evalua-

tion, to focus solely on the most dominant cost factor, which is the number of I/Os during the

evaluation of each operator.

Ultimately, we would like to have a formula that given a specific chunk expression cx on a

cube C consisting of B(C) buckets, it would return the number of bucket reads, in order to

evaluate it. Unfortunately, such a cost formula is generally impossible to derive using merely

these input parameters (i.e., cx and B(C)). This is because information on the allocation of the

data into buckets is required. However, the allocation of the data into buckets depends heav-

155


ily on the data distribution. Therefore, some knowledge on the data distribution as well as the

allocation of chunks into buckets is required to be able to even approximate this cost. There-

fore, in the absence of a structure that maintains this information, only crude estimations can

be made by such a formula on the cost of each operator that accesses the CUBE File. For this

reason, we will make a qualitative analysis on this cost, rather than a quantitative one provid-

ing some mathematical formula.

B

QP......

...BR

Cache Area SM

(a)

B2

B1

...BR

Cache Area SM

B3

QP......

(b) Figure 61: (a) In the best case a single data point query can be evaluated with a single I/O, (b) If the

cache area cannot hold the whole root directory, more buckets might interleave in the evaluation path.

For a single point query Q the best scenario, i.e., the shortest possible path from the root-

chunk to the corresponding data chunk, corresponds to a single bucket read. This is the case,

where the bucket hosting the desired point is directly connected with the portion of the root

directory that is cached into memory (§3.2.3). This is depicted in Figure 61(a). However, if

the cache area size is not large enough to hold the whole root directory, then, as we depict in

Figure 61(b), there might be cases where more than one bucket I/Os might be needed, in or-

der to evaluate Q . If compression is used for storing the chunks (and thus no space is wasted

for sparse chunks) and considering the fact that a directory chunk would seldom occupy a

whole bucket on its own, nor would it trigger an artificial chunking, then we can safely as-

P

P

156


sume that for the most of the cases D /2 is a good upper bound of the number of buckets in

a path to a data point. As we have seen, D corresponds to the length of the longest hierar-

chy among the dimensions of a cube. Moreover, storage of large data chunks (§3.2.2) would

seldom require more than two levels of artificial chunking; therefore artificial chunking will

increase the number of buckets in a path by one or two for most cases.

MAX

MAX

Similar observations hold also for the evaluation of the ChunkTreeScan operator. Since

chunk-trees are clustered into the same bucket, the best-case scenario for the evaluation of

ChunkTreeScan is again a single bucket I/O. Furthermore, due to the construction algorithm

(see Figure 5 in §3.2) that stores chunk-trees of equal depth at each step, buckets “hanging”

from the same parent-chunk are created consecutively and thus are clustered on disk. There-

fore, it is clear that when evaluating a chunk expression not all I/Os are random, but the ac-

cess to sibling buckets corresponds to a sequential I/O operation. Consequently, this also

means that a get_next operation that will try to access a sibling bucket can benefit from the

potential pre-fetching done by the underlying file system.

Finally, in order to evaluate the cost of MDRangeGroup in comparison to a conventional

grouping algorithm, let’s assume that no restrictions are imposed on the CUBE File; i.e., a

chunk expression of the form *|…*. … .*|…* is submitted. Also, assume that the total num-

ber of buckets for a CUBE File organized cube C, is B(C). The evaluation of MDRange-

Group on C for a specific result granularity rg, will generate G chunk expressions to be

evaluated separately by an MDRangeAggregate operator each. As we have seen, the

MDRangeAggregate operation traverses the chunk-tree in a depth-first manner, hence access-

ing each relative bucket only once. Moreover, the order that the G chunk expressions are

generated and evaluated, is consistent to the hierarchical clustering, since groups are enumer-

ated (major-to-minor) from lowest chunking depths to higher ones. Therefore, similar paths

will be traversed for consecutive groups. With appropriate exploitation of the current drilling

path (CDP) construct, which results to a hierarchy-based caching (see §13.2.2), we can en-

sure that common buckets requested by different groups are fetched only once. Thus, each

bucket of the CUBE File will be read only once. Consequently, the number of I/Os for evalu-

ating MDRangeGroup, when no restrictions are imposed, equals B(C).

On the other hand, consider a relational table R consisting of B(R) disk pages and a memory

area of M buffers, each of the size of a page. Then with a sort-based or hash-based grouping

algorithm we will need K passes of the whole table in order to evaluate the grouping, where

157


)(log RBK M=

()(2 RBRBK +⋅⋅ )

[GUW00]. This means that the number of I/Os would be:

. The 2 factor corresponds to the fact that for each pass both reading and

writing must be considered. The last term corresponds to the final read in memory in order to

produce the results.

We see that MDRangeGroup imposes a significantly less I/O cost than a conventional group-

ing operation, essentially requiring only a single pass over the data. This is natural if we con-

sider that the “groups” of the data are essentially already formed by the underlying data or-

ganization (not in terms of aggregate value computation but with respect to clustering the

data points that belong to the same group). Hence, the MDRangeGroup algorithm merely

needs to use the directory chunk nodes in order to retrieve (i.e., calculate from the clustered

relevant data points) each group value. On the contrary, the conventional grouping algorithms

on relational tables spend most of their effort on trying to sort the data and form the individ-

ual groups before doing the final pass that will calculate the aggregate values.

A final note regarding the number of buckets B(C) of a CUBE File compared to the number

of pages B(R) of a relational table R. It is true that the directory chunks will impose an extra

storage overhead. However, in the CUBE File we only need to store the measure values and

not the h-surrogates (i.e., the chunk-ids) per data cell, as is the case for a relational table.

Therefore, we can assume that these two numbers B(C) and B(R) are fairly comparable.

8.3 Physical Execution Plans: the Big Picture In this chapter, we have presented five physical operators that access the CUBE File storage

organization. We have seen that these operators are based on the data navigation interface

provided by the CUBE File. Moreover, complex operators are based on simpler ones, demon-

strating a method of defining value-added processing operators. The operators have been de-

fined with the iterator model, which distinguishes three functions (open next and close), as

well as some local state information for each operation. In Table 5, we synopsize the physical

operators presented in this chapter.

In order to get the big picture of the processing entailed in the evaluation of star queries over

CUBE File organized fact tables, we will present a physical execution plan, i.e., a plan con-

sisting of physical operators rather than logical (abstract) ones, which corresponds to the ex-

ample abstract processing of Figure 38 in §6.4.1. This is depicted in Figure 62. For ease of

reference we repeat the query here again:

158


SELECT L.area, D.month, SUM(F.sales) FROM SALES_FACT F, LOCATION L, DATE D, PRODUCT P WHERE F.day = D.day AND F.store_id = L.store_id AND F.product_id = P.item_id AND D.year = 1999 AND L.population>1000000 AND P.category = “air condition” GROUP BY L.area, D.month

Table 5: Summary of physical operators accessing the CUBE File storage organization.

Iterator Open Next Close Local State

ChunkSelect, ε (C) <cx, cid>

Isolate selection condi-tion; find first qualifying cell

Return current cell; move CP over next qualifying cell

Revaluate CP status flags; re-lease local state re-sources

D-domain selection, chunk-id, cur-rent cell

ChunkTreeScan, δ (C) < cid>

Store tree data boundary; find first qualifying cell

Return current cell; move CP over next qualifying cell


Tree-root chunk-id, tree- boundary chunk-id, cur-rent cell

MDRangeSelect , ρ (C) < cx>

Find and store DMAX-R; initiate stack of chunk selections at different depths; initiate tree-scan at first qualifying tree

Return current result from current tree scan; Advance to next re-sult either on same tree or initiate scan on next qualifying tree.


Maximum depth of re-strictions, stack of ChunkSelect operators, current ChunkTree-Scan operator, input chunk expression, current cell

MDRangeAggegate α (C) < cx, aggf>

Initialize MDRangeSelect, calculate aggregate value

Read calculated aggr. result and turn it into a tuple.

Close under-lying MDRange-Select opera-tor

Input chunk expression, aggr. Func-tion, MDRange-Select opera-tor

MDRangeGroup γ (C) < cx, rg, aggf>

Initialize group generator; initialize MDRangeAg-gegate for first non-empty group; calculate corre-sponding group result.

Read result for cur-rent group; generate next group; initialize MDRangeAggregate for next group; calcu-late next group result.

Close under-lying MDRange-Aggegate operator

Input chunk expression, aggr. Func-tion, result granularity, current MDRange-Aggregate op., group generator

As in the case of its abstract counterpart, the plan in Figure 62 is divided into two major

phases: the h-surrogate phase and the main execution phase. We have discussed the former

in detail in the previous chapter. In particular, we have presented a physical design for the

dimensions that improves the corresponding processing. In the figure, we can see the evalua-

159


tion of the local predicate on the DATE dimension consisting of an HPP restriction (see di-

mension hierarchies in Figure 37) solely on the HPP-Index without the need to access the

base table. Moreover, the first matching tuple suffices for generating the corresponding mem-

ber code specification according to the implementation rule 1 (§7.3.2). The same holds for

the PRODUCT dimension also, since another HPP restriction is imposed there. For dimen-

sion LOCATION things are a bit different since we have to perform a full table scan directly

on the base table and then select the tuples that match the restriction on the feature attribute

Population, which is functionally dependent on the hierarchical attribute Area, according to

the implementation rule 4 (§7.3.2).

As soon as the member-code specifications are extracted from each dimension, they are com-

bined into a single chunk expression that is passed as input to an MDRangeSelect operator.

As we have discussed, the latter will access the CUBE File with the assistance of a number of

ChunkSelect operators and a series of ChunkTreeScans that will efficiently retrieve the rele-

vant detailed data. Each sales value retrieved will be augmented with two h-surrogates (i.e.,

member-codes), one corresponding to the DATE dimension and the other to LOCATION,

which are dynamically computed from the corresponding data cell chunk-id (which is not

stored along with the measure values but retrieved from the current position CP in the CUBE

File). This provides the impression of “tuples” coming out of the MDRangeSelect operator.

Furthermore, these tuples will need to be joined with the DATE dimension in order to retrieve

the Month values required in the final result. This join is implemented by a physical operator

named IndexResJoin in the figure. Essentially, this is an index-based join that utilizes the

primary organization of the dimension tables to efficiently retrieve the single join tuple from

the dimension side. Recall from §7.3.1 that a dimension table is organized as a B+ tree with

the h-surrogate attribute being the search key. Each tuple coming from the CUBE File side

contains an hsk attribute (i.e., a member-code) corresponding to the DATE dimension. We

use this value as a key for accessing directly the DATE dimension and retrieving the single

tuple that matches. Indeed, since hsk is a primary key of the dimension table, there will be

only a single tuple match. Therefore, the number of tuples in the output of the IndexResJoin

operator is the same as the one in the input. Similarly, for each hsk value corresponding to

dimension LOCATION we access the corresponding B+ tree and retrieve the appropriate area

value for each result tuple. Finally, the grouping and aggregation has to take place. We depict

a hash-group operator that groups the incoming tuples by area and month.

In Figure 63, we show an example of the application of the MDRangeGroup operator for the 160


same query. As we can see, we have replaced the MDRangeSelect operator with an

MDRangeGroup operator and at the same time we have removed the hash-group operator

altogether; since the grouping now is incorporated within the multidimensional range selec-

tion from the fact table. Naturally, this results to a significant reduction of the processing en-

tailed for the subsequent residual joins because the number of input tuples from the fact-table

side is now greatly reduced. Thus, we essentially depict an implementation of the hierarchi-

cal pre-grouping transformation [KTS+02, PER+03] with the MDRangeGroup operator.

Figure 62: A physical execution plan for the running example of abstract processing (§6.4.1).


LOCATIONDATEHPP index

Table Scanyear = 1999

MakeSpecification

mb-code

PRODUCTHPP index

Table Scancategory = “air

condition”

MakeSpecification

mb-code

Table Scanpopulation >

1000000

MakeSpecification

mb-code

CreateChunk

Expressionmb-codespecification

mb-codespecification mb-code

specification

ChunkExpressionCX

Sales Fact

MDRangeSelectρDATE

Table ScanIndex

ResJoinhsk = X.X.X

LOCATION

Table Scan

IndexResJoin

hsk = X.X.X

Hash-grouparea, month,sum(sales)

Main execution phase

LOCATION.hsk,DATE.hsk, sales

LOCATION.hsk, Month, sales

Area, Month,sales

area, month,sum(sales)

The input to the MDRangeGroup operator comprises the corresponding chunk expression

161


from the h-surrogate processing phase and the result granularity expressed in chunking depth

terms. If we assume the following interleaving order: (DATE, CUSTOMER, LOCATION,

PRODUCT) major-to-minor from left to right, then the corresponding result granularity term

in SQNF is: (DATE.month, CUSTOMER.ALL, LOCATION.area, PRODUCT.ALL), which in

chunking depth terms (possible depth values are D = 0,1,2 in this case) translates to: rg = (1,

-1, 1, -1).


LOCATIONDATEHPP index

Table Scanyear = 1999

MakeSpecification

mb-code

PRODUCTHPP index

Table Scancategory = “air

condition”

MakeSpecification

mb-code

Table Scanpopulation >

1000000

MakeSpecification

mb-code

CreateChunk

Expressionmb-codespecification

mb-codespecification mb-code

specification

ChunkExpressionCX

Sales Fact

MDRangeGroupγDATE

Table ScanIndex

ResJoinhsk = X.X

LOCATION

Table Scan

IndexResJoinhsk = X.X

Main execution phase

hsk(Region.Area),hsk(Year.Month),sum(sales)

hsk(Region.Area),Month,sum(sales)

area, month,sum(sales)

rg = (1, -1, 1, -1)

Figure 63: Physical execution plan for the running example containing an MDRangeGroup operator.

Note that the tuples coming out of the MDRangeGroup operator contain member code pre-

fixes corresponding to the granularity of each group. This is depicted in Figure 63 e.g., by

hsk(Region.Area) to denote the member-codes of the LOCATION dimension corresponding

to the Area level granularity. Consequently, now the conditions in the residual joins change.

Compare the condition hsk = X.X inside the IndexResJoin nodes in Figure 63, with that in

162


Figure 62. The condition hsk = X.X denotes that the search key in the corresponding dimen-

sion table is formed of a member-code prefix (for granularity Month and Area respectively)

instead of the whole member-code (corresponding to the grain level). An important character-

istic regarding the IndexResJoin operator is that only the first matching tuple from the dimen-

sion side, for each tuple coming from the fact table side, suffices in order to retrieve the re-

spective dimension attribute value. This occurs, even when the join restriction is on an h-

surrogate prefix and not on the whole h-surrogate (and thus it is not on a key of the dimension

table). For example, a “grouped” tuple corresponding to Month-level granularity and coming

out from the fact table, needs only to be joined with a single tuple from the DATE table (the

first matching one) in order to retrieve the e.g., “May99” value, and not with all the 31 rele-

vant tuples for this month. Therefore, similarly with the plan in Figure 62, the number of tu-

ples in the output of each IndexResJoin operator is equal to that in its input. The final results

are retrieved directly from the last residual join node.

163

Chapter 9: Related Work II

9 Related Work II

n this chapter, we discuss related work on ad hoc star query processing. To the best of

our knowledge, the processing of these queries has been focused only on the evaluation

of the star-join (i.e., the join of the fact table with the dimension tables in a star, or snowflake

schema). This is justified since the latter has been traditionally the major bottleneck in the

processing of such queries. Therefore, there is an extensive literature of methods; and to some

extent, the industry has surpassed the research community, since many of the proposed new

algorithms have first found their way into commercial systems before start appearing in the

research field. In the first section, we discuss the most prominent star-join methods.

Multidimensional structures that impose hierarchical clustering have changed the processing

framework, not only for the star join but for the evaluation of the whole star query. In chapter

1, we have presented such a framework for any hierarchical clustering fact table organization.

Apart from the CUBE File, another such organization is the UB-tree combined with the so-

called MHC technique. In the second section, we discuss the processing of ad hoc star queries

over a UB-tree organized fact table.

I

Finally, in the last section, we discuss work on the query optimization of star queries over

hierarchically clustered fact tables. We refer to the hierarchical pre-grouping transformation

and to work towards a cost-based query optimization.

9.1 Star-Join Processing One of the most important parts of a star query is the processing of the star join [OG95]. Star

join processing has been studied extensively and specific solutions have been implemented in

commercial products. See also [CD97b] for an overview.

The standard query processing algorithm for a star join over n dimensions first evaluates the

165


predicates on the dimension tables, either on a normalized (snowflake) or a de-normalized

(star) schema, resulting in a set R of n tuples of dimension D (1 ≤ i ≤ n). It then builds a

Cartesian product of the dimension result tuples (R × R × … × R ). The cardinality of the

Cartesian product is n · n ·…· n for the n restricted dimensions. With these Cartesian prod-

uct tuples, we perform a direct index access on the composite index built on the fact table.

For non-sparse fact tables and queries that restrict most dimensions of the composite index in

the order of the index attributes, the access to the fact tuples is quite fast. The next processing

step then joins the resulting fact tuples with the dimension tables in order to allow grouping

and aggregating.

i i i

1 2 n

1 2 n

However, for large sparse fact tables and high dimensionality, such a query processing plan

does not work efficiently enough. The cardinality of the cartesian product resulting from the

dimension predicates grows very fast, whereas the number of affected tuples in the fact table

may be relatively small. This is the point where a call is made for specialized indexing or

clustering methods.

Bitmapped join indices [OG95, OQ97] are often used to speed up the access to the fact table.

The bitmaps corresponding to the different dimension values are ANDed or ORed depending

on the selection conditions on them. The single resulting bitmap is used in order to access the

tuples of the fact table. When the query selectivity is high, then only a few bits in the final

bitmap are set. If there is no particular order among the fact table tuples, we can expect each

bit to correspond to a tuple in different page. Thus, there will be as many I/Os as there are

bits set.

Another flavor of bitmap index exploitation for the evaluation of star joins has been incorpo-

rated into a popular commercial system [Ora01]. The so-called star transformation rewrites a

star-join so as the dimension restrictions are expressed as direct restrictions on the fact table

column. For example the following query containing a star-join: SELECT dim2.dim2_attr, dim3.dim3_attr, dim5.dim5_attr, fact.fact1 FROM fact, dim2, dim3, dim5 WHERE fact.dim2_key = dim2.dim2_key /* joins */ AND fact.dim3_key = dim3.dim3_key AND fact.dim5_key = dim5.dim5_key AND dim2.dim2_attr IN (’c’,’d’) /* dimension restrictions */ AND dim3.dim3_attr IN (’e’,’f’) AND dim5.dim5_attr IN (’l’,’m’)

is rewritten in the following form:

166


SELECT … FROM fact WHERE fact.dim2_key IN (SELECT dim2.dim2_key FROM dim2 WHERE dim2.dim2_attr IN ('c','d')) AND fact.dim3_key IN (SELECT dim3.dim3_key FROM dim3 WHERE dim3.dim3_attr IN ('e','f')) AND fact.dim5_key IN (SELECT dim5.dim5_key FROM dim5 WHERE AND dim5.dim5_attr ('l','m'))

In this way, the evaluation of the individual dimension restrictions takes place in the begin-

ning, as if these were separate queries. From this evaluation only the dimension keys of the

qualifying tuples are extracted; and for large dimensions, the results are saved into temporary

tables. In the mean time, a separate bitmap index has been created on each fact table attribute

that is a foreign key referencing a dimension table key. Then, the extracted list of qualifying

dimension keys, for each dimension, is used to access the bitmap index on the corresponding

fact table column. The created bitmaps are merged (i.e., ANDed) and a final bitmap, indicat-

ing the qualifying fact table tuples is produced. Next, bits set on the final bitmap, are con-

verted to the corresponding row ids and the fact table tuples are retrieved. Finally, these tu-

ples have to be joined to the dimension tables in order to retrieve the dimension attribute val-

ues required in the final result.

The main advantage of this method is that the bitmap operations can be executed very effi-

ciently. However, as mentioned before, the lack of appropriate data clustering, might lead to a

significant number of I/Os. Moreover, bitmap indexes become inefficient if the number of

distinct values for a column is large [OQ97]. In this case the “bitmap density” (i.e., the num-

ber of bits set per bitmap) becomes low and the storage overhead is significantly increased.

Therefore, compression techniques have to be used, that will reduce the efficiency of the bit-

map operations.

9.2 UB-tree-based Ad Hoc Star Query Processing The UB-tree data structure [Bay97] along with the Multidimensional Hierarchical Clustering

technique [MRB99] can be used as an alternative physical base for evaluating ad hoc star

queries over hierarchically clustered fact tables. The abstract processing framework discussed

in chapter 1 can be fully applied to this case also. The h-surrogates are constructed by con-

catenation of the binary representations of individual hierarchical attribute values (assuming

that a hierarchy-preserving ordering of these values has taken place and integer values (called

surrogates) have been assigned to each one) and are called compound surrogates (cs). For

any data point in the multidimensional space its corresponding cs values that specify it, are

used in order to compute a single Z-value ([OM84]). This is simply done by bit interleaving

167


the cs binary representations. These Z-values are stored along with the corresponding meas-

ure values at the leaves of the UB-tree, which is used for storing the fact table. The dimension

restrictions appearing in a star query conforming to the template of Figure 34, are translated

to a set of cs intervals that form a set of query boxes to be evaluated in the multidimensional

space of the cube.

Due to the mapping of the multidimensional space to Z-values, a query box in the multidi-

mensional space partitions into a set of intervals on the Z-curve, called Z-intervals. Figure 64

shows such a decomposition for a query box (picture taken from [Ram02]). Consecutive data

points along the Z-curve are stored in Z-regions. Each Z-region is mapped to exactly one disk

page, i.e., a leaf in the UB-tree. Due to the compound surrogate encoding, tuples within the

same region are close in the dimension hierarchies; thus, hierarchical clustering is achieved.

The range query algorithm on the UB-tree tries to retrieve the Z-regions that are intersected

with a query box. However, many different Z-intervals, as well as data points outside the

query box might reside in a z-region. Therefore, a post-filtering after the retrieval of the Z-

regions is required

The UB-Tree provides an efficient algorithm for executing queries with a single multi-

dimensional range selection. However, the local restrictions on the dimensions may lead to

multiple cs-intervals per dimension. In this case, the query boxes to be evaluated on the UB-

Tree, result from the Cartesian product of these cs-intervals. Actually, it turns out that for

queries with many disjoint intervals for one or more dimensions (assume ni intervals for di-

mension D ), such as the queries with restrictions on the feature attributes (§6.1), the number

of overall query boxes is very large; i.e., n ·n ·…·n for d dimensions. As a result, a very large

set of query boxes has to be processed by the UB-Tree. An attempt to sequentially evaluate

each query box, may result in multiple accesses to the same pages, leading to significantly

higher I/O cost compared to the optimal processing of the query boxes.

i

1 2 d

In order to prevent the multiple accesses to the same pages, a proposal for processing the list

of all query boxes simultaneously was made in [FMB99]. This extension of the range query

algorithm for UB-Trees guarantees that each page covered by any query box is only accessed

once. However, the algorithm requires all query boxes to be “materialized”; causing signifi-

cant overhead for handling a very large set of query boxes. Consequently, this approach

works well if the list of query boxes fits in main memory. If not, the I/Os necessary to main-

tain the query box list as required by the algorithm may outweigh the I/O saved by the algo-

168


rithm in comparison to sequential processing.

Figure 64: A query box in the multidimensional space translates to a set of Z-intervals in the one-dimensional Z-value space [Ram02].

On the other side, the chunk expressions in CUBE File that we have discussed in the previ-

ous chapters, provide a compact representation of multiple multidimensional range restric-

tions. The proposed algorithm for the chunk expression evaluation (§8.2.3) enables simulta-

neous evaluation of multiple query boxes with no extra overhead. Moreover, the exploitation

of the current drilling path, as well as the hierarchical chunk to bucket allocation ensures that

each bucket is fetched into memory only once.

9.3 Ad Hoc Star Query Optimization The abstract processing plan for the evaluation of ad hoc star queries over hierarchically clus-

tered fact tables, discussed in chapter 1 sets the scene for new optimization challenges. Sev-

eral aspects of processing and optimizing star queries on hierarchically clustered fact tables

are presented in [TT01]. The paper considers a star schema with UB-Tree organized fact ta-

bles and dimension tables stored sorted on a composite surrogate key. For a particular class of

star join queries, the authors investigate the usage of sort-merge joins and a set of other heu-

ristic optimizations.

An important step in the abstract processing plan can be proved a significant bottleneck: the

residual join of results from the fact table with the dimension tables in combination with 169


grouping and aggregation. This phase typically consumes between 50% and 80% of the over-

all processing time. In typical star queries, early-grouping methods ([CS94, YL94, LMS94,

YL95, GHQ95]) only have a limited effect as the grouping is usually specified on the hierar-

chy levels of the dimension tables and not on the fact table itself. In [KTS+02, PER+03], a

combination of hierarchical clustering and early-grouping is proposed, called the hierarchical

pre-grouping transformation. Exploiting hierarchy semantics for the pre-grouping of fact ta-

ble result-tuples is several times faster than conventional query processing. The hierarchical

pre-grouping transformation, allows the grouping of fact table tuples before all join opera-

tions leading to a significant reduction of both the join and grouping effort. Furthermore, in

particular cases the transformation can remove completely one or more join operations. With

the proposed method, even queries covering a large part of the fact table can be executed

within a time span acceptable for interactive query processing.

In [TKS+02] an attempt towards cost-based optimization has been made with the definition

of a cost model pertaining to the specialized processing discussed so far. In particular, the

authors argue that there is no rule-based method to decide if the application of the hierarchi-

cal pre-grouping transformation is beneficial for a particular query and database instance.

This happens even though in some circumstances the benefits are obvious. They support that

a cost-based method can be used to optimally apply the hierarchical pre-grouping transforma-

tion. In order to define such a method they provide the means to estimate the cost of the vari-

ous operators affected by hierarchical pre-grouping. Since the cost of an operator depends on

the algorithm used to implement it, they also select the set of algorithms to be used and dis-

cuss specific variations of them.

170

PART III: ERATOSTHENES Im-plementation

ERATOSTHENES is a database management system primarily

aimed at providing OLAP functionality developed at the National

Technical University of Athens (NTUA). Essentially, the work re-

ported in this thesis has constituted the “kick-off” material for the

implementation of this system. In ERATOSTHENES, we have al-

ready implemented (or we are in the process of implementing) most

of the ideas described in the previous two parts of this thesis. The

CUBE File multidimensional access method is the cornerstone of the

system and lies in the heart of its kernel code.

As of this writing, the storage manager of ERATOSTHENES is fairly

completed and we have a running prototype of the processing en-

gine running on a CUBE File simulation for more than a year; al-

though, its original design has been revised extensively. Albeit the

storage manager and the execution engine are probably the most

fundamental parts of a DBMS, ERATOSTHENES is still an infant

with regard to having completed all the development and testing

necessary for making it publicly available. Since this is real software

and not just “paperware”, its completion is a strenuous and de-

manding goal that we plan to pursue with consistency in the near fu-

ture.

In this third part of the thesis we discuss the various implementation

issues in the development of ERATOSTHENES. We begin by de-

scribing the scope and objectives of the ERATOSTHENES project.

Then we discuss the overall architecture and present all the system’s

components. We continue with the implementation of the system’s

171

storage manager SISYPHUS and with the description of the incorpo-

ration of the CUBE File organization into SISYPHUS. Then, we pro-

ceed with the discussion of the implementation of the processing

engine. We finish by reporting related work of similar projects to

ERATOSTHENES.

172

Chapter 10: ERATOSTHENES: Building a Novel OLAP System

10 ERATOSTHENES: Building a

Novel OLAP System

R

OE ATOSTHENES is a database management system primarily aimed at providing

LAP functionality developed at the Knowledge and Database Systems laboratory

[KDB03] of the National Technical University of Athens. In this chapter, we discuss the ob-

jectives and scope of the ERATOSTHENES project. Then, we present the overall architecture

of the server component (see also [KVT+01] and [KTV+03]).

With regard to the work pertaining to this thesis, the author is the primary developer of the

current version of the server component (see also Figure 65 in §10.2), while the early execu-

tion engine prototype has been developed in the context of a diploma thesis supervised by the

author [Sam01].

10.1 Scope and Objectives The overall objective for launching the ERATOSTHENES project has been the development

of a novel OLAP system that will serve as a research workbench and will incorporate our re-

search results in the field of OLAP. Nevertheless, apart from the evaluation of our research

ideas in a real system, our ultimate goal has been to satisfy the most essential need of OLAP

users, i.e., to have a system that can truly respond “on-line” to ad hoc queries, supporting this

way ad hoc analysis.

To this end, we have decided to primarily focus on the efficient processing of ad hoc star

queries, since we believe that this type of query corresponds to the most common ad hoc

OLAP query and also comprises the most fundamental element for building more complex

OLAP queries and expressions. For this reason, we have chosen to direct our efforts to novel

173


physical data organizations, departing from the relational approach. In particular, regarding

the data of the cube, we have emphasized on pure multidimensional data structures and espe-

cially ones that provide hierarchical clustering of the data. Hence, we have decided to imple-

ment the CUBE File as the primary organization for cube-storage in ERATOSTHENES.

Figure 65: Architecture of ERATOSTHENES.

Buffer Managementpin/unpin records,

fix/unfix pages

SHORE Storage ManagerFunctionality

Disk Space Managementpage allocation/deallocation, page- space management

FileManagementcreate/destroy

files, create/destroy

records,

BtreeManagement

create/destroy index, insert/delete/update

entry

LockManagement

locks onvolumes, files,

indexes, records

Threads

TransactionManagementbegin, commit,abort, roll-back,

save-work

File Managercreate/delete CUBE File,

create/delete bucket, read/write bucket

Buffer Managerpin/unpin bucket, access chunk,

hierarchy-based replacement policy

Access Managercreate/delete cube,

create/delete dimension, cube access,dimension access, cube loading/maintenance, dimension loading/

maintenance

System ManagerEncapsulate Shore Storage Manager

Initialize database on start-up

ExecutionManager

execute planimplements physical

operators

Optimization Managerstar-query optimization

Cost Managercost model for star-query operations

Statistics ManagerHierarchy-enabledMultidimensional

Statistics

Plan Selection Managerrule/cost-based plan selection

Query Managercompile/validate

query

CommunicationManager

SISYPHUS Storage ManagerFunctionality

ERATOSTHENES OLAP Server

CatalogManager

register/unregistercube/dimension,

retrieve cube/dimension meta-data

Implemented withinthis thesis

174


In addition, having realized that the processing of star queries changes radically in the context

of these new physical organizations, we have turned our efforts, within the context of the

processing engine development, towards the implementation of efficient processing algo-

rithms for ad hoc star queries over hierarchically organized data. Moreover, we aim at devel-

oping specialized optimization techniques that exploit the characteristics of hierarchical-

clustering storage structures (such as the CUBE File, or the UB-tree/MHC [MRB99]), in or-

der to increase even more the processing speedup.

Finally, at the user side, we aim at developing a practical but yet rich in context and intuitive

conceptual model (see [TKS01] for more details) for helping the user design better OLAP

databases on ERATOSTHENES, or on other systems in general.

10.2 The architecture of ERATOSTHENES ERATOSTHENES can be used in a client-server architecture, where the client interacts with

the user and submits star queries to the server for evaluation. However, the client-side could

be split into two (thus yielding an overall 3-tier architecture), where the middle tier receives

OLAP queries consisting of complex multidimensional expressions (generated by some

graphical interface at the top tier), which are analyzed to a series of ad hoc star queries to be

evaluated at the server side.

In Figure 65, we depict the architecture of the ERATOSTHENES’ server component. This can

be distinguished in three layers. At the bottom lies the SHORE Storage Manager (SSM) a li-

brary for building object repository servers developed at the University of Wisconsin-

Madison [SSM99]. SSM provides the functionality of an untyped-record oriented storage

manager. Precious services, such as the disk-page management, or the management of a

buffer pool, in which pages are fetched from permanent storage and “pinned” into some page

slot in main memory, or the concurrency control with different kind of locks offered at sev-

eral granularities and even the recovery management done by a transaction manager are all

provided by the SSM interface. Moreover a threads implementation is also offered and we

have exploited it in order to add multi-user support into ERATOSTHENES (see also §11.3).

On top of SSM lies the SISYPHUS storage manager [KS01, KS03]. This is a chunk-oriented

storage manager essentially implementing the CUBE File organization. It makes “cubes” and

“dimensions” first-class citizens in ERATOSTHENES and provides typical storage manage-

ment facilities geared towards the specific needs of OLAP. For example the Buffer Manager

module, although underneath uses the facilities of the page-based, LRU-like, SSM buffer

175


manager, it implements a completely different replacement strategy, enabling caching of

buckets (the I/O unit in CUBE File) based on the dimension hierarchies (see §13.2.2).

In the top layer lies the execution engine represented by the Execution Manager, which im-

plements all physical operators (i.e., processing algorithms) and coordinates the evaluation of

query plans represented by operator-trees. These plans realize our processing framework for

ad hoc star queries over hierarchically clustered data [KTS+02]. The Optimizer Manager im-

plements all our optimization techniques regarding execution plan transformations (e.g., see

[TT01, PER+03]). These transformations are triggered by heuristic rules and/or cost-based

decisions. For this reason, we plan to incorporate into our optimizer the implementation of a

cost model [TKS+02] pertaining to star-query processing, as well as novel statistics manage-

ment techniques specialized for multidimensional data with hierarchies (see also §15.2.1).

Finally, it is important to note that special attention has been given in the design of this sys-

tem in order to enable easy incorporation of alternative methods and algorithms in the whole

spectrum of data management. ERATOSTHENES is implemented in C++ [Str97] and is run-

ning on a Linux platform [LNX02]. In terms of development completion as of this writing,

the SISYPHUS component has been almost finished and is being tested, while the processing

engine development has commenced based on an early prototype. The current implementa-

tion of the server component has been done by the author in the context of this thesis work

and is depicted enclosed in an ellipse in Figure 65.

176

Chapter 11: The SISYPHUS Storage Manager

11 The SISYPHUS Storage Manager

n

mI this chapter, we discuss the core architecture and design of SISYPHUS, the storage

anager of the ERATOSTHENES OLAP system. Much of the material reported in this

chapter appears in [KS01] and [KS03]. On-Line Analytical Processing (OLAP) poses new

requirements to the physical storage layer of a database management system. Special charac-

teristics of OLAP cubes such as multidimensionality, hierarchical structure of dimensions,

data sparseness, etc., are difficult to handle with ordinary record-oriented storage managers.

The SISYPHUS storage manager is based on the CUBE File organization in order to enable

the hierarchical clustering of cube data with a very low storage cost and thus provide an effi-

cient physical base for performing OLAP operations.

SISYPHUS has been developed with ANSI C++ [Str97] and is running on a Linux platform

(kernel 2.4.x) [LNX02]. It is a large program extending to more than 20,000 lines of code

(including comments), which are distributed to 21 source-code files (.cpp) and 24 header files

(.h). For the development the GNU [GNU03] compiler gcc (2.95.x) has been used and the

make utility. SISYPHUS has been built on top of the SHORE Storage Manager (interim re-

lease 2), a library for building object repository servers developed at the University of Wis-

consin-Madison [SSM99]. Finally, extensive use has been made of the C++ standard tem-

plate library STL [STL94] through out the SISYPHUS code.

We begin our discussion with the motivation for developing an OLAP storage manager. In

particular, we discuss requirements for OLAP relative to storage management and why con-

ventional relational storage managers do not fulfill these. Then, we present the core architec-

ture of SISYPHUS. We describe the various levels of abstractions provided by its design and

discuss modules and their corresponding functionality. Finally, we discuss some other aspects

of the system design such as the multi-user support and adaptability to different storage con-

177


figurations.

11.1 OLAP Requirements Relative to Storage Management A typical RDBMS storage manager offers the storage structures, the operations, and in one

word the framework, in order to implement a tuple (or record) oriented file system on top of

an operating system’s file system or storage device interface. Precious services, such as the

management of a buffer pool, in which pages are fetched from permanent storage and

“pinned” into some page slot in main memory, or the concurrency control with different kind

of locks offered at several granularities, and even the recovery management done by a log

manager, can all gracefully be included in a storage manager system. As an example, the re-

cord-oriented SHORE storage manager [SSM99] offers all of this functionality.

However, in the context of OLAP some of these services have a “restricted usefulness”, while

some other characteristics that are really needed are not supported by a record-oriented stor-

age manager. For example, it is known that in OLAP there are no transaction-oriented work-

loads with frequent updates to the database. Most of the loads are read-only. Moreover, que-

ries in OLAP are much more demanding than in OLTP systems and thus pose an imperative

need for small response times, which in storage management terms translates to a need for

efficient access to the stored data. In addition, concurrent access to the data is not as impor-

tant in OLAP as it is in OLTP. This is due to the read-oriented profile of OLAP workloads

and the different end-user target groups between the two.

Additionally, OLAP data are natively multidimensional. This means that the underlying stor-

age structures should provide efficient access to the data, when the latter are addressed by

dimension content. Unfortunately, record-oriented storage managers are natively one-

dimensional and cannot adapt well to this requirement. Moreover, the intuitive view of the

cube as a multidimensional grid with facts playing the role of the data points within this grid,

points out the need for addressing data by location (in the multidimensional data space), ex-

ploiting address-based access mechanisms and not by content that entails search-based access

mechanisms, as it is in ordinary storage managers.

Finally, dimensions in OLAP contain hierarchies. We have stressed in all the previous chap-

ters of this thesis the importance of HPP restrictions (see Definition 1) in typical OLAP

workloads. Ordinary storage managers do not support hierarchies in particular. Moreover, the

need for smaller response times makes the issue of good physical clustering of the data a cen-

tral point in storage management. Sometimes this might cause inflexibility in updating. How-

178


ever, considering the profile of typical OLAP workloads this is acceptable. Combining the

two, we advocate that the data clustering offered by an OLAP storage manager must be per-

formed with respect to the dimension hierarchies.

As a last point, we should not forget that OLAP cubes are usually very sparse. [Col96] argues

that only 20% of the cube contains real data but our experiences from real world applications

indicate that cube densities of much less than 0.1% are more than typical. Therefore, it is im-

perative for the storage manager to cope with this sparseness and make good space utiliza-

tion.

Figure 66: The abstraction levels in SISYPHUS storage manager.

11.2 The SISYPHUS Architecture

CubeAccess

Methods

AccessManager

BufferManager

FileManager

SSM

Cell-OrientedAccess

Chunk-OrientedAccess

Bucket-OrientedAccess

SSM Record-Oriented Access

OLAP Processing

Chunk-Oriented File Management

Buffer Management

LoggingRecovery

Bucket-Oriented File Management

Record-Oriented Storage Manager

Modular design and information hiding are software engineering principles that apply to the

implementation of any system and SISYPHUS could not escape this rule. The core architec-

ture of SISYPHUS has been based on a hierarchy of abstraction levels that each one imple-

ments a specific layer of functionality. Each level plays its own role in storage management

by hiding details of the levels below from the levels above. Figure 66, depicts the abstraction

levels implemented in SISYPHUS. This hierarchy of levels has to stand upon the correspond-

ing abstraction levels provided by the record-oriented SHORE storage manager (SSM)

[SSM99]. The aim of this section is not to cover in full detail all the operations involved in

179


the interface between different levels. Our primary goal is to present the basic modules of the

SISYPHUS architecture and describe the most important functionality offered by merely out-

lining the provided operations. We will start our description of Figure 66 in a bottom up ap-

proach.

11.2.1 SHORE Storage Manager The SSM provides a hierarchy of storage structures. A device corresponds to a disk partition

(raw device) or an operating system file used for storing data. A device contains volumes. A

volume is a collection of files and indexes managed as a unit. A file is a collection of records.

A record is an un-typed container of bytes consisting basically of a header and a body. The

body of a record is the primary data storage location and can range in size from zero bytes to

4GB. The calls to the operations provided by SSM are hidden inside methods of other “man-

ager” modules described next.

We characterize the public interface of this module as a record-oriented one. The provided

functions cover operations such as the creation/destruction of files, as well as the inser-

tion/update/deletion of individual records.

11.2.2 File Manager The SISYPHUS file manager’s primary task is to hide all the record-related SSM details. The

higher levels don’t have to know anything about devices, disk volumes, SSM files, SSM re-

cords, etc. The abstraction provided by this module is that the basic file system consists of a

collection of cubes, where each cube is a collection of buckets.

Each cube is stored in a single SSM file. We use an SSM record to implement a bucket. In

our case however, a bucket is of fixed size. Buckets play the role of the I/O transfer unit in

this file system and are equivalent to disk pages. However, there size might exceed a single

page (these are SSM pages and not operating system pages, although the size of these two

might be equal). A bucket is recognized within a cube with its bucket-id, which encapsulates

its record counterpart. The file manager communicates with the SSM level with record access

operations provided by SSM. Typical operations offered by this module include the creation

and destruction of a cube, also that of a bucket, a read operation for fetching a specific bucket

into main memory, a set of operations for updating a bucket and finally an operation for iter-

ating through all the buckets of a cube.

11.2.3 Buffer Manager The next level of abstraction is the buffer manager. This level’s basic concern is to hide all

180


the bucket file system specific details and give the impression of a virtual memory space of

buckets, as if the whole database was in main memory. It is a client of the file manager, in the

sense that buckets have to be read from a cube into a memory area in the buffer pool. This

module is built on top of the SSM (page-oriented) buffer manager and exploits its functional-

ity, while hiding it from the other modules. The underlying SSM buffer manager implements

an LRU-like page-replacement policy (in §13.2.2 we present a bucket-replacement policy

based on the dimension hierarchies) and also the collaboration with the transaction manager,

for logging of transactions and recovery precautions. Typical operations offered include the

pinning and unpinning of buckets into the buffer pool, also operations for accessing the con-

tents of a bucket, e.g., the bucket header, or a specific chunk, and a set of operations for up-

dating a pinned bucket.

The interface provided by the buffer manager to the next higher level is viewing a bucket as

an array of chunks. Therefore, appropriate chunk-access operations are used in this interface

that enable a chunk-oriented access to the underlying data.

11.2.4 Access Manager The access manager implements the basic interface to the “user” of SISYPHUS (e.g., this

could be an OLAP execution engine module). The most important responsibility of the access

manager is to provide access to the cube data in terms of the primary data navigation opera-

tions of the CUBE File (§4.1). We have seen that these operations enable a seamless naviga-

tion in the multi-dimensional and multi-level space of a hierarchically chunked cube. There-

fore, this module provides a cell-oriented access interface to the “outside world”, and each

cell is uniquely identified by its chunk-id (§2.3). Moreover, it realizes the notion of the cur-

rent position CUBE File (Definition 13), as well as the current drilling path construct

(Definition 14). All this functionality is wrapped up in a CubeAccess class created with a call

to an Access Manager function. As we have discussed in detail in the second part of this the-

sis, this functionality is used in order to define processing algorithms for typical OLAP

operations.

Another responsibility of the Access Manager is the management of the chunk-oriented file

system (i.e., the CUBE File). It provides operations for the initial loading and construction of

a CUBE File that implement the chunk to bucket allocation algorithm discussed in §3.2. Fur-

thermore, it provides an interface to the basic CUBE File maintenance operations (§4.2).

Playing the role of the central access point to the data stored and managed by SISYPHUS, the

181


access manager provides an access interface also to other structures not necessarily hosting

cube data. For example it provides an interface for the management of dimension data that

corresponds to the physical design and processing discussed in chapter 1. Finally, it provides

an access interface to other storage facilities such as B+ trees and files of records.

Figure 67: The basic threads of control when N clients are connected to a server using SISYPHUS.

11.2.5 Catalog Manager The catalog manager (this module is not depicted in Figure 66) implements the basic data-

base catalog services in SISYPHUS. On instantiation it creates the catalog structure, which

Entry point

Process SISYPHUSconfiguration options

WAIT

Start-up Thread

New System Manager

New Catalog Manager

New File Manager

New Buffer Manager

WAIT

Stdin Thread

Get, parse and executecommands fromserver-console

Listener Thread

Listen forclientconnections

Client Thread 1

New AccessManager . . .

Client Thread N

New AccessManager

Get, parse andexecutecommands fromserver

Get, parse andexecutecommands fromserver

New AccessManager

182


records the objects inside an SSM volume. Essentially this structure consists of a few SSM

files storing meta-data records accompanied with appropriate B+ tree indexes for fast access

to these meta-data. It typically provides operations for registering and removing objects from

the SISYPHUS catalog.

Actually there are two catalogs maintained by the catalog manager: (a) the catalog for multi-

dimensional data and the (b) catalog for relational data. In the current version, the former re-

cords multidimensional entities, i.e., cubes and dimensions, while the latter stores meta-data

on B+ trees and simple files of records. Our intention for the catalog for relational data is to

enable the storage management of relational structures within SISYPHUS and thus to avoid

external storage.

11.2.6 System Manager Finally, there is the system manager (also not depicted in Figure 66), which is responsible for

the overall system management. On start up, it mounts the appropriate device of volumes and

if requested it formats a device, as well as initializes the volume directory (a B+ tree re-

cording the contents of an SSM volume). Also, in the case of a previous crash it performs re-

covery based on a redo log. All these are implemented by appropriate calls to the underlying

SSM module. In fact the system manager encapsulates an SSM instance in order to enable the

access to the SSM functionality.

11.3 Multi-User Support SISYPHUS is actually a library and not a stand-alone server. It is intended to be incorporated

within an OLAP server (i.e., code that uses the library). Therefore, it is the server’s responsi-

bility to interact with multiple clients. For the server to accomplish this without blocking the

entire process there are actually two possible solutions: it will be either a multi-process

server, or a multi-threaded server. Long experience in DBMS system implementation re-

ported in the literature (e.g., [Ses98, SRH90] to mention a few) shows that the latter solution

is more flexible, faster, imposes fewer overheads in communication and synchronization and

allows more information to be shared.

SISYPHUS provides the facilities to implement a multi-threaded server capable of managing

multiple transactions. Actually, this is due to the underlying SSM, which comes with its own

threads package implementation and transaction facilities. Although theoretically, it would be

possible to use a different threads package; in reality, it is important that the threads package

used in a database system be compatible with low-level operations like buffer management, 183


concurrency control, etc. Consequently, a server implementation is bound to use the SSM

threads.

In Figure 67, we depict the basic threads of control in a typical scenario, where N clients are

connected to a server using SISYPHUS. The server starts-up by processing several SISYPHUS

system-related configuration options and then it forks off a thread that will instantiate various

manager classes, such as the system manager, the file manager, the catalog manager and the

buffer manager. Inside the system manager an SSM instance is hidden (“encapsulated” in the

object-oriented paradigm). At this time, recovery is performed in case of a previous system

crash. Then a thread is forked for accepting commands from the standard input (i.e., server

console). For serving client connections a listener thread is forked, that listens for connec-

tions on a well-known socket port. Each time a client wants to be connected, a separate client

thread is forked for serving him exclusively. For each client a new access manager instance is

created providing access to a collection of cubes (equivalent to a database instance in DBMS

terms). Through the access manager interface, each client can access all SISYPHUS’ func-

tionality by submitting commands over the socket in the form of ASCII text. Of course, any

other client-server communication protocol might be implemented by the server, and even 3-

tier architectures might be used; this is just an example to illustrate how SISYPHUS can be

used by a potential server.

Actually, in order to test SISYPHUS, we have implemented a simple server with a parser that

accepts (from the server console in the current version) SISYPHUS commands. An example

of a typical sample of commands is the following: create_cube <name> (create an empty cube object in the

SISYPHUS catalog) load_cube <name> <dim_schema> <data> <config> (initial loading of an empty cube with

<data>, dimensions’ schemata and a configuration file also provided)

drop_cube <name> (delete a cube from SISYPHUS database and reclaim allocated space)

As can be seen from the figure, manager instances are shared by all threads (except from the

access manager). This code is thread-safe because these managers are stateless objects. They

simply provide a set of functions to be invoked (in C++ terms these are called static member

functions [Str97]) and store no information whatsoever. However, if a need for sharing ob-

jects with state arises, then SSM provides all the thread synchronization mechanisms for do-

ing it thread-safely.

Transaction semantics and concurrency control are provided by the underlying SSM modules.

184


SSM uses a standard hierarchical two-phase locking protocol [GR93]. By exploiting the

mapping of SISYPHUS storage structures to SSM storage structures, SISYPHUS can provide

locking at the cube and bucket level. However, we have to implement locking at the chunk

and cell level from scratch, as it cannot be supported by SSM. However, bearing in mind the

read-mostly environment in which an OLAP storage manager operates such a locking granu-

larity might be unnecessary. Normally, all SISYPHUS operations that access or modify

persistent data structures acquire locks and are protected within transactions. For example the

initial loading of a cube is done in a single transaction to ensure its execution with the ACID

properties. Typically, shared locks and exclusive locks can be provided when pinning buckets

in the buffer pool.

11.4 Alternative Storage Configuration Options SISYPHUS was designed and built primarily as a research prototype. For this reason, one of

the very first requirements was to design the system so that it can accommodate a variety of

storage alternatives. This can be applied in two levels: (a) to the overall storage organization

and (b) within a specific storage organization.

For the first case, special care has been taken in the design of the SISYPHUS catalog in order

to support other storage organizations, apart from the CUBE File. The new storage organiza-

tion can be gracefully incorporated into SISYPHUS by simply adding appropriate routines

reflecting the new method in each manager class. The whole implementation framework will

remain intact. Besides, all potential storage schemes will need the abstraction level hierarchy

of Figure 66. Only the interface between the levels will have to change.

For the second case, the modules implementing the chunk-oriented file system have been de-

signed so that they can exploit alternative ways for accomplishing specific tasks. For exam-

ple, the algorithm for forming bucket-regions (see §3.2.1) can be changed by the user. All

that is needed is for the user to provide a new function-class [Str97] implementing the new

algorithm and then re-compile SISYPHUS. The same also holds for methods resolving the

storage of large data chunks (§3.2.2), storage of the root directory (§3.2.3), and for traversing

a chunk-tree, in order to place its nodes in a bucket. All these can be seamlessly injected in

the system with a minimum of interference with the existing code. Thus, variant flavors of

the storage organization can be used according to specific needs. At command line (see pre-

vious example of commands), a configuration file is provided that sets the desired options for

storing a specific cube. These options, after the initial loading, are stored along with other

185


information, as meta-data for each cube. This also means that we can have different cubes

stored with different ways in the same cube collection (i.e., database instance).

186

Chapter 12: Implementation of the CUBE File

12 Implementation of the CUBE

File

n

bI this chapter, we present the implementation of the chunk-oriented file system provided

y SISYPHUS (i.e., the CUBE File) and discuss specific design issues and implementa-

tion solutions. We begin our discussion by posing the objectives of such a file system. We

continue with the design decisions in order to tackle the cube sparseness problem. The inter-

nal organization of buckets is described next. Then, we describe physical chunk-ids and con-

trast them to the logical chunk-ids defined in the first part of this thesis. Next we present the

internal organization of a chunk and explain the rationale for our design decision to imple-

ment a chunk as a multidimensional array. Finally, we describe the implementation of the

CUBE File initial bulk-loading and construction.

12.1 Objectives of a Chunk-Based File System The file management layer in Figure 66, as we have seen in the previous chapter, implements

a bucket oriented file system. In this file system, a cube is a collection of fixed size buckets.

This basic file system is used as the foundation for implementing a chunk-oriented file sys-

tem, where chunks are formed according to the CUBE File organization. A chunk-oriented

file system destined for a storage base for OLAP cubes, has to provide the following services:

Storage allocation: It has to store chunks into the buckets provided by the underlying

bucket-oriented file system.

Chunk addressing: A single chunk must be addressable from other modules. This means

that an identifier must be assigned to each chunk. Moreover, an efficient access path must

exist via that identifier.

187


Enumeration: There must be a fast way to get from one chunk to the next one. However,

in a multidimensional multi-level space, “next” can have many interpretations.

Data point location addressing: Cube data points should be made accessible via their lo-

cation in the multidimensional multi-level space.

Data sparseness management: Space allocated should not be wasteful and must handle

efficiently the native sparseness of cube data.

Maintenance: Although, transaction oriented workloads are not expected in OLAP envi-

ronments, in order to expect frequent data updates; however, the system must be able to

support at least periodic incremental loads in a batch form.

We briefly reason why the CUBE File fulfils all of the above requirements. In the first part of

this thesis we have discussed how chunks in the CUBE File are uniquely identified by their

chunk-id (see §2.3). Moreover, in the second part, we have seen how efficient access opera-

tions can be defined through this identifier. For enumerating the chunks of a cube, we exploit

the CUBE File primary data navigation operations and the construct of the current position in

the CUBE File (§4.1), which enables an enumeration of many possible orderings. In addition,

we have seen how the chunk-id assigned to each data point represents on the one hand the

location of the data point (since it is formed directly of the dimension hierarchy values), and

on the other, due to the structure of a chunk (discussed later on in this chapter) it is used for

accessing the data points, through an offset-based mechanism instead of a search-based one.

Finally, data sparseness management is discussed in the following section and maintenance is

served by the bulk incremental updating algorithm, as well as the CUBE File primary reor-

ganization operations discussed in §4.2.

12.2 Coping with Cube Sparseness One cannot stress enough the importance of good space utilization, when it comes to the stor-

age of OLAP cubes. Mainly because the data density of a cube (i.e., the ratio of the actual

number of data points with the Cartesian product of the dimension grain level cardinalities) is

extremely low. Therefore, when we designed the chunk-based file system of SISYPHUS, we

had to face the fact that most of the chunks will be empty, or almost empty, and that a full

allocation of cells was practically out of the question.

A key observation is that OLAP data are not at all randomly distributed in the data space but

rather tend to be clustered into dense and sparse regions [Col96, Sar97]. What is really inter-

188


esting though, is that in almost all real-world applications these regions are formed by a com-

bination of values in the dimension hierarchies. For example, the fact that a specific product

category was not sold to specific countries, results in a number of “empty holes” in the data

space. As soon as this was realized, it was not difficult to see that the adopted method for

hierarchical chunking (§2.2) is ideal for exploiting this characteristic, since the chunk

boundaries coincide with the boundaries formed from hierarchy value combinations. There-

fore, a simple and clear-cut storage rule resulted naturally: no allocation of space is made for

empty subtrees.

In other words, a specific combination of hierarchy members (of a level higher or equal to the

grain level) that corresponds to non-existent data points, translates, in the chunk-tree repre-

sentation of the cube (§2.3), to an empty subtree hanging from a chunk cell. In the trivial

case, this subtree might be a single data chunk or a single data cell in a data chunk. Therefore,

for a directory chunk containing such empty “roots” we simply mark empty cells with a spe-

cial value and no allocation of space is done for empty subtrees. Large families of chunks

that end-up to many data chunks, and are empty, will not consume any space in SISYPHUS.

In a similar manner, for a data chunk containing empty data cells, we only allocate space for

the non-empty ones. This is an additional compression measure taken especially for the data

chunks, since these will be the largest in number and their entries correspond to data values

and not to other subtrees; therefore, just marking empty entries would not yield a significant

compression. In order to retain the fast cell addressing mechanism within a data chunk (a

discussion on the implementation of a chunk follows), we have decided to maintain a bitmap

for each data chunk indicating which cell is empty and which is not. This is not necessary for

the directory chunks, because in this case we allocate all the cells for those chunks that con-

tain at least one non-empty cell. This way, we sensibly allocate space for the grain level,

where the size of the data space is extremely large.

Finally, there is also a significant amount of space savings arising directly from the adopted

structure for the implementation of a chunk, for which there is no need to store the coordi-

nates of a data point along with its corresponding value; but we will come back to this issue

in the corresponding section.

12.3 Internal Organization of Buckets A bucket in SISYPHUS is the basic I/O transfer unit. It is also the primary chunk container.

Buckets are implemented on top of SSM records. The structure of a bucket is quite simple:

189


we have implemented a bucket as an array of variable-size chunks. Imagine a bucket consist-

ing of a sequence of chunk-slots, where in each slot a single chunk is stored.

Along with this implementation decision came several problems that had to be tackled: First

of all, since chunk-slots were going to be of different length, we would need a bucket internal

directory to manage storage and retrieval of chunks. A simple way to do this would be to in-

clude an array with as many entries as are chunks stored that point to the beginning of each

chunk. Of course, the number of chunks stored in a bucket is not fixed and allocating a

maximum number of directory entries would waste much space. Therefore, the directory had

to grow dynamically, just like the chunk-slots would be filled dynamically.

A well-established solution whenever two dynamic data structures have to be located in the

same linear address space (typical example: the heap and the stack in a process address

space) is to allow them to grow toward each other [Gra93]. Therefore, we place the bucket

header (which is of fixed size) at the beginning of the bucket (low-address) and chunks are

inserted right after it in increasing address order. The bucket directory is indexed backwards;

that is the first entry occupies the highest bucket address, the second entry comes in the next-

lower address, and so forth. This is depicted in Figure 68. In the same figure, we can see the

implementation of a bucket over an SSM record.

Figure 68: The internal organization of a bucket.

Free space management in the bucket has to make sure that the bucket directory and the in-

serted chunks do not overlap. This is achieved with information included in the bucket header

such as the current number of stored chunks (denoting also the size of the directory), the cur-

rent amount of free space and the location of the next available chunk-slot. Also in the bucket

header a “previous” and a “next” link (i.e., bucket-id) are included for exploiting logical

...Bucket Header 0

SSM recordheader

SSM record body

... C0 C1 C2 ... Free Space ... 12

Internal BucketDirectory

Chunk Slots Chunk anddirectory entriesgrow in oppositedirections

190


bucket orderings other than the physical one.

Another important benefit from the use of the bucket directory is that the actual location of

the chunk within a bucket is hidden from external modules, which only need to know the di-

rectory entry (i.e., chunk-slot) corresponding to a chunk; thus, allowing transparent internal

reorganization of a bucket.

The order in which chunks are laid out in a bucket is as follows: When we have to store a

subtree, we descend it either in a depth-first or breadth-first manner and we store each chunk

the first time we visit it. Parent cells are visited in the lexicographic order of their chunk-ids,

thus their corresponding child chunks are stored accordingly.

During the chunk-to-bucket allocation (§3.2) and in particular, during the storage of the root

directory (§3.2.3), we have seen that the upper part of the root directory is stored in a special

bucket, called the root-bucket; the size of which equals the size of the available cache mem-

ory, in order to load it into the cache, when accessing the cube. The internal organization of

the root-bucket is different from the one of a simple, fixed-size bucket; and is depicted in

Figure 69.

In this organization the bucket directory and the chunk slots are stored consecutively in the

bucket body. Extra directory entries for the bucket directory are included in order to antici-

pate future appends of chunks. The chunk slots occupy the last part of the bucket body and in

the case of future appends extra space is allocated on demand (and not a priori). The root

bucket header records the total number of directory entries, the number of chunk slots (i.e.,

number of real chunks stored in the root bucket), the total size of the root bucket and the byte

offset, where the first chunk-slot resides.

Figure 69: The internal organization of the root-bucket.

The decision to differentiate the design of the root-bucket -albeit this would complicate the

C0 C1

Internal BucketDirectory

Chunk SlotsExtra directoryentries for futureappends

RootBucketHeader

10 N... ... CN

Future append ofchunk

SSM recordheader SSM record body

191


code, since we would have two separate bucket-structures to handle- was based on the differ-

ent characteristics of the latter. In particular, the root-bucket does not have a fixed size; and

the available cache memory size is merely an upper bound and thus essentially, the root-

bucket’s size is decided dynamically, as soon as the size of the root directory is calculated.

Moreover, the size of the root bucket is significantly larger than that of a simple bucket. The

idea here is that a root bucket comprises a number of consecutive buckets that are sequen-

tially read in the cache area once, when a cube is “opened” for access. However, the main

reason that led to us to abandon the simple bucket organization design, was the fact that we

wanted to allocate some extra space in the root bucket for future appends of chunks.

In §4.2.1, we have described the case where the root bucket overflows, due to some bulk in-

cremental updating and thus new simple buckets are allocated to accommodate the inserted

chunks (see expand_root_dir reorganization primitive). However, this increases the path from

the root chunk to the data chunks and thus makes query evaluation more inefficient. By re-

serving some extra space, we can anticipate for future appends and stall the root-bucket over-

flow. The problem with the bucket organization of Figure 68 is that we have to allocate extra

space for the chunk-slots as well as for the directory entries, since the free space lies in be-

tween the two. Considering, the size of the root-bucket this means that a significant amount

of space must be reserved, even though it could remain unused after all. On the contrary, with

the internal organization of Figure 69, we only need to allocate extra space for the directory

entries (which are typically small –see following section on physical chunk-ids) and thereaf-

ter allocate space for additional chunk-slots on demand and with append-only operations.

Thus, this organization enables us to allocate extra space, in anticipation of future insertions

in the root-bucket, with a minimal storage overhead and hence it was the one chosen at the

end.

Root directory chunks are laid out in the body of the root-bucket in a breadth-first or depth-

first way (this can be specified as a user storage configuration option) with the higher level

chunks (i.e., smaller depth ones) being stored first. Therefore, the root chunk is stored in the

first chunk-slot (C0 in Figure 69). This has the benefit that if we want to reduce the size of

the root-bucket (i.e., the amount of the root directory that will be cached), we can simply

truncate a part at the end of the root-bucket (storing it into simple buckets) and still have the

higher-level nodes cached (since these are the most frequently accessed nodes in a CUBE

File).

192


The structure of a bucket (simple one or the root-bucket) provides us with a way to uniquely

identify chunks residing in a bucket with a simple and system-internal id. A discussion on

this follows in the next section.

12.4 Logical vs. Physical Chunk-ids In §2.3, we have defined the chunk-id as a unique identifier of a chunk within the hierarchi-

cally chunked cube. This is a “logical id” independent of the physical location of the chunk

and based completely on the “logical location” of the chunk in the multidimensional and

multi-level cube data space. It is the counterpart of an attribute-based primary-key in a rela-

tion.

A physical chunk-id (cid) is defined as a (bucket-id, chunk-slot index) pair. In other words, it

represents the physical location of a chunk within a CUBE File by designating the specific

bucket and index entry within the bucket directory, where a chunk resides. The “pointers” in

the directory chunk entries in the chunk-tree representation of a cube are implemented as cids

and not as chunk-ids, which are long and of variable length (their length depends on the

maximum chunking depth of each cube) and thus difficult to handle. cids provide a simple

and efficient way for accessing a chunk, eliminating the need for a sequential (or binary)

search within a bucket. That would have been the case, if we used the chunk-ids for address-

ing a chunk within a bucket. Yet, more importantly, this saves us from the burden of having

to store the chunk-id with each chunk, as we would have to do in conventional record-

oriented storage.

6

Indeed, the really interesting thing about logical chunk-ids is that we only need to use them;

we don’t have to store them. In other words, the primary access path provided by the chunk-

oriented file system exploits chunk-ids for traversing the data structure; and in particular for

accessing the data within a chunk node. However, the storage of the chunk-id along with each

chunk (at the individual bucket level), or more importantly within each cell of a chunk (at the

individual chunk level), is not necessary, as it would be in the case of conventional relational

storage. This is a direct consequence of SISYPHUS’ location-based access to cube data in

contrast to the content-based one employed by relational storage managers. Here by “loca-

tion” we mean the “logical location”, i.e., with respect to the multidimensional data space and

not the physical location (in terms of cids).

6 With the term “chunk-id”, unless explicitly stated, we will mean a logical chunk-id. We will use “cid” for physical chunk-ids. 193


The location-based access leads to significant space savings and to a more efficient data ac-

cess mechanism. This mechanism is basically enabled by the chosen structure for implement-

ing a chunk and is outlined in the next section.

12.5 Internal Organization of Chunks A chunk is actually a set of fixed-size cells. Directory chunk cells contain a single cid value,

while data chunk cells contain a fixed (within a specific cube) number of measure values. For

the implementation of chunks we considered basically two alternatives: (a) a relational tuple-

based approach and (b) a multidimensional array-based approach. We have chosen the latter.

In order to take this design decision we first had to identify the functional requirements per-

taining to chunks: A chunk resides always within a bucket (note, that we do not allow a single

chunk to span more than one buckets). As we have described in §3.2.2, large chunks are con-

tinuously re-chunked (even when the hierarchical chunking stops, artificial chunking is em-

ployed), until subtrees or single chunks that can fit in a bucket are produced. A chunk must

consume a minimum of space and provide an efficient access mechanism to its cells. The

cells in a chunk will be initially loaded with some data, probably leaving many cells empty,

which will be filled with new values (typically along the time dimension) that will occupy

previously empty cells. Reclassifications on the dimension hierarchies that would trigger a

more radical reorganization of a chunk are expected in a more seldom rate.

Clearly, the first alternative would mean that along with each cell of a chunk, we would have

to store its coordinates (or actually the D-domain (§2.3) in the chunk-id corresponding to the

chunk’s depth). Then, in order to access a cell within a chunk, sequential or binary search

would have to be employed, i.e., a search-based access mechanism would be employed. Of

course, we would have great flexibility in deleting and inserting cells, without having to

worry about the order of the cells.

Multidimensional arrays (md-arrays) on the other hand, are very similar in concept with

chunks in the sense that values are accessed by specifying a coordinate (index value) on each

dimension. In other words, they are the natural representation of chunks. The fixed size of the

cells allows for a simple offset computation in order to access an md-array cell, which can be

performed in constant time and is far more efficient than any search-based access mechanism.

Moreover, it gives us an opportunity for effective exploitation of logical chunk-ids, which

essentially consist of interleaved coordinate values. In addition, exactly because of the ad-

dress-based access mechanism, we don’t have to store the chunk-id along with each cell.

194


Most well-known shortcomings of arrays are (see [SS94] for a thorough analysis): (a) the

ordering of cells clusters them with respect to some dimensions, while disperses them with

respect to some others, (b) multidimensional arrays can be very wasteful in space for sparse

chunks, and (c) the “linearization” of cells results in inflexibility for frequent reorganizations,

in the presence of dynamic deletes and inserts.

Since chunks are always confined within a single bucket, dispersal of cells should not be a

problem in the sense of imposing more I/O cost. The same argument holds when reorganiza-

tion of a chunk within a bucket takes place (the case of a bucket overflow has to do with re-

organization at the bucket level and is actually independent from the chunk implementation).

Of course, more processing would be required for moving around cells but considering that

this will only take place in a batch form periodically; it should be acceptable. After all, since

we anticipate more frequent insertions, e.g., along the time dimension, the chosen ordering of

cells could be such that will turn these updates to append-only operations (Figure 31). Fi-

nally, the “compression” techniques described in a previous section address the sparseness

problem sufficiently.

Figure 70: The internal organization of a chunk.

Therefore, bearing in mind the advantages of multidimensional arrays and the workload pro-

file, we concluded that paying the cost of consuming more space and providing a slower cell-

access mechanism for the mere benefit of allowing faster dynamic updates, was not worth it.

These were the major justifications for our design choice.

...Chunk Headerfixed-size part

Chunk Header variant-size part

Chunk fixed-size part Chunk variant-size part

Chunk Body

In Figure 70, we present the internal organization of a chunk (both directory and data). We

can see the chunk header included in the chunk body. The minimum information that we

need to store in the chunk header is the chunking depth, and the order-code ranges for each

dimension that the chunk covers (i.e., two values per dimension). In the chunk body we store

the chunk entries but also the compression bitmap (see §12.2) for data chunks. The fixed-size

parts in the chunk header and chunk definitions include simple fields (e.g., the chunking 195


depth) and in-memory pointers to the variable-size fields (e.g. cell entries and bitmap). These

pointers are updated the first time a chunk is loaded from disk in main memory to the appro-

priate offsets.

12.6 Initial Bulk-loading and Construction of the CUBE File The CUBE File is a structure that needs to be initially bulk-loaded with data instead of to be

filled by individual inserts. Thereafter, incremental updates (§4.2.2) -again in bulk-mode- as

well as data purging operations (§4.2.3) can be performed. The initial loading phase is a cru-

cial one, since essentially it entails the construction of the CUBE File structure. Τhe hierar-

chical chunking (§2.2), as well as the chunk to bucket allocation (§3.2), resulting to hierar-

chically clustered data, take place during this time. 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24:

ConstructCUBEFile(CubeInfo info, File data, File options) { set CUBE FILE construction parameters from options file; // PHASE I // Apply hierarchical chunking and calculate the storage // cost for the nodes of the chunk hierarchy tree // In this phase we will use only chunk headers. // Create cost-tree btree = create a B+tree; bulk-load btree from data file; costTreeRoot = create cost tree; // PHASE II // Perform chunk-to-bucket allocation deque<DirEntries> rootBuckEnt; PutChunksIntoBuckets(costTreeRoot,btree,info,rootBuckEnt); // store the root directory in the root-bucket storeRootDir(rootBuckEnt); RETURN; }

Figure 71: Algorithm for the construction and bulk loading of a CUBE File.

In Figure 71, we present the algorithm that implements the initial loading and construction of

the CUBE File. As we can see, it consists of two phases. In phase I, an in-memory

representation of the cube data distribution will be created, which is called the cost-tree of the

cube. Essentially, this corresponds to the chunk-tree representation of the cube (§2.3), where

the nodes store only the information of the relevant chunk header (§12.5), instead of the real

chunk entries. Therefore, by building this structure we actually perform hierarchical chunking

(§2.2). The most important information in the chunk header are, the number of non-empty

196


cells in a chunk, as well as the total number of cells, because it allows us to calculate the ex-

act cost for storing each chunk. This will be exploited in phase II, where the chunk-to-bucket

allocation takes place and where the cost for storing each chunk subtree is required by the

corresponding algorithm (Figure 5).

Initially, the CUBE File construction options are set (line 2 in Figure 71); provided by a user-

edited configuration file. These affect the way the bucket-regions are formed (§3.2.1), the or-

der (depth-first, or breadth-first) that chunks are laid out in each bucket, the method for per-

forming artificial chunking to resolve the storage of large data chunks (§3.2.2), the method

for storing the root directory (§3.2.3), as well as the cache area size constraint and the amount

of extra space to be allocated for the root bucket (§12.3).

An input file data is assumed to contain the grain-level measure values of the cube that

must be loaded into the CUBE File. Also, the dimension data encoding (§2.1) has taken place

and the hierarchies (in terms of order-code ranges per parent in each level) have been loaded

in main memory. In order to build the cost-tree a significant processing of the input data file

(i.e., multiple passes over the input data) must take place, because the cost-tree essentially

records the data distribution of the input data points in the multidimensional and hierarchy-

enabled data space and thus for each chunk we search for its non-empty cells. To speed up

this processing, we create and bulk-load with the input data a B+ tree (lines 11-12). We as-

sume that the input file contains, in each line, a specific grain-level data point, appearing with

its chunk-id and corresponding measure values. In addition, we assume that the data points

are sorted by ascending lexicographic order of their chunk-ids. This is depicted in Figure 72,

where we present an example of such an input file. After the chunk-id in each line, two

measure-values appear for each data cell. In the figure, we have separated the data points of

each data chunk by a blank line for presentation reasons.

We bulk-load a B+ tree with the data in the same order as the one they appear in the input

file. As a search key, we use a simple integer that increases sequentially with each new data

point. Instead of scanning the input file to find the data points that correspond to each chunk,

we exploit the B+ tree, in order to perform only a single range selection for each chunk. In

particular, in Figure 72 we depict the performed range selections that correspond to chunks at

different depth. The attached numbers denote the order that each one takes place. The first

selection is essentially a full scan over the data, in order to identify the non-empty cells of the

root-chunk (chunking depth D = 0). During this scan, also the ranges of each non-empty

197


child-chunk will be recorded and executed as the next group of range selections (appearing

with numbers 2 and 3 in the figure) referring to the corresponding chunks at depth D = 2.

Likewise, during each new range selection, the non-empty cells of the corresponding chunk

are identified and the range for each child-chunk is recorded. This continues until we reach

the maximum chunking depth, where we only identify the existing data points for each data

chunk. Therefore, for our example the cost-tree is built with 16 range selections on a B+ tree,

(which is an ideal structure for this kind of selections), instead of performing 16 file-scans on

the input data.

Figure 72: Order of range-selections over the bulk-loaded B+ tree during the cost-tree creation.

After the cost-tree is built, the chunk-to-bucket allocation has to take place. In line 19 of

Figure 71, we invoke a routine, which implements the greedy algorithm for solving the HPP

chunk-to-bucket allocation presented in Figure 5. During the execution of this algorithm, data

chunks will be loaded with measure values and chunks will be packed into buckets. In line

19, we observe among the input arguments of PutChunksIntoBuckets a parameter stor-

ing the directory entries of the root-bucket (rootBuckEnt). This will be filled from the

VALUES_START0|0.0|0.0|-1.0|0 3.5 1.70|0.0|0.0|-1.0|1 67.0 12.9

0|0.0|0.1|-1.1|0 34.45 27.80|0.0|0.1|-1.1|1 127.1 100.50|0.0|0.1|-1.2|0 130.0 150.00|0.0|0.1|-1.2|1 78.9 65.7

0|0.0|1.0|-1.0|2 78.3 32.40|0.0|1.0|-1.0|3 56.5 90.2

0|0.0|1.1|-1.1|2 560.6 322.70|0.0|1.1|-1.1|3 250.4 230.50|0.0|1.1|-1.2|2 1055.8 988.70|0.0|1.1|-1.2|3 104.5 89.2

0|0.1|0.2|-1.3|0 56.8 66.40|0.1|0.2|-1.3|1 34.5 45.1

0|1.0|2.0|-1.0|4 88.9 50.20|1.0|2.0|-1.0|5 200.3 178.40|1.0|2.0|-1.0|6 456.9 333.10|1.0|2.0|-1.0|7 67.2 75.2

0|1.0|2.1|-1.1|4 45.1 32.90|1.0|2.1|-1.1|6 78.3 69.2

0|1.0|2.1|-1.2|4 189.5 156.10|1.0|2.1|-1.2|6 2.1 1.50|1.0|2.1|-1.2|7 5.7 5.7VALUES_END

1

2

4

9

10

5

11

12

6

3

14

15

8

7

13

16

198


chunk-to-bucket allocation routine (PutChunksIntoBuckets). Therefore, when the

chunk-to-bucket allocation completes we subsequently store the chunks of the root bucket

(line 22). Note that the exploited data structure used to hold in main memory the contents of

the root bucket is a deque. This is a vector, where data can be inserted also from both ends.

By inserting the chunks from the front, we can store the higher-level nodes of the root-bucket

first, as we descend the chunk-tree in a depth-first fashion. This has the benefit that, if we

want to reduce the size of the root-bucket (i.e., the amount of the root directory that will be

cached), we can truncate a part at the end of the root-bucket (storing it into simple buckets)

and still having the higher-level nodes cached (since these are the most frequently accessed

nodes in a CUBE File) –see §12.3.

199

Chapter 13: The Processing Engine

13 The Processing Engine

n

pI this chapter we discuss the implementation of an ERATOSTHENES’ processing engine

rototype. Mainly, this prototype has been developed in the context of a diploma thesis

work [Sam01] supervised by the author. The engine includes all the processing algorithms

both on dimension data, as well as on cube data, discussed in the second part of this thesis.

However, in the current version the underlying data reside in an in-memory data structure

simulating the CUBE File. The reason is that SISYPHUS was not ready at the time this thesis

was worked out. The interface used by the engine in order to retrieve the underlying data, is

formed of the basic data navigation operations of the CUBE File data structure (see §4.1).

This makes the transition to the real storage manager a relatively simple task.

The programming language that we used was ANSI C++ [Str97] and the underlying platform

was Linux (kernel 2.2.x) [LNX02]. For the dimension-data processing we have used an ex-

ternal RDBMS [MySQ02] running on Linux. For the dimension-data access we have used the

ODBC API on top of the UnixODBC driver manager [ODBC02].

This chapter consists of two sections. In the first one, we outline the implementation of the

processing engine prototype. In the second, we discuss design choices and provide implemen-

tation hints for a future incorporation of the processing engine into the ERATOSTHENES sys-

tem.

13.1 SQNF Query Evaluation The primary responsibility of the processing engine is to evaluate star queries, which in our

case are expressed in SQNF (see §7.1). In Figure 73, we depict the basic processing flow in

the evaluation of a star query, as this has been implemented in our engine prototype. As can

be seen from the figure, the processing can be distinguished in three separate phases.

201


The first phase comprises the syntactic processing. During this phase the submitted star query

must be converted from SQL to SQNF. In addition, the subsequent CUBE File processing

will be identified from the syntax of the query. The implementation of the conversion from

SQL to SQNF is straightforward and has been based on the mapping appearing in Table 3. Of

course we have assumed that the input SQL complies with the ad hoc star query template of

Figure 34. Note that the SQNF produced from this processing phase, does not contain the di-

mension restriction term (D term) expressed as a chunk expression, nor does it contain the

result granularity term (R term) expressed in terms of chunking depths (Definition 18). These

conversions take place in the next phase.

Another important part of the syntactic processing is the identification of the “type” of the

query in order to invoke subsequently the appropriate CUBE File processing algorithm. To

this end, in our prototype implementation we distinguish an incoming star query into three

different categories: (a) queries that require no aggregation, (b) queries that require aggrega-

tion but no grouping and (c) queries that involve both grouping and aggregation. In order to

classify an incoming query to one of these categories, we examine the SQNF aggregation

term (A term), as well as the result granularity term (R term) and apply the following decision

rules:

Definition 20 (Syntactic Decision Rule 1)

If all the terms appearing in the result granularity term R of a query Q = (D,M,R,A,T,O) ex-

pressed in SQNF, correspond to the most detailed levels of the dimensions, then Q is a query

that requires no aggregation.

Definition 21 (Syntactic Decision Rule 2)

If for each term t, corresponding to a dimension D (1≤ i ≤ N, where Ν is the cube dimension-

ality), appearing in the result granularity term R of a query Q = (D,M,R,A,T,O) expressed in

SQNF, holds one of the following, then Q requires aggregation but no grouping:

i

1. t appears with the special ALL value (see §0), or

2. t is of the form D .h , or D .f , where h and f are a hierarchical and a feature attribute of

D respectively, corresponding to the hierarchy level k (assuming that 1 is the most de-

tailed level). In this case there must be a single term only in the dimension restriction of

Q corresponding to D , of the form D .h = c (or D .f = c) , where k’ ≤ k and c is a con-

stant. In other words, a single hierarchy-attribute equality restriction must appear, where

i k i k k k

i

i i k’ i k’

202


the restricted level k’ is at least as detailed as the result granularity with respect to di-

mension D . i

Finally, if none of the above rules hold, then the query requires both grouping and aggrega-

tion.

Figure 73: The general processing flow for the evaluation of SQNF queries.

The second phase is the chunk expression processing. SQNF queries have to be completed

with the dimension restrictions represented as a single chunk expression and the result granu-

larity term expressed in terms of chunking depth values ((Definition 18).

Initially, the D term in the SQNF query is analyzed. We deal with each dimension separately.

We process each dimension according to the interleaving order (§2.3). The local predicate

(§6.3) for each dimension, appearing in D, is used, in order to form a single SQL query that is

submitted to the external RDBMS hosting the dimension data. The retrieved results comprise

the member-codes (§2.1) that qualify the specific local predicate. The sorted stream of mem-

ber-codes is examined in order to identify potential order-code ranges. This results into the

SQL query

SQL parser

SQNF Writer

SQNF

tokens

Mb-codeExtractor

Dimension DataExternal RDBMS

dim dataaccess API

ChunkExpression

Writer

mb-codes

SQNFEvaluator

SQNF(instantiated

form)

Cube Data(Sisyphus)

CUBE Fileaccess API

SyntacticProcessing

Chunk ExpressionProcessing

SQNFProcessing

203


construction of the corresponding member-code specification (Definition 17). The supported

subset of member-code specifications is depicted in Table 6. Finally, the same process is re-

peated for all dimensions appearing in the D term; if a dimension does not appear in D, then

we assume a member-code specification of the form *|…*. … .*|…*. When the member-

code specifications for all dimensions are constructed, then they are interleaved, in order to

retrieve the corresponding chunk expression.

The last phase in Figure 73 is the SQNF processing. In this phase the stored cube data are ac-

cessed and appropriate CUBE File processing algorithms are executed in order to produce the

final result. A Cube class is instantiated that encapsulates the access to the underlying CUBE

File. The basic public interface of this class consists of the primary data navigation operations

of the CUBE File (§4.1). Essentially, it implements the construct of the current position in

the CUBE File (Definition 13).

Table 6: Subset of member-code specification symbols implemented in the processing engine.

Symbol Definition Oc An order code value at a specific level in the hierarchy. * All order codes at this level under the specified ancestor. P Pseudo level, no order code at this level. [oc -oc ] i j Range of oc’s (boundaries included). {oc ,…,oc } 1 k List of non-consecutive order codes in ascending order. {oc ,…,oc , 1 k

[oci-ocj],…, [oci’-ocj’]}

List of non-consecutive order codes and order code intervals, in ascending order.

There have been implemented four basic algorithms for accessing and processing the CUBE

File data:

a) Range-Select

b) Range-Aggregate,

c) Range-Group and

d) Range-Select-Sort.

The first three correspond to the MDRangeSelect, MDRangeAggregate and MDRangeGroup

physical operators discussed in §8.2. The latter is a variation of the Range-Select, where

qualifying data are returned ordered by the order codes of a specific dimension level.

The selection of the appropriate CUBE File processing algorithm, in our engine prototype is

based on the syntactic processing phase discussed above. During the syntactic processing

phase it is identified which of the four processing algorithms must be used. In particular, the

R and A terms of an SQNF query are examined. The basic decision flow regarding the CUBE

File processing of a star query is depicted in Figure 74. 204


BEGIN

Read SQNFquery

Valid R and Aterms? EXITNO

YES

Aggregationrequired?

Sortingrequired?

Groupingrequired?

Range-Select Range-Select-Sort Range-Aggregate Range-Group

FINISH

NO YES

NOYES NO

YES

Figure 74: Decision flow chart depicting the selection of the appropriate CUBE File processing algorithm.

13.2 Towards a Pragmatic OLAP Processing Engine In this section, we examine our engine prototype in retrospect and discuss implementation

issues that set the scene for a more pragmatic implementation. In particular, we discuss im-

provements on the existent implementation of the engine and describe a framework for im-

plementing a state-of-the-art OLAP execution engine that is based on execution plans, pro-

duced by a query compiler and not on hard-wired decision flow-charts.

In the first subsection, we present a design for accessing the cube data. We advocate a design

that enables query sessions on each cube, rather than isolated queries, which is a more realis-

tic approach for OLAP data analysis. Moreover, we show the incorporation of the current

drilling path (CDP) (Definition 14) into our design, which is a sine qua non for the efficient

implementation of the navigation in a CUBE File (§4.1). In the second subsection, we pro-

205


pose a buffer management strategy that is hierarchy-sensitive and thus more appropriate for

OLAP query loads. Moreover, we discuss the implementation of our strategy on top of the

existing Shore buffer manager [SSM99]. Finally, in the last subsection, we provide hints for

the implementation of physical operators and execution plans. We describe how key features

of object-oriented design, such as class inheritance and polymorphism, can be exploited for a

flexible implementation of a processing engine.

13.2.1 Accessing the Cube Clearly, the efficiency of star query processing depends on the efficiency of the CUBE File

processing algorithms. We have seen from chapter 1, that all these algorithms are based on

the CUBE File data navigation operations (§4.1). Consequently, there is a need for an effi-

cient implementation of these operations. We propose a design where a single class realizes

the current position (CP) in the CUBE File (Definition 13); it provides access to the primary

navigation operations and incorporates the current drilling path (CDP) (Definition 14).

In Figure 75, we depict such a definition of a class called CubeAccess. The public interface of

CubeAccess consists of the primary navigation operations of the CUBE File: drill_down,

roll_up, get_next, move_to (lines 16-19). All operations return a Boolean value to indicate

success or failure of the operation. The open operation (line 13) initializes the access to a

cube corresponding to a specific cube id. This is a system-internal id assigned by the catalog

manager module (§11.2.5). open reads the corresponding cube meta-data from the catalog

and saves them in the cbinfop (line 30) private member. Moreover, it loads into main memory

the root-bucket (§3.2.3) of this cube.

This design facilitates an efficient processing of the most typical workload in OLAP. In par-

ticular, typical analysis on an OLAP cube takes place in terms of a sequence of ad hoc que-

ries (query session). This means that usually the business user will not just submit an individ-

ual ad hoc query to a cube and then proceed by querying some other cube, as would be the

case in a transaction-oriented environment. A cube constitutes a significant, and more impor-

tantly, a self-contained part of the enterprise activity history and thus, the user will want to

query a lot more a stored cube than it would any other relational table in the database. We

imagine the typical OLAP user “opening” a cube for access and then submitting a series of

queries to the cube.

We believe that an appropriate design for OLAP processing, should aim at accelerating, as

much as possible, this type of user analysis by exploiting in-memory caching. Therefore,

206


since the higher nodes (i.e., smaller depth nodes) of the CUBE File will be the most fre-

quently accessed, we choose to load them into the buffer pool during the whole period that a

cube is “opened” for access. The root-bucket of a specific CUBE File will remain in the

buffer pool until all the client-threads (Figure 67) that are accessing it, invoke the close

method (line 14). The private member rootBucket (line 28), essentially stores a handle on the

pinned SSM record implementing the root-bucket. 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31:

class CubeAccess { public: typedef WORD CPstatus_t; typedef enum { pos_st_data_chunk, pos_st_root_chunk, pos_st_end_of_data, pos_st_empty_cell, pos_st_chunk_boundary, pos_st_topmost, pos_st_error } CPstatusBitPos_t; bool open(const CatalogManager::cubeID_t& cbid); bool close(); bool drill_down(); bool roll_up(); bool get_next(); bool move_to(const ChunkID& targetId); bool rewind(); Entry* read(); bool isStatusFlagOn(CPstatusBitPos_t flagPos); private: CUBEFilePos currPos; CDP currDrillPath; PinnedBucket rootBucket; CPstatus_t currPosStatus; DiskCubeInfo* cbinfop; }; //class CubeAccess

Figure 75: Definition of the CubeAccess class.

In addition, the private members currPos and currDrillPath implement the current position

CP and current drilling path CDP constructs respectively (lines 26-27). Recall from §4.1 that

the former records our position in the CUBE File, while the latter stores the positions in the

current path from the root-chunk and is an effective means for implementing the roll_up pri-

mary navigation operation (§4.1.2). In the next subsection, we will see how can we exploit

even further the CDP as a local cache per user query session.

Finally, the currPosStatus data member (line 29) records the status of the current position in

the CUBE File. This is a bit string, where each bit corresponds to a specific status flag. The

207


various status flags are defined in lines 3-11. The richer is the set of status flags describing

the current position, the more flexibility we have when implementing the basic navigation

operations, or the various processing algorithms. For example, recall the algorithm of the

ChunkTreeScan physical operator, presented in Figure 48 of §8.2.2, where the chunk-

boundary flag (pos_st_chunk_boundary) was checked in order to identify the end of the tar-

get data set. This saved us from the overhead of accessing a sibling chunk and thus invoking

a potential I/O operation. A description of the most important flags is given in Table 7.

Table 7: Description of the current position (CP) status flags.

pos_st_data_chunk Set when the CP is at a data chunk; a call to drill_down will fail. pos_st_root_chunk set when the CP is at the root chunk; a call to roll_up will fail. pos_st_end_of_data Set when no more cells to access at the current depth; a call to get_next will fail. pos_st_empty_cell Set when a failure of move_to has occurred: an attempt was made to move CP over

an empty cell. The CP has remained unchanged. pos_st_chunk_boundary Set when the CP corresponds to a chunk boundary (meaning that the next call to

get_next will move the CP to a different chunk.) pos_st_topmost Set when the CP is at the first non-empty cell of the root chunk. Typically this is

the state of CP after a call to rewind() –see line 20. pos_st_error set after a GeneralError exception has been thrown.

13.2.2 Hierarchy-based Buffer Management Buffer management is an essential part of query execution. Physical operators executing a

part of a query plan, frequently need to access data that either must be fetched from disk (im-

posing a consequent time penalty in the final response time), or read from a buffer in main

memory. In a DBMS, typically the main memory buffers, available to operations, are man-

aged by the buffer manager. In Figure 66, we have depicted the buffer manager module

within the overall system architecture of SISYPHUS.

In SISYPHUS the buffer manager is implemented on top of the Shore Storage Manager

(SSM) buffer manager (§11.2.1 and [SSM99]). The latter is a means by which all other SSM

modules (except the log manager) read and write pages. On Unix, the buffer pool is in shared

memory, where cooperating processes can find and place pages in response to read and write

requests [SMA99]. The interface provided by SSM gives us the ability to “pin” an SSM re-

cord into the buffer pool. In the case, of a record larger than the page size, only a single page

is pinned (corresponding to the requested byte-offset in the record); then, an iterator interface

is provided for subsequently retrieving the following pages.

The buffer replacement policy used is a clock-based LRU-like strategy [GUW00]. In Figure

76, we depict an example of this strategy. Buffers are arranged in a circle and contain a ref-

erence count. The first time a page is read into a buffer the reference count is set to 1. Each

subsequent access to the same page increases the reference count by one. A “hand” points to 208


one of the buffers and will rotate clockwise if it needs to find a buffer in which to place a disk

page. The first buffer found with a zero reference count is the one chosen for loading a new

page and replacing its previous contents. Whenever the hand passes from a buffer, it de-

creases its reference count by one. Thus for a new page that is loaded and accessed only once

it would take at least two rounds of the clock to be replaced.

0

1

1

2

0

3

Figure 76: A clock-based buffer replacement policy.

As has been described in §11.2.3 the SISYPHUS’ buffer manager provides an interface for

pinning and unpinning fixed size buckets into the buffer pool. In this subsection, we argue

that common buffer management strategies (as is the one provided by SSM) are not enough,

in order to enable an efficient processing of star queries. We advocate a novel buffer man-

agement strategy for star queries that is based on the dimension hierarchies and thus, is more

appropriate for this type of queries. More specifically, our strategy is largely based on the

current drilling path (CDP) construct defined in Definition 14.

7

Recall from the definition 1.2 in §4.1 that the CDP contains the positions in the CUBE File in

the current drilling path. This is mainly exploited in order to enable a more efficient imple-

mentation of the roll-up primary navigation operation. Since pointers in the CUBE File only

point to children nodes, without the CDP, roll-up would have to be performed by a descent

from the root chunk each time. Hence, in its simplest form the CDP can be realized by a stack

recording the “footprint” of each position in the current drilling path. Each time we descend a

level we “push” the corresponding source-position entry into the CDP. Inversely, each time

we move upwards by a level we “pop” one entry from the CDP and use it as a reference to

the target position. These have been described in detail in §4.1.

7 “Buffer Management” and “buffer replacement policy” are used as equivalent terms throughout this text. 209


A natural extension of the above logic is to maintain, apart from the position in each node,

also the corresponding bucket and thus providing an efficient caching strategy for moving

along the chunk hierarchy. This would certainly be an enhancement to the underlying buffer

management since the latter fails to cache all the buckets in a drilling path (e.g., as we de-

scend along a path, the root-node bucket would be the first one to be replaced in LRU, albeit

this is the most critical bucket in terms of caching).

8

Furthermore, the need for a hierarchy-based buffer management is also declared by the algo-

rithms implementing the MDRangeSelect (§8.2.3) and MDRangeGroup (§8.2.5) physical op-

erators, since during their execution the buckets in the current drilling path are continuously

accessed. Specifically, for the MDRangeSelect algorithm, the same path (at least as far as the

buckets at smaller depths are concerned) is visited more than once for consecutive groups.

Therefore, we have to ensure that the common upper-level buckets are cached, if we do not

want to fetch from the disk the same buckets multiple times. Once more, the usual strategies

do not always provide this guaranty.

Finally, a third reason that advocates a hierarchy-based buffer management is that a query

session consisting of a series of ad hoc star queries submitted sequentially to the system, typi-

cally accesses similar paths in the dimension hierarchies. This is a well-known behavior of

OLAP query loads. In fact, this phenomenon has been characterized as the hierarchical local-

ity [DRSN98] of OLAP queries.

To the end of a hierarchy-enabled buffer management strategy we propose a buffer manage-

ment architecture implemented on top of the page-based “clock” policy of the underlying sys-

tem. It is depicted in Figure 77. We propose the use of a session cache for each access to a

cube, which is based on the CDP notion. The session cache is implemented as a doubly

linked list of entries (see Figure 78). Each entry in the list corresponds to a pinned bucket in

the underlying clock-based buffer pool. Actually, an entry contains only a handle to the cor-

responding bucket. Entries are ordered according to the aggregation level (i.e., the chunking

depth) of the corresponding bucket (most detailed level from left-to-right). We assume that

the depth of a bucket equals the depth of the chunk that “points” to it. The root-bucket corre-

sponds to the rightmost entry. The length of the list equals the number of buckets in the long-

est path from the root-chunk to a data chunk (i.e., a leaf node). This means that for typical

dimension hierarchies, this cache should occupy only a small amount of entries. This allows

8 “Buffer management” and “cache management” are used as equivalent terms throughout this text. 210


for maintaining this cache per user query session, i.e., per cube opened for access by a single

client thread (§13.2.1). This is similar with the CDP, which is also maintained per query ses-

sion (§13.2.1). The fact that all bucket retrieval requests from the session cache are served by

the underlying buffer pool, ensures that common buckets are going to be fetched and pinned

only once, even if more than one concurrent session-caches exist for the same cube.

session cache

find bucket

SISYPHUS buffermanager

pin bucket unpin bucket

SSM buffermanager

pin record unpin record

Figure 77: Architecture of the session cache.

The entry in the list that corresponds to the bucket of the current position (CP) in the CUBE

File (Definition 13) is called the current entry. As the CP moves during the evaluation of a

query the current entry moves analogously. Whenever a request for a bucket arrives to the

session cache (“find bucket”), we compare the depth of the bucket with that of the current

entry. If it is greater or equal to the depth of the current entry then, we search for the re-

quested bucket in the current entry or to the left of the current entry. This would typically be

the result of a drill_down navigation operation (§4.1.1). Otherwise, we search for the re-

quested bucket to the right of the current entry (which corresponds to a roll_up operation

(§4.1.2)). If the bucket is found, then we update the current entry to refer to the new bucket.

211


Note that all movements in the CUBE File are performed in terms of consecutive roll-ups and

drill-downs. Even the get_next (§4.1.3) and move_to (§4.1.4) operations are essentially based

on roll_up and drill_down. These two are the only operations that can trigger the need for a

bucket access. Hence, if a bucket that we search for is contained in the session cache, then

most probably, it will be either in the current entry, or in adjacent entries.

B3

B1

cacheentry

cacheentry

cacheentryRoot-bucket

Currententry

Most aggregatedlevel

Most detailed level

a

Root-bucket

b

c

d

e

CP

a

Root-bucket

b

c

d

e

f

CPdrill_down

f

cacheentry

cacheentry

cacheentry

Currententry

cacheentry

B2 B4

B2B3

B1

B4

B2B3

Most detailed level

cacheentry

B1

cacheentryRoot-bucket

Most aggregatedlevel

head

head

Figure 78: A hierarchy-based bucket replacement policy.

When a bucket is not found in the session cache, then we have to issue a “pin bucket” request

to the buffer manager, which in turn is transformed to a “pin record” request to the underly-

ing SSM buffer manager (see Figure 77). Buckets in the session cache are always pinned in

read-only mode (i.e., only shared locks are maintained). The new entry is always placed to

212


the left of the current entry. This is again, a consequence of the way that movements are per-

formed in the CUBE File. A roll_up always corresponds to an existing entry, since the ses-

sion cache stores the buckets in the current drilling path. Therefore, no missing session-cache

hits are possible for roll_up movements. Only a drill_down can lead to a new “pin bucket”

request. In the case of a cache overflow, when a new entry is inserted, we remove the left-

most entry corresponding to the most detailed bucket. This leads to a hierarchy-sensitive

bucket replacement policy and ensures that smaller depth buckets will remain cached for

longer periods. In fact, the smaller the chunking depth of a bucket, the more chances we have

to find it in the session cache. For example, the root-bucket (§3.2.3) (zero depth) is cached

during the whole lifetime of a session cache. Note here, that even if many session caches

store the same root-bucket, the actual pages of the root-bucket are stored in the SSM buffer

pool only once.

The removal of an entry in the session cache means that we issue an “unpin bucket” request

to the buffer manager, which is then transformed to an “unpin record” request. In Figure 77,

we depict the basic architecture of our hierarchy-based session cache implementation. We

can see the SISYPHUS buffer manager acting as a mediator between the session cache and

the SSM clock-based buffer manager.

In Figure 78, we depict an example of a session cache. In particular, on the top we can see an

instance of the session cache corresponding to the current position (CP) appearing on the left

chunk-tree. We have assumed a maximum of four entries for this session cache. Initially, the

cache is empty and the first bucket search request corresponds to the root bucket. Since this is

not present in the cache a pin bucket request is issued and a corresponding entry is inserted in

the list after the bucket fetch. At that time, the “head” and “current entry” pointers point to

the single entry of the list. All insertions take place at the left of the “current entry” pointer,

while removals from the list take place at the “head” pointer. As we drill-down to chunk “a”,

a request for bucket B is made. Once more, a pin bucket is issued and new entry is inserted

in the list. The subsequent drill-down to chunk “b” invokes a second search for the same

bucket. This time however, the bucket is found at the current entry, so no “pin bucket” is is-

sued. In a similar fashion, as we drill down all the way to chunk “e” the corresponding buck-

ets are pinned into the pool and the cache is augmented accordingly. Then, the three subse-

quent roll-ups result in moving the current entry two positions to the right and leave the cache

at the sate depicted on the top of the figure.

1

When a drill-down is issued from chunk “b” to “f”, presented in the chunk tree on the right, a 213


request for finding bucket B is made. The search in the current entry and to the left of the

current entry fails to find it. Therefore, we issue a pin bucket request for bucket B . The new

entry will be placed right after the current entry on the left side. This is depicted on the bot-

tom of the figure. However, the new entry will overflow the session cache. According to our

“hierarchy-sensitive” replacement policy, we remove the “most detailed” entry pointed to by

the “head” pointer. Then, we issue an unpin bucket request for bucket B .

4

4

3

We end this subsection with an implementation comment regarding the caching of the root-

bucket. Since the root-bucket typically comprises many consecutive fixed-size buckets

(§3.2.3), pinning it in the bucket buffer pool (i.e., essentially in the record-based SSM buffer

pool) might cause inefficiency in accessing it. This is because the underlying SSM will not

loaded it as a whole into the buffer pool but will only cache it partially by fetching relative

pages on demand. This could result to a phenomenon equivalent to the so-called “thrashing”

caused by the virtual memory of an operating system, where many blocks are moved in and

out of the disk’s swap space.

A solution to this problem would be to load the root-bucket for each accessed cube, only

once, in the heap space of the thread that instantiated the buffer manager (see also Figure 67

in §11.3) instead of the SSM buffer pool. This way we can bypass the SSM buffer manager

and control directly the caching of this special bucket. Of course shared locks must be ac-

quired prior to accessing it in order to guarantee a thread-safe implementation.

13.2.3 Execution Plan Implementation In this subsection we discuss the implementation of physical operators and execution plans in

the context of an OLAP query-processing engine. We have seen in chapter 1, the representa-

tion of physical operators as iterators. The iterator model (§8.1) is the main vehicle for im-

plementing efficiently the concept of an execution plan comprising different operators. In the

remainder, we will provide hints for an implementation of execution plans over the pro-

gramming language of choice (C++ [Str97]).

All physical operators in ERATOSTHENES processing engine can be represented as classes

derived from a common (abstract) base class. This is depicted in Figure 79. We can see that

the base class is defined according to the iterator model, i.e., it consists of the three methods

open, next and close. These are defined as virtual methods [Str97]. This means that it is left to

each child class, representing a specific operator (leaf nodes in Figure 79), to define each one

of these methods according to the functionality of the specific operation. The state for each

214


operator is defined with the help of private class attributes. However, there is some minimal

common state that all operators have. For example, in the figure, we note the foundNext flag

indicating the existence of a subsequent result (§8.1). This attribute is inherited by all derived

classes. Input arguments specific to each operator are passed from the constructor of each

class, instead of from the open method. This allows us to have the same argument list for

open in all derived classes, which is necessary for declaring it as virtual.

We distinguish between three types of operators according to their data source input: (a)

unary operators, (b) binary operators and (c) CUBE File access operators. The first two can

receive input data from any other operator, while the latter accesses directly a CUBE File.

Figure 79: Class hierarchy of physical operators.

The important thing to note in this design is the exploitation of object-oriented polymorphism,

in order to “link” the various operator nodes and construct execution plans. In particular, note

that each operator stores a pointer (or two pointers if it is a binary operator) to the base class

PhysOpBase. This pointer can point, not only to instances of class PhysOpBase but also to

any operator instance X in the hierarchy, which is used as the source data input. The interest-

ing thing is that, when we invoke one of the three basic iterator methods (open, next, close)

through this pointer, the corresponding method of the specific operator (i.e., of operator X)

will be called. This is true, even if the “caller” operator was written and compiled before op-

erator X was even conceived of! This is a key aspect of object-oriented design and is called

polymorphism. It is exactly because the iterator methods have been declared as virtual that

enables this behavior. With this design, all operator code can be written without the need to

+open()+next()+close()

#foundNext : boolPhysOpBase


#cbp : CubeAccess*CUBEFAccessOpBase

+open()+next()+close()+HashGroup()

-StateHashGroup


#input : PhysOpBase*UnaryOpBase


#input1 : PhysOpBase*#input2 : PhysOpBase*

BinaryOpBase

+open()+next()+close()+IndexResJoin()

-StateIndexResJoin. . . . . .

+open()+next()+close()+MDRangeSelect()

-StateMDRangeSelect

. . .

215


have input operators “hard-wired” into the code. This is a fairly standard approach in data-

base management system implementations. Actually, systems written in no object-oriented

languages, e.g., in C [KR88], typically tackle this issue by using function pointers for input

operators [Gra93].

Consequently, the construction of an execution plan consisting of various physical operators

is quite straightforward. We advocate a design where the query execution plan is passed to

the execution engine (probably by a query optimizer module) in an algebraic form (e.g., see

algebraic notation of physical operators in Table 5 of chapter 1). The engine then parses and

validates the plan. For each token corresponding to a physical operator, the corresponding

object is instantiated. Operators are instantiated bottom-up (or from inner-to-outer in alge-

braic form), so that the pointer members to the input operators can be set appropriately. In

Figure 80, we depict various instantiated operators residing in main memory that are linked

through pointers. This essentially corresponds to the physical representation of an execution

plan. In fact, we only need to maintain a pointer to the topmost operator in order to handle the

whole execution plan. This is also depicted in the figure, where we see an ExecutionPlan

class containing a pointer member to a single operator. By invoking the next method through

this pointer we retrieve, one tuple at a time, the final results of the overall query.

HashGroup

IndexResJoin

IndexResJoin

ExecutionPlan

TableScan

TableScan

MDRangeSelect

. . .Figure 80: The physical representation of an execution plan.

216

Chapter 14: Related Work III

14 Related Work III

h

cT ere is a plethora of systems and research prototypes in the database literature. In this

hapter, we have decided to confine ourselves to system implementations that exhibit

similarities in various aspects with ERATOSTHENES. In particular, we begin with the Shore

persistent object system, whose underlying storage manager has been the physical base of our

system. Next, we describe PREDATOR, which is another example of a DBMS built on top of

the shore storage manager functionality. Finally, we discuss the integration of a multidimen-

sional access method into the kernel of a commercial RDBMS.

14.1 Shore Shore (Scalable Heterogeneous Object REpository) [CDN+94] is a persistent object system

developed at the University of Wisconsin that represents a merger of object-oriented database

(OODB) and file system technologies. From the file system world, Shore draws object nam-

ing services, support for lower (and cheaper) degrees of transaction-related services, and an

object access mechanism for use by legacy Unix file-based tools. From the OODB world,

Shore draws data modeling features and support for associative access and performance ac-

celeration features. To provide scalability and a basis for parallelizing the system, Shore also

employs a novel architecture, including support for symmetric peer-to-peer server communi-

cation and caching; in addition, it includes support for extensibility via the value-added server

facility.

Shore is a collection of cooperating data servers, with each data server containing typed per-

sistent objects. To organize this universe of persistent Shore objects, a Unix-like name space

is provided. As in Unix, named objects can be directories, symbolic links, or individual

(typed) objects (the counterpart of Unix ``plain'' files). Unlike Unix, Shore allows each object

217


to be accessed by a globally unique Object Identifier (OID) that is never reused.

The type system for Shore objects is language-neutral, supporting applications in any pro-

gramming language for which a language binding exists. For objects whose primary data con-

tent is textual or untyped binary data, Unix file system calls are provided to enable legacy

applications (such as existing language compilers or CAD tools) to access their data content

in an untyped manner. Shore is structured as a peer-to-peer distributed system; each node

where objects are stored or where an application program wishes to execute contains a Shore

server process that talks to other Shore servers, interfaces to locally executing applications,

and caches data pages and locks in order to improve system performance.

The Shore Storage Manager (SSM) is a package of libraries for building object repository

servers and their clients. The core library in the package, is a multi-threaded system manag-

ing persistent storage and caching of un-typed data and indexes. It provides disk and buffer

management, transactions, concurrency control and recovery. A value-added-server (VAS) is

a system built with the SSM [SVAS99]. A VAS relies on the SSM for the above capabilities

and extends it to provide more functionality. One example of a VAS is the Shore server,

which extends the SSM to provide typed objects with permissions and ownership and organ-

izes storage as a tree structured name-space. Another example of a VAS is ERATOSTHENES.

14.2 Predator PREDATOR is a multi-user, client-server database system developed at Cornell University

[Ses98]. The main goal of this system has been to comprise a next-generation database sys-

tem that supports complex data of different types (like images, video, audio, documents, geo-

graphic and geometric entities, etc.). Each data type has its own query features, and data re-

positories hold combinations of different types of data. The broad research focus of the

PREDATOR project at Cornell has been to design and build a database system within which

various kinds of data can be uniformly and extensibly supported. However, it is a full-fledged

DBMS and can also serve as a vehicle for research into other areas of database systems. The

system is built using C++, and makes use of inheritance and encapsulation. It uses the Shore

storage manager (SSM) [SSM99] as the underlying data repository. Predator is another ex-

ample of a value added server built with SSM.

A major theme in PREDATOR is extensibility of the system -adding the ability to process

new kinds of data. The efficient integration of the different data types leads to interesting de-

sign, optimization and implementation issues. Two characteristics of complex data types are

218


crucial: (1) they are expensive to store and retrieve and they are associated with expensive

query operations (2) they have rich domain semantics leading to opportunities for automatic

query optimization. The primary technical contribution in PREDATOR is the notion of En-

hanced Abstract Data Types (E-ADTs). An E-ADT "enhances" the notion of a database ADT

by exposing the semantics of the data type to the database system. These semantics are pri-

marily used for efficient query processing, although they serve other purposes too.

PREDATOR implements many features found in commercial relational and object-relational

database systems. The emphasis has been primarily on query processing, although transaction

processing is also supported. The beta code release includes the ability to execute a large sub-

set of SQL, multiple join algorithms, storage management over large data volumes, indexing,

a cost-based query optimizer, and a variety of object-relational features. Basic concurrency

control and low-level storage management is handled using the Shore storage manager. A

WWW-based graphical user interface allows any Java-enabled Web browser to act as a data-

base client.

14.3 UB-tree Integration into a DBMS Ramsak et al in [RMF+00] describe the integration of the UB-tree multidimensional index

[Bay97] into a commercial RDBMS kernel [TBHC00]. The authors argue that a kernel inte-

gration is superior than a simple interface to an index extension, provided by many commer-

cial systems as a solution for multidimensional indexing, because of the tight coupling with

the query optimizer, which allows for optimal usage of the UB-tree in execution plans. The

key aspect for achieving this integration is the fact that the UB-tree is based largely on the

classical B-tree [BM72], which is the most prevalent indexing technique in commercial sys-

tems.

The authors identify as the major obstacle for finding their way into commercial RDBMS’

novel multidimensional access methods (MAMs) is the concurrency and recovery issues,

which are as important as performance issues for commercial systems. For most MAMs new

solutions to these problems, e.g., locking for R-Trees [GG98], have to be developed, as the

new concepts do not allow reusing standard techniques. This makes the kernel integration of

a MAM a very costly task in the range of multiple man-years. As consequence many DBMS

producers have not integrated the new technology into their systems, but offer it only as add-

on features.

The most important issues that enable the query engine to use the UB-Tree in the most effi-

219


cient way include extension of schema information and DDL of the DBMS, as well as the

generation of multidimensional query boxes out of common SQL predicates. Note that with

the kernel integration the UB-Tree query functionality is hidden by the standard SQL inter-

face, i.e., no extension of the DML is required. The extensions of the query engine, especially

of the optimizer, take care of the appropriate usage of the new index, e.g., processing a multi-

dimensional range query on the UB-tree, if possible.

Finally, performance evaluation results reported in [RMF+00] indicate a speed up of 3 com-

pared to an implementation of the UB-tree on top of a DBMS.

220

Chapter 15: Epilogue

15 EPILOGUE

n

inI this thesis we have addressed the issues of physical data organization, query process-

g, and implementation in OLAP systems. Our work was motivated by the need to sup-

port efficient storage and processing of OLAP cubes in order to promote ad hoc analysis (per-

formed in terms of ad hoc OLAP queries), which is a critical element of success in business

intelligence applications today. To this end, we have chosen to follow the “harsh” path of a

system implementation. We have launched the ERATOSTHENES project for the development

of a real OLAP system and have tried to implement our ideas regarding the physical organi-

zation of a cube, as well as the processing of ad hoc star queries within this system. Through

the difficulties of the intensive coding, we have managed to evaluate in a more pragmatic

way the validity of our proposals and we have gained a more in-depth understanding of the

issues involved. This more realistic picture of the problems and solutions proposed, we have

tried to offer to the reader of this thesis.

This chapter concludes the thesis and is separated in two sections. In the first one we summa-

rize our contributions and outline the most important conclusions of this research. In the sec-

ond section, we discuss future research based on the current work.

15.1 Summary and Conclusions As was stated in the beginning, the first objective of the research work reported in this thesis

has been the investigation of a data structure that would provide an efficient storage base for

cubes as well as access path for the most detailed data, in order to support the processing of

ad hoc OLAP queries. In particular, we have identified the following criteria as essential to

be met: The “candidate” structure must be natively multidimensional, to explicitly support

hierarchies, to impose a physical clustering of the data, that will minimize I/O cost during

221


query evaluation and at the same time, to guarantee a good space utilization and management

of the cube sparseness.

To this end, we have proposed a file organization for the most detailed data of a cube called

the CUBE File, which meets all of the above criteria:

• It is a multidimensional data structure belonging to the grid file “family” of multidimen-

sional indexes, intended for storing discrete data points rather than spatial objects.

• It explicitly supports hierarchies, since it provides access paths to the data directly

through the restrictions imposed on any level(s) of the dimension hierarchies. This is

achieved following the entries of the directory chunks, which have resulted via a hierar-

chical chunking based directly on the dimension hierarchies.

• Moreover, since the focus has been to speedup queries with hierarchical restrictions, the

CUBE File imposes hierarchical clustering of the data. We departed from the conven-

tional linear clustering approach, which is impossible to solve (in the absence of a spe-

cific query load), due to the huge search space and exploited a hierarchical chunking

method leading to our chunk-tree representation of a cube, in order to transform the hier-

archical clustering problem into a chunk-to-bucket allocation problem.

• We have quantified the degree of hierarchical clustering achieved by a chunk-to-bucket

allocation by introducing the hierarchical clustering factor f . This metric can also be

exploited in any storage scheme exploiting hierarchy-based surrogate keys in order to

achieve hierarchical clustering.

HC

• In addition, based on the hierarchical clustering factor we have formalized the hierarchi-

cal clustering problem as an optimization problem (the HPP Chunk-to-Bucket allocation

problem) and have proved that it is NP-Hard. Furthermore, we have provided as a solu-

tion to this problem, a greedy algorithm based on heuristics, which tries to store whole

chunk-tree families together in the same bucket, so as to minimize bucket accesses in the

path from the root-chunk to the data chunks and to cluster within the same physical loca-

tion data that are hierarchy-related.

• It consumes space conservatively (employing compression of chunks when necessary)

aiming at a high utilization of the space (e.g., filling buckets to capacity). Due to the fact

that the chunking is based on the dimension hierarchies, it adapts perfectly to the sparse-

ness of the data space, making no allocation of space for empty regions of the cube.

Moreover, due to the chosen data structure for the chunk implementation, it does not have

222


to store any multidimensional coordinates for each data point (as it would have been the

case in a tuple-based approach), nor does it have to store its level per dimension hierarchy

(except from the chunking depth value, which can fit in a single byte per chunk header),

hence reducing further the overhead for storing the cube.

• Operationally, the CUBE File is intended for an initial bulk loading and then for read-

only query sessions, while it supports bulk incremental updating and data purging opera-

tions, which fit perfectly to the OLAP applications most essential needs. It provides a set

of basic data navigation operations that can be used in order to define more complex

OLAP operations.

A second objective of the work reported in this thesis has been to investigate the processing

of OLAP queries in the context of hierarchically clustered OLAP cubes. To this end, we have

identified ad hoc star queries as the most prominent kind of queries in OLAP and data ware-

housing. Towards the efficient processing of such queries we have made the following con-

tributions:

• We have shown how we can exploit in a relational star schema a physical representation

of the fact table based on a multidimensional data structure that provides hierarchical

clustering through the use of special path-based surrogate keys like (but not restricted to)

the CUBE File structure. With such structures, the evaluation of the costly star-join be-

comes a simple multidimensional range query, which is evaluated very efficiently due to

the native support for many dimensions.

• Having identified that the processing of these queries changes radically over this new

storage base for the cube data, we have proposed an abstract processing plan for ad hoc

star query evaluation over hierarchically clustered cubes, which captures all the necessary

steps in this processing [KTS+02]. Moreover, our experimental evaluation has shown that

it reduces significantly the query response time, exhibiting speedups up to 25 times faster

than the most prevalent contemporary method for star query evaluation, which is based on

a bitmap-index star-join. It is also worth mentioning that our processing framework has

been incorporated into a commercial RDBMS, namely Transbase HyperCube®

[TBHC00].

• This new processing framework has opened the road for important optimizations on the

evaluation of star queries such as the hierarchical pre-grouping transformation

[PER+03], exhibiting speed-ups up to 42 times faster than the conventional plans.

223


In addition, we have presented the entailed processing in the context of a CUBE File organ-

ized cube:

• We have introduced chunk expressions and have discussed the relevant dimension data

processing. Chunk expressions provide the means for expressing multiple range restric-

tions on multiple hierarchy levels in a unified way. This contributes to the evaluation of

these restrictions as a single query over the CUBE File.

• We have defined a set of CUBE File-access physical operators that implement the corre-

sponding abstract operations of our processing framework over the CUBE File. These

definitions were based on the iterator model for physical operators and demonstrate how

we can define gradually more elaborate operations based on simpler ones.

• Our algorithm for the implementation of the MDRangeSelect operator can evaluate chunk

expressions that correspond to multiple range queries on multiple levels in the hierarchy,

in a single only operation and hence achieve a low I/O cost, since relevant data are ac-

cessed only once. This is very important considering the significance and frequency of

occurrence of these restrictions in typical star queries. Moreover, it is an improvement

over the corresponding algorithm for a UB-tree storage base, where each range query is

evaluated separately, hence resulting to redundant reads of the same disk pages.

• Finally, we have proposed the physical operator MDRangeGroup, which combines range-

selection and grouping evaluation in a single operation and thus is a perfect candidate for

implementing the hierarchical pre-grouping transformation. Our algorithm provides a

grouping operation that does not block the processing flow (i.e., pipelining) since it does

not include the sorting step that all conventional grouping implementations have. With

appropriate exploitation of the current drilling path construct resulting to a hierarchy-

based caching, we can ensure that common buckets requested by different groups are

fetched only once. MDRangeGroup imposes a significantly less I/O cost than a conven-

tional grouping operation (over a relational table), essentially requiring only a single pass

over the data. On the contrary, the conventional grouping algorithms on relational tables

spend most of their effort on trying to sort the data and form the individual groups before

doing the final pass that will calculate the aggregate group values [Gra93].

The third and last objective of this work has been to design and implement a novel OLAP

system that incorporates the above results and corresponds to the OLAP applications specific

needs. To this end, our contributions are synopsized as follows:

224


• We have presented the overall architecture and design of the OLAP system ERATOS-

THENES. Among other, this architecture clearly shows the building of an OLAP server

on top of an untyped- record-based storage manager as a value added server, comprising

all the components of a contemporary database management system ([KVT+01,

KTV+03]).

• Then, we have presented the implementation of the storage manager of the system, SISY-

PHUS ([KS01, KS03]). SISYPHUS adopts the CUBE File organization as its primary file

structure for storing a cube and is a system aiming at fulfilling storage management re-

quirements pertaining to OLAP that are not met by conventional record-based storage

managers. We have described the different abstraction layers of the system that imple-

ment a chunk-based file system on top of a record-based one.

• In addition, we have presented a pragmatic implementation of the CUBE File structure

and its incorporation into SISYPHUS. We have discussed design issues regarding the in-

ternal organization of buckets and chunks. In addition, we have presented an algorithm

for efficiently bulk-loading the CUBE File during its initial construction phase.

• Also, we have discussed an implementation of the processing engine, which is based on

execution plans. We have shown how we can implement an access interface over data or-

ganized in a CUBE File that enables query sessions rather than isolated queries and im-

plements the notion of the current position in the multi-level and multidimensional space

of the cube.

• Finally, we have proposed a novel hierarchy-based buffer management strategy, which is

implemented on top of the clock-based LRU-like replacement policy of the underlying

system. We have defined a session cache for each user access to a cube, which is based

on the current drilling path construct. This cache improves the amount of successful

cache hits for typical OLAP queries based on hierarchical restrictions since the LRU-

based policies fail to cache all the buckets in a drilling path. Furthermore, this caching

scheme improves the performance of the MDRangeGroup operator, reducing redundant

bucket fetches. Finally, by caching the buckets in similar access paths, our method adapts

successfully to the important characteristic of OLAP query loads known as hierarchical

locality.

225


15.2 Future Work Finally, we discuss future research directions resulting from the work on this thesis.

15.2.1 Synopses for Multidimensional Data with Hierarchies Multidimensional data synopses have been a quite “hot” research issue lately. Derived from

the original work on one-dimensional histograms [IP95, PIH+96], multidimensional histo-

grams appear as an effective approach [PI97, GKT+00]. These are primarily used for selec-

tivity estimation on multi-attribute restriction predicates and more recently, for approximate

query answering over the data cube [PG99]. In addition, other techniques departing from the

histogram technology have appeared; e.g., the wavelet transformation [VWI98], with very

promising results regarding the accuracy of the estimations.

However, in the presence of hierarchies the existing multidimensional synopses techniques

cannot be used for selectivity estimation on queries with multi-dimensional restrictions, based

on any level of the dimension hierarchies. For example, what is the number of qualifying tu-

ples that will be retrieved from an MD_Range_Access operator (see abstract processing plan

in Figure 35 of §6.4), i.e., from the evaluation of the dimension restrictions on the fact table?

An equivalent question (in CUBE File terms) would be, what is the number of qualifying

cells that satisfy a chunk expression? Moreover, another important question would be, what is

the number of groups resulting from the application of the MDRangeGroup operator on a

CUBE File (i.e., from the application of the hierarchical pre-grouping transformation on a

fact table)? These are essential questions in the calculation of the cost of any star query exe-

cution plan, no matter what is the underlying physical organization of the fact table data and

the existing approaches fail to answer them effectively.

Since restrictions can be applied to any number of hierarchy levels, it is clear that if we

wanted to use one of the existing multi-dimensional synopsis techniques to answer the previ-

ous questions, we would have to create a synopsis for all possible combinations of hierarchy

levels, which is not a realistic approach. Therefore, what is needed is a multidimensional

synopsis that incorporates knowledge of the data distribution with respect to the dimension

hierarchies.

To this end, we are investigating the use of the CUBE File directory (i.e., the directory

chunks) as a multidimensional data synopsis that supports naturally hierarchies. Indeed, if we

include in each directory chunk entry, apart from the physical chunk-id (§12.4), pointing to

some chunk-subtree, also the number of existing data points at the data chunks of this sub-

226


tree, then we have created a 100% accurate data synopsis. Moreover, due the structure of this

synopsis, where chunks are created directly from the dimension hierarchies, we can answer

all of the above critical questions, merely by using small variations of the MDRangeSelect

(§8.2.3) and the MDRangeGroup (§8.2.5) algorithms. In the case, of a memory constraint,

where the synopsis does not fit in main memory, we can start compromising on the accuracy

in favor of reducing the storage cost. In fact, we are working on an algorithm that exploits the

bucket-region formation technique (§3.2.1), in order to reduce the synopsis size, by gradually

decreasing the accuracy of the chunks from the most detailed levels (maximum chunking

depth) to the ones in the upper levels.

15.2.2 Support for “Advanced” Dimensions A major issue in the efficient evaluation of an ad hoc star query, as we have seen, is the di-

mension data processing. In our abstract processing plan (§6.4), we see dimension data play-

ing an important role in the h-surrogate processing phase, as well as in the main-execution

phase, where the retrieved fact table data are joined with the dimensions tables (a.k.a. the re-

sidual join). The physical model for representing a dimension that we have described (§7.3.1)

is sufficient for the most typical cases of dimensions. However, in the presence of dimensions

with multiple hierarchies or with unbalanced hierarchies, it fails to effectively support the en-

tailed processing. Therefore, for the case of these “advanced” dimensions, we investigate the

exploitation of a multidimensional structure as an alternative storage base. In particular, we

examine the feasibility of using the CUBE File for the storage of dimension data as well. In-

tuitively, the multidimensional nature of this structure, as well as its explicit support for hier-

archies and the fact that due to the local depth construct (§3.2.2) it can support unbalanced

trees, make it an ideal candidate for this task.

15.2.3 Query driven chunking Another interesting research is the exploitation of a workload-based chunking method. Utiliz-

ing an algorithm similar to the one used for a query-driven bucket-region formation (Figure

11), we could decide at each depth whether chunking will be applied along a dimension, and

thus increasing the granularity of the searching abilities provided by the directory chunks, or

it will not be applied making the directory coarser but increasing at the same time the cluster-

ing factor. Moreover, the same approach could be exploited after the initial (not query-driven

but solely hierarchy-based) construction of the CUBE File, as a means of fine-tuning the

structure, with respect to the incoming user queries. In this case, we would have to define op-

227


erations on the CUBE File directory such as: make_finer or make_coarser. Of course such

operations could trigger changes on the CUBE File structure, e.g., making the chunk-tree un-

balanced etc., and thus an investigation is due, in order to define them appropriately.

228

Chapter 16: Bibliography

16 Bibliography

[Bay97] R. Bayer. The universal B-Tree for multi-dimensional Indexing: General Con-

cepts. WWCA ’97. Tsukuba, Japan, LNCS, Springer Verlag, March, 1997.

[BM72] Rudolf Bayer and E.McCreight.Organization and Maintainance of large ordered

Indexes.In Acta Informatica 1 ,pages 173 –189,1972.

[CD97a] Surajit Chaudhuri, Umeshwar Dayal: An Overview of Data Warehousing and

OLAP Technology. SIGMOD Record 26(1): 65-74 (1997)

[CD97b] S. Chaudhuri, U. Dayal: Data Warehousing and OLAP for Decision Support

(Tutorial). SIGMOD Conference 1997: 507-508

[CDN+94] Carey, M., DeWitt, D., Naughton, J., Solomon, M., et. al: Shoring Up Persistent

Applications Proc. of the 1994 ACM SIGMOD Conference, Minneapolis, MN,

May 1994.

[CI98] C. Y. Chan, Y. E. Ioannidis: Bitmap Index Design and Evaluation. SIGMOD

Conference 1998: 355-366

[CI99] C. Y. Chan, Y. E. Ioannidis: Hierarchical Prefix Cubes for Range-Sum Queries.

VLDB 1999: 675-686

[Col96] G. Colliat, Olap relational and multidimensional database systems, in: SIGMOD

Record, 25(3) (September 1996) 64-69.

[CS94] S. Chaudhuri, K. Shim: Including Group-By in Query Optimization. VLDB

1994: 354-366

[Dat97] C.J.Date. A Guide to the SQL STANDARD. Addison Wesley 1997.

[DRSN98] Deshpande, K. Ramasamy, A. Shukla,J. Naughton, Caching multidimensional

Queries using Chunks, in: Proc. ACM SIGMOD Int. Conf. On Management of

Data, (1998) 259-270.

229


[FMB99] Robert Fenk, Volker Markl, Rudolf Bayer: Improving Multidimensional Range

Queries of non rectangular Volumes specified by a Query Box Set. Proc. of In-

ternational Symposium on Database, Web and Cooperative Systems (DWA-

COS), Baden-Baden, Germany, 1999

[FNPS79] Ronald Fagin, Jürg Nievergelt, Nicholas Pippenger, H. Raymond Strong: Ex-

tendible Hashing - A Fast Access Method for Dynamic Files. TODS 4(3): 315-

344 (1979)

[FR91] Christos Faloutsos, Yi Rong: DOT: A Spatial Access Method Using Fractals.

ICDE 1991: 152-159

[GBLP96] Jim Gray, Adam Bosworth, Andrew Layman, Hamid Pirahesh: Data Cube: A

Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-

Total. ICDE 1996: 152-159

[GG98] Volker Gaede, Oliver Günther: Multidimensional Access Methods. ACM Com-

puting Surveys 30(2): 170-231 (1998)

[GHQ95] A. Gupta, V. Harinarayan, D. Quass: Aggregate-Query Processing in Data

Warehousing Environments. VLDB Conference 1995: 358-369

[GKT+00] Dimitrios Gunopulos, George Kollios, Vassilis J. Tsotras, Carlotta Domeniconi

: Approximating Multi-Dimensional Aggregate Range Queries over Real At-

tributes. SIGMOD Conference 2000: 463-474

[GM95] A. Gupta, I. S. Mumick: Maintenance of Materialized Views: Problems, Tech-

niques, and Applications. Data Engineering Bulletin 18(2): 3-18 (1995)

[GNU03] GNU is NOT UNIX, 2003. Available at:http://www.gnu.org

[GR93] J. Gray, A. Reuter, Transaction Processing:Concepts and Techniques (Morgan

Kaufmann, 1993).

G.Graefe. Query Evaluation Techniques for Large Databases. ACM Computing

Surveys 25(2), 1993.

[Gra93]

[GUW00] H. Garcia-Molina, J.D. Ullman, J. Widom: Database System Implementation.

Prentice Hall, New Jersey 2000.

[Hin85] Klaus Hinrichs: Implementation of the Grid File: Design Concepts and Experi-

ence. BIT 25(4): 569-592 (1985)

[HRU96] V. Harinarayan, A. Rajaraman, J.D. Ullman, Implementing Data Cubes Effi-

ciently, in: Proc. ACM SIGMOD Intl Conf. On Management of Data (1996)

205-227.

230


[IP95] Yannis E. Ioannidis, Viswanath Poosala: Histogram-Based Solutions to Diverse

Database Estimation Problems. Data Engineering Bulletin 18(3): 10-18 (1995)

[Jag90] H. V. Jagadish: Linear Clustering of Objects with Multiple Attributes. SIGMOD


[KDB03] Knowledge and Database Systems laboratory at the National Technical Univer-

sity of Athens (NTUA), dept. of Electrical and Computer Engineering, Com-

puter Science division. http://www.dblab.ece.ntua.gr.

[Kim96] R. Kimball. The Data Warehouse Toolkit. John Wiley & Sons, New York.

1996.

[KR88] Brian W. Kernighan and Dennis M. Ritchie. The C Programming Language,

Second Edition, Prentice Hall, Inc., 1988. ISBN 0-13-110362-8 (paperback), 0-

13-110370-9 (hardback).

[KR98] Y. Kotidis, N. Roussopoulos, An Alternative Storage Organization for ROLAP

Aggregate Views Based on Cubetrees, in: Proc. ACM SIGMOD Intl Conf. On

Management of Data (1998): 249-258.

[KS01] N. Karayannidis, and T. Sellis: SISYPHUS: A Chunk-Based Storage Manager

for OLAP Cubes, Proceedings of the 3rd International Workshop on Design and

Management of Data Warehouses (DMDW'2001), Interlaken, Switzerland,

June 2001.

[KS03] N. Karayannidis and T. Sellis: SISYPHUS: The Implementation of a Chunk-

Based Storage Manager for OLAP Data Cubes, to appear in the Data and

Knowledge Engineering journal, 2003.

[KTS+02] N. Karayannidis, A. Tsois, T. Sellis, R. Pieringer, V. Markl, F. Ramsak, R.

Fenk, K. Elhardt, and R. Bayer: Processing Star-Queries on Hierarchically-

Clustered Fact-Tables, in: Proc. VLDB ‘2002 Hong Kong, China.

[KTV+03] N. Karayannidis, A. Tsois, P. Vassiliadis, and T. Sellis, Design of the ERA-

TOSTHENES OLAP Server, in Advances in Informatics – Post-Proceedings of

the 8 Panhellenic Conference in Informatics, Lecture Notes in Computer Sci-

ence, Vol. 2563 Springer Verlag, 2003.

th

[KVT+01] N. Karayannidis, P. Vassiliadis, A. Tsois, and T. Sellis, ERATOSTHENES: De-

sign and Architecture of an OLAP System, in the Proceedings of the 8th Panhel-

lenic Conference on Informatics, Nicosia, Cyprus, November 2001.

[LMS94] A. Y. Levy, I. S. Mumick, Y. Sagiv: Query Optimization by Predicate Move-

231


Around. VLDB Conference 1994: 96-107

[LNX02] The Linux Operating System 2002, available at: http://www.linux.org.

[MRB99] V. Markl, F. Ramsak, and R. Bayer, Improving OLAP Performance by Multi-

dimensional Hierarchical Clustering, in: Proc. IDEAS ‘99 (Montreal, Canada,

1999) 165-177.

[MySQ02] MySQL: The world’s most popular open source database 2002, available at:

http://www.mysql.com.

[NHS84] J. Nievergelt, H. Hinterberger, K. C. Sevcik, The Grid File: An Adaptable,

Symmetric Multikey File Structure, in: TODS 9(1) (1984) 38-71

[ODBC02] UnixODBC 2002,available at: http://www.odbc.org.

[OG95] P. E. O'Neil, G. Graefe: Multi-Table Joins Through Bitmapped Join Indices.

SIGMOD Record 24(3): 8-11 (1995)

[OLR99] OLAP Report. Database Explosion. 1999. Available at:

http://www.olapreport.com/DatabaseExplosion.htm .

[OQ97] P. E. O'Neil, D. Quass: Improved Query Performance with Variant Indexes.

SIGMOD Conference 1997: 38-49

[Ora01] Oracle® 8i Documentation, 2001.

[PER+03] R. Pieringer, K. Elhardt, F. Ramsak, V. Markl, R. Fenk, R. Bayer, N. Karayan-

nidis, A. Tsois, T. Sellis: Combining Hierarchy Encoding and Pre-Grouping:

Intelligent Grouping in Star Join Processing, to appear in Proc. of ICDE 2003,

Bangalore, India

[PG99] Viswanath Poosala, Venkatesh Ganti: Fast Approximate Answers to Aggregate

Queries on a Data Cube. SSDBM 1999: 24-33

[PI97] Viswanath Poosala, Yannis E. Ioannidis : Selectivity Estimation Without the

Attribute Value Independence Assumption. VLDB 1997: 486-495

[Ram02] Frank Ramsak, Towards a general-purpose, multidimensional index: Integra-

tion, Optimization, and Enhancement of UB-Trees. Ph.D. Thesis, TU München,

July 2002

[OM84] Jack A. Orenstein, T. H. Merrett: A Class of Data Structures for Associative

Searching. PODS 1984: 181-190

[PIH+96] Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita: Im-

proved Histograms for Selectivity Estimation of Range Predicates. SIGMOD


232


[Ram98]

[Reg85] Mireille Régnier: Analysis of Grid File Algorithms. BIT 25(2): 335-357 (1985)

[RKR97] N. Roussopoulos, Y. Kotidis, and M.Roussopoulos, Cubetree: Organization of

and Bulk Incremental Updates on the Data Cube, in: Proc. ACM SIGMOD In-

ternational Conference on Management of Data (Tuscon, Arizona, May 1997)

89-99.

Frank Ramsak, Volker Markl, Robert Fenk, Martin Zirkel, Klaus Elhardt, Ru-

dolf Bayer: Integrating the UB-Tree into a Database System Kernel. VLDB

2000: 263-272.

[Rou98] N. Roussopoulos: Materialized Views and Data Warehouses. SIGMOD Record

27(1): 21-26 (1998)

Raghu Ramakrishnan. Database Management Systems. McGRAW-HILL 1998.

pg 104.

[RMF+00]

[Sag94 ] Hans Sagan.Space-Filling Curves .Springer Verlag,1994.

[Sam01] C. Samios: Design and Implementation of Query Evaluation Algorithms for

Multidimensional Data, Diploma Thesis, National Technical University of Ath-

ens, 2001.

[Sar97] S. Sarawagi: Indexing OLAP Data. Data Engineering Bulletin 20(1): 36-43

(1997)

[SDJL96] D. Srivastava, S. Dar, H. V. Jagadish, A. Y. Levy: Answering Queries with Ag-

gregation Using Views. VLDB Conference 1996: 318-329

[Ses98] P. Seshadri. PREDATOR: Design and Implementation, available at:

http://www.cs.cornell.edu/Info/Projects/PREDATOR/designdoc.html, 1998.

[SMA99] Storage Manager Architecture. CS Dept., Univ. of Wisconsin-Madison, 1999.

Available at: Available at: ftp://ftp.cs.wisc.edu/shore/current/smdoc.pdf.

[SRH90] M. Stonebraker, L. A. Rowe, M. Hirohama, The Implementation of Postgres, in:

TKDE 2(1) (1990) 125-142.

[SS94] S. Sarawagi and M. Stonebraker, Efficient Organization of Large Multidimen-

sional Arrays, in: Proc. Of the 11 Int. Conf. On Data Eng. (1994) 326-336. th

[SSM99] The Shore Project Group. The Shore Storage Manager Programming Interface.

CS Dept., Univ. of Wisconsin-Madison, 1999. Available at:

ftp://ftp.cs.wisc.edu/shore/current/ssmapi.pdf.

[STL94] Standard Template Library Programmer's Guide. Available at:

http://www.sgi.com/tech/stl/, Hewlett-Packard Company 1994.

233


[Str97] B. Stroustrup. The C++ Programming Language (3rd edition). Addison Wesley

Longman, Reading, MA. 1997.

[SVAS99] The Shore Project Group. Writing Value-Added Servers with the Shore Storage

Manager. CS Dept., Univ. of Wisconsin-Madison, 1999. Available at:

ftp://ftp.cs.wisc.edu/shore/current/ssmvas.pdf.

[TBHC00] The TransBase HyperCube® relational database system 2000, available at:

http://www.transaction.de.

[TKS+02] Aris Tsois, Nikos Karayannidis, Timos K. Sellis, Dimitri Theodoratos: Cost-

based optimization of aggregation star queries on hierarchically clustered data

warehouses. DMDW 2002.

[TKS01] A. Tsois, N. Karayannidis, and T. Sellis: MAC: Conceptual data modeling for

OLAP, Proceedings of the 3rd International Workshop on Design and Manage-

ment of Data Warehouses (DMDW'2001), Interlaken, Switzerland, June 2001.

[TT01] D. Theodoratos, A. Tsois: Heuristic Optimization of OLAP Queries in Multidi-

mensionally Hierarchically Clustered Databases. DOLAP 2001.

[VS00] P. Vassiliadis, S. Skiadopoulos. Modelling and Optimization Issues for Multi-

dimensional Databases. In Proc. CAiSE '00 (Stockholm, Sweden, June 2000)

482-497.

[VWI98] Jeffrey Scott Vitter, Min Wang, Balakrishna R. Iyer : Data Cube Approximation

and Histograms via Wavelets. CIKM 1998: 96-104

[WB98] M.C. Wu, A. P. Buchmann: Encoded Bitmap Indexing for Data Warehouses.

ICDE 1998: 220-230

[Wei95] M.A.Weiss. Data Structures and Algorithm Analysis. The Benjamin/Cummings

Publishing Company Inc. 1995. pg 351- 359.

[WK91] Kyu-Young Whang, Ravi Krishnamurthy: The Multilevel Grid File - A Dy-

namic Hierarchical Multidimensional File Structure. DASFAA 1991: 449-459

[WOS01] K.Wu, E. J. Otoo, A. Shoshani: A Performance Comparison of bitmap indexes.

CIKM 2001: 559-561

[WSB98] Roger Weber, Hans-Jörg Schek, Stephen Blott: A Quantitative Analysis and

Performance Study for Similarity-Search Methods in High-Dimensional Spaces.

VLDB 1998: 194-205

[Wu99] Ming-Chuan Wu: Query Optimization for Selections Using Bitmaps. SIGMOD


234


[YL94] W. P. Yan, P-Å. Larson: Performing Group-By before Join. ICDE 1994: 89-100

[YL95] W. P. Yan, P.-Å. Larson: Eager Aggregation and Lazy Aggregation. VLDB

Conference 1995

[ZDN97] Yihong Zhao, Prasad Deshpande, Jeffrey F. Naughton: An Array-Based Algo-

rithm for Simultaneous Multidimensional Aggregates. SIGMOD Conference

1997: 159-170

235

Documents

Storage Structures, Query Processing and Implementation of On-Line Analytical ... · 2004. 2. 19. · Abstract On Line Analytical Processing (OLAP) has caused a significant shift