24
Parallel Multi- Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ. Todd Eavis, Dalhousie Univ.

Parallel Multi-Dimensional ROLAP Indexing

  • Upload
    angus

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

Parallel Multi-Dimensional ROLAP Indexing. Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ. Todd Eavis, Dalhousie Univ. Data Warehousing for Decision Support. Operational data collected into DW - PowerPoint PPT Presentation

Citation preview

Page 1: Parallel Multi-Dimensional ROLAP Indexing

Parallel Multi-Dimensional ROLAP Indexing

Andrew Rau-ChaplinFaculty of Computer Science

Dalhousie University

Joint work with

Frank Dehne, Carleton Univ.

Todd Eavis, Dalhousie Univ.

Page 2: Parallel Multi-Dimensional ROLAP Indexing

Data Warehousing for Decision Support

Operational data collected into DW

DW used to support multi-dimensional views

Views form the basis of OLAP processing

Our focus: the OLAP server

Data MiningAnalysisQuery Reports

Olap ServerOlap Server

Meta Data Repository

MonitoringAdministration

Operational Databases

Data Warehouse

Data Marts

External Sources

ExtractClean

TransformLoad

Refresh

Output

Front-End Tools

Olap Engines

Data Storage

Data Cleaningand

Integration

Page 3: Parallel Multi-Dimensional ROLAP Indexing

Multi-dimensional views

Collection of feature attributes

Aggregate along one or more measure attributes

Reduce the granularity by “collapsing” dimensions

Points generated by: distributive functions(e.g.,

sum) algebraic functions (e.g.,

average) holistic functions(e.g.,

median)

Red

White

Blue

By Make & Colour

By Colour

By Make

1993

19901991

1992

ChevyFord

By Year

By Colour & Year

By Make & Year

Page 4: Parallel Multi-Dimensional ROLAP Indexing

Data Cube Generation

Proposed by Gray et al in 1995

Can be generated “manually” from a relational DB but this is very inefficient

Exploit the relationship between cuboids to compute all 2d cuboids

In OLAP environments, we typically pre-compute these views to improve query response time

ABC

AB AC BC

A C B

ALL

Page 5: Parallel Multi-Dimensional ROLAP Indexing

Existing Parallel Results

Goil & ChoudharyMOLAP solution

in-memory structures global partition + d

communication rounds

distributed viewsLimitations

Memory for multi-dimensional arrays

expensive communication for larger d

J. Of Data Mining & Knowledge Discovery 1(4), 1997

Page 6: Parallel Multi-Dimensional ROLAP Indexing

Our Approach

ROLAP solution Construct and cost the

data cube lattice Find a “least cost”

spanning tree Partition the spanning tree

over the processors equally, construct views and distribute

Can handle partial cubes

Limitations What about indexing?????

ABCD

ABC ABD ACD BCD

AB AC AD BC BD CD

AA BB CC DD

All

CCGrid’01 + J. Dist. & Parallel Databases 11(2), 2001

Page 7: Parallel Multi-Dimensional ROLAP Indexing

Parallel Multi-dimensional Indexing

Query specifies a range on multiple dimensions

Forms a hypercube in the point space

Page 8: Parallel Multi-Dimensional ROLAP Indexing

General Approach

No multidimensional index is universally successful

Exploit domain specific information and the features of a particular index

OLAP Data is provided up front Updates are batch oriented

Page 9: Parallel Multi-Dimensional ROLAP Indexing

Design Goals

A framework for distributed high-performance indexing of ROLAP cubes Practical to implement Low communication volume Fully adapted to external memory (disks) No shared disk required Incrementally maintainable Efficient for high D spatial searches Scalable in terms of data size,

dimensions, processors

Page 10: Parallel Multi-Dimensional ROLAP Indexing

Challenge

How to order and partition data such that Number of records retrieved per node is

as balanced as possible Minimize the number of disk seeks

required in answering a queryABC

P1 P2 P3 P4

Page 11: Parallel Multi-Dimensional ROLAP Indexing

Indexing the Data Cube

Combine the strengths of a space filling and an r-tree index

Use Hilbert curve to load buckets

Index buckets with r-tree

Update indexes with merge/sort

Page 12: Parallel Multi-Dimensional ROLAP Indexing

Space Filling Curves & Striping

Page 13: Parallel Multi-Dimensional ROLAP Indexing

Query Retrieval

P1 P2 P3 P4

ABC ABC ABC ABC

Page 14: Parallel Multi-Dimensional ROLAP Indexing

Example

Original Space Processor 1 Processor 2

8 points to be reported

Reports:2 consecutive blocks & 4 points

Reports:2 consecutive blocks & 4 points

Page 15: Parallel Multi-Dimensional ROLAP Indexing

The Parallel Framework

A single view is partitioned across p processors

Partial Hilbert/r-tree indexes are computed locally

Queries are answered concurrently

Queries answered individually or “piggy-backed”

Page 16: Parallel Multi-Dimensional ROLAP Indexing

The Virtual Data Cube

Problem: Full cube often to large to materialize

Solution: Use surrogate views

Page 17: Parallel Multi-Dimensional ROLAP Indexing

Surrogate Processing

Page 18: Parallel Multi-Dimensional ROLAP Indexing

Other issues…

Dimension orderingQuery piggybacking Batch updatingManaging Hierarchies of views

Page 19: Parallel Multi-Dimensional ROLAP Indexing

Experimental Results

Machine 17 node cluster Node = 1.8 GHz Xeon, 1 GB RAM, 2 * 40

GB IDE drives, running Linux Interconnect = Intel Fast Ethernet

switchTest Data

10 dimensions and 1,000,000 records

Page 20: Parallel Multi-Dimensional ROLAP Indexing

RCUBE index Construction

Output: ~640 million rows, 16 Gigabytes

Page 21: Parallel Multi-Dimensional ROLAP Indexing

Distributed Query Resolution

Test: Random queries returning ~15% of points (10 experiments per point)

Page 22: Parallel Multi-Dimensional ROLAP Indexing

Disk blocks retrieved vs. Disk Seeks

Test: Random queries returning 5-15% of points (15 experiments per point)

Page 23: Parallel Multi-Dimensional ROLAP Indexing

Distributed Query Resolution in Surrogate Group-bys

Page 24: Parallel Multi-Dimensional ROLAP Indexing

Thank You

Questions?