14
© 2009 IBM Corporation Multi-Dimensional Clustering A High-Level Overview Zoran Kulina DB2 CE Kernel Development

Multi-Dimensional Clustering: A High-Level Overview

Embed Size (px)

DESCRIPTION

High-level overview of the multi-dimensional table clustering feature of DB2.

Citation preview

Page 1: Multi-Dimensional Clustering: A High-Level Overview

© 2009 IBM Corporation

Multi-Dimensional ClusteringA High-Level Overview

Zoran Kulina

DB2 CE Kernel Development

Page 2: Multi-Dimensional Clustering: A High-Level Overview

Multi-Dimensional Clustering

© 2009 IBM Corporation2

MDC Purpose

One of the three methods for partitioning data in DB2 (others being range and database partitioning).

Allows flexible, continuous and automatic clustering of data along multiple dimensions.

Primarily intended for data warehousing and large database systems; can also be used in OLTP environments.

Enables a table to be physically clustered on more than one key (or dimension) simultaneously.

Page 3: Multi-Dimensional Clustering: A High-Level Overview

Multi-Dimensional Clustering

© 2009 IBM Corporation3

MDC Concepts

Block– MDC version of extent

– Consecutive set of pages on the disk

– The smallest allocation unit of an MDC table

Block index– Automatically created

– Point to blocks of data rather than individual rows

– Cannot enforce uniqueness

– Cannot be dropped

Page 4: Multi-Dimensional Clustering: A High-Level Overview

Multi-Dimensional Clustering

© 2009 IBM Corporation4

MDC Concepts

Dimension block index– One per dimension

– Used to access dimension data

Composite block index– One per table or partition

– Contains all dimension columns

– Used to maintain clustering of data during insert or update

Page 5: Multi-Dimensional Clustering: A High-Level Overview

Multi-Dimensional Clustering

© 2009 IBM Corporation5

MDC Concepts

Block map

– Maintains usage status information for blocks (extents)

– Facilitates quick lookup of empty blocks in MDC tables

Reserved Data stored

year

East, 1996

North, 1996

North, 1997

South, 1999

0

1

2

3

4

5

6...

Extents in the table

0

X

1

F U U U F U

2 3 4 5 6 ...7

F ...

Reserved

Free - no bits set

X

F

U In use - data assigned to a cell

Page 6: Multi-Dimensional Clustering: A High-Level Overview

Multi-Dimensional Clustering

© 2009 IBM Corporation6

MDC Concepts

Dimension– Ordered set of one or more columns (clustering keys) of the table

– Axis along which data is organized in an MDC table

– Example: dimensions for nation, color, and year

1997, Canada,

blue

1997, Mexico, yellow

1997, Mexico,

blue

1997, Canada, yellow

1998, Mexico, yellow1997,

Mexico, yellow

1998, Canada, yellow1997,

Canada, yellow

yeardimension

colourdimension

nationdimension

Page 7: Multi-Dimensional Clustering: A High-Level Overview

Multi-Dimensional Clustering

© 2009 IBM Corporation7

MDC Concepts

Slice

– Portion of the table that contains all the rows that have a specific dimension value (e.g. nation = ‘Canada’)

1997, Canada,

blue

1997, Mexico, yellow

1997, Mexico,

blue

1997, Canada, yellow

1998, Mexico, yellow1997,

Mexico, yellow

1998, Canada, yellow1997,

Canada, yellow

yeardimension

colourdimension

nationdimension

Canada slice

Page 8: Multi-Dimensional Clustering: A High-Level Overview

Multi-Dimensional Clustering

© 2009 IBM Corporation8

MDC Concepts

Cell

– Portion of the table that contains rows having the same unique set of dimension values

– Intersection of slices from each dimension (e.g. all records where year=2002, country='Canada', and color='yellow‘)

yeardimension

colourdimension

nationdimension

Cell for (1997, Canada, yellow)

1997, Canada,

blue

1997, Mexico, yellow

1997, Mexico,

blue

1997, Canada, yellow

1998, Canada, yellow

1997, Mexico, yellow

1998, Mexico, yellow1997,

Canada, yellow

1998, Canada, yellow

1998, Mexico, yellow

Each cell contains one or more blocks.

Page 9: Multi-Dimensional Clustering: A High-Level Overview

Multi-Dimensional Clustering

© 2009 IBM Corporation9

MDC Syntax

ORGANIZE BY clause in CREATE TABLE

CREATE TABLE mdctable (Year INT,Nation CHAR(25),Colour VARCHAR(10),... )ORGANIZE BY (Year, Nation, Colour)

This MDC table will have four block indexes:

– Three dimension block indexes: Year, Nation and Colour

– One composite block index: (Year, Nation, Colour)

Page 10: Multi-Dimensional Clustering: A High-Level Overview

Multi-Dimensional Clustering

© 2009 IBM Corporation10

MDC Syntax

DB2_MDC_ROLLOUT registry variable

– 1, TRUE, ON, YES, IMMEDIATE (default)

– 0, FALSE, OFF, NO

– DEFER

Delete statement special register

– SET CURRENT ROLLOUT MODE IMMEDIATE CLEANUP

– SET CURRENT ROLLOUT MODE NONE

– SET CURRENT ROLLOUT MODE DEFERRED CLEANUP

Page 11: Multi-Dimensional Clustering: A High-Level Overview

Multi-Dimensional Clustering

© 2009 IBM Corporation11

MDC Benefits

Improved query performance

– Block indexes are much smaller than row-level indexes

– Data is guaranteed to be clustered

– Prefetching is more efficient with MDC tables

Reduced logging

– Inserts are not logged unless a new block is needed

– Mass deletes (rollouts) of entire cells log less data than regular deletes

Page 12: Multi-Dimensional Clustering: A High-Level Overview

Multi-Dimensional Clustering

© 2009 IBM Corporation12

MDC Benefits

Reduced table maintenance

– Clustering maintained automatically

– No need for reorg unless to reclaim space

Reduced application dependence on clustering indexes

– No need to reference columns in particular order for optimum usage

Page 13: Multi-Dimensional Clustering: A High-Level Overview

Multi-Dimensional Clustering

© 2009 IBM Corporation13

MDC Usage Considerations

Performance – Best suited for data warehouses where queries are complex and

long-running

– Good for OLTP environments, but some update operations on MDC tables may take longer than on regular tables

Disk space– MDC tables takes more space than equivalent regular tables

Table design– Poor selection of clustering key may lead to wasted disk space and

no performance gain

Page 14: Multi-Dimensional Clustering: A High-Level Overview

Multi-Dimensional Clustering

© 2009 IBM Corporation14

References

DB2 V9.7 Documentation

– http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.partition.doc/doc/c0007201.html

Database Partitioning, Table Partitioning and MDC for DB2 9

– http://www.redbooks.ibm.com/abstracts/SG247467.html