Upload
terraborealis
View
2.148
Download
0
Tags:
Embed Size (px)
DESCRIPTION
High-level overview of the multi-dimensional table clustering feature of DB2.
Citation preview
© 2009 IBM Corporation
Multi-Dimensional ClusteringA High-Level Overview
Zoran Kulina
DB2 CE Kernel Development
Multi-Dimensional Clustering
© 2009 IBM Corporation2
MDC Purpose
One of the three methods for partitioning data in DB2 (others being range and database partitioning).
Allows flexible, continuous and automatic clustering of data along multiple dimensions.
Primarily intended for data warehousing and large database systems; can also be used in OLTP environments.
Enables a table to be physically clustered on more than one key (or dimension) simultaneously.
Multi-Dimensional Clustering
© 2009 IBM Corporation3
MDC Concepts
Block– MDC version of extent
– Consecutive set of pages on the disk
– The smallest allocation unit of an MDC table
Block index– Automatically created
– Point to blocks of data rather than individual rows
– Cannot enforce uniqueness
– Cannot be dropped
Multi-Dimensional Clustering
© 2009 IBM Corporation4
MDC Concepts
Dimension block index– One per dimension
– Used to access dimension data
Composite block index– One per table or partition
– Contains all dimension columns
– Used to maintain clustering of data during insert or update
Multi-Dimensional Clustering
© 2009 IBM Corporation5
MDC Concepts
Block map
– Maintains usage status information for blocks (extents)
– Facilitates quick lookup of empty blocks in MDC tables
Reserved Data stored
year
East, 1996
North, 1996
North, 1997
South, 1999
0
1
2
3
4
5
6...
Extents in the table
0
X
1
F U U U F U
2 3 4 5 6 ...7
F ...
Reserved
Free - no bits set
X
F
U In use - data assigned to a cell
Multi-Dimensional Clustering
© 2009 IBM Corporation6
MDC Concepts
Dimension– Ordered set of one or more columns (clustering keys) of the table
– Axis along which data is organized in an MDC table
– Example: dimensions for nation, color, and year
1997, Canada,
blue
1997, Mexico, yellow
1997, Mexico,
blue
1997, Canada, yellow
1998, Mexico, yellow1997,
Mexico, yellow
1998, Canada, yellow1997,
Canada, yellow
yeardimension
colourdimension
nationdimension
Multi-Dimensional Clustering
© 2009 IBM Corporation7
MDC Concepts
Slice
– Portion of the table that contains all the rows that have a specific dimension value (e.g. nation = ‘Canada’)
1997, Canada,
blue
1997, Mexico, yellow
1997, Mexico,
blue
1997, Canada, yellow
1998, Mexico, yellow1997,
Mexico, yellow
1998, Canada, yellow1997,
Canada, yellow
yeardimension
colourdimension
nationdimension
Canada slice
Multi-Dimensional Clustering
© 2009 IBM Corporation8
MDC Concepts
Cell
– Portion of the table that contains rows having the same unique set of dimension values
– Intersection of slices from each dimension (e.g. all records where year=2002, country='Canada', and color='yellow‘)
yeardimension
colourdimension
nationdimension
Cell for (1997, Canada, yellow)
1997, Canada,
blue
1997, Mexico, yellow
1997, Mexico,
blue
1997, Canada, yellow
1998, Canada, yellow
1997, Mexico, yellow
1998, Mexico, yellow1997,
Canada, yellow
1998, Canada, yellow
1998, Mexico, yellow
Each cell contains one or more blocks.
Multi-Dimensional Clustering
© 2009 IBM Corporation9
MDC Syntax
ORGANIZE BY clause in CREATE TABLE
CREATE TABLE mdctable (Year INT,Nation CHAR(25),Colour VARCHAR(10),... )ORGANIZE BY (Year, Nation, Colour)
This MDC table will have four block indexes:
– Three dimension block indexes: Year, Nation and Colour
– One composite block index: (Year, Nation, Colour)
Multi-Dimensional Clustering
© 2009 IBM Corporation10
MDC Syntax
DB2_MDC_ROLLOUT registry variable
– 1, TRUE, ON, YES, IMMEDIATE (default)
– 0, FALSE, OFF, NO
– DEFER
Delete statement special register
– SET CURRENT ROLLOUT MODE IMMEDIATE CLEANUP
– SET CURRENT ROLLOUT MODE NONE
– SET CURRENT ROLLOUT MODE DEFERRED CLEANUP
Multi-Dimensional Clustering
© 2009 IBM Corporation11
MDC Benefits
Improved query performance
– Block indexes are much smaller than row-level indexes
– Data is guaranteed to be clustered
– Prefetching is more efficient with MDC tables
Reduced logging
– Inserts are not logged unless a new block is needed
– Mass deletes (rollouts) of entire cells log less data than regular deletes
Multi-Dimensional Clustering
© 2009 IBM Corporation12
MDC Benefits
Reduced table maintenance
– Clustering maintained automatically
– No need for reorg unless to reclaim space
Reduced application dependence on clustering indexes
– No need to reference columns in particular order for optimum usage
Multi-Dimensional Clustering
© 2009 IBM Corporation13
MDC Usage Considerations
Performance – Best suited for data warehouses where queries are complex and
long-running
– Good for OLTP environments, but some update operations on MDC tables may take longer than on regular tables
Disk space– MDC tables takes more space than equivalent regular tables
Table design– Poor selection of clustering key may lead to wasted disk space and
no performance gain
Multi-Dimensional Clustering
© 2009 IBM Corporation14
References
DB2 V9.7 Documentation
– http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.partition.doc/doc/c0007201.html
Database Partitioning, Table Partitioning and MDC for DB2 9
– http://www.redbooks.ibm.com/abstracts/SG247467.html