- 1. Stack It & Pack It Partitioning And Compression For
Warehouses / VLDB Jeff Moss
2. Who Dunnit ? 3. Agenda
- Squeeze your data with data segment compression
4. My Background
- 13 years Oracle experience
- Blog:http://oramossoracle.blogspot.com/
- Focused on warehousing / VLDB since 1998
-
- Produces BBC Radio 1 Top 40 chart and many more
-
- 2 billion row sales fact table
- Currently working with Eon UK (Powergen)
-
- 4Tb Production Warehouse, 8Tb total storage
5. What Is Data Segment Compression ?
- Compresses data by eliminating intra block repeated column
values
- Reduces the space required for a segment
-
- but only if there are appropriate repeats!
6. Where Can Data Segment Compression Be Used ?
- Can be used with a number of segment types
-
- Indexes but they have row level compression
-
- Tables that are part of a Cluster
7. How Does Segment Compression Work ? Database Block Symbol
Table Row Data Area Block Common Header (20 bytes) Transaction
Header (24 bytes fixed + 24 bytes per ITL) Data Header (14 bytes)
Compressed Data Header (16 bytes -variable ) Tail (4 bytes) 100
Call to discuss bill amount TEL NO YES 3 TEL 4 NO 5 YES 2 Call to
discuss bill amount 1 100 1 2 3 4 5 101 Call to discuss new product
MAIL NO N/A 8 MAIL 9 N/A 7 Call to discuss new product 6 101 6 7 8
4 9 102 Call to discuss new product TEL YES N/A 10 7 3 5 9 10 102
ID DESCRIPTION CONTACT TYPE OUTCOME FOLLOWUP Table Directory (8
bytes) Row Directory (2 bytesper row ) 8. What Affects Compression
?
-
- I asked but support wouldnt play ball!
-
- Anything which affectsblock overhead
-
-
- Interested Transaction Lists ( INITRANS )
-
- Number of repeats ( in the block )
-
- Length of column value(s)
9. Compression v Block Size
- 200K rows, Non ASSM Uniform Local extents
- More chance of repeats in any given block
10. Compression v ITL
- 10K rows, Non ASSM Uniform Local extents
- More ITL = more overhead = less repeats
11. Compression v Number Of Columns
- 500K rows, Non ASSM Uniform Local extents
- Same amount of data to store
- More columns = more overhead = less repeats
12. Compression v PCTFREE
- 200K rows, Non ASSM Uniform Local extents
- Higher PCTFREE = less space = less repeats
13. Compression v NDV
- 200K rows, Non ASSM Uniform Local extents
- Higher NDV = less repeats
14. Compression v Column Length
- 80K rows, Non ASSM Uniform Local extents
- Minimum 6 characters for compression
- Longer Length = more compression savings
15. Compression v Ordering
- Colocate data to maximise compression benefits
-
- Minimise the total space required by the segment
-
- Identify most compressable column(s)
-
- We know how the data is to be queried
-
-
- Then the next most compressable column(s)
Uniformly distributed Colocated 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1
2 3 4 5 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 16. Get Max
Compression Order Package
-
- PROCEDURE mgmt_p_get_max_compress_order
-
- Argument NameTypeIn/Out Default?
-
- ------------------------------ ----------------------- ------
--------
-
- P_TABLE_OWNERVARCHAR2INDEFAULT
-
- P_PARTITION_NAMEVARCHAR2INDEFAULT
-
- P_SAMPLE_SIZENUMBERINDEFAULT
-
- P_PREFIX_COLUMN1VARCHAR2INDEFAULT
-
- P_PREFIX_COLUMN2VARCHAR2INDEFAULT
-
- P_PREFIX_COLUMN3VARCHAR2INDEFAULT
-
- mgmt_p_get_max_compress_order(p_table_owner => AE_MGMT
-
- ,p_table_name =>BIG_TABLE
Running mgmt_p_get_max_compress_order...
----------------------------------------------------------------------------------------------------
Table: BIG_TABLE Sample Size: 10000 Unique Run ID: 25012006232119
ORDER BY Prefix:
----------------------------------------------------------------------------------------------------
Creating MASTER Table: TEMP_MASTER_25012006232119 Creating COLUMN
Table 1: COL1 Creating COLUMN Table 2: COL2 Creating COLUMN Table
3: COL3
----------------------------------------------------------------------------------------------------
The output below lists each column in the table and the number of
blocks/rows and space used when the table data is ordered by only
that column, or in the case where a prefix has been specified,
where the table data is ordered by the prefix and then that column.
From this one can determine if there is a specific ORDER BY which
can be applied to to the data in order to maximise compression
within the table whilst, in the case of a a prefix being present,
ordering data as efficiently as possible for the most common access
path(s).
----------------------------------------------------------------------------------------------------
NAMECOLUMNBLOCKSROWS SPACE_GB ==============================
============================== ============ ============ ========
TEMP_COL_001_25012006232119COL129010000 .0022
TEMP_COL_002_25012006232119COL234510000 .0026
TEMP_COL_003_25012006232119COL355510000 .0042 17. Pros &
Cons
-
-
- Speeds up backup/recovery
-
-
- Improves query response time
-
- Decreases time to perform some DML
-
-
- Bulk insertsmaybe quicker
18. Pros & Cons
-
- Can only be used on Direct Path operations
-
-
- Serial Inserts using INSERT /*+ APPEND */
-
- Increases time to perform some DML
19. Data Warehousing Specifics
- Star Schema compresses better than Normalized
-
- Fact Tables and Summaries in Star Schema
-
- Transaction tables in Normalized Schema
1 -Table Compression in Oracle 9iR2: A Performance Analysis 20.
Things To Watch Out For
-
- ORA-39726: Unsupported add/drop column operation on compressed
tables
-
- Uncompress the table and try again - still gives
ORA-39726!
- After UPDATEs data is uncompressed
- Use appropriate physical design settings
-
- PCTFREE 0- pack each block
-
- Large blocksize -reduce overhead / increase repeats per
block
-
- Minimise INITRANS -reduce overhead
- Order data for best compression / access path
21. A Funny Thing
- Block dump trace files still show 9iR2 even in 10g
releases
- ALTER SYSTEM DUMP DATAFILE x BLOCK y;
Thanks to Julian Dyke for the block dumping information
http://www.juliandyke.com 22. What Is Partitioning ?
- Partitioningaddresses key issues in supporting very large
tables and indexes by letting you decompose them intosmallerand
moremanageablepieces calledpartitions . Oracle Database Concepts
Manual, 10gR2
- Numerous improvements since
- Subpartitioning adds another level of decomposition
- Partitions and Subpartitions are logical containers
23. Partition To Tablespace Mapping
- Partitions map to tablespaces
-
- Partition can only be in One tablespace
-
- Tablespace can hold many partitions
-
- Highest granularity is One tablespace per partition
-
- Lowest granularity is One tablespace for all the
partitions
P_JAN_2005 P_FEB_2005 P_MAR_2005 P_APR_2005 P_MAY_2005
P_JUN_2005 P_JUL_2005 P_AUG_2005 P_SEP_2005 P_OCT_2005 P_NOV_2005
P_DEC_2005 T_Q1_2005 T_Q2_2005 T_Q3_2005 T_Q4_2005 T_Q1_2006
P_JAN_2006 P_FEB_2006 P_MAR_2006 T_Q3_2005 Read / Write Read Only
24. Read Only Tablespaces
- Reduced space use via compression
Partition Tablespace 25. Why Partition ? - Performance
- Improved query performance
SELECT SUM(sales)FROM part_tab WHERE sales_date BETWEEN
01-JAN-2005AND 30-JUN-2005 Sales Fact Table * Oracle 10gR2 Data
Warehousing Manual JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
26. Why Partition ? - Manageability
-
- Use a rolling window approach
-
- ALTER TABLE ADD/SPLIT/DROP PARTITION
-
- Build a new dataset in a staging table
-
- Add indexes and constraints
-
- Then swap the staging table for a partition on the target
-
-
- ALTER TABLEEXCHANGE PARTITION
-
- Table partition move, e.g. to compress data
-
- Local Index partition rebuild
27. Why Partition ? - Scalability
- Partition is generally consistent and predictable
-
- Assuming an appropriate partitioning key is used
-
- and data has an even distribution across the key
-
- Scalable backups - read only tablespaces are ignored
-
- so partitions in those tablespaces are ignored
- Pruning allows consistent query performance
28. Why Partition ? - Availability
- Offline data impact minimised
P_JAN_2005 P_FEB_2005 P_MAR_2005 P_APR_2005 P_MAY_2005
P_JUN_2005 P_JUL_2005 P_AUG_2005 P_SEP_2005 P_OCT_2005 P_NOV_2005
P_DEC_2005 T_Q1_2005 T_Q2_2005 T_Q3_2005 T_Q4_2005 T_Q1_2006
P_JAN_2006 P_FEB_2006 P_MAR_2006 T_Q3_2005 Read / Write Read Only
29. Fact Table Partitioning Transaction Date Load Date
-
- Each load deals with only 1 partition
- No use to end user queries!
-
- But still uses EXCHANGE PARTITION
- Useful to end user queries
-
- Allows full pruning capability
07-JAN-2005 Customer 1 09-JAN-2005 15-JAN-2005 Customer 2
17-JAN-2005 January Partition February Partition 22-JAN-2005
Customer 3 01-FEB-2005 02-FEB-2005 Customer 4 05-FEB-2005
26-FEB-2005 Customer 5 28-FEB-2005 March Partition 06-MAR-2005
Customer 2 07-MAR-2005 12-MAR-2005 Customer 3 15-MAR-2005 Tran Date
Customer Load Date April Partition 21-JAN-2005 Customer 7
04-APR-2005 09-APR-2005 Customer 9 10-APR-2005 07-JAN-2005 Customer
1 09-JAN-2005 15-JAN-2005 Customer 2 17-JAN-2005 21-JAN-2005
Customer 7 04-APR-2005 22-JAN-2005 Customer 3 01-FEB-2005 January
Partition February Partition 02-FEB-2005 Customer 4 05-FEB-2005
26-FEB-2005 Customer 5 28-FEB-2005 March Partition 06-MAR-2005
Customer 2 07-MAR-2005 12-MAR-2005 Customer 3 15-MAR-2005 Tran Date
Customer Load Date April Partition 09-APR-2005 Customer 9
10-APR-2005 30. Watch out for
- Partition exchange and table statistics 1
-
- but Global stats are NOT!
-
- Affects queries accessing multiple partitions
-
-
- Gather stats on staging table prior to EXCHANGE
-
-
- Gather stats on partitioned table using GLOBAL
Jonathan Lewis: Cost-Based Oracle Fundamentals, Chapter 2 31.
Partitioning Feature: Characteristic Reason Matrix Partition
Truncation Exchange Partition Archiving Pruning (Partition
Elimination) Partition wise joins Parallel DML Local Indexes Read
Only Partitions Availability Scalability Manageability Performance
Characteristic: Feature: 32. Questions ? 33. References: Papers
- Table Compression in Oracle 9iR2: A Performance Analysis
- Table Compression in Oracle 9iR2: An Oracle White Paper
- Scaling To Infinity, Partitioning In Oracle Data Warehouses,
Tim Gorman
- Decision Speed: Table Compression In Action
34. References: Online Presentation / Code
-
http://www.oramoss.demon.co.uk/presentations/stackitandpackit.ppt
-
http://www.oramoss.demon.co.uk/Code/mgmt_p_get_max_compression_order.prc
-
http://www.oramoss.demon.co.uk/Code/test_dml_performance_delete.sql
-
http://www.oramoss.demon.co.uk/Code/test_dml_performance_insert.sql
-
http://www.oramoss.demon.co.uk/Code/test_dml_performance_update.sql
-
http://www.oramoss.demon.co.uk/Code/test_block_size_compression.sql
-
http://www.oramoss.demon.co.uk/Code/test_column_length_compression.sql
-
http://www.oramoss.demon.co.uk/Code/test_itl_compression.sql
-
http://www.oramoss.demon.co.uk/Code/test_ndv_compression.sql
-
http://www.oramoss.demon.co.uk/Code/test_num_cols_compression.sql
-
http://www.oramoss.demon.co.uk/Code/test_pctfree_compression.sql