24
Star Transformations Tony Hasler, UKOUG Birmingham 2012 Tony Hasler, Anvil Computer Services Ltd.

Star Transformations Tony Hasler, UKOUG Birmingham 2012 Tony Hasler, Anvil Computer Services Ltd

Embed Size (px)

Citation preview

Star TransformationsTony Hasler, UKOUG Birmingham 2012

Tony Hasler, Anvil Computer Services Ltd.

Who is Tony Hasler?

Google Tony Hasler! My blog contains all the material from this presentation on

the front page

Tony Hasler http://tonyhasler.wordpress.com

Acknowledgments

SQL Tuning by Dan Tow Optimizer team blog:

https://blogs.oracle.com/optimizer/entry/star_transformation

(Sunil Chakkappen 2010)Examples Based on11.2.0.3 Enterprise Edition

Problem statement: Distributed Filters

Generally, we want to access very large tables using an index on the most selective filter we can find.−small (ish) tables may be accessed by a full-table scan

using multi-block reads−Partitioning by column X is an alternative to creating an

index on X (and a local index on Y is an alternative to a multi-column index on X,Y)

This is because we want to avoid reading data from a table only to reject it later on.

The most selective filter may or may not be a join condition However, sometimes a combination of filters provides

much stronger selectivity than any one filter. When combinations of filters involve join conditions they

are called Distributed Filters (by Dan Tow).Tony Hasler http://tonyhasler.wordpress.com

Query using SH example schema

SELECT prod_name,cust_first_name, time_id, amount_sold

FROM customers c, products p, sales s WHERE s.cust_id = c.cust_id AND s.prod_id = p.prod_id AND c.cust_last_name = 'Everett' AND p.prod_category = 'Electronics';

NOTE: For convenience I will refer to the SALES table as a FACT table and the CUSTOMERS and PRODUCTS tables as DIMENSION tables but star transformations are NOT restricted to star schemas!

Tony Hasler http://tonyhasler.wordpress.com

What join order do we want?

Tony Hasler http://tonyhasler.wordpress.com

Product (13)

Sales (116,267 /918,843 )

Customers (115)

Product (115)

Sales (740 /918,843 )

Customers (80)

Customers (80)

Product (1,040)

Sales (115 /918,843 )

They are all bad!

Wouldn’t it be nice if….

Tony Hasler http://tonyhasler.wordpress.com

Product (13)

SALES_PROD_IDX (116,267 /918,843 )

Customers (115)

SALES_CUST_IDX (740 /918,843 )

Merged bitmap (115/918,843)

SALES (115/918,843 )

Rewrite attempt 1:

SELECT time_id, amount_sold FROM sales WHERE cust_id = 4117 AND prod_id = 20;

This query returns too few rows because we are only picking one customer and one product

This query returns too few columns because we don’t have the details from the dimension tables.

Tony Hasler http://tonyhasler.wordpress.com

Enterprise edition execution plan

-------------------------------------------------------------| Id | Operation | Name |-------------------------------------------------------------| 0 | SELECT STATEMENT | || 1 | PARTITION RANGE ALL | || 2 | TABLE ACCESS BY LOCAL INDEX ROWID| SALES || 3 | BITMAP CONVERSION TO ROWIDS | || 4 | BITMAP AND | ||* 5 | BITMAP INDEX SINGLE VALUE | SALES_CUST_BIX ||* 6 | BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX |-------------------------------------------------------------

Here we see selection filters being combined in a straightforward way.

Tony Hasler http://tonyhasler.wordpress.com

Rewrite attempt 2:

SELECT time_id, amount_sold FROM sales s WHERE s.cust_id IN (SELECT cust_id FROM customers WHERE cust_last_name = 'Everett') AND s.prod_id IN (SELECT prod_id FROM products WHERE prod_category = 'Electronics');

This query returns fewer columns than the original query because the columns from the dimension tables are missing

This query returns the same number of rows as the original query providing that CUST_ID is a unique key in CUSTOMERS and PROD_ID is unique within PRODUCTS.

Tony Hasler http://tonyhasler.wordpress.com

Enterprise edition execution plan---------------------------------------------------------------| Id | Operation | Name |---------------------------------------------------------------| 0 | SELECT STATEMENT | || 1 | VIEW | VW_ST_EABF22F5 || 2 | NESTED LOOPS | || 3 | PARTITION RANGE ALL | || 4 | BITMAP CONVERSION TO ROWIDS| || 5 | BITMAP AND | || 6 | BITMAP MERGE | || 7 | BITMAP KEY ITERATION | || 8 | BUFFER SORT | ||* 9 | TABLE ACCESS FULL | CUSTOMERS ||* 10 | BITMAP INDEX RANGE SCAN| SALES_CUST_BIX || 11 | BITMAP MERGE | || 12 | BITMAP KEY ITERATION | || 13 | BUFFER SORT | ||* 14 | VIEW | index$_join$_051 ||* 15 | HASH JOIN | ||* 16 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX || 17 | INDEX FAST FULL SCAN| PRODUCTS_PK ||* 18 | BITMAP INDEX RANGE SCAN| SALES_PROD_BIX || 19 | TABLE ACCESS BY USER ROWID | SALES |---------------------------------------------------------------

Tony Hasler http://tonyhasler.wordpress.com

Star transformations make an appearance Multi-column indexes, and the INDEX_COMBINE (and the

deprecated AND_EQUAL) hints can be used to deal with combined selection filters

Star transformations are used to solve the distributed filter problem when at least one of the filter predicates is a join condition with another table

Requires enterprise edition and the initialisation parameter set to TRUE or TEMP_DISABLE

They are frequently used with “star” or “snowflake” schemas, hence the name, but can be used elsewhere

BITMAP KEY ITERATION is the signature of a star transformation

BITMAP KEY ITERATION runs a loop performing lookups on indexes (usually but not necessarily bitmap indexes)

The BITMAP MERGE operation merges the output of the loop

In the example, the BITMAP AND operation combines the results of two BITMAP KEY ITERATION operations

Tony Hasler http://tonyhasler.wordpress.com

Rewrite 3: Getting our missing columns back

WITH q1 AS (SELECT /*+ no_merge */ time_id, amount_sold, prod_id, cust_id FROM sales s WHERE s.cust_id IN (SELECT cust_id FROM customers WHERE cust_last_name = 'Everett') AND s.prod_id IN (SELECT prod_id FROM products WHERE prod_category = 'Electronics'))SELECT /*+ leading(p s c) use_hash(c) use_hash(p) swap_join_inputs(c) no_swap_join_inputs(s) */ prod_name,cust_first_name,time_id,amount_sold FROM customers c, products p, q1 s WHERE s.cust_id = c.cust_id AND s.prod_id = p.prod_id AND p.prod_category = 'Electronics';

Note: The hints and the last predicate are for demonstration purposes only

Tony Hasler http://tonyhasler.wordpress.com

Execution plan for rewrite 3:------------------------------------------------------------------| Id | Operation | Name |------------------------------------------------------------------| 0 | SELECT STATEMENT | ||* 1 | HASH JOIN | || 2 | TABLE ACCESS FULL | CUSTOMERS ||* 3 | HASH JOIN | || 4 | TABLE ACCESS BY INDEX ROWID | PRODUCTS ||* 5 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX || 6 | VIEW | || 7 | VIEW | VW_ST_EABF22F5 || 8 | NESTED LOOPS | || 9 | PARTITION RANGE ALL | || 10 | BITMAP CONVERSION TO ROWIDS| || 11 | BITMAP AND | || 12 | BITMAP MERGE | || 13 | BITMAP KEY ITERATION | || 14 | BUFFER SORT | ||* 15 | TABLE ACCESS FULL | CUSTOMERS ||* 16 | BITMAP INDEX RANGE SCAN| SALES_CUST_BIX || 17 | BITMAP MERGE | || 18 | BITMAP KEY ITERATION | || 19 | BUFFER SORT | ||* 20 | VIEW | index$_join$_060 ||* 21 | HASH JOIN | ||* 22 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX || 23 | INDEX FAST FULL SCAN| PRODUCTS_PK ||* 24 | BITMAP INDEX RANGE SCAN| SALES_PROD_BIX || 25 | TABLE ACCESS BY USER ROWID | SALES |------------------------------------------------------------------

Tony Hasler http://tonyhasler.wordpress.com

Same as rewrite 2

Thoughts so far?

Do we really need to rewrite our simple query in such a complex way?

Answer: If you set STAR_TRANSFORMATION_ENABLED=TEMP_DISABLE then the un-hinted execution plan for the original query is (almost) identical to that of rewrite 3!!!

But why do we need to access the CUSTOMERS table twice? Couldn’t we cache the columns from the select list first time round?

Answer: You will cache if STAR_TRANSFORMATION_ENABLED=TRUE

Tony Hasler http://tonyhasler.wordpress.com

Execution plan for original query when STAR_TRANSFORMATION =TRUE-----------------------------------------------------------------------| Id | Operation | Name |-----------------------------------------------------------------------| 0 | SELECT STATEMENT | || 1 | TEMP TABLE TRANSFORMATION | || 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6609_38DC9C ||* 3 | TABLE ACCESS FULL | CUSTOMERS ||* 4 | HASH JOIN | || 5 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6609_38DC9C ||* 6 | HASH JOIN | || 7 | TABLE ACCESS BY INDEX ROWID | PRODUCTS ||* 8 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX || 9 | VIEW | VW_ST_62EEF96F || 10 | NESTED LOOPS | || 11 | PARTITION RANGE ALL | || 12 | BITMAP CONVERSION TO ROWIDS| || 13 | BITMAP AND | || 14 | BITMAP MERGE | || 15 | BITMAP KEY ITERATION | || 16 | BUFFER SORT | || 17 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6609_38DC9C ||* 18 | BITMAP INDEX RANGE SCAN| SALES_CUST_BIX || 19 | BITMAP MERGE | || 20 | BITMAP KEY ITERATION | || 21 | BUFFER SORT | ||* 22 | VIEW | index$_join$_051 ||* 23 | HASH JOIN | ||* 24 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX || 25 | INDEX FAST FULL SCAN| PRODUCTS_PK ||* 26 | BITMAP INDEX RANGE SCAN| SALES_PROD_BIX || 27 | TABLE ACCESS BY USER ROWID | SALES |-----------------------------------------------------------------------

Tony Hasler http://tonyhasler.wordpress.com

So is that it?

Star transformations are only a good idea if the selectivity improvement outweighs the cost of the bitmap operations.

Star transformations may occasionally be used in mixed OLTP environments by converting b-tree indexes to bitmaps

But execution plans using star transformations in a data warehouse can almost always be optimized further:−Denormalize: e.g. adding columns from CUSTOMERS and

PRODUCTS to the SALES table. You can then create multi-column indexes or use INDEX_COMBINE operations

−Or if you don’t like wasting disk space….

Tony Hasler http://tonyhasler.wordpress.com

Bitmap join indexes

ALTER TABLE products MODIFY CONSTRAINT products_pk VALIDATE;

ALTER TABLE customers MODIFY CONSTRAINT customers_pk VALIDATE;

CREATE BITMAP INDEX sales_prod_category_bjx ON sales (p.prod_category) FROM products p, sales s WHERE s.prod_id = p.prod_id LOCAL; CREATE BITMAP INDEX sales_cust_ln_bjx ON sales (c.cust_last_name) FROM customers c, sales s WHERE c.cust_id = s.cust_id LOCAL;

Tony Hasler http://tonyhasler.wordpress.com

Execution plan with bitmap join indexes

-------------------------------------------------------------------------| Id | Operation | Name |-------------------------------------------------------------------------| 0 | SELECT STATEMENT | || 1 | NESTED LOOPS | || 2 | NESTED LOOPS | || 3 | HASH JOIN | || 4 | TABLE ACCESS BY INDEX ROWID | PRODUCTS || 5 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX || 6 | PARTITION RANGE ALL | || 7 | TABLE ACCESS BY LOCAL INDEX ROWID| SALES || 8 | BITMAP CONVERSION TO ROWIDS | || 9 | BITMAP AND | || 10 | BITMAP INDEX SINGLE VALUE | SALES_CUST_LN_BJX || 11 | BITMAP INDEX SINGLE VALUE | SALES_PROD_CATEGORY_BJX || 12 | INDEX UNIQUE SCAN | CUSTOMERS_PK || 13 | TABLE ACCESS BY INDEX ROWID | CUSTOMERS |-------------------------------------------------------------------------

Tony Hasler http://tonyhasler.wordpress.com

What about standard edition users????

CREATE TABLE sales2AS SELECT * FROM sales;

CREATE INDEX sales2_cust_id_idx ON sales2 (cust_id);

CREATE INDEX sales2_prod_id_idx ON sales2 (prod_id);

Tony Hasler http://tonyhasler.wordpress.com

Standard edition alternative

WITH q1 AS (SELECT /*+ no_merge */ s.ROWID rid FROM customers c, sales2 s WHERE s.cust_id = c.cust_id AND c.cust_last_name = 'Everett') ,q2 AS (SELECT /*+ no_merge */ s.ROWID rid FROM products p, sales2 s WHERE s.prod_id = p.prod_id AND prod_category = 'Electronics')SELECT /*+ no_star_transformation leading(q1 q2 s)

use_nl(s) use_nl(p) use_nl(c) */ prod_name,cust_first_name,time_id,amount_sold FROM q1,q2,sales2 s,products p,customers c WHERE q1.rid = q2.rid AND q1.rid = s.ROWID AND s.cust_id = c.cust_id AND s.prod_id = p.prod_id ;

Tony Hasler http://tonyhasler.wordpress.com

Standard edition execution plan (simulated)---------------------------------------------------------------| Id | Operation | Name |---------------------------------------------------------------| 0 | SELECT STATEMENT | || 1 | NESTED LOOPS | || 2 | NESTED LOOPS | || 3 | NESTED LOOPS | || 4 | NESTED LOOPS | ||* 5 | HASH JOIN | || 6 | VIEW | || 7 | NESTED LOOPS | ||* 8 | TABLE ACCESS FULL | CUSTOMERS ||* 9 | INDEX RANGE SCAN | SALES2_CUST_ID_IDX || 10 | VIEW | || 11 | NESTED LOOPS | ||* 12 | VIEW | index$_join$_003 ||* 13 | HASH JOIN | ||* 14 | INDEX RANGE SCAN | PRODUCTS_PROD_CAT_IX || 15 | INDEX FAST FULL SCAN | PRODUCTS_PK ||* 16 | INDEX RANGE SCAN | SALES2_PROD_ID_IDX || 17 | TABLE ACCESS BY USER ROWID| SALES2 ||* 18 | TABLE ACCESS BY INDEX ROWID| PRODUCTS ||* 19 | INDEX UNIQUE SCAN | PRODUCTS_PK ||* 20 | INDEX UNIQUE SCAN | CUSTOMERS_PK || 21 | TABLE ACCESS BY INDEX ROWID | CUSTOMERS |---------------------------------------------------------------

Tony Hasler http://tonyhasler.wordpress.com

Some final pieces of trivia.. The deprecated STAR hint has nothing to do with star

transformations….DON’T USE IT. STAR_TRANSFORMATION and

NO_STAR_TRANSFORMATION hints can be used to control star transformations.

The FACT hint can be used to specify the table to which the star transformation is applied (although it is usually obvious).

The fact table can also be supplied as an argument to the STAR_TRANSFORMATION hint (undocumented, unsupported etc.)

Enabled integrity constraints are not strictly required on the dimension table(s). However, if they are absent an extra join may be seen to ensure the row count is accurate.

Enabled and validated Integrity constraints are required on dimension tables used in bitmap join indexes.

Bitmaps from BITMAP KEY ITERATION operations can be combined with any other bitmaps including those from bitmap join indexes

Tony Hasler http://tonyhasler.wordpress.com

Summary Star transformations are used to solve the problem of

distributed filters when at least one of the filters is a join predicate

Star transformations can be recognised by the presence of the BITMAP KEY ITERATION operation in an execution plan.

Star transformations are usually found using bitmap indexes in “star” schemas in a data warehouse but can occasionally be useful in mixed OLTP environments with traditional b-tree indexes and in non-star schemas.

Star transformations, like all bitmap combination operations, relies on the fact that the cost of the operation is outweighed by increased selectivity.

STAR_TRANSFORMATION_ENABLED=TEMP_DISABLE is to avoid bugs but should rarely be needed these days.

Performance of queries using star transformations in a data warehouse can almost always be improved by the use of bitmap join indexes.

Tony Hasler http://tonyhasler.wordpress.com

Questions

Tony Hasler http://tonyhasler.wordpress.com