Excessive Temp Space Usages From Parallel Operations
The Sources of Temp Space Usages
• SORT ORDER BY (PGA)• SORT GROUP BY (PGA)• HASH GROUP BY (PGA)• WINDOW SORT (Analytic Function) (PGA)• HASH JOIN (PGA and join order)• HASH JOIN BUFFERED (PX related, need more
research)• BUFFER SORT (PX related, excessive)• PX SEND BROADCAST (PX distribute
BROADCAST, excessive)
How to Identify SQLs with Temp Space Issue
• Use view V$SQL (or AWR DBA_HIST_SQLSTAT).
• Check column direct_writes and compare the value with disk_reads.
• If the value is significant, and the query is not related to direct load, it is highly possible that we have high temp space usages.
V$SQL Example (UAD)
SQL_ID 6jbvpvurr02rh
ELAPSED TIME (SEC)
4129
IO WAIT TIME (SEC)
2494
DISK_READS 2,545,487
DIRECT_WRITES 5,481,060
Note: The reason DIRECT_WRITES is much greater than DISK_READS is that the query was still writing the data to temp space and yet to read when v$sql was checked.
Locate the Source of Temp Space Usages
• For 11g, try v$sql_plan_monitor, column workarea_max_tempseg
• For 11g and 10g, try v$sql_workarea_active, column tempseg_size
• Any significant value from above metrics will tell the execution steps with large temp space usages.
Example to Use V$SQL_PLAN_MONITOR
SQL_ID 6jbvpvurr02rh
SQL_EXEC_ID 16777217
PLAN ID 17
PLAN PARENT ID 12
OPERATION HASH JOIN
READ REQUESTS 0
WRITE REQUESTS 365,794
TEMP SPACE (MB) 45,732
Note: The reason read requests (PHYSICAL_READ_REQUESTS) is 0 is that the query was still building the first hash table from the first row source.
Example to Use V$SQL_WORKAREA_ACTIVE
Operation Plan Id SID Temp Space (MB)
HASH JOIN 17 1042 11,435
HASH JOIN 17 1107 11,433
HASH JOIN 17 1156 11,432
HASH JOIN 17 1223 11,432
SQL_ID: 6jbvpvurr02rh
Analyze The Plan
1. The temp space usage is from plan Id 17: HASH JOIN2. Since temp space is used, the first row source (Id 19 – 35) must be very large. 3. There is “PX SEND BROADCAST” for the first row source. It will amplify the temp
space usages by the magnitude of DOP, in this case, DOP = 4.4. When the row source of a HASH JOIN is already very large, BROADCAST PX
distribute will make the join much harder.
Using Realtime Monitor (V$SQL_PLAN_MONITOR)
1. Up to plan step 20, the first row source has generated 112,679,920 rows. The plan step 19 “PX SEND BROADCAST” amplified it to 450,719,680 rows. It definitely made the join much harder.
2. BROADCAST is supposed to be used for small row source distribution, that is how Oracle estimated for this query: 10421 rows for the first row source. Since Oracle estimate the second row source with 2.9M records, Oracle thought this join order was better.
The Root Cause1. The bad temp space usages with BROADCAST PX distribution is usually
the result of bad cardinality estimates of the first row source.2. The root cause is either the inaccuracy of table stats or Oracle’s
incapability to estimate JOIN cardinality. 3. For this case, both are to be blamed:
• The fact table involved does not have global stats.• There is no explicit partition range for Oracle to use partition level stats.• Multi column range partition scheme makes cardinality, join estimate and
partition pruning complicated.• BLOOM filter is disabled on UAD DB which makes partition pruning by join
almost impossible.4. The work around is to add two hints
• Dynamic sample hint: dynamic_sampling(2), note no table alias is used, so it will be applied to all tables involved. The purpose is to have better cardinality estimate.
• OPT_PARAM('_bloom_filter_enabled' 'true') to enable bloom filter for join related partition pruning.
PX BUFFER SORT Example
1. BUFFER SORT in PX is the result of that the operations on one row source/table is not parallelized, while the whole query runs in parallel. The BUFFER SORT operation happens when the query switches from serial operation to parallel operation. The temp space usage can be identified, using v$sql_workarea_active or v$sql_plan_monitor, or by researching the plan self if the query has completed long time ago.
2. In above case (DIRECT MARKETING, SEM), the query run with DOP 32, but the operation on the major row source, the fact table AGG_BY_SPACEID_KWOID_7D, was serial operation.
The Impact of BUFFER SORT
• If the BUFFER SORT is on the major row source and results significant temp space usages, it basically triples the IO requests (with additional one round of write and read)
• The more interesting thing is, the whole query runs in parallel, even with very high DOP, but the slowest operation to read a very large table runs in serial. This is basically PX resource waste.
• The work around is, to identify the operations running in serial (inside plan, those operations have column TQ and IN-OUT empty) and see if parallel hints can be added to appropriate tables, it will not only make PX operation more efficient, also reduce temp space usages.