23
1 Chapter 10 Joins and Subqueries

1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based

Embed Size (px)

Citation preview

Page 1: 1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based

1

Chapter 10Joins and Subqueries

Page 2: 1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based

2

Joins & Subqueries

Joins– Methods to combine data from multiple tables– Optimizer information can be limited based on

Algorithms used Knowledge of data

Subqueries– Complex by nature– Difficult for Optimizer to determine best plan

Page 3: 1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based

3

Types of Joins

Equi-join (equality condition – i.e. “-”) Non-equi or Theta (non-equality – e.g. “<>”, between) Cross (Cartesian – i.e. no join condition) Outer (joining data not matching in other table)

– Left– Right– Full

Self (joining table to itself) Hierarchical (type of self-join) Anti (rows from one table without match from other) Semi (only one row from matching table returned)

Page 4: 1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based

4

Join Methods

Nested Loops– Performing search of inner table for each row found in outer

table– Optimizer will choose only if index exists on inner table– Nested table scan – scan of entire inner table for each outer

table row if no index on inner table– Generally least effective join method

Sort-Merge– Each table sorted by value of the join columns– After sort, data merged– Best when

Large amount of data needed No index on inner table

Page 5: 1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based

5

Join Methods (cont.)

Hash – Hash table built for one of the tables– Hash table used to find matching rows in other table– Also good for large amounts of data– Can be similar in performance to sort-merge

Page 6: 1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based

6

Choosing Join Method

See Table 10-1 (p. 296) Sort-Merge/Hash vs. Nested Loops

– Nested Loops Better response time Smaller amounts of data Indexes needed

– Sort-Merge/Hash Better throughput Larger amounts of data More memory needed for sorting or building hash table Better with parallel operations (especially Hash)

Page 7: 1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based

7

Choosing Join Method (cont.)

Sort-Merge vs. Hash– Hash

Generally performs better than Sort-Merge Only applicable to equi-joins Only has to process all of one table (creating the hash table)

– Sort-Merge Applies to more situations than Hash Applies to equi and non-equi joins Both tables processed (sorted) More memory and CPU generally needed Outperforms Hash if data is pre-sorted

Page 8: 1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based

8

Choosing Join Method (cont.)

Sort-Merge vs. Hash– Hash

Generally performs better than Sort-Merge Only applicable to equi-joins Only has to process all of one table (creating the hash table)

– Sort-Merge Applies to more situations than Hash Applies to equi and non-equi joins Both tables processed (sorted) More memory and CPU generally needed Outperforms Hash if data is pre-sorted

Page 9: 1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based

9

Choosing Join Method (cont.)

When Joining A to B– Both are small– Small subset from B– Want first rows quickly– Want all rows quickly– FTS of A / parallelism– Limited memory

NL– Depends– Yes– Yes– Depends– Yes, if..– Yes

SM/Hash– Yes– No– No– Depends– Yes– Maybe not

Page 10: 1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based

10

Optimizing Nested Loops Joins

Nested Loops– Ensure index is on inner table– Join column is selective(low cardinality)

Sort-Merge & Hash– Needs enough memory in PGA to perform well– Best if entire structure constructed in memory

Avoid “multi-pass” operations to disk

– Sort-Merge is the most resource intensive Two sorted tables Merge operation

Page 11: 1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based

11

Avoiding Joins

Maintaining denormalized data from one table to another

– Requires application process to copy data– Data integrity needs to be carefully maintained

Storing tables in index cluster– Reduces IO by combining into single segment– SIZE parameter must be set appropriately– FTS operations still slow– Rarely Used

Creating Materialized Views Create bitmap join index

Page 12: 1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based

12

Avoiding Joins (cont.)

Creating Materialized Views– Allows transparent query rewrite– Keeps transaction data in log tables– Avoid join overhead for frequently used queries

Create bitmap join index– Efficient method of matching values between indexes– Higher frequency of locking can occur

Page 13: 1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based

13

Join Order

Optimizer calculates join possibilities– Factorial of number of tables being joined– Only two tables joined in single operation– Temporary result sets created for three or more tables– Let optimizer decide join order, but..

Ensure statistics are current Create histograms where appropriate

Page 14: 1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based

14

Join Order (cont.)

If you don’t trust the optimizer– The driving table (first table in join)

Should be most selective Should have most efficient WHERE clause

– Eliminate rows from final result set as early as possible during join operations

Try to process filtering conditions early on in the join

– For small tables with indexes Use nested loops join Ensure all columns of WHERE clause are indexed

Page 15: 1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based

15

Outer Joins

Rows returned from one table in a join, even if there is no matching rows in the other table

Three types– Left Outer Join (rows missing from one table) – Right Outer Join (rows missing from one table)– Full Outer join (shows rows missing from both tables)

Optimizer joins table with missing rows last Specified with

– Proprietary oracle syntax (+)– ANSI syntax (e.g. LEFT OUTER JOIN, etc.)

Inner Join– Shows only matching rows from both tables– This is the “default”

Page 16: 1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based

16

Star Joins

Common in the data warehouse Star schema consists of

– Large Fact table containing detailed rows and foreign keys– Dimension tables categorizes fact items (e.g. time, product, etc.)

Oracle’s default approach is to:– Query all dimensions to retrieve foreign key values– Merge dimension result sets using Cartesian join– Resulting foreign keys used to identify fact table rows

Requires many concatenated indexes

Page 17: 1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based

17

Star Transformation

Cartesian join approach has drawbacks– Assumes small dimension tables, which may not be true– Concatenated index requirements across all dimension keys may

not be practical Oracle created “Star Transformation” optimization

– Uses bitmap indexes on fact table– Requires setting parameter

STAR_TRANSFORMATION_ENABLED=TRUE– Also can use OPT_PARAM hint– Can validate star transformation via the execution plan– Easier to configure and manage– Supports widest range of possible WHERE clause conditions– Possible lock overhead with bitmap indexes still applies

Page 18: 1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based

18

Hierarchical Joins

Special case of self-join Column in table points to the primary key of

another row in the same table Next row points to a further row and so on Cascading effect Avoid indexes in execution plan

Page 19: 1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based

19

Subqueries

Is a SELECT statement contained within another SQL Statement

Types include– Simple– Correlated– Anti-join– Semi-join

Page 20: 1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based

20

Simple Subqueries

Inner query makes no reference to parent query Example to find employees with lowest salary

SELECT COUNT(*)

FROM employees

WHERE salary = (SELECT MIN (salary) FROM employees);

Each query can and should be tuned independently Generally use more resources than running queries

separately within a program

Page 21: 1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based

21

Correlated Subqueries

Subquery refers to values in the parent query Subquery is logically executed once for each row

returned by the parent query Usually accomplished via a join method

SELECT employee_id, first_name, last_name, salaryFROM employees aWHERE salary = (SELECT MIN (salary)

FROM employees b WHERE b.department_id =

a.department_id);

Can generate inefficient plans Consider rewriting as joins or using analytic functions

Page 22: 1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based

22

Anti-join Subqueries

As named, is the opposite of a join– Returns rows in one table that do not match rows from another– Expressed with ‘NOT IN’ or ‘NOT EXISTS’ subquery– Example: Google customers who are not Microsoft customers

SELECT COUNT(*)FROM google_customersWHERE (cust_first_name, cust_last_name)NOT IN (SELECT cust_first_name, cust_last_name)

FROM microsoft_customers)

Optimizer generally uses HASH JOIN ANTI method May be beneficial to add index to subquery table Avoid NOT IN unless join keys are NOT NULL

Page 23: 1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based

23

Semi-join Subqueries

Expressed as ‘WHERE IN’ or ‘WHERE EXISTS’ subquery

SELECT COUNT(*)

FROM google_customers

WHERE (cust_first_name, cust_last_name)

IN (SELECT cust_first_name, cust_last_name)

FROM microsoft_customers)

Returns rows from first table only once– Even if more than one matching rows in second table