56
Zohar Elkayam www.realdbamagic.com Twitter: @realmgic Exploring Advanced SQL Techniques Using Analytic Functions

Exploring Advanced SQL Techniques Using Analytic Functions

Embed Size (px)

Citation preview

Page 1: Exploring Advanced SQL Techniques Using Analytic Functions

Zohar Elkayam www.realdbamagic.com

Twitter: @realmgic

Exploring Advanced SQL Techniques Using Analytic Functions

Page 2: Exploring Advanced SQL Techniques Using Analytic Functions

Who am I?• Zohar Elkayam, CTO at Brillix

• DBA, team leader, database trainer, public speaker, and a senior consultant for over 18 years

• Oracle ACE Associate • Involved with Big Data projects since 2011

• Blogger – www.realdbamagic.com and www.ilDBA.co.il

2

Page 3: Exploring Advanced SQL Techniques Using Analytic Functions

About Brillix• Brillix is a leading company that specialized in Data

Management

• We provide consulting, training, and professional services for various Databases, Security, NoSQL, and Big Data solutions

• Providing the Brillix Big Data Experience Center

3

Page 4: Exploring Advanced SQL Techniques Using Analytic Functions

Agenda: Advanced SQL• “Basic” aggregation: Rollup, Cube, and Grouping Sets• Analytic functions

• Reporting Functions• Ranking Functions• Inter-row Functions• Using the Window clause

• Oracle 12c new features overview• Top-N queries• Pattern matching

4

Page 5: Exploring Advanced SQL Techniques Using Analytic Functions

Advanced Aggregation FunctionsMore than just group by…

5

Page 6: Exploring Advanced SQL Techniques Using Analytic Functions

Basics• Group functions will return a single row for each group

of rows• We can run group functions only when we group the rest

of the columns together using GROUP BY clause• Common group functions: SUM, MIN, MAX, AVG, etc.• We can filter out rows after aggregation, if we use the

HAVING clause

6

Page 7: Exploring Advanced SQL Techniques Using Analytic Functions

GROUP BY With the ROLLUP and CUBE Operators• Use ROLLUP or CUBE with GROUP BY to produce super

aggregate rows by cross-referencing columns• ROLLUP grouping produces a result set containing the

regular grouped rows and the subtotal and grand total values

• CUBE grouping produces a result set containing the rows from ROLLUP and cross-tabulation rows

7

Page 8: Exploring Advanced SQL Techniques Using Analytic Functions

Using the ROLLUP Operator• ROLLUP is an extension of the GROUP BY clause• Use the ROLLUP operation to produce cumulative

aggregates, such as subtotals

SELECT [column,] group_function(column). . .FROM table[WHERE condition][GROUP BY [ROLLUP] group_by_expression][HAVING having_expression];[ORDER BY column];

8

Page 9: Exploring Advanced SQL Techniques Using Analytic Functions

Using the ROLLUP Operator: ExampleSELECT department_id, job_id, SUM(salary)FROM hr.employees WHERE department_id < 60GROUP BY ROLLUP(department_id, job_id);

1

2

3

Total by DEPARTMENT_ID and JOB_ID

Total by DEPARTMENT_ID

Grand total

9

Page 10: Exploring Advanced SQL Techniques Using Analytic Functions

Using the CUBE Operator• CUBE is an extension of the GROUP BY clause• You can use the CUBE operator to produce cross-

tabulation values with a single SELECT statement

SELECT [column,] group_function(column)...FROM table[WHERE condition][GROUP BY [CUBE] group_by_expression][HAVING having_expression][ORDER BY column];

10

Page 11: Exploring Advanced SQL Techniques Using Analytic Functions

1

2

3

4

Grand total

Total by JOB_ID

Total by DEPARTMENT_ID and JOB_ID

Total by DEPARTMENT_ID

SELECT department_id, job_id, SUM(salary)FROM hr.employees WHERE department_id < 60GROUP BY CUBE (department_id, job_id);

. . .

Using the CUBE Operator: Example

11

Page 12: Exploring Advanced SQL Techniques Using Analytic Functions

GROUPING SETS• The GROUPING SETS syntax is used to define multiple

groupings in the same query• All groupings specified in the GROUPING SETS clause are

computed and the results of individual groupings are combined with a UNION ALL operation

• Grouping set efficiency:• Only one pass over the base table is required• There is no need to write complex UNION statements• The more elements GROUPING SETS has, the greater the

performance benefit

12

Page 13: Exploring Advanced SQL Techniques Using Analytic Functions

SELECT department_id, job_id, manager_id, AVG(salary)FROM hr.employeesGROUP BY GROUPING SETS

((department_id,job_id), (job_id,manager_id));

GROUPING SETS: Example

. . .

. . .

1

2

13

Page 14: Exploring Advanced SQL Techniques Using Analytic Functions

Composite Columns• A composite column is a collection of columns that are

treated as a unit.ROLLUP (a,(b,c), d)

• Use parentheses within the GROUP BY clause to group columns, so that they are treated as a unit while computing ROLLUP or CUBE operators.

• When used with ROLLUP or CUBE, composite columns require skipping aggregation across certain levels.

14

Page 15: Exploring Advanced SQL Techniques Using Analytic Functions

SELECT department_id, job_id, manager_id, SUM(salary)FROM hr.employees

GROUP BY ROLLUP( department_id,(job_id, manager_id));

Composite Columns: Example

1

2

3

4

15

Page 16: Exploring Advanced SQL Techniques Using Analytic Functions

Analytic FunctionsLet’s analyze our data!

Page 17: Exploring Advanced SQL Techniques Using Analytic Functions

Overview of SQL for Analysis and Reporting• Oracle has enhanced SQL's analytical processing capabilities

by introducing a family of analytic SQL functions• These analytic functions enable you to calculate and perform:

• Reporting operations• Rankings and percentiles• Moving window calculations• Inter-row calculations (LAG/LEAD, FIRST/LAST etc.)• Pivoting operations (11g)• Pattern matching (12c)• Linear regression and predictions

17

Page 18: Exploring Advanced SQL Techniques Using Analytic Functions

Why Use Analytic Functions?• Ability to see one row from another row in the results• Avoid self-join queries• Summary data in detail rows• Slice and dice within the results• Performance improvement, in some cases

18

Page 19: Exploring Advanced SQL Techniques Using Analytic Functions

Concepts Used in Analytic Functions • Result set partitions: These are created and available to

any aggregate results such as sums and averages. The term “partitions” is unrelated to the table partitions feature.

• Window: For each row in a partition, you can define a sliding window of data, which determines the range of rows used to perform the calculations for the current row.

• Current row: Each calculation performed with an analytic function is based on a current row within a partition. It serves as the reference point determining the start and end of the window.

19

Page 20: Exploring Advanced SQL Techniques Using Analytic Functions

Reporting Functions• We can use aggregative functions as analytic functions

(i.e. SUM, AVG, MIN, MAX, COUNT etc.)• Each row will get the aggregative value for a given

partition without the need for group by clause so we can have multiple group by’s on the same row

• Getting the raw data along with the aggregated value• Use Order By to get cumulative aggrigations

20

Page 21: Exploring Advanced SQL Techniques Using Analytic Functions

Report Functions

21

SELECT last_name, salary, department_id, ROUND(AVG(salary) OVER (PARTITION BY department_id),2) A, COUNT(*) OVER (PARTITION BY manager_id) B, SUM(salary) OVER (PARTITION BY department_id ORDER BY salary) C, MAX(salary) OVER () DFROM hr.employees;

Page 22: Exploring Advanced SQL Techniques Using Analytic Functions

Ranking Functions

22

Page 23: Exploring Advanced SQL Techniques Using Analytic Functions

Using the Ranking Functions• A ranking function computes the rank of a record

compared to other records in the data set based on the values of a set of measures. The types of ranking function are:

• RANK and DENSE_RANK functions• ROW_NUMBER function• PERCENT_RANK function• NTILE function

23

Page 24: Exploring Advanced SQL Techniques Using Analytic Functions

Working with the RANK Function

• The RANK function calculates the rank of a value in a group of values, which is useful for top-N and bottom-N reporting.

• When using the RANK function, ascending is the default sort order, which you can change to descending.

• Rows with equal values for the ranking criteria receive the same rank.

• Oracle Database then adds the number of tied rows to the tied rank to calculate the next rank.

RANK ( ) OVER ( [query_partition_clause] order_by_clause )

24

Page 25: Exploring Advanced SQL Techniques Using Analytic Functions

Using the RANK Function: ExampleSELECT department_id, last_name, salary, RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) "Rank"FROM employees WHERE department_id = 60ORDER BY department_id, "Rank", salary;

25

Page 26: Exploring Advanced SQL Techniques Using Analytic Functions

RANK and DENSE_RANK Functions: Example

SELECT department_id, last_name, salary, RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) "Rank", DENSE_RANK() over (partition by department_id ORDER BY salary DESC) "Drank"FROM employees WHERE department_id = 60ORDER BY department_id, last_name, salary DESC, "Rank"

DESC;

DENSE_RANK ( ) OVER ([query_partition_clause] order_by_clause)

26

Page 27: Exploring Advanced SQL Techniques Using Analytic Functions

Working with the ROW_NUMBER Function

• The ROW_NUMBER function calculates a sequential number of a value in a group of values.

• When using the ROW_NUMBER function, ascending is the default sort order, which you can change to descending.

• Rows with equal values for the ranking criteria receive a different number.

ROW_NUMBER ( ) OVER ( [query_partition_clause] order_by_clause )

27

Page 28: Exploring Advanced SQL Techniques Using Analytic Functions

ROW_NUMBER vs. ROWNUM• ROWNUM is a pseudo column, ROW_NUMBER is an

actual function• It is calculated when the result returns to the client • ROWNUM requires sorting of the entire dataset in order

to return an ordered list• ROW_NUMBER will only sort the required rows thus

giving better performance

28

Page 29: Exploring Advanced SQL Techniques Using Analytic Functions

Using the PERCENT_RANK Function• Uses rank values in its numerator and returns the percent

rank of a value relative to a group of values• PERCENT_RANK of a row is calculated as follows:

• The range of values returned by PERCENT_RANK is 0 to 1, inclusive. The first row in any set has a PERCENT_RANK of 0. The return value is NUMBER. Its syntax is:

(rank of row in its partition - 1) / (number of rows in the partition - 1)

PERCENT_RANK () OVER ([query_partition_clause] order_by_clause)

29

Page 30: Exploring Advanced SQL Techniques Using Analytic Functions

Using the PERCENT_RANK Function: Example

SELECT department_id, last_name, salary, PERCENT_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC)

AS prFROM hr.employeesORDER BY department_id, pr, salary;

. . .30

Page 31: Exploring Advanced SQL Techniques Using Analytic Functions

Working with the NTILE Function

• Not really a rank function• Divides an ordered data set into a number of buckets

indicated by expr and assigns the appropriate bucket number to each row

• The buckets are numbered 1 through expr

NTILE ( expr ) OVER ([query_partition_clause] order_by_clause)

31

Page 32: Exploring Advanced SQL Techniques Using Analytic Functions

Summary of Ranking Functions• Different ranking functions may return different results

if the data has tiesSELECT last_name, salary, department_id, ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) A, RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) B, DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) C, PERCENT_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) D, NTILE(4) OVER (PARTITION BY department_id ORDER BY salary DESC) EFROM hr.employees;

32

Page 33: Exploring Advanced SQL Techniques Using Analytic Functions

Inter-row Analytic Functions

33

Page 34: Exploring Advanced SQL Techniques Using Analytic Functions

Using the LAG and LEAD Analytic Functions

• LAG provides access to more than one row of a table at the same time without a self-join.

• Given a series of rows returned from a query and a position of the cursor, LAG provides access to a row at a given physical offset before that position.

• If you do not specify the offset, its default is 1. • If the offset goes beyond the scope of the window, the optional default

value is returned. If you do not specify the default, its value is NULL.

{LAG | LEAD}(value_expr [, offset ] [, default ]) OVER ([ query_partition_clause ] order_by_clause)

34

Page 35: Exploring Advanced SQL Techniques Using Analytic Functions

Using the LAG and LEAD Analytic Functions: Example

SELECT time_id, TO_CHAR(SUM(amount_sold),'9,999,999') AS SALES,TO_CHAR(LAG(SUM(amount_sold),1) OVER (ORDER BY

time_id),'9,999,999') AS LAG1,TO_CHAR(LEAD(SUM(amount_sold),1) OVER (ORDER BY

time_id),'9,999,999') AS LEAD1FROM salesWHERE time_id >= TO_DATE('10-OCT-2000') AND time_id <= TO_DATE('14-OCT-2000')GROUP BY time_id;

35

Page 36: Exploring Advanced SQL Techniques Using Analytic Functions

Using FIRST_VALUE/LAST_VALUE• Returns the first/last value in an ordered set of values• If the first value in the set is null, then the function

returns NULL unless you specify IGNORE NULLS. This setting is useful for data densification.

38

FIRST_VALUE (expr [ IGNORE NULLS ]) OVER (analytic_clause)

LAST_VALUE (expr [ IGNORE NULLS ]) OVER (analytic_clause)

Page 37: Exploring Advanced SQL Techniques Using Analytic Functions

Using FIRST_VALUE Analytic Function Example

SELECT department_id, last_name, salary, FIRST_VALUE(last_name) OVER

(ORDER BY salary ASC ROWS UNBOUNDED PRECEDING) AS lowest_sal, LAST_VALUE(last_name) OVER (ORDER BY salary ASC ROWS BETWEEN UNBOUNDED

PRECEDING and UNBOUNDED FOLLOWING) AS highest_sal FROM (SELECT * FROM employees WHERE department_id = 30 ORDER BY employee_id) ORDER BY department_id, last_name, salary;

39

Page 38: Exploring Advanced SQL Techniques Using Analytic Functions

Using NTH_VALUE Analytic Function• Returns the N-th values in an ordered set of values• Different default window: RANGE BETWEEN

UNBOUNDED PRECEDING AND CURRENT ROW

NTH_VALUE (measure_expr, n) [ FROM { FIRST | LAST } ][ { RESPECT | IGNORE } NULLS ] OVER (analytic_clause)

40

Page 39: Exploring Advanced SQL Techniques Using Analytic Functions

Using NTH_VALUE Analytic Function ExampleSELECT prod_id, channel_id, MIN(amount_sold), NTH_VALUE ( MIN(amount_sold), 2) OVER (PARTITION BY

prod_id ORDER BY channel_id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED

FOLLOWING) nv FROM sh.sales WHERE prod_id BETWEEN 13 and 16 GROUP BY prod_id, channel_id;

41

Page 40: Exploring Advanced SQL Techniques Using Analytic Functions

Using the LISTAGG Function• For a specified measure, LISTAGG orders data within each

group specified in the ORDER BY clause and then concatenates the values of the measure column

• WARNING: Limited to output of 4000 chars (else, error message in runtime)

42

LISTAGG(measure_expr [, 'delimiter']) WITHIN GROUP (order_by_clause) [OVER

query_partition_clause]

Page 41: Exploring Advanced SQL Techniques Using Analytic Functions

Using the LISTAGG Function ExampleSELECT department_id "Dept", hire_date "Date", last_name "Name", LISTAGG(last_name, ', ') WITHIN GROUP (ORDER BY

hire_date, last_name) OVER (PARTITION BY department_id) as "Emp_list" FROM hr.employees WHERE hire_date < '01-SEP-2003' ORDER BY "Dept", "Date", "Name";

43

Page 42: Exploring Advanced SQL Techniques Using Analytic Functions

Window Functions

Page 43: Exploring Advanced SQL Techniques Using Analytic Functions

45

Window Functions• The windowing_clause gives some analytic functions a

further degree of control over this window within the current partition

• The windowing_clause can only be used if an order_by_clause is present

• The windows are always limited to the current partition• Generally, the default window is the entire work set

unless said otherwise

Page 44: Exploring Advanced SQL Techniques Using Analytic Functions

46

Windowing Clause Useful Usages• Cumulative aggregation• Sliding average over proceeding and/or following rows• Using the RANGE parameter to filter aggregation

records

Page 45: Exploring Advanced SQL Techniques Using Analytic Functions

47

Windows can be by RANGE or ROWS

Possible values for start_point and end_pointUNBOUNDED PRECEDING The window starts at the first row of the partition.

Only available for start points.UNBOUNDED FOLLOWING The window ends at the last row of the partition.

Only available for end points.CURRENT ROW The window starts or ends at the current rowvalue_expr PRECEDING A physical or logical offset before the current row.

When used with RANGE, can also be an interval literal 

value_expr FOLLOWING As above, but an offset after the current row

RANGE BETWEEN start_point AND end_pointROWS BETWEEN start_point AND end_point

Page 46: Exploring Advanced SQL Techniques Using Analytic Functions

48

Shortcuts• Useful shortcuts for the windowing clause:

ROWS UNBOUNDED PRECEDING ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

ROWS 10 PRECEDING ROWS BETWEEN 10 PRECEDING AND CURRENT ROW

ROWS CURRENT ROW ROWS BETWEEN CURRENT ROW AND CURRENT ROW (1 row)

Page 47: Exploring Advanced SQL Techniques Using Analytic Functions

Oracle 12c New Feature OverviewJust a couple, we can talk for hours on all the new features..

49

Page 48: Exploring Advanced SQL Techniques Using Analytic Functions

What’s New in Oracle 12c• Top-N Queries and pagination: returning the top-n

queries• syntactic honey – just a syntax enhancement, not

performance enhancement• Pattern matching: New MATCH_RECOGNIZE syntax for

finding row between patterns

50

Page 49: Exploring Advanced SQL Techniques Using Analytic Functions

51

Top-N ExamplesSELECT last_name, salaryFROM hr.employeesORDER BY salaryFETCH FIRST 4 ROWS ONLY;

SELECT last_name, salaryFROM hr.employeesORDER BY salaryFETCH FIRST 4 ROWS WITH TIES;

SELECT last_name, salaryFROM hr.employeesORDER BY salary DESCFETCH FIRST 10 PERCENT ROWS ONLY;

Page 50: Exploring Advanced SQL Techniques Using Analytic Functions

52

What is Pattern Matching?• A new syntax that allows us to identify and group rows

with consecutive values• Consecutive in this regards – row after row• Uses regular expression like syntax to find patterns• Finds complex behavior we couldn’t found before, or

needed PL/SQL for it

Page 51: Exploring Advanced SQL Techniques Using Analytic Functions

53

Example: Pages in a Book Example• Our goal: find uninterrupted sequences in a book• This can be useful for detecting missing records or

sequential behavior

(source: “Database 12c Row Pattern Matching” (OOW2014 session), by Stew Ashton).

Page 52: Exploring Advanced SQL Techniques Using Analytic Functions

SELECT *FROM book_pagesMATCH_RECOGNIZE ( ORDER BY page PATTERN (A B*) DEFINE B AS page = PREV(page)+1 ONE ROW PER MATCH MEASURES A.page firstpage, LAST(page) lastpage, COUNT(*) cnt AFTER MATCH SKIP PAST LAST ROW);

1. Define input2. Pattern Matching3. Order input4. Process pattern5. Using defined conditions6. Output: rows per match7. Output: columns per

row8. Where to go after

match?

Pattern Matching ExampleSELECT *FROM book_pagesMATCH_RECOGNIZE ( ORDER BY page MEASURES A.page firstpage, LAST(page) lastpage, COUNT(*) cnt ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW PATTERN (A B*) DEFINE B AS page = PREV(page)+1);

Page 53: Exploring Advanced SQL Techniques Using Analytic Functions

And the Result…

55

FIRSTPAGE LASTPAGE CNT---------- ---------- ---------- 1 3 3 5 7 3 10 15 6 42 42 1

Page 54: Exploring Advanced SQL Techniques Using Analytic Functions

Q&A

56

Page 55: Exploring Advanced SQL Techniques Using Analytic Functions

Summary• We talked about advanced aggregation clauses, multi-

dimensional aggregation, and how utilizing it can save us time and effort

• Analytic functions are really important both for performance and for code clarity

• We saw how rank function work and how to use windows• We explored some Oracle 12c enhancements – more

information about that can be found in my blog: www.realdbamgic.com

57

Page 56: Exploring Advanced SQL Techniques Using Analytic Functions

Thank You!Zohar Elkayam

twitter: @[email protected]

www.realdbamagic.com

58