25
Window Functions for Data Science MARK TABLADILLO PH.D. MICROSOFT MVP – ATLANTA, GA FEBRUARY 2016

Window functions for Data Science

Embed Size (px)

Citation preview

Page 1: Window functions for Data Science

Window Functions for Data ScienceMARK TABLADILLO PH.D.

MICROSOFT MVP – ATLANTA, GA

FEBRUARY 2016

Page 2: Window functions for Data Science

AbstractWindow functions are powerful analytic functions built into SQL Server. SQL Server 2005 introduced the core window ranking functions, and SQL Server 2012 added time and statistical percentage window functions. These functions allow for advanced variable creation, and are of direct benefit to people creating features for data science. This talk will also recommend further reading on this topic.

Required for Certification

Page 3: Window functions for Data Science

What is a “Window”?Answer: A set of rows defined by the OVER clause

Set One

Set Two

Set Three

ORDER BY

Set One

Set Two

Set Three

PARTITION BY

Page 4: Window functions for Data Science

What is a “function”?Function Example

Ranking ROW_NUMBER, RANK, DENSE_RANK, NTILE

Aggregate MIN, MAX, AVG, SUM, COUNT, STDEV, STDEVP, VAR, VARPCHECKSUM_AGG, COUNT_BIG

Analytic LAG, LEAD, FIRST_VALUE, LAST_VALUE, PERCENT_RANK, PERCENTILE_CONT, PERCENTILE_DISC, CUME_DIST

GROUP BY not required

Page 5: Window functions for Data Science

Demo: Ranking

Page 6: Window functions for Data Science

What is a “Window”?Answer: A set of rows defined by the OVER clause

Set One

Set Two

Set Three

ORDER BY

Set One

Set Two

Set Three

PARTITION BY

Page 7: Window functions for Data Science

What is a “function”?Function Example

Ranking ROW_NUMBER, RANK, DENSE_RANK, NTILE

Aggregate MIN, MAX, AVG, SUM, COUNT, STDEV, STDEVP, VAR, VARP

Analytic LAG, LEAD, FIRST_VALUE, LAST_VALUE, PERCENT_RANK, PERCENTILE_CONT, PERCENTILE_DISC, CUME_DIST

GROUP BY not required

Page 8: Window functions for Data Science

Order please

Page 9: Window functions for Data Science

Demo: Aggregate

Page 10: Window functions for Data Science

What is a “Window”?Answer: A set of rows defined by the OVER clause

Set One

Set Two

Set Three

ORDER BY

Set One

Set Two

Set Three

PARTITION BY

Page 11: Window functions for Data Science

What is a “function”?Function Example

Ranking ROW_NUMBER, RANK, DENSE_RANK, NTILE

Aggregate MIN, MAX, AVG, SUM, COUNT, STDEV, STDEVP, VAR, VARP

Analytic LAG, LEAD, FIRST_VALUE, LAST_VALUE, PERCENT_RANK, PERCENTILE_CONT, PERCENTILE_DISC, CUME_DIST

GROUP BY not required

Page 12: Window functions for Data Science

Framing: Order MattersTerm Operators

ROWS = Physical Operator (faster) UNBOUNDED PRECEDINGUNBOUNDED FOLLOWINGN PRECEDINGN FOLLOWINGCURRENT ROW

RANGE = Logical Operator (slower) UNBOUNDED PRECEDINGUNBOUNDED FOLLOWINGCURRENT ROW (RANGE?)

Page 13: Window functions for Data Science

Default FrameFirst row of partition to current row

Page 14: Window functions for Data Science

1 2 3 4 5 6 7 8 9

Current Row

Unbounded FollowingUnbounded Preceding

4 Preceding2 Following

1 Preceding 1 Following

ROW

Page 15: Window functions for Data Science

Current Row

Unbounded Following

Unbounded Preceding

RANGE

1 2 3 4 5 6 7 8 9

12 23 34 45 50 50 50 50 65

Page 16: Window functions for Data Science

Demo: Framing

Page 17: Window functions for Data Science

What is a “Window”?Answer: A set of rows defined by the OVER clause

Set One

Set Two

Set Three

ORDER BY

Set One

Set Two

Set Three

PARTITION BY

Page 18: Window functions for Data Science

What is a “function”?Function Example

Ranking ROW_NUMBER, RANK, DENSE_RANK, NTILE

Aggregate MIN, MAX, AVG, SUM, COUNT, STDEV, STDEVP, VAR, VARP

Analytic LAG, LEAD, FIRST_VALUE, LAST_VALUE, PERCENT_RANK, PERCENTILE_CONT, PERCENTILE_DISC, CUME_DIST

GROUP BY not required

Page 19: Window functions for Data Science

Demo: Analytic

Page 20: Window functions for Data Science

Caveats ExistBut if Window Functions alone cannot do the job,

Then something else can

Page 21: Window functions for Data Science

Logical Alternatives

Common Table Expressions (CTEs)

CROSS APPLY

Page 22: Window functions for Data Science

Nondeterministic Functions =Not one unique way to do the jobhttps://msdn.microsoft.com/en-us/library/ms178091.aspx

Page 23: Window functions for Data Science
Page 24: Window functions for Data Science
Page 25: Window functions for Data Science

CodeAvailable at https://github.com/marktab/windowfunctions2016