Upload
alvin-freeman
View
218
Download
4
Embed Size (px)
Citation preview
ISO/IEC JTC1 SC32 1SQL/OLAP
SQL/OLAP
Sang-Won Lee
Let’s e-Wha!
Email: [email protected]: http://home.ewha.ac.kr/~swlee
Jul. 12th, 2001
ISO/IEC JTC1 SC32 2SQL/OLAP
Contents
Introduction to OLAP and SQL Issues
Current OLAP Solutions
SQL/OLAP
Future OLAP Trends
ISO/IEC JTC1 SC32 3SQL/OLAP
OLAP
On-Line Analytical Processing– E.F. Codd coined the term “OLAP”([1])
– Multi-dimensional data model
– vs. On-Line Transaction Processing
– vs. Data warehouse
ISO/IEC JTC1 SC32 4SQL/OLAP
Data Warehouse Architecture
ISO/IEC JTC1 SC32 5SQL/OLAP
Multi-dimensional Data Model
Sales(prod-id,store-id,time-id,qty,amt)
R egiona l M gr. V iew
F inancia l M gr. V iew A d H oc V iew
MARKET
TIM E
S A LE SP roduct M gr. V iew
Dimension: Product, Store, Time
Hierarchy:– Product -> Category -> Industry– Store->City -> State -> Country– Date -> Month -> Quarter -> Year
ISO/IEC JTC1 SC32 6SQL/OLAP
Multi-dimensional Data Model(2)
Operations– roll-up/drill-down – slice/dice – pivot – ranking – comparisons– drill-across– etc.
Example– for each state show me top 10 products based on total sales – what is the percentage growth of Jan-99 total sales
over total Jan-98? – for each product show me the quantity shipped and sold
ISO/IEC JTC1 SC32 7SQL/OLAP
Database Back in the OLAP Game- History of SQL Evolutions in 1990s(OLAP Area) -
Requirements from industries(‘95 ~ ‘96)– R. Kimball, “Why Decision Support Fails and How to Fix it?”
([2]); see also [3], [4]
Reactions from researchers(‘96)– Jim Gray et al., “Data Cube: A Relational Aggregation Operator G
eneralizing Group-By, Cross-Tab and Sub Totals,” ([7,8])– Chatziantoniou, K. Ross, “Querying Multiple Features in Relation
al Databases,”([9])
Commercial DBMSs and SQL standards(‘98 ~ )– commercial products: e.g. Oracle, “Analytical Functions for Oracl
e8i”, Oct., 1999– SQL standards
ANSI X3H2-96-205(R3): Super Sets(The Cube and Beyond) ANSI NCITS H2-99-154: Introduction to OLAP
see also [6]
ISO/IEC JTC1 SC32 8SQL/OLAP
OLAP Operations
Many business operations was hard or impossible to express in SQL
– multiple aggregations
– comparisons(with aggregation)
– reporting features
Be prepared for serious performance penalty
Client and middle-ware tools provide the necessary functionality
– OLAP server: ROLAP vs. MOLAP
ISO/IEC JTC1 SC32 9SQL/OLAP
Multiple Aggregations
Create a 2-dimensional spreadsheets that shows sum of sales by maker as well as model of car
Each subtotal requires a separate aggregate query
RED
WHITE
BLUE
Chevy Ford
By Make
By Color
Sum
Cross Tab
SELECT color, make, sum(amt)FROM salesGROUP BY color, makeunionSELECT color, sum(amt)FROM salesGROUP BY colorunionSELECT make, sum(amt)FROM salesGROUP BY makeunionSELECT sum(amt)FROM sales
ISO/IEC JTC1 SC32 10SQL/OLAP
Comparisons
Examples:– last year’s sales vs. this year’s sales for each product
requires a self-join
VIEW:create or replace view v_sales asselect prod-id, year, sum(qty) as sale_sumfrom salesgroup by prod-id, year;
QUERY:select cur.year cur_year, cur.sale_cur_sales, last.sum last_salesfrom v_sales curr, v_sales lastwhere curr.year=(last.year+1)
ISO/IEC JTC1 SC32 11SQL/OLAP
Reporting Features
It was too complex to express– rank(top 10) and N_tile(“top 30%” of all products)
– median, mode, …
– running total, moving average, cumulative totals
ISO/IEC JTC1 SC32 12SQL/OLAP
Reporting Features(2)
Examples:– a moving average(over 3 day window) of total sales for
each product for 2000
VIEW:create or replace view v_sales asselect prod-id, time-id, sum(qty) as sale_sumfrom salesgroup by prod-id, time-id;
QUERY:select end.time, avg(start.sale_sum)from v_sales start, v_sales endwhere end.time >= start.time and end.time <= start.time+2group by end.time
ISO/IEC JTC1 SC32 13SQL/OLAP
OLAP Servers
ProcessingMD queriesefficiently
ISO/IEC JTC1 SC32 14SQL/OLAP
ROLAP
OLAP Client OLAP Client OLAP Client
OLAP Engine
Relational Database(Star or Snowflake Schema)
meta-data
To map warehouseschema into a MD model
ISO/IEC JTC1 SC32 15SQL/OLAP
ROLAP(2)
Example: Oracle Discoverer 4i leverages Oracle 8i– 8i - biggest SQL improvements in a decade!
– more powerful analysis using new analytic functions
– sharing query redirection(rewrite) using MVs
– 100% automated summary management
ISO/IEC JTC1 SC32 16SQL/OLAP
MOLAP
A multidimensional database(MDDB) stores data in a series of array structures, indexed to provide optimal access time to any element in the array.
Example: Oracle Express stores arrays of data
6 7 8
0 1 2
3 4 5
8
5
2
14
11
17
26
23
20
0 1 2
9 10 11
18 19 20
0 1 2
0
1
2
P
R
O
D
U
C
T
M O N T H
0
1
2
C
I
T
Y
16 17 18 19 20 21 22 23
24 25 26
8 9 10 11 12 13 14 15
0 1 2 3 4 5 6 7
ISO/IEC JTC1 SC32 17SQL/OLAP
Propose SQL Constructs
Multiple aggregations– Gray et. al., “Cube and Roll-Up”[6,7]
Comparison– Chatziantoniou and Ross, “Group By Column Variabl
e”[8]SELECT subscriber, r.login-time
FROM log
GROUP BY subscriber: r
SUCH THAT r.spent-time = max(spent_time)
Reporting– Redbrick provides SQL extensions in RISQL
rank, tertile, ratio-to-report etc
ISO/IEC JTC1 SC32 18SQL/OLAP
The Data CUBE Relational Operator Generalizes Group By and Aggregates
CHEVY
FORD 19901991
19921993
REDWHITEBLUE
By Color
By Make & Color
By Make & Year
By Color & Year
By MakeBy Year
Sum
The Data Cube and The Sub-Space Aggregates
REDWHITE
BLUE
Chevy Ford
By Make
By Color
Sum
Cross TabRED
WHITEBLUE
By Color
Sum
Group By (with total)Sum
Aggregate
source:[6]
ISO/IEC JTC1 SC32SQL/OLAP
Getting Sub-totals: ROLLUP Operation
SELECT year, brand, SUM(qty)FROM salesGROUP BY ROLLUP (year, brand);
YEAR BRAND SUM(qty)1996 Ford 2501996 Honda 3001996 Toyota 450 1997 Ford 300 …
1996 1000
1997 1200 2200
ISO/IEC JTC1 SC32SQL/OLAP
Getting Cross-tabs: CUBE Operation
SELECT year, brand, SUM(amount)FROM salesGROUP BY CUBE (year, brand);
YEAR BRAND SUM(AMOUNT)1996 Ford 250 ...1996 Toyota 4501997 Ford 300 ...1997 1200
2200
Ford 550 Honda 650
Toyota 1000
ISO/IEC JTC1 SC32SQL/OLAP
Flexible Grouping: GROUPING_SETS Operator
SELECT year, brand, color, SUM(qty) FROM salesGROUP BY GROUPING_SETS ((year, brand),
(brand,color),());
YEAR BRAND COLOR SUM(QTY)1996 Ford 2501996 Honda 300 1996 Toyota 4501997 Ford 3001997 Honda 3501997 Toyota 550 Ford Blue 400 Ford Red 150 Honda Blue 650 Toyota Red 700 Toyota White 300 2200
Brand, ColorBrand, Color
Year, BrandYear, Brand
Grand totalGrand total
ISO/IEC JTC1 SC32 22SQL/OLAP
LAG Operator
TIMEKEY SALES SALES_LAST_YEAR SALES_CHANGE98-1 1100 - -….. … … ...99-1 1200 1100 10099-2 1500 1450 5099-3 1700 1350 25099-4 1600 1700 -10099-5 1800 1600 20099-6 1500 1450 5099-7 1300 1250 5099-8 1400 1200 200
SQL> SELECT timekey, sales 2 LAG(sales, 12) OVER 3 (ORDER BY timekey) AS sales_last_year, 4 (sales - sales_last_year) AS sales_change 5 FROM sales;
ISO/IEC JTC1 SC32 23SQL/OLAP
MOVING Average
SELECT time-id, avg(sum(qty)) over (order by time-id RANGE INTERVAL ‘2’ DAY PRECEDING ) as mvg_avg_salesfrom salesgroup by time_id ;
ISO/IEC JTC1 SC32 24SQL/OLAP
SQL/OLAP
Why enhance the RDBMS for OLAP calculations? – Performance– Scalability– Simpler SQL development– Productivity
Rollup Functional Index Top 10 Moving window Cumulative window Lead and lagBefore 8.32 8.62 4.26 43.62 45.55 175.01After 1.42 0.91 1.02 4.97 3.36 4.96Improvement 486% 847% 318% 778% 1256% 3428%
0%
500%
1000%
1500%
2000%
2500%
3000%
3500%
4000%
Rollup Functional Index Top 10 Moving w indow Cumulativew indow
Lead and lag
% Im
pro
vem
en
t
ISO/IEC JTC1 SC32 25SQL/OLAP
Database Back in the OLAP Game
Materialized views
Index techniques: e.g. bitmap (join) index
Partitioning: e.g. range/hash/list
Query optimization: e.g. star query optimization
......
ISO/IEC JTC1 SC32 26SQL/OLAP
Future OLAP Trends
To be or not to be?
OLAP API:-OLE DB for OLAP-JOLAP
ISO/IEC JTC1 SC32 27SQL/OLAP
References
[1] E.F. Codd et al., “Providing OLAP(On-line Analytical Processing) to User-Analysts: An IT Mandate,” Available from Arborsoft’s Web Site(http://www.arborsoft.com)
[2] R. Kimball, “Why Decision Support Fails and How to Fix it?” SIGMOD Record, Sep.,1995
[3] R. Kimball, “The Problem with Comparisons,” DBMS Magazine, Jan., 1996(also available from http://www.rkimball.com/html/articles.html)
[4] R. Kimball, “SQL Roadblocks and Pitfalls,” DBMS Magazine, Feb., 1996(also available from http://www.rkimball.com/html/articles.html
[5] R. Winter, “Database Back in the OLAP Game,” Intelligent Enterprise Magazine, Dec., 1998,(available from http://www.intelligententerprise.com)
[6] R. Winter, “SQL-99’s New OLAP Functions,” Intelligent Enterprise Magazine, Jan., 2000,(available from http://www.intelligententerprise.com)
[7] Jim Gray et al., “Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab and Sub Totals,” Proceedings of International Conferences on Data Engineering, p. 152 - 159, 1996
[8] Jim Gray et al., “Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab and Sub Totals,” Data Mining and Knowledge Discovery Journal, Vol. 1, No. 1, 1997
[9] D. Chatziantoniou, K. Ross, “Querying Multiple Features in Relational Databases,”, Proc. Of VLDB Conf., 1996