29
OLAP

OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

  • View
    222

  • Download
    1

Embed Size (px)

Citation preview

Page 1: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

OLAP

Page 2: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

Overview

• Traditional database systems are tuned to many, small, simple queries.

• Some new applications use fewer, more time-consuming, analytic queries.

• New architectures have been developed to handle analytic queries efficiently.

Page 3: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

The Data Warehouse

• The most common form of data integration.– Copy sources into a single DB (warehouse) and try to keep it

up-to-date.

– Usual method: periodic reconstruction of the warehouse, perhaps overnight.

– Frequently essential for analytic queries.

Page 4: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

OLTP

• Most database operations involve On-Line Transaction Processing (OTLP).– Short, simple, frequent queries and/or modifications,

each involving a small number of tuples.

– Examples: Answering queries from a Web interface, sales at cash registers, selling airline tickets.

Page 5: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

OLAP

• On-Line Analytical Processing (OLAP, or “analytic”) queries are, typically:– Few, but complex queries --- may run for hours.

– Queries do not depend on having an absolutely up-to-date database.

Page 6: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

Common Architecture

• Databases at store branches handle OLTP.• Local store databases copied to a central warehouse

overnight.• Analysts use the warehouse for OLAP.

Page 7: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

Star Schemas

• A star schema is a common organization for data at a warehouse. It consists of:

1. Fact table : a very large accumulation of facts such as sales. Often “insert-only.”

2. Dimension tables : smaller, generally static information about the entities involved in the facts.

Page 8: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

Example: Star Schema

• Suppose we want to record in a warehouse information about every car sale: the serial number, the date of sale, the dealer who sold the car, the day, the time, and the price charged.

• The fact table is a relation:

Sales(serialNo, date, dealer, price)

Page 9: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

Example -- Continued

• The dimension tables include information about the autos, dealers, and days “dimensions”:

Autos(serialNo, model, color)

Dealers(name, city, state)

Days(day, week, month, year)

(5, 27, 7, 2000)

Day dimension table probably not stored.

Page 10: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

Visualization – Star Schema

Dimension Table (Day) Dimension Table (etc.)

Dimension Table (Dealer)Dimension Table (Autos)

Fact Table - Sales

Dimension Attrs. Dependent Attrs.

Page 11: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

Dimensions and Dependent Attributes

• Two classes of fact-table attributes:1. Dimension attributes : the key of a dimension table.

2. Dependent attributes : a value determined by the dimension attributes of the tuple.

• price is the dependent attribute of our example Sales relation.

• It is determined by the combination of dimension attributes.

Page 12: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

Approaches to Building Warehouses

1. ROLAP = “relational OLAP”: Tune a relational DBMS to support star schemas.

2. MOLAP = “multidimensional OLAP”: Use a specialized DBMS with a model such as the “data cube.”

Page 13: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

Data Cubes

• Keys of dimension tables are the dimensions of a hypercube.– Example: for the Sales data, the three dimensions are

serialNo, date, and dealer.

• Dependent attributes (e.g., price) appear at the points of the cube.

Page 14: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

Visualization -- Data Cubes

price

dealer

car

date

Page 15: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

Slicing and Dicing

• Slice– focus on particular partitions along (one or more) dimension

i.e., focusing on a particular slice of the cube• WHERE clause in SQL

• Dice– partitions the cube into smaller subcubes and the points in

each subcube are aggregated• GROUP BY clause in SQL

Page 16: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

Slicing and Dicing in SQL

SELECT grouping-attributes and aggregations

FROM fact table joined with (zero or more) dimension tables

WHERE certain attributes are compared with constants /* slicing */

GROUP BY grouping-attributes /* dicing */

Page 17: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

Slicing and Dicing Example

• Suppose a particular car model, say ‘Gobi’, is not selling as well as anticipated. How to analyze?

• Maybe it’s the color…

• Slice for ‘Gobi.’ Dice for color.

SELECT color, SUM(price)

FROM Sales NATURAL JOIN Autos

WHERE model = 'Gobi'

GROUP BY color;

• Doesn’t show anything interesting.

Page 18: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

Slicing and Dicing Example

• Suppose the previous query doesn't tell us much; each color produces about the same revenue.

• Since the query does not dice for time, we only see the total over all time for each color. – We may thus issue a revised query that also partitions time by

month.

SELECT color, month, SUM(price)

FROM (Sales NATURAL JOIN Autos) JOIN Days ON date = day

WHERE model = 'Gobi'

GROUP BY color, month;

Page 19: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

Slicing and Dicing Example

• We might discover that red Gobis havn’t sold well recently. • Does this problem exists at all dealers, or only some

dealers have had low sales of red Gobis? • Let’s dice on dealer dimension as well.

SELECT dealer, month, SUM(price)

FROM (Sales NATURAL JOIN Autos) JOIN Days ON date = day

WHERE model = 'Gobi' AND color = 'red‘

GROUP BY month, dealer;

Page 20: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

Slicing and Dicing Example

• At this point, we find that the sales per month for red Gobis are so small that we cannot observe any trends easily.

• Thus, we decide that it was a mistake to dice by month. A better idea would be to partition only by years, and look at only the last two years (2006 and 2007).

SELECT dealer, year, SUM(price)

FROM (Sales NATURAL JOIN Autos) JOIN Days ON date = day

WHERE model = 'Gobi' AND color = 'red' AND

(year = 2001 OR year = 2002)

GROUP BY year, dealer;

Page 21: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

Drill-Down and Roll-Up

• Previous examples illustrate two common patterns in sequences of queries that slice-and-dice the data cube.

• Drill-down is the process of partitioning more finely and/or focusing on specific values in certain dimensions. – Each of the example steps except the last is an instance of

drill-down.

• Roll-up is the process of partitioning more coarsely. – The last step, where we grouped by years instead of months

to eliminate the effect of randomness in the data, is an example of roll-up.

Page 22: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

Marginals

• The data cube also includes aggregation (typically SUM) along the margins of the cube.

• The marginals include aggregations over one dimension, two dimensions,…

Page 23: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

Visualization --- Data Cube w/Aggregation

price

dealer

car

dateSU

M o

ver

all D

ays

Page 24: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

Cube operator• CUBE(F) is an augmented table for fact table F

– tuples (or points) added in CUBE(F)• have a value, denoted * to each dimension

– * represents aggregation along that dimension

Page 25: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

Example

Sales(serialNo, date, dealer, price)

Sales(model, color, date, dealer, val, cnt)

serialNo dimension not well suited for the cube as summing the price over all dates, or over all dealers, but keeping the serial number fixed has no effect.

We will replace the serial number by model and color to which the serial number connects.

Also, we will have as independent attributes, val for sum of prices and cnt for number of cars sold.

Page 26: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

Example: CUBE(Sale)

• ('Gobi', 'red', '2001-05-21', 'Friendly Fred', 45000, 2)– On May 21, 2001, dealer Friendly Fred sold two red Gobis

for a total of $45,000.

– Hypothetical tuple that would be in both Sales and CUBE(Sales).

• ('Gobi', *, '2001-05-21', 'Friendly Fred', 152000, 7) – On May 21, 2001, Friendly Fred sold seven Gobis of all

colors, for a total price of $152,000.

– Note that this tuple is in CUBE(Sales) but not in Sales.

Page 27: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

Example: CUBE(Sale)

• ('Gobi', *, '2001-05-21', *, 2348000, 100)– On May 21, 2001, there were 100 Gobis sold by all the

dealers, and the total price of those Gobis was $2,348,000.

• ('Gobi', *, *, *, 1339800000, 58000)– Over all time, dealers, and colors, 58,000 Gobis have been

sold for a total price of $1,339,800,000.

• (*, *, *, *, 3521727000, 198000)– Total sales of all models in all colors, over all time at all

dealers is 198,000 cars for a total price of $3,521,727,000.

Page 28: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

CUBE Helps Answering Queries

SELECT color, AVG(price)

FROM Sales

WHERE model = 'Gobi'

GROUP BY color;

is answered by looking for all tuples of CUBE(Sales) with the form

('Gobi', c, *, *, v, n)

where c is any specific color.

v and n will be the sum and number of sales of Gobis in that color.

The average price, is v/n.

The answer to the query is the set of (c; v/n) pairs obtained from all

('Gobi', c, *, *, v, n) tuples.

Page 29: OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries

CUBE in SQL

SELECT model, color, date, dealer, SUM(val) AS v, SUM(cnt) AS nFROM Sales GROUP BY model, color, date, dealerWITH CUBE;

Suppose we materialize this into SalesCube.Then the previous query is rewritten into:

SELECT color, SUM(v)/SUM(n)FROM SalesCubeWHERE model = 'Gobi' AND

date IS NULL AND dealer IS NULL;