116
Spring Term 2020 Slide 1 Data Warehousing Analytic Applications and Business Intelligence Spring Term 2020 Dr. Andreas Geppert [email protected]

Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

Spring Term 2020 Slide 1

Data Warehousing

Analytic Applications and

Business Intelligence

Spring Term 2020Dr. Andreas [email protected]

Page 2: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 2

Outline of the Course

� Introduction

� DWH Architecture

� DWH-Design and multi-dimensional data models

� Extract, Transform, Load (ETL)

� Metadata

� Data Quality

� Analytic Applications and Business Intelligence

� Implementation and Performance

Page 3: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 3

Outline

1. Analytic Applications

– Classifications and Architecture

– Semantic Models

2. Query Languages: SQL

3. Reporting

4. Query Languages: MDX

5. OLAP

6. Visualization

7. Dashboards and Scorecards

8. Big Data

9. (Data Mining)

Page 4: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 4

GUIReporting, OLAP,Data Mining

Selection,Aggregation,Calculation

Credit Suisse DWH Reference Architecture V5

(Meta)data

Management

Layered Architecture

Data MartsReporting and

Analysis Services

FrontEndDomain Integration and Enrichment

Integration, Aggregation, Calculation

Staging AreaData

SourcesFederated Integration

Reference/

Master

Data

integration enrichment

logic;

extract, transform, load

logic

(no ETL)Legend:

data

flowrelationaldatabase

multidimensionaldatabase

file

Page 5: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 5

Analytic Systems

Concept-oriented Systems

Balanced Scorecard

Planning and Budgeting

Consolidation

Value-oriented Management

Generic Systems

Ad-hoc Analysis Systems

Free OLAP Analysis

Guided OLAP Analysis

Free Data Retrieval

SQL

MDX

Reporting Systems

Interactive Reporting Platforms

Generated Reports

Model-based Analysis Systems

Decision Support Systems

Expert Systems

Data Mining

Page 6: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 6

IT Developers

Production Reporting Tools Statistics

Analysts &

Information Workers

BI Spreadsheets

OLAP

Business Query

Executives &

ManagersDashboards

Interactive Fixed Reports

Scorecards

Front-Line

WorkersEmbedded BI

BI Search

Customers,

Suppliers, Regulators

Published Reports

BI-Tools and (IT-) Skills

Specialization

© Andreas Geppert

Frühlingssemester 2008 Slide 6

Source: Howson 2008

Page 7: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 7

Decisions: frequency and Economic Impact

High-impact,

infrequent decisions

Ex: M&A,

capital investment,

strategic market

positioning

Medium-impact,

medium frequent

Decisions

Ex: product development-

and pricing,

customer segmentation

Low-impact,

frequent decisions

Ex: Loan request,

Cross-sell offers,

customer upgrade

Frequency of Decision

Eco

no

mic

Im

pa

ct o

f In

div

idu

al

De

cisi

on

s

Source: Howson 2008

Page 8: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 8

BI System Architecture

BI Server

(Caching Optimization Security Workflow)

DWH CubeERP

CRM OLTPSpread

sheet

Semantic Layer

Tool ToolSpread

sheet

Page 9: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 9

Semantic Models

� Especially business users and in ad-hoc reporting, skills to

access relational data models via SQL cannot be expected

� semantic models (semantic layers) form an intermediate layer

between database/DWH and users:

– Semantic models are closer to business language and terminology than

relational models and star schemas

– Semantic models abstract from database structures: joins and aggregations

etc. can be hidden in the mapping of the semantic layer onto database

structures

– Ideally all the required BI tools integrate with the semantic layer

� Examples: Business Objects Universes, OBIEE

Page 10: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 10

Outline

1. Analytic Applications– Classifications and Architecture– Semantic Models

2. Query Languages: SQL

Star Joins

Super groups Aggregate and analysis functions Local grouping

3. Reporting4. Query Languages: MDX5. OLAP6. Visualization7. Dashboards and Scorecards8. Big Data

Page 11: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 11

Ex

am

ple

Sc

he

ma

Page 12: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 12

Query Patterns: Star Queries

� Queries against star schemas

� Joins fact table with some or all of the dimension tables

Star queries, star join

� Typically restrictions in dimension tables

� Typically also grouping and aggregation in the resulting table

Page 13: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 13

Example: Star Queries

� Sales per product, store, and day

select p.product_name, s.store_name, t.the_date,

sum(f.unit_sales) as sales

from sales_fact f, store s, product p, time_by_day t

where f.product_id = p. product_id

and f.store_id = s. store_id

and f.time_id = t. time_id

group by p.product_name, s.store_name, t.the_date;

Page 14: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 14

Outline

1. Analytic Applications– Classifications and Architecture– Semantic Models

2. Query Languages: SQL Star Joins

Super groups

Aggregate and analysis functions Local grouping

3. Reporting4. Query Languages: MDX5. OLAP6. Visualization7. Dashboards and Scorecards8. Big Data

Page 15: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 15

Grouping in SQL

� traditionally:

– group-by clause

– Per query, there is a fixed set of grouping criteria

� Suboptimal for flexible grouping

– Along multiple dimension

– On multiple levels of a dimension hierarchy

� all combinations of grouping attributes over G1,...,Gn ?

– 2n queries with corresponding grouping criteria

– (product, store, date)

[(product, store, date), (store, date), (product, date), (product, store)

(product), (store), (date), ()]

Super groups

– Grouping sets, rollup, cube

Page 16: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 16

Grouping Sets

� Groups along multiple grouping criteria

� In a single query!

Grouping sets

Explicit listing of all grouping criteria

� Example: sum of sales, per

– product,

– day,

– product and day

in a single, same query

Page 17: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 17

Grouping Sets: Sample Data

PRODUCT SALES_DATE SALES_CNT

--------- ---------- ---------

Cornetto 06.06.06 15

Magnum 06.06.06 25

06.06.06 5

Cornetto 07.07.06 22

Magnum 07.07.06 33

07.07.06 6

Cornetto 44

Magnum 9

Page 18: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 18

Grouping Sets: Example

select product, sales_date, sum(sales_cnt) total

from kiosk

group by grouping sets ( (product),

(sales_date),

(product, sales_date));

Page 19: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 19

Grouping Sets: Example (2)

� Possible result:PRODUCT SALES_DATE TOTAL

-------- ---------- -----

Magnum (null) 9

Cornetto (null) 44

(null) 06.06.06 5

Magnum 06.06.06 25

Cornetto 06.06.06 15

(null) 07.07.06 6

Magnum 07.07.06 33

Cornetto 07.07.06 22

(null) (null) 53

(null) 06.06.06 45

(null) 07.07.06 61

(null) (null) 11

Magnum (null) 67

Cornetto (null) 81

Page 20: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 20

Grouping Function

� SQL queries generate null values

– Meaning: „all“

– Set of values cannot be represented in first normal form

� Additionally there may be null values in the data

– Meaning: not existing, unknown

In the query result, the meaning of NULL (-) is not obvious

Grouping function

Shows whetherthe row is the result of grouping over the column

0: NULL has been in the data

1: NULL because of grouping ("all")

Page 21: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 21

The Grouping Function: Example

� Similar query as above

select product, grouping(product) as prodgrp,

sales_date, grouping(sales_date) dategrp,

sum(sales_cnt) as total

from kiosk

group by grouping sets ( (product),

(sales_date),

(product, sales_date))

Page 22: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 22

The Grouping Function: Example (2)

� result:product prodgrp sales_date dategrp total --------------- ---------- -------------- ------- ---- -----------Cornetto 0 06.06.06 0 15

Cornetto 0 07.07.06 0 22

Cornetto 0 0 44

Cornetto 0 1 81

Magnum 0 06.06.06 0 25

Magnum 0 07.07.06 0 33

Magnum 0 0 9

Magnum 0 1 67

0 06.06.06 0 5

1 06.06.06 0 45

1 07.07.06 0 61

0 07.07.06 0 6

1 0 53

0 1 11

Page 23: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 23

The Grouping Function: Example (3)

� Query as above, with declaration of the «all» values:

select decode(grouping(product), 1, 'All Products', product)

decode(grouping(sales_date), 1, 'All Dates', sales_date),

sum(sales_cnt) as total

from kiosk

group by grouping sets ( (product),

(sales_date),

(product, sales_date));

Page 24: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 24

The Grouping Function: Example (4)

� Result:product date total --------------- ------- ---- -----------All Products 06.06.06 45

All Products 07.07.06 61

All Products (null) 53

Cornetto 06.06.06 15

Cornetto 07.07.06 22

Cornetto (null) 44

Cornetto All Dates 81

Magnum 06.06.06 25

Magnum 07.07.06 33

Magnum (null) 9

Magnum All Dates 67

(null) 06.06.06 5

(null) 07.07.06 6

(null) All Dates 11

Page 25: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 25

The Cube Operator

� Grouping with all possible combinations?

� G1...Gn 2n criteria with grouping sets

abbreviation: the cube operator

� cube(G1...Gn) grouping sets( 2{G1...Gn})

� Example: cube(A, B) grouping sets((A,B), (A), (B), ())

� ( ): grand total

Page 26: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 26

The Cube Operator: Example

� Sales grouped by:

– Product family

– Customer country

– Product family and customer country

– And overall sum (grand total)

select product_family, country, sum(store_sales)

from sales_fact, product, product_class pc, customer c

where ...

group by cube(pc.product_family, c.country);

Page 27: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 27

The Cube Operator: Example(2)

PRODUCT_FAMILY COUNTRY sales

-------------- ------- ------------

Drink Canada 14256.53

Drink Mexico 57991.18

Drink USA 150349.99

Drink - 222597.70

Food Canada 124649.63

Food Mexico 568431.52

Food USA 1376761.54

Food - 2069842.69

Non-Consumable Canada 30845.95

Non-Consumable Mexico 137558.68

Non-Consumable USA 329025.81

Non-Consumable - 497430.44

- Canada 169752.11

- Mexico 763981.38

- USA 1856137.34

- - 2789870.83

Page 28: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 28

The Rollup Operator

� Often we are not interested in all possible grouping criteria

� But mainly in all aggregates along a (subset of a) dimension

hierarchy

� This is, we would like to see a stepwise rollup

Rollup operator

� Computes n grouping combinations + grand total

� rollup(Family, Department, Product)

grouping sets((Family, Department, Product),

(Family , Department), (Family), ()

Page 29: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 29

The Rollup Operator: Example

� Sum of sales per

– Product family and product department

– Product family

– overall (grand total)

select product_family,product_dep't,sum(store_sales)

from sales_fact f, product p, product_class pc

where ...

group by rollup(pc.product_family,

pc.product_department)

Page 30: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 30

The Rollup Operator: Example (2)

PRODUCT_FAMILY PRODUCT_DEPARTMENT SALES

-------------- ------------------ ----------

Drink Alcoholic Beverages 68118.28

Drink Beverages 123181.16

Drink Dairy 31298.26

Drink - 222597.70

Food Baked Goods 107589.71

Food Baking Goods 185836.08

Food Breakfast Foods 36523.92

Food Canned Foods 183554.39

Food Dairy 152413.99

Food Eggs 43091.95

Food Frozen Foods 287099.75

... ... ...

Food - 2069842.69

Non-Consumable Health and Hygiene 144139.47

Non-Consumable Household 283380.38

Non-Consumable Periodicals 40860.55

Non-Consumable - 497430.44

- - 2789870.83

Page 31: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 31

Outline

1. Analytic Applications– Classifications and Architecture– Semantic Models

2. Query Languages: SQL Star Joins Super groups

Aggregate and analysis functions

Local grouping

3. Reporting4. Query Languages: MDX5. OLAP6. Visualization7. Dashboards and Scorecards8. Big Data

Page 32: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 32

New Aggregation Functions in SQL

� Traditional aggregate functions

– sum, count, min, max, avg

– Operate on entire columns or on partitions resulting from a traditional group-by

clause („global grouping“)

� This is often not sufficient for analytic queries

– For instance, aggregates (or more general, calculated values) should be

computed based on a subset of the result set relative to single tuples(rows

� „local“ grouping

– Other terms: window functions, OLAP functions, new aggregate functions

� Examples:

– Ranking: top seller

– Numbering of rows

– Position of elements in a list

Page 33: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 33

Rank Functions

� Rank functions

– Assign a numeric value to each individual row

– Rank = position of the element in a sorted list

– Not that the position of a tuple in a list obtained by a sort-by clause is

implicit!

� Rank operator

– Sort criterion is specified in a (new) order clause

– If two or more elements tie, the following ranks are not assigned

� denserank: no gaps in ranks

� Numbering function:

– Enumerates tuples

– never assigns equal numbers to tuples

– row_number

Page 34: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 34

Rank Functions: Examples

A B C

a1 b1 9a2 b2 8a3 b1 8a4 b2 6a5 b1 5a6 b2 4

a7 b1 3

rank () over(order by c)

3

1

2

4557

denserank () over(order by c)

3

1

2

4556

row_number() over(order by c)

3

1

2

4567

Page 35: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 35

Rank Functions: Examples

� The list of products, ordered by total sales, including

position in the sorted list

select p.product_name,

f.storeSales,

rank() over(order by f.storeSales desc)

as salesRank

from productSalesV f join

product p on f.product_id = p.product_id

order by salesRank asc;

Page 36: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 36

Rank Functions: Examples (2)

PRODUCT_NAME STORESALES SALESRANK

------------------------------- --------------- ---------

Carrington Turkey TV Dinner 11753.84 1

Big Time Apple Cinnamon Waffles 11585.34 2

CDR Vegetable Oil 11493.60 3

High Quality 60 Watt Lightbulb 11408.76 4

Ebony Lettuce 11371.86 5

...

Super Columbian Coffee 245.00 1558

Top Measure Chardonnay Wine 240.12 1559

Page 37: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 37

Rank Functions: Examples (3)

� The ten best-selling products

select *

from table(select p.product_name,

f.storeSales,

rank() over(order by f.storeSales desc)

as salesRank

from productSalesV ...) sr

where salesRank <= 10

order by salesRank;

Page 38: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 38

Outline

1. Analytic Applications– Classifications and Architecture– Semantic Models

2. Query Languages: SQL Star Joins Super groups

Aggregate and analysis functions Local grouping

3. Reporting4. Query Languages: MDX5. OLAP6. Visualization7. Dashboards and Scorecards8. Big Data

Page 39: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 39

Local Grouping

� Calculate aggregates based on partitions given by the context of

single tuples

«local» grouping

Each individual tuple defines an aggregate (possibly together with other

tuples)

� Comparisons to other aggregates are possible

– For instance, change in monthly sales compared to the previous year

� Moving average

– For instance, product sales averaged over previous, current, and following

month

� Cummulated sums

– For instance, monthly sums added up to current month (year-to-date)

Page 40: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 40

Local Grouping (2)

� Hard to formulate (if possible at all) with traditional SQL

– Possibly multiple SQL statements are required

Local grouping

One output tuple per input tuple

Local partitioning criterion

Local ordering criterion

Window size

Page 41: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 41

Local Partitioning

� Partition is defined based on «current» tuple

� Aggregates will be calculated over this partition

� partition-by clause

� Typically used for pre-aggregated data

� Often used to compute ratios (ratio-to-report)

Page 42: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 42

Local Partitioning (2)

A B Ca1 b1 6a2 b2 5a3 b1 5a4 b2 4a5 b1 3a6 b2 2a7 b1 1

15

1511

11151115

14

1511

91156

sum(c) over(partition by b)sum(c) over(partition by b order by a)

Page 43: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 43

Local Partitioning: Example

� Sum of sales per day plus ratio to monthly (percent) and yearly sales (per mill)

select theDate, storeSales,

sum(storeSales) over(partition by month(theDate))

as mSales,

100 * storeSales /

sum(storeSales) over(partition by month(theDate))

as monPcnt,

sum(storeSales) over(partition by year(theDate))

as ySales,

1000 * storeSales /

sum(storeSales) over(partition by year(theDate))

as yearPmil

from timeSalesV;

Page 44: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 44

Local Partitioning: Example (2)

thedate storesales msales monpcnt ysales yearpmil

---------- ---------- ---------- ------- ----------- -----

01/01/2003 1139.06 228289.43 0.00 890572.78 1.00

...

01/05/2003 2877.56 228289.43 1.00 890572.78 3.00

...

02/15/2003 4939.80 217254.31 2.00 890572.78 5.00

02/16/2003 3195.47 217254.31 1.00 890572.78 3.00

...

01/02/2004 6379.30 228289.43 2.00 1899298.05 3.00

...

02/01/2004 6706.19 217254.31 3.00 1899298.05 3.00

...

Page 45: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 45

Local Grouping: Calculation of Ratios

� Calculation of ratios using ratio_to_report� Determines the relative share that a value contributes to a

sumselect …

ratio_to_report(sales)

over (partition by month(theDate)) * 100

as monPcnt

Page 46: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 46

Windows

� Windows can be moved over a (intermediate) table

� Aggregates will be computed over the data «visible through the

window»

� Window (size) defined in terms of the current tuple

Window clause

� options:

– Position based:

n (or all, none) tuples before T (in the specified sort order)

n (or all, none) tuples after T (in the specified sort order)

– Value based

Page 47: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 47

Windows (2)

Bb1b2b1b2b1b2b1

A Ca1 6a2 5a3 5a4 4a5 3a6 2a7 1

partition by b order by a

rows between 1 preceding and 1 following

Page 48: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 48

Moving Average: Example

� Three-month average of the sum of sales

select monat, jahr, storeSales,

avg(storeSales)

over(partition by jahr

order by monat

rows between 1 preceding and 1 following)

as avg_3_mon

from monthSalesV

Page 49: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 49

Moving Average: Example (2)

MONAT JAHR STORESALES AVG_3_MON

----------- ----------- ---------- -----------

1 2003 70923.85 71068.94

2 2003 71214.04 74140.24

3 2003 80282.83 72513.60

4 2003 66043.94 72294.36

...

9 2003 69036.89 68672.30

10 2003 64944.22 72704.42

11 2003 84132.16 79524.97

12 2003 89498.53 86815.34

1 2004 157365.58 151702.92

2 2004 146040.27 153100.34

3 2004 155895.17 150537.91

...

Page 50: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 50

Cumulated Sums

� Individual tuples contribute successively to the computation of a

sum

� Sum in step 1 (per partition!) = attribute value of the first tuple

� Sum in step n+1 (per partition!) =

sum of step n + attribute value of the n+1st tuple

� Result tuples represent cumulated sums

Page 51: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 51

Cumulated Sums: Example

� Monthly sales and sum of sales up to and including the current

month

select monat, jahr, storeSales,

sum(storeSales) over(partition by jahr

order by monat

rows unbounded preceding) cum_Sales

from monthSalesV;

Page 52: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 52

Cumulated Sums: Example (2)

MONAT JAHR STORESALES CUM_SALES

----------- ----------- ---------- -------------

1 2003 70923.85 70923.85

2 2003 71214.04 142137.89

3 2003 80282.83 222420.72

4 2003 66043.94 288464.66

5 2003 70556.32 359020.98

...

11 2003 84132.16 801074.25

12 2003 89498.53 890572.78

1 2004 157365.58 157365.58

2 2004 146040.27 303405.85

3 2004 155895.17 459301.02

4 2004 149678.31 608979.33

...

Page 53: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 53

Processing of Analytic Queries

� Data filtering (WHERE)

� (global) grouping (GROUP-BY)

� Filtering of aggregates (HAVING)

� Computation of analytic functions– Each analytic function is computed for itself

– Creation of partitions

– Sorting of partitions

– Application of ranking or aggregate functions

� Sorting of final result (ORDER BY)

� Note that WHERE and HAVING are applied before the

computation of analytic functions

Results of analytic functions cannot be referred to in these

clauses

Page 54: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 54

Outline

1. Analytic Applications

– Classifications and Architecture

– Semantic Models

2. Query Languages: SQL

3. Reporting

4. Query Languages: MDX

5. OLAP

6. Visualization

7. Dashboards and Scorecards

8. Big Data

Page 55: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 55

Reporting

� Pre-defined, created and distributed periodically

� Or defined, created, and consumed on-demand

� Report consists in general of data and layout

� Data are typically obtained through database queries

� Layout can be tabular and/or graphical

� Often reporting is distinguished according to the purpose or

domain

– Management reporting

– Performance reporting

– Financial reporting

– Regulatory reporting

– technical reporting (e.g. availability reporting)

Page 56: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 56

Reporting: Standard Reports

� Pre-defined

� Created regularly (e.g., month end)

� Distributed to consumers

� Replaces traditional, paper-based reporting

� Possibly with very high requirements and expectations regarding

layout and look-and-feel («pixel-perfect reports»)

� Reports are often defined by specialized IT staff

� Reports are typically developed as regular software projects

Page 57: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 57

Reporting: Parameterized Reports

� Very similar to standard reports

� Report and database query contain formal parameters which are

instantiated at report generation time

� Actual parameters are then specific for individual report

consumers

– Access to such reports often needs to be restricted to a small group of

consumers

� Example: net new assets per relationship manager

� (very basic) drill down can be implement by linking

parameterized reports

Page 58: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 58

Reporting: Phases

� Report definition

– Tool-based

– Should ideally be possible without deep database or IT skills

– Graphical interface, report editor

– Metadata support is esssential

� Report creation

– Data are extracted by executing queries

– Depending on the report definition and database query, the reporting tool

can also perform some kind of processing

Page 59: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 59

Reporting: Phases (2)

� Formatting

– Report is formatted according to the layout definition

– Tables and/or charts

� Publishing and distribution

– Report can be stored on the reporting tool’s infrastructure

– Consumers can be notified about availability of the report

– Reports may also be distributed via email directly

Page 60: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 60

Reporting: Ad-hoc Reports

� Not pre-defined, satisfy an urgent, one-off information need

� Reporting phases collapse, in particular the time gap between

report definition and generation does not exist

� In orderto provide the required agility, ad-hoc reports cannot be

developed with the same rigorous project approach as

standard/parameterized reports

� Typically ad-hoc reports should be definable by end-users, at

least power users

– Implies ease-of-use requirements of the reporting platform and tool

– Layout requirements are typically less strict

Page 61: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 61

Outline

1. Analytic Applications

– Classifications and Architecture

– Semantic Models

2. Query Languages: SQL

3. Reporting

4. Query Languages: MDX

5. OLAP

6. Visualization

7. Dashboards and Scorecards

8. Big Data

Page 62: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 62

MDX

� Multidimensional Expressions

� Initially proposed by Microsoft

� Query language of SQL Server Analysis Services (previously

OLAP Services)

� In the meantime also supported by other multidimensional

database systems (e.g. Essbase, Mondrian, Alphablox, ...)

Page 63: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 63

Structure of MDX Statements

� SELECT (axis dimensions)– columns: set of elements ON COLUMNS

– rows: set of elements ON ROWS

– ... plus ON PAGES, SECTIONS, CHAPTERS, ...

� FROM (cube specification)– Reference to typically a single cube

– In principle a multi-dimensional join between cubes is possible as well

� WHERE (“Slicer” dimensions)– Restriction of the data range

� Measures of a cube– Elements of the mandatory dimension “Measures”

– Standard aggregation operators are defined on schema level (sum, min,

max, count)

Page 64: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 64

Sample MDX-Query

select {Produkt.Abteilung.Members} on Columns,

{Standort.Kanton.Members} on Rows

from KioskSales

where (Measures.[Anzahl Verkäufe] )

Lebensmittel Schreibwaren Zeitschriften

Zürich 2 2

Aargau 2 2 5

Uri 2 2

Page 65: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 65

Set Expressions

� Enumeration

– {USA, CA, SF, SJ, Aargau}

� Element expressions

– Schweiz.CHILDREN: returns Cantons {ZH, AR, AG, ....}

– ZH.PARENT: returns Switzerland

– DESCENDANTS(Schweiz, Cities): Decendants on level Cities

– Time.Quarter.MEMBERS: Enumeration of all elements of a dimension

hierarchy level

Page 66: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 66

Set Expressions (2)

� Creation of sets

GENERATE ({USA, Schweiz},

DESCENDANTS(Geography.CURRENT, Cities))

– Enumerates all cities in Switzerland and USA

� Nesting sets

CROSSJOIN({USA, Schweiz}, {Mike, John}):

{(USA, Mike), (USA, John), (Schweiz, Mike), (Schweiz, John)}

Page 67: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 67

Sample MDX-Query with Crossjoin

select Produkt.Abteilung.Members on Columns,

Crossjoin (Standort.Kanton.Members,

{Datum.[2002].Januar,

Datum.[2002].Februar})

on Rows

from KioskSales

where (Measures.[Anzahl Verkäufe] )

Lebensmittel Schreibwaren Zeitschriften

ZürichJanuar 1

Februar 1

AargauJanuar 1 1

Februar 2 2

UriJanuar 1

Februar 1 1

Page 68: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 68

Set Expressions (3)

� Relative reference

– Zeit.[1999].LastChild: fourth quarter1999

– [1999].NextMember: 2000

– [1990]:[2000]: [1990], ..., [2000]

� Level functions

– Schweiz.LEVEL: returns Country

– Zeit.LEVELS(1): returns Year (counting top down)

Page 69: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 69

Special Functions

� TOPCOUNT, TOPPERCENT, TOPSUM

SELECT {[Anzahl Verkäufe]} on COLUMNS,

{TOPCOUNT(Schweiz.CHILDREN, 5,

Sales)}

ON ROWS

FROM KioskSales

WHERE ([Anzahl Verkäufe], [2005])

Page 70: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 70

Special Functions (2)

� FILTER

– in a WHERE clause, only slicers can be specified

– For specification of predicates, filters have to be used

SELECT FILTER({Schweiz.CHILDREN},

([2005],Sales) > 500) ON COLUMNS,

Quarters.MEMBERS ON ROWS

FROM KioskSales

WHERE ([AnzahlVerkäufe], [2005])

Page 71: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 71

Summary: MDX

� Powerful language for the specification of OLAP queries

– Top-level structure similar to SQL

– Set expressions provide elengant ways to operate on dimension hierarchies

� Functionality

– Many OLAP functions

– Derived measures (WITH clause)

presentation aspects (on rows/columns) as part of queries

Page 72: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 72

Outline

1. Analytic Applications

– Classifications and Architecture

– Semantic Models

2. Query Languages: SQL

3. Reporting

4. Query Languages: MDX

5. OLAP

6. Visualization

7. Dashboards and Scorecards

8. Big Data

Page 73: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 73

OLAP: "Definition"

� Rules by Codd and others

� 12 rules for the evaluation of OLAP products

� Later extended with 6 further features

� Re-grouped into four groups:

– Basic features

– Special features

– Reporting features

– Dimension control

Page 74: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 74

OLAP: Codd’s Rules

1. Multidimensional conceptual view

2. Transparency

Transparency of the architecture and the database environment (data

origin)

3. Accessibility

Integration of heterogeneous schemas and data

4. consistent reporting performance

No performance degradation when number of dimensions grows or

database size increases

5. Client/Server architecture

Logically and physically

Page 75: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 75

OLAP: Codd’s (2)

6. generic dimensionality

7. Dynamic management of sparse cubes

8. Multi-user mode

9. Unrestricted operations across dimensions

10. Intuitive data manipulation

11. Flexible reporting

12. Unrestricted dimensions and aggregation

Page 76: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 76

OLAP: Definition by the OLAP Council

� On-Line Analytical Processing (OLAP) is a category of software technology that enables

analysts, managers and executives to gain insight into data through fast, consistent,

interactive access to a wide variety of possible views of information that has been

transformed from raw data to reflect the real dimensionality of the enterprise as understood

by the user.

� OLAP functionality is characterized by dynamic multi-dimensional analysis of consolidated

enterprise data supporting end user analytical and navigational activities including: – calculations and modeling applied across dimensions, through hierarchies and/or across members

– trend analysis over sequential time periods

– slicing subsets for on-screen viewing

– drill-down to deeper levels of consolidation

– reach-through to underlying detail data

– rotation to new dimensional comparisons in the viewing area

� OLAP is implemented in a multi-user client/server mode and offers consistently rapid response

to queries, regardless of database size and complexity. OLAP helps the user synthesize

enterprise information through comparative, personalized viewing, as well as through analysis

of historical and projected data in various "what-if" data model scenarios. This is achieved

through use of an OLAP Server.

Page 77: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 77

OLAP: FASMI-Test

� Coined by Nigel Pendske

� Fast Analysis of Shared Multi-dimensional Information

� Fast

– Analytic queries must be executed efficiently

– Especially when queries are ad-hoc and interactive

– Balance between:

Pre-computation ( database explosion) and

on-the-fly computation (Performance)

Page 78: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 78

OLAP: FASMI-Test (2)

� Analysis

– Relevant business logic and statistical analysis use cases are supported

– Comprehensible for end users

– Ad-hoc calculations and analysis

� Shared

– Multi-user access

– security

� Multi-dimensional

– Full support for dimensions, hierarchies, including parallel hierarchies

� Information

– Ability to handle large data volumes

– "Input", not consumed storage!

Page 79: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 79

MOLAP: General Architecture

� Support for analysis of

multi-dimensional data

� Multi-dimensional structures

as storage objects

Client

Server

Data Store

Page 80: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 80

MOLAP

� Physical storage of cubes: nested arrays

� Arrays contain finest granularity required for analysis

� Designed for multi-dimensional analysis

� multidimensional OLAP, MOLAP

� Efficient execution of analytic queries possible (depending on query and

design)

� With fine granularity, most of the cube cells are empty (typically > 95%)– Compression of spare dimensions and subcubes

– Complex physical design

� Scalability becomes a problem (performance degrades with growing

cubes)

� Storage of finest granularity is then no longer possible– Use coarser granularity

– Analysis of detail data no longer possible

Page 81: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 81

ROLAP: High-level Architecture

Client

Server

Data Store

Page 82: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 82

ROLAP (2)

� Use relational database systems as storage system

� Map multidimensional structures onto relational tables

(Star-Schema)

� Implementation of management of and queries against

multi-dimensional structures with SQL

� relational OLAP, ROLAP

Page 83: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 83

ROLAP (3)

� Unrestricted number of dimensions

� Management of very large data volumes possible (many TB)– Good scalability

� Skills and experiences are often available

� Analysis of detail level possible

� Execution of complex analytic queries not always possible– See section on SQL

� Performance (query response time) often worse than with

MOLAP– Usage of caches to improve performance (reduce I/O)

� Extensions of relational systems– New operations (some of them already standardized)

– Improved implementations and progress regarding optimizers and access

paths (Teradata, DB2, Oracle, SQLServer)

Page 84: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 84

HOLAP: High-level Architecture

Client

Server

Data Store

Page 85: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 85

HOLAP (2)

� hybrid OLAP

� Tries to combine the advantages of ROLAP and MOLAP

� Multi-dimensional DBS

– Stores aggregates (coarse granularity)

– Ability to analyze detail data

� Drill-through to relational tables

� Cubes and tables can be stored in the same of in different

database systems

Page 86: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 86

Outline

1. Analytic Applications

– Classifications and Architecture

– Semantic Models

2. Query Languages: SQL

3. Reporting

4. Query Languages: MDX

5. OLAP

6. Visualization

7. Dashboards and Scorecards

8. Big Data

Page 87: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 87

Data Visualization

Data visualization

• encompasses all sorts of visual representation supporting the exploration, investigation, and communication of data (S. Few)

Visualization of Information

• (vs. scientific visualization): the use of computer-supported, interactive, visual representations of abstract data to amplify recognition”(Card, Mackinlay, Shneiderman)

Page 88: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 88

Data Visualization

� s. Few2009

0

500

1000

1500

2000

2500

3000

3500

4000

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Inland

Ausland

Page 89: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 89

Visual Perception

� Text and tables are read and processed subsequently value by

value

� Graphs can be consumed as a whole

� Characteristics that can be perceived particularly easily:

– Position (2D)

– Length

– Width

– Area

– Form

– Color

– Orientation

� “pre-attentive attributes of visual perception”

Page 90: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 90

Visual Perception: Basic Elements

Points

■ Points

– Two-dimensional position

■ Lines

– Two-dimensional position +

connections

■ Columns

– Height or length

■ Boxes0

1

2

3

4

5

6

A B C D E

Page 91: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 91

Visual Perception: Basic Elements

Lines Columns

0

1

2

3

4

5

6

A B C D E

0

1

2

3

4

5

6

A B C D E

Page 92: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 92

Visualization Best Practices

� Graph types

� Dimensionality

� Trellis charts small multiples

Page 93: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 93

Charts

� Bar and column charts

– in many cases the best-suited type of way to

visualize quantitative information

– Comparisons, maximum and minimum are

easy to detect

– In order to enable reasonable comparisons,

the x axis must intersect the y axis at 0

� Line charts

– Well-suited to visualize the trend of

quantitative information over time

– Can be meaningfully combined with column

charts

Page 94: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 94

Charts (2)

� Pie charts

– The chart type that is most often mis-used

– Well suited (if at all) for visualizing proportions

– Comparisons are often difficult, because arcs or areas have to be compared

� Maps

– Visualization of geo-coded data

– Visualization of geographical concentration

� Scatterplot

– Represents relationship (or the lack thereof) between two variables

– Identification of trends, clusters, outliers

Page 95: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 95

Charts (3)

� Bubble Charts

– Not a chart type in ist own right

– Represents additional information in other charts such as maps or

scatterplots

� Heatmaps

– Colored visualization of the relationship between two variables

– Third variable can be represented via the size (area) of the rectancles

Page 96: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 96

Dimensionality

� Two-dimensional charts are

preferred

� Three-dimensional charts are

often problematic and hard to

read

Page 97: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 97

Trellis Charts

� A Trellis chart (small

multiples) is a series of

multiple, small, similar

charts

� The series allows one to

visualize an additional

variable or dimension

Page 98: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 98

Outline

1. Analytic Applications

– Classifications and Architecture

– Semantic Models

2. Query Languages: SQL

3. Reporting

4. Query Languages: MDX

5. OLAP

6. Visualization

7. Dashboards and Scorecards

8. Big Data

Page 99: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 99

Dashboards

� Dashboards contain several related indicators or reports

� Usually together with advanced visualiation elements

– Maps

– Reports with content-based formatting

– Travel light visualiation

– Speedometer etc.

� "A dashboard is a visual display of the most important information

needed to achieve one or more objectives; consolidated and arranged

on a single screen so the information can be monitored on a glance"

(Stephen Few, Information Dashboard Design)

Page 100: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 100

Dashboard Example: WEBeMars

� Source: web-based Emergency Medicine Analysis & Reporting System (http://www.edims.net/webemars.php )

Page 101: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 101

Dashboard: Sample Metrics

� Sales

– Orders

– Invoices

– Sales pipeline

– Number of orders

– Sales prices

� Marketing

– Market share

– Campaing success

– Customer demographics

� Finances

– Turnover

– Costs

– Profit

� HR

– Employee Satisfaction

– Attrition

– Number of open positions

� Tech Support– Number of support calls

– Number of closed cases

– Customer satisfaction

– Duration of calls

� Delivery

– Delivery times

– Backlog

– Inventory

� Production

– Number of produced units

– Production times

– Number of defects

� Web Services

– Number of visitors

– Number of page hits

Page 102: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 102

Dashboard: Comparisons of Data

� The same measure at the same point in time in the past

� The same measure at a different point in time in the past

� The current target for the measure

� Relationship to a target in the future

� A past prediction of the measure

� A typical/standard value for the measure

� An extrapolation of the measure into the future

� Another version of the measure

� A different but related measure

Page 103: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 103

Dashboards: Bad Practices

� Distribution of information across multiple screens

� Missing context

� Excessive detail or precision

� Inadequate metrics

� Inadequate representation

� Redundancy

� Inadequate design

� Inadequate coding of quantitative data

� Inadequate emphasis of important information

� Useless decoration

� Inadequate use of colors

Page 104: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 104

Balanced Scorecards

� Concept-oriented business intelligence applications

� Objective: balanced control and steering of the enterprise and ist

constituent parts

� a Balanced Scorecard supports measurement, communication and

control of strategic enterprise targets

� Balanced view of four areas

– Finance

– Customers

– processes (operations)

– Learning and development (people)

Page 105: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 105

Scorecards: Approach

� Definition of strategic goals

� Assignment of at least one key figure to each goal

� Definition of target values for each key figure

� Definition of actions to achieve goals

� Measurement of goal achievement

� Break-down of BSC for subordinate organizational units

� Extension onto functional units like HR, IT

Page 106: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 106

Scorecard Tools

� Recommendations of the Balanced Scorecard Initiative

� BSC design

– Implementation of a BSC containing the BSC approach

� Strategy communication

– Documentation and communication of the BSC elements: Goals, target values, key

figures

� Monitoring of implementation

– Monitoring of measures

� Feedback and adaptation

– Reporting of key figures

– Status of target fulfillment using advanced visualization (s. Dashboards)

– Possibility to comment

Page 107: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 107

Outline

1. Analytic Applications

– Classifications and Architecture

– Semantic Models

2. Query Languages: SQL

3. Reporting

4. Query Languages: MDX

5. OLAP

6. Visualization

7. Dashboards and Scorecards

8. Big Data

Page 108: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 108

Big Data in the Press

� In 2011, McKinsey

estimated that Big Data

can contribute …

� … $300 bn potential

annual value to US

health care

� … €250 billion potential

annual value to

Europe’s public sector

administration� J. Manyika et al: Big data: The next frontier

for innovation, competition, and productivity.

McKinsey Global Institute, May 2011

Page 109: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 109

Big Data Motivation

� processing and analysis of data has traditionally been done in relational databases, using SQL as “inter-galactic data speak”

� in some application areas there is a tremendous increase in data volumes– RDBMS are not easily able to ingest such data volumes– Facebook applications create up to several dozens of TB new data each day (!)– each airplane generates several TB of data on a single flight

� data “types” are not handled well by RDBMS– particularly machine-generated data (web logs, sensor data), social media– those data often are un- or semi-structured or of varying structure

� analysis and processing styles are not very well supported by RDBMS/SQL– analysis of semi- and unstructured data– graph analysis, natural language processing

� data might not be “worth” being stored in a relational database� The451 Group: SPRAIN-requirements are not very well met by current

RDBMS– SPRAIN: Scalability, Performance, Relaxed consistency, Agility, Intricacy, Necessity

� emergence of Big Data (and NoSQL)

Page 110: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 110

Initial Big Data Technology: Map/Reduce andHadoop

� Map/Reduce has been invented and

first implemented/used at Google

� Hadoop is an Apache project

implementing a runtime environment

for Map/Reduce

� in Hadoop, Map and Reduce functions

can be implemented in Java (other

languages are supported as well)

HDFS

Map/Reduce

Hive Hbase Pig

Page 111: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 111

Map/Reduce and Hadoop

� Map/Reduce is (one of) the most prominent approaches to process Big

Data

� It is a highly scalable approach for processing large data volumes in

parallel

� Map/Reduce can run on clusters consisting of thousands of commodity

servers

� the map phase is executed in parallel and computes sets of key/value

pairs

� the reduce phase (also executed in parallel) combines pairs with equal

keys

Page 112: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 112

Map/Reduce: Example Word Count

database/1

column/5

data/5

database/2

data/6

database/(1,2)

column/(5)

data/(5,6)

In a traditional database, data (the

records in a table) are stored row-wise

(i.e., the primary key together with all

the other attributes values). This is

efficient for requests that retrieve one

or only a few rows, require all

attributes, or are write-intensive. …

NoSQL database systems are DBMS or

other kinds of data management

systems that do not offer a SQL-

interface, or at least not only. In general,

NoSQL systems store key/value pairs

and the access to data is primarily via

the primary key (i.e., no scans, range

queries, etc.).

input map shuffle reduce

column/5

data/11

database/3

result

Page 113: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 113

Big Data Evaluation from a Database Perspective

� Initial Hadoop approaches raise challenges addressed by

(relational) databases long ago for relational query operators

� how to parallelize operations

– communication between phase becomes a critical cost factor

� algorithm design

– application implementors need to design algorithms (or at least evaluate

candidates) from a complexity and cost perspective

� abstraction and declarativeness missing

Page 114: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 114

DWH and Big Data: Delineation

� first generation Big Data/Hadoop vendors aimed at replacing

data warehouses

– not realistic (and no longer claimed)

� Hadoop & Co as massively scalable and parallel ETL engines

Map/Reduce

SQL / BI

Page 115: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 115

DWH and Big Data: Delineation (2)

� DWH and Big Data as complements for different use cases

� each of them is used for use cases it can handle well (see

below)

� An often overlooked aspect is the typically much better data

quality in data warehouses

� Both together form a data lake

SQL / BIBig Data

see M. Selvage: Decision Point for Logical Data Warehouse Implementation Style.

Research ID G00250883, Gartner, May 2013

Page 116: Data Warehousing Analytic Applications and Business ......Layered Architecture Data Marts Reporting and Analysis Services Front Domain Integration and Enrichment End Integration, Aggregation,

© Andreas Geppert Spring Term 2020 Slide 116

General Big Data Use Cases

� typical use cases– customer analysis

– product analysis (sentiment analysis, opinion mining)

– graph analysis

– monitoring, analysis, and planning of operations

– security

– legal and compliance

– stock and fund performance predictions

� data used for such use cases– social media

– log data (web and other logs)

– sensor data

– RFID events

– location data

– weather data (historical and forecast)