25
1 Enterprise Information Systems Umberto Nanni Master Degree Programme in Management Engineering Enterprise Information Systems Umberto Nanni DIPARTIMENTO DI INGEGNERIA INFORMATICA AUTOMATICA E GESTIONALE ANTONIO RUBERTI Introduction to Business Intelligence and Data Warehousing

Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

1Enterprise Information SystemsUmberto Nanni

Master Degree Programme in

Management Engineering

Enterprise Information Systems

Umberto Nanni

DIPARTIMENTO DI INGEGNERIA INFORMATICA AUTOMATICA E GESTIONALE

ANTONIO RUBERTI

Introduction to

Business Intelligence and

Data Warehousing

Page 2: Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

2Enterprise Information SystemsUmberto Nanni

Business Intelligence Architecture

goals results

managementsystem

operationalsystem

ETL systems

externaldata

sources

servicesystems

ERPsystems

internet /extranet

rep

ort

ing

OLA

P

dat

a m

inin

g

Datawarehouse

Datamart-1 Datamart-2 Datamart-3

KPI DSS MKT HRCRM …

managementsystem

operational system

Page 3: Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

3Enterprise Information SystemsUmberto Nanni

What is Data Warehousing

Collection of methods, technologies and tools to

assist the “knowledge worker” (manager,

analyst) to conduct data analysis aimed at

supporting decision-making and/or improving

the management of information assets

Page 4: Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

4Enterprise Information SystemsUmberto Nanni

What is a Data Warehouse

A data warehouse is a collection of data

• integrated (far beyond the organization)

• consistent (despite the heterogeneous origin)

• focused (an interest area is defined)

• historical (over a consistent timeframe)

• permanent (never delete your data!)

Page 5: Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

5Enterprise Information SystemsUmberto Nanni

Purpose of a Data Warehouse

A Data Warehouse helps (allows) you:

• to take decisions

• to identify and interpret phenomena

• to make predictions about the future

• to control a complex system

Page 6: Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

6Enterprise Information SystemsUmberto Nanni

Value and quantity of information

value

quantity

strategicinformation

primaryinformation

sources

reports

selectedinformation

BD

$$$$

competitors

marketing

prices

sales

logistics

Page 7: Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

7Enterprise Information SystemsUmberto Nanni

OLTP & OLAP

OLTP - On-Line Transaction Processing– realm of (write and / or read) transactions, recovery,

consistency

– many, fast and frequent operations

– high level of concurrency

– access to a small amount of data

– on-the-fly data update

OLAP - On-Line Analytical Processing– read only

– few operations

– low level of concurrency

– access to huge amounts of data

– historical but essentially static data

Page 8: Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

8Enterprise Information SystemsUmberto Nanni

Separation between:Operational Database & Data Warehouse

• different computational load

• different needs:

– DB: dynamic data, asynchronous updates

– DW: static data, periodic updates

• integration with business activity:

– DB: supporting operations (focused, timely)

– DW: supporting decisions (descriptive, historical)

• data collection:

– DB: minimal

– DW: maximal

Page 9: Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

9Enterprise Information SystemsUmberto Nanni

Two issues with different perspectives

• Data redundancy

– OLTP (DB): to avoid, bringing to inconsistency and/or inefficiency on updates

– OLAP (DW): redundancy avoids recomputation and shorten response time

• Indexing

– OLTP (DB): good when you search – bad when you update... you need some trade-off

– OLAP (DW): the more, the best

Page 10: Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

10Enterprise Information SystemsUmberto Nanni

Some Data Warehouse Systems

• Oracle

• IBM InfoSphere

• Microsoft SQL-Server 2014 – Analysis Services

• Sybase IQ

• Hyperion (bought by Oracle)

• Teradata (division of NCR)

• Netezza – Cognos (bought by IBM)

• Business Objects (bought by SAP)

• ...

Page 11: Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

11Enterprise Information SystemsUmberto Nanni

A comparison by Gartner

Mark A. Beyer, Roxane EdjlaliMagic Quadrant for Data WarehouseDatabase Management SystemsGartner RAS Core Research Note G00255860, 07 March 2014

SAP

IBMMicrosoft

1010dataAmazon Web Services HP

Kognitio

Pivotal (Greeplum)

Cloudera

Exasol

ActianInfobright

InfiniDB (formerly Calpont)

MarkLogic

TeradataOracle

2014

Page 12: Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

12Enterprise Information SystemsUmberto Nanni

2010

A comparison by Gartner (some years ago)

Donald Feinberg, Mark A. BeyerMagic Quadrant for Data WarehouseDatabase Management SystemsGartner RAS Core Research Note G00173535, 28 January 2010

Page 13: Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

13Enterprise Information SystemsUmberto Nanni

Architectures for Datawarehousing: issues

• separating OLTP & OLAP

• scalability

• extensibility

• security

• administrability

Page 14: Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

14Enterprise Information SystemsUmberto Nanni

Architecture for Datawarehousing:

‒ determined by design choices

‒ determined by / determines the choice of a

software system

‒ determines the cost and makes possible

future integration (quantitative and / or

qualitative)

‒ affects the cost of data processing

Page 15: Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

15Enterprise Information SystemsUmberto Nanni

Data Mart

Collection of data focused on particular user profile or

on particular target analysis

Alternatives:

1. dependent Data Mart: it is a subset and/or an aggregation of

data in the primary DW

→ DM extracted from a DW

2. independent Data Mart: it is a subset and/or an aggregation

of data in the operational DB

→ DW=Ui(DMi), that is, DW is a set of DM

3. hybrid solution, combining 1, 2

Page 16: Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

16Enterprise Information SystemsUmberto Nanni

DW architecture: 1 Level

• there is only an operational DW

• virtual DB (no OLTP-OLAP separation)

• data coincident with DB operational

• difficult integration with other sources

sources warehouse analysis

data - level 1 middleware

(copy of)operational

DB

externalsources

Page 17: Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

17Enterprise Information SystemsUmberto Nanni

DW architecture: 2 Levels – dependent DMs

• data sources complemented with external sources• running on dedicated software platform• ETL: Extraction, Transformation, Loading• materialization of the DW• materialization of Data Marts

operBD

extBD

sources warehouse analysisfeeding

DW

DataMart

DataMart

ETL

data - level 1 data - level 2

Page 18: Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

18Enterprise Information SystemsUmberto Nanni

sources warehouse analysisfeeding

DW architecture: 2 Levels – independent DMs

• Data Mart are materialized by feeding

• DW = union of DMs

operBD

extBD

DataMart

DataMart

ETL

data - level 1 data - level 2

Page 19: Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

19Enterprise Information SystemsUmberto Nanni

DW architecture: 3 Levels

• a level of "reconciled" data (operational data store) is introduced

• separation into two phases of ETL activities:1. extraction / transformation

2. loading

operBD

extBD

DW

DataMart

DataMart

ET(L)

reconcilieddata

loading

data - level 1 data - level 2 data - level 3

sources warehouse analysisfeeding

Page 20: Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

20Enterprise Information SystemsUmberto Nanni

ETL: Extraction, Transformation, Loading

• extraction

• cleaning - validation - filtering

• transformation

• loading

Operational Data, External Data

Reconciled Data

Data Warehouse

Page 21: Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

21Enterprise Information SystemsUmberto Nanni

Extraction

• initial extraction:

– targeted at the creation of the DW

• furter extractions:

– static (replaces the whole DW)

– incremental

• log (journal)

• timestamp

Page 22: Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

22Enterprise Information SystemsUmberto Nanni

Cleaning

• changing VALUES

• duplicates

• inconsistencies

– domain violation

– functional dependency violation

• null values

• misuse of fields

• spelling

• abbreviations (not homogeneous)

Page 23: Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

23Enterprise Information SystemsUmberto Nanni

Transformation

• changing FORMATS:

• misalignment of formats

• field overloading

• unhomogeneous coding

Page 24: Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

24Enterprise Information SystemsUmberto Nanni

Loading

• Refresh:

ex-novo load of the whole DW

• Update:

differential updates

Page 25: Enterprise Information Systemsnanni/Didattica/MatDid/EIS/slides/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems

25Enterprise Information SystemsUmberto Nanni

Metadata

• internal metadata

– concerning the administration of the DW (i.e., sources, transformations,

schemas, users, etc..)

• external metadata

– interesting for users (e.g., measurement units, possible combinations)

• STANDARDs

• CWM - Common Warehouse Model (OMG), defined by:

– UML (Unified Modeling Language)

– XML (eXtensible Markup Language)

– XMI (XML Metadata Interchange)

OMG = Object Management Group: CORBA (Common Object Request Broker Architecture), UML

(Unified Modeling Language), MDA (Model-Driven Architecture)