43
Information & Knowledge Management - Class 3 Marielba Zacarias Prof. Auxiliar DEEI FCT I, Gab 2.69, Ext. 7749 Data-warehousing [email protected] http://w3.ualg.pt/~mzacaria

Gic2011 aula3-ingles

Embed Size (px)

Citation preview

Information & Knowledge Management - Class 3

Marielba ZacariasProf. Auxiliar DEEI

FCT I, Gab 2.69, Ext. 7749Data-warehousing

[email protected]://w3.ualg.pt/~mzacaria

Summary

Data-warehouses

The architected environment

Design Process

Data-modeling schemas

Data Warehousing

Data collection for analysis and reporting taks

Historical data

Stored in a distinct environment from operational data

Structure different from data-bases

Why

Operational and analitical data have different requirements in terms of

usage (frequency, response time)

hardware

software

structure

Data-warehousing Users

6

Before Data-Warehouses....The “spider web”

7

operationalAtomic

dwDept.dw

individualdw

Detaileddaily

current valueHigh access prob.

Application oriented

More granularTemporalIntegrated

Subject orientedSumarized

derived,Some primitive

Typical of MarketingEngineeringProductionAccounting

temporalAd-hoc

HeuristicNão-repetitive

Oriented to PC orworkstations

The “arquitected” environment”

“data-marts”

8

J. Jones123 Main St.Credit - AA

1986-87J. Jones

456 High St.Credit - B

1987-89J. Jones

456 High St.Credit - A

1989 – pte.J. Jones

123 Main St.Credit - AA

Jan – 4101Fev – 4209Mar- 4175Apr - 4215

Clientes Desde 1982Com saldos

> 5,000 e crédito

>= B

operacional Atomicodw Dept. individual

MonthlySales?

Jones Credit?

Jones Credit

History?

Client typesin analysis?

Type of questions

9

ProductionEnvironment

Operationalenvironment

AnaliticalEnvironment

Architected Environment

Data-warehouse designRequirement Gatherings

Physical Environment Setup

Data Modeling

ETL

OLAP Cube Design

Front End Development

Report Development

Performance Tuning

Query Optimization

Quality Assurance

Rolling out to Production

Production Maintenance

Incremental Enhancements

Requirements Gathering

Take into account users

Executive with little time and knowledge about technical terms

Interviews, JAD sessions

User Reporting/Analysis Requirements

Hardware, training requirements

Data source identification

Concrete project plan

Physical Environment Setup

Setup Servers, DBMS and databases, ETL, OLAP Cubes and reporting services

Create three environments

development, testing, production

Data-modeling

Depends on initial data source identification

Conceptual, logical and physical data modeling

Should be related to the information

architecture!!!!

Data ModelingDimensional Approach

Transactional data is partitioned in facts

Numeric transaction data

products ordered, price

Dimensions

provide context for facts

order date, customer name, product number, location info, salesperson

Dimensional Approaches

Star

Fact table (typically a transaction)

Dimensions (context of the transaction)

Snowflake

Dimensions indirectly linked to fact tables

Star Metaphor

Star Schema

Relational model

Star schema

Snow-flake schema

OLAP Cube DesignSpecification of detailed reporting needs in terms of the multi-dimensional structure previously defined (star or snowflake), but regarded as a n-dimensional cube

star/snowflake and cubes are pretty much the same thing

cubes are more appropriate for not IT users

The Cube Metaphor

Slicing

Dicing

Rotating

ETL

Extraction

Transformation

Loading

SQL Server Integration Services

SQL Server Integration Examples

SQL ServerIntegration Examples II

Qualitative data

Description term ActionId

team meeting 18

hr distribution 19

project list 19

team meeting 19

hr distribution 26

project list 26

claims application 27

claims application 28

cards application maintenance 29

claims application integration 30

hr distribution 31

project list 31

claims application 34

claims application 35

hr distribution 36

project list 36

SQL ServerIntegration Examples III

Fuzzy Transformations

Front-end developmentFront-ends range from

in-house development with scripting languages php, asp, or perl

to off-the-shelf products such as Crystal Reports or higher-end products such as Actuate

OLAP vendors also offer front-ends of their own

Report DevelopmentDerived from requirements

Main point of contact between the data-warehouse and users

User customization

Report Delivery (web, e-mail, sms, file formats)

Access privileges

Performance Tuning

ETL

Query Processing

Users loose interest after 30 sec!

Query optimization

Report Delivery

Query Optimization

Understand how your DBMS executes queries

Store intermediate results in temporary tables

Query Optimization tips

Use indexes

Partition tables (vertically and horizontally)

De-normalize (less joins)

Server Tuning

Quality Assurance

Test plan with quality criteria for data

Critical success factor

Often overlooked

Performed by people with knowledge of the business data not data-warehouses

Resistance

Rolling to production

Seems easy but..

Putting everyone online may take a full week in some cases

Online access can be as simple as sending a link by e-mail

Production MaintenanceBackup and recovery processes

Crisis Management

Monitoring end-user usage

Capture runaways queries before whole system is slowed down

To measure usage for ROI calculations and future enhancements

Incremental enhancements

Accomplish small changes such as changing original geographical designations

A company may add new sales regions

No matter how simple, never do them directly in production environment

Architected environment

Architected Enviroment

Architected Environment

Architected environment

Tools for unstructured information management

Content Management Systems

Record Management Systems

Digital Image Management Systems

Digital Asset Management Systems

Digital Imaging Systems