46
Lecture 1 Themes in this session An introduction to the data warehouse Decision making in organisations The use of data and information in organisations

Lecture 1 Themes in this session An introduction to the data warehouse Decision making in organisations The use of data and information in organisations

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Lecture 1

Themes in this session

• An introduction to the data warehouse• Decision making in organisations• The use of data and information in

organisations

An introduction to the data warehouse

The Data Warehouse

Definition: ” A single, complete and consistent source of data obtained from a variety of sources and made available to end users in a way that they can understand and use in a business context” - Barry Devlin

The data warehouse is an architecture for the provision of information !The data contained in the data warehouse is...– subject oriented– integrated– non-volatile– time variant– transformed– accessible

Subject-oriented

PayrollSystem

PurchasingSystem

SalesSystem

Operational Systems

EmployeeData

VendorData

CustomerData

Informational Systems

Integrated

Operational Systems Informational Systems

CustomerData

OrderSystem

BillingSystem

MarketingSystem

Time variant

Operational Systems Informational Systems

CustomerData

OrderSystem

60-90 days 5-10 years

Non-volatile

Informational Systems

Operational Systems

OrderSystem

Create

Delete

Insert

Update

CustomerData

Load Access

The Data Related Problems Behind the Data Warehouse

Data in organisations often has the following characteristics...– massive volume – dispersed– difficult to access– badly integrated– complex data structures– not suitable for high level business queries

The Information Needs Behind the Data Warehouse

Organisations need information which is...– more holistic in its coverage of the business– selected and enriched– easily accessible– more easily understandable– of a high quality– directly applicable to the decision situation

The integration problem

DBMSDB

DBMSDB

DBMSDB

DBMSDB

End userapplication A

End userapplication B

End userapplication C

The integrated data warehouse

DB

DW DBMSDB

End userapplication A

End userapplicatio

n B

End userapplication C

DB

DB

DB

ExternalDB

Extraction

Characteristics of the integrated data warehouse

• Contained data represents a holistic view of the organisation

• All contained data is for use in ”knowledge discovery”– current data – historical data– aggregated data

• Data usually time-stamped• Data often pre-processed for expected queries• Does not burden operational systems• Own database management environment

Some important terminology

• granularity - the level of detail of data contained in the data warehouse

• aggregation - a summation of detailed data to a less detailed level

• de-normalisation - the abandonment of the rules of normalisation and allowance of repetitive storage of data

• dimension - a business perspective from which data is looked upon

A generic five-tier data warehouse architecture

DB

DB

DB

DB

Extraction DW DBMSDB

Middle-ware

LegacyLayer

Extraction

Layer

Database

Layer

MiddlewareLayer

ApplicationsLayer

Data Flow

The legacy layer

Common legacy layer characteristics...• online source data often in OLTP systems• often dispersed• discrepancies between different data sources• often contains inconsistencies• ”twinkling databases”• already taxed by the normal operations of the

organisation• number of different types of data structures present

DBDB

DBDB

DBDB

The extraction layer

Often the most complex layer in the architecture, involves...

• data selection• retrieval and conversion of data• data cleansing and data scrubbing• data enrichment• data aggregation

ExtractionDB

DBDB

DBDB

DB

The database layer

The database layer is characterised by...• storage of data• processing of queries• data warehouse DBMS

– relational database management system (RDBMS)– modified relational database management system– multi-dimensional database management system

(MDDBMS)– object database management system (ODBMS)

• the storage of metadata

DW DBMS

DB

The middleware layer

A technological platform for accessing the database, involves the distribution and control of enquiries and resulting data flows...

• client-server• LAN • WAN• Intranet

DW DBMS

DBMiddle-

ware

The application layer

Interaction with system users through a number of decision support applications...

• standard reporting• ad-hoc query processing• statistical analysis• simulations - ”what if”• OLAP• data mining

Middle-ware

The data mart issue

A small-subject oriented warehouse

• provides information for a specific sub-group of the enterprise’s business analyses

• far narrower scope than an enterprise wide data warehouse

• far smaller data volumes• more manageable• faster and easier to fine tune the design of the data mart• allows a “piecemeal” approach to some of the enormous

integration problems involved in creating an enterprise wide data model

Data mart evolution

PayrollSystem

PurchasingSystem

SalesSystem

Operational Systems

OrderSystem

BillingSystem

MarketingSystem

CustomerData

Data Mart

Data marts based on an enterprise wide data warehouse

PayrollSystem

PurchasingSystem

SalesSystem

Operational Systems

OrderSystem

BillingSystem

MarketingSystem

Data Mart

DataWarehouse

Avoid the data mart explosion

DataMart

DataMart

DataMart

Alternative DBMS’s for the database layer

A closer look at the three of the four major DBMS alternatives, namely...

• relational database management system (RDBMS)• modified relational database management system• multi-dimensional database management system

(MDDBMS)

DW DBMS

DB

The relational database management system (RDBMS)

• based on standard, normalised relational tables• known technology, many supporting applications,

portable• standard query interface (SQL)• supports easy summations and calculations• can support very large databases• can be slow when processing complex queries• established suppliers

DW DBMS

DB

The modified RDBMS

• uses star-join schema based data structure • expanded SQL, good for business queries• provides a more readily understandable interface• specially designed for quick access and fast

calculations• highly indexed• often used in data marts (ie, dw with a limited

subject area)• good market reputation• demands good knowledge of users information needs

DW DBMS

DB

Star-join schema in a modified RDBMS

Time Dimension

Promotion Dimension

Product Dimension

Store Dimension

Sales Fact

time_keytime_attributa Atime_attribute B

promotion_keypromo_attribute Apromo_attribute B

product_keyproduct_attribute Aproduct_attribute B

store_keystore_attribute Astore_atribute B

time_keypromotion_keystore_keyproduct_keydollar_salesunits_salesdollar_costcustomer_count

Star schema representing a data perspective

Dimension table or single dimensionaldata subject

Fact table ormultidimensional datasubject

The Multi-Dimensional DBMS (MDDBMS)

• Uses a ”metacube” as standard data structure• data stored as an array with any number of possible

dimension• Optimised for OLAP applications• often only compatible with proprietary systems• no branch standards• demands new competence• can have bad performance with large data volumes

DW DBMS

DB

Metadata

Data about data

Main functions are to give...• data definitions• the origin of data• the structure of data• rules for the selection and transfer of data• qualitative and quantitative data about data

The metadata repository

An integrated complete source of metadata

• is at the heart of the data warehouse architecture• supports the information needs of...

– system developers– data administrators– system administrators– users– applications on the data warehouse

• very complex data structure• must contain full version history• must always be up to date

Decision making in organisations

The basics of organisational decision making

• decisions involve choices between alternative courses of action in a given decision situation

• decision making relates to organisational purposes and should serve the interests of the organisation

• decisions are made on behalf of the organisation by a decision making unit, these can be anything from an individual to an entire organisation

• decision making occurs in a specific environment which has a great impact on the decision making process and the choices made

Decision making models

• Provide a framework for interpreting decision making in an organisation

• help articulate the organisational aspects of the environment in which decision making occurs

• can help suggest what form of decision support is best suited to an organisation

4 organisational decision making models

• The rational model• The political/competitive model• The garbage can model• The program model

Rational

Garbagecan

Program

Politicalcompetitive

The rational model

• Involves organisational units using information rationally in order to make choices on behalf of the organisation

• decisions based on sound analysis of possible outcomes and their expected value to the organisation

• decision unit chooses those options that lead to the outcome with the maximum expected value

• the model is widely spread but there is some scepticism as to the applicability of the model in complex, organisational decision situationsRational

Garbagecan

Program

Politicalcompetitive

The political/competitive model

• Decision making occurs in an environment where decisions are consequences of strategies and tactics of decision units trying to influence decisions so that they result in choices favourable to themselves

• different stakeholders in the same decision process have different goals

• decision units promote the choice of options that they believe will result in an outcome which best serves their own purposes

• the interests of the organisation are often subordinate to the interests of the decision

units involved in the processRational

Garbagecan

Program

Politicalcompetitive

The garbage can model

• Organisational decisions are consequences of intersections of problems looking for solutions, solutions looking for problems, and opportunities for making decisions

• participants in the decision making process look for opportunities to apply tried and tested solutions to problems they are confronted with

• the organisational memory records which solutions are available and to what sorts of problems they were applicable

• this model highlights the role of chance and timing in organisational decision making and is quite different from the rational model

Rational

Garbagecan

Program

Politicalcompetitive

The program model

• A model based on the presumption that decision makers are to some extent incapable and unwilling to take decisions

• decision making is affected by standard operating procedures, group norms, and action-constraining organisational programs

• these standard practices result in a “programming” of decision makers

• decision making behaviour at time t will probably be the same as decision making behaviour at time t-1

Rational

Garbagecan

Program

Politicalcompetitive

Problems with organisational decision making

• limited rationality in the choice of alternatives• inconsistent preferences for decision outcomes• chaotic, fragmented, and coincidental decision

processes• decision making as a sacred ritual• conflict of interest

The organisational decision making environment

• Available knowledge - more and increasing– knowledge will become more available and more

proliferous

• Complexity - more and increasing– specialization and diversity in systems

accompanied by increasing interdependence

• Turbulence - more and increasing– the rate of change in the environment is

continually increasing

The difficulties of managerial problem solving

• managerial problems are by nature indefinite– not easily identified and in some cases not even possible

to agree on what the problem is

• most managerial problems are interconnected– acting towards the solution of one problem may adversely

affect another problem, the sub-optimisation problem

• managerial problems come in an infinite variety of shapes and sizes

• managerial problems most often have no clear cut solution

• the factors which affect managerial problems are often unbounded

The use of data and information in organisations

Retrieving information

• Scanning (exploration)– general browsing of several data sources in order

to spot variations or find a specific piece of information

• Focused search (exploitation)– searching for data in order to answer a specific

question or solve a given problem

The rational use of information

• Information’s main role is reducing uncertainty – ignorance about future consequences of current

actions– ignorance of the knowledge possessed by others

and their probable actions

• The cost of information should never exceed the value of the benefits it delivers– this can /theoretically) be achieved by comparing

the situation with a specific piece of information as compared to the situation without that same information

The real (ab)use of information

• Information overload and the distribution of worthless information

• Information is used for ex-ante substantiation of decisions

• Information is power and power is rarely shared• Information seeking is a ritual where the

organisations decision making history decided which information is “right”

• Information can be imbued with multiple meanings depending on the intentions of the person citing the information

• Soft is best

Some of the problems of providing information

• What is the organisations “culture” of information use?

• What decision situations are going to be supported by the information and which factors are “relevant” to these decisions?

• How will the decision situations evolve and how will this affect the information requirements?

• How can the information be presented so as to be minimally unambiguous and maximally accessible ?