33
Data Warehousing, Data Mining & Data Visualisation Introduction

Data Warehousing, Data Mining & Data Visualisation

Embed Size (px)

DESCRIPTION

Data it's big, so, grab it, store it, analyse it, make it accessible...mine, warehouse and visualise...use the pictures in your mind and others will see it your way!

Citation preview

Page 1: Data Warehousing, Data Mining & Data Visualisation

Data Warehousing, Data Mining &

Data Visualisation

Introduction

Page 2: Data Warehousing, Data Mining & Data Visualisation

Data Warehousing

Page 3: Data Warehousing, Data Mining & Data Visualisation

What is a Data Warehouse?

• A data warehouse is a database used for reporting and analysis.

• The data stored in the warehouse is uploaded from the operational systems.

• The data may pass through an operational data store for additional operations before it is used in the data warehouse for reporting.

Page 4: Data Warehousing, Data Mining & Data Visualisation

A data-processing database? Wholesaling Data?

Page 5: Data Warehousing, Data Mining & Data Visualisation

Benefits of a Data WarehouseA data warehouse maintains a copy of information from the source

transaction systems. This architectural complexity provides the opportunity to:

• Maintain data history.• Integrate data from multiple source systems.• Improve data quality.• Present the organisation's information consistently.• Provide a single common data model for all data of interest regardless of

the data's source.• Restructure the data so that it makes sense to the business users.• Restructure the data so that it delivers excellent query performance, even

for complex analytic queries.• Add value to operational business applications.

Page 6: Data Warehousing, Data Mining & Data Visualisation

History of Data Warehousing• 1990 — Red Brick Systems, founded by Ralph Kimball,

introduces Red Brick Warehouse, a database management system specifically for data warehousing.

• 1991 — Prism Solutions, founded by Bill Inmon, introduces Prism Warehouse Manager, software for developing a data warehouse.

• 1992 — Bill Inmon publishes the book Building the Data Warehouse.

• 1995 — The Data Warehousing Institute, a not-for-profit organisation that promotes data warehousing, is founded.

• 1996 — Ralph Kimball publishes the book The Data Warehouse Toolkit.

• 2000 — Daniel Linstedt releases the Data Vault, enabling real time auditable data warehouses.

Page 7: Data Warehousing, Data Mining & Data Visualisation

Dimensional v NormalisedThere are two leading approaches to storing data in a data warehouse

— the dimensional approach and the normalised approach. • The dimensional approach, whose supporters are referred to as

“Kimballites”, believe in Ralph Kimball’s approach in which it is stated that the data warehouse should be modelled using a Dimensional Model (DM). For example, a sales transaction can be broken up into facts such as the number of products ordered and the price paid for the products, and into dimensions such as order date, customer name, product number, order ship-to and bill-to locations, and salesperson responsible for receiving the order.

• The normalised approach, also called the 3NF model, whose supporters are referred to as “Inmonites”, believe in Bill Inmon's approach in which it is stated that the data warehouse should be modelled using Peter Chen’s Entity-Relationship (ER) model with which, of course, we are all familiar!

Page 8: Data Warehousing, Data Mining & Data Visualisation

Kimball’s Bottom Up Design• In the bottom-up approach data marts are first

created to provide reporting and analytical capabilities for specific business processes.

• Data marts contain, primarily, dimensions and facts. • Facts can contain either atomic data and, if

necessary, summarised data. • The single data mart often models a specific business

area such as "Sales" or "Production." • These data marts can eventually be integrated to

create a comprehensive data warehouse.

Page 9: Data Warehousing, Data Mining & Data Visualisation

Inmon’s Top Down DesignInmon states that the data warehouse is:• Subject-oriented: The data in the data warehouse is

organised so that all the data elements relating to the same real-world event or object are linked together.

• Non-volatile: Data in the data warehouse are never over-written or deleted — once committed, the data are static, read-only, and retained for future reporting.

• Integrated: The data warehouse contains data from most or all of an organisation's operational systems and these data are made consistent.

Page 10: Data Warehousing, Data Mining & Data Visualisation

Hybrid Design• Data warehouse (DW) solutions often resemble hub

and spoke architecture. • Legacy systems feeding the DW solution often

include customer relationship management (CRM) and enterprise resource planning solutions (ERP), generating large amounts of data.

• To consolidate these various data models, and facilitate the extract transform load (ETL) process, DW solutions often make use of an operational data store (ODS).

Page 11: Data Warehousing, Data Mining & Data Visualisation

Data Warehouse Appliances

• IBM Netezza• Oracle ExaData• Kognitio 360• Teradata

Page 12: Data Warehousing, Data Mining & Data Visualisation

Demystifying the Data Warehouse

http://www.youtube.com/watch?v=mgEugd5kZgk&feature=related

Page 13: Data Warehousing, Data Mining & Data Visualisation

Data Mining

(KDD);

Page 14: Data Warehousing, Data Mining & Data Visualisation

What is Data Mining?

• Data mining is the analysis step of the Knowledge Discovery in Databases (KDD) process.

• It is a relatively young and interdisciplinary field of computer science.

• It is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems.

Page 15: Data Warehousing, Data Mining & Data Visualisation

The KDD ProcessThe knowledge discovery in databases (KDD)

process is commonly defined in 5 stages:(1) Selection (2) Preprocessing (3) Transformation (4) Data Mining (5) Interpretation/Evaluation

Page 16: Data Warehousing, Data Mining & Data Visualisation

The CRISP-DM ProcessThe CRoss Industry Standard Process for Data Mining

(CRISP-DM) defines six phases: (1)Business Understanding(2) Data Understanding(3) Data Preparation(4) Modelling(5) Evaluation(6) DeploymentThe simplified process is (1) Pre-processing, (2) Data

mining and (3) Results validation

Page 17: Data Warehousing, Data Mining & Data Visualisation

Spatial Data Mining• Spatial data mining is the application of data mining methods

to spatial data. • Spatial data mining follows along the same functions in data

mining, with the end objective to find patterns in geography. • So far, data mining and Geographic Information Systems (GIS)

have existed as two separate technologies, each with its own methods, traditions and approaches to visualization and data analysis.

• The immense explosion in geographically referenced data occasioned by developments in IT, digital mapping, remote sensing, and the global diffusion of GIS emphasises the importance of developing data driven inductive approaches to geographical analysis and modelling.

Page 18: Data Warehousing, Data Mining & Data Visualisation

Build a KPI Dashboard in 5 Minuteshttp://www.youtube.com/watch?v=D4S_uIIZyN0&feature=related

Build a KPI Dashboard in 5 minutes with no programming in Excel 2010

Page 19: Data Warehousing, Data Mining & Data Visualisation

Data Visualisation

Choose 6 of the Keywords in the above!

Page 20: Data Warehousing, Data Mining & Data Visualisation

Data Visualisation Defined Data visualisation is the study of the visual representation of data, meaning "information that has been abstracted in some schematic form, including attributes or variables for the units of information".Friendly 2008

Page 21: Data Warehousing, Data Mining & Data Visualisation

Tufte and Data Visualisation‘The success of visualisation is based on deep knowledge and care about the substance and the quality, relevance and integrity of the content.’Tufte 1983

Page 22: Data Warehousing, Data Mining & Data Visualisation

5 Principles of Graphic Display

1. Above all else, show the data.2. Maximise the data-ink ratio.3. Erase non-data-ink.4. Erase redundant data-ink.5. Revise and edit.

Page 23: Data Warehousing, Data Mining & Data Visualisation

The Beauty of Data Visualisation

http://www.youtube.com/watch?v=pLqjQ55tz-U

David McCandless

Page 24: Data Warehousing, Data Mining & Data Visualisation

Gapminder

A Data Mining & Data Visualisation Tool

Page 25: Data Warehousing, Data Mining & Data Visualisation

Hans Rosling

• The Gapminder application is the brain-child of Hans Rosling.

• He thought of the title when he heard the prompt ‘mind the gap’ on the London Underground.

• He is Professor of International Health at Karolinska Institute, Stockholm, Sweden.

• He is a Doctor of Medicine and a Doctor of Philosophy.

Page 26: Data Warehousing, Data Mining & Data Visualisation

Hans uses Gapminderhttp://www.ted.com/talks/

hans_rosling_shows_the_best_stats_you_ve_ever_seen.html

http://www.ted.com/talks/hans_rosling_reveals_new_insights_on_poverty.html

Page 27: Data Warehousing, Data Mining & Data Visualisation

Gapminder DesktopGapminder Desktop

allows you to show animated statistics from your own laptop. In short:

• Use Gapminder World without internet access.

• Save a list of your own favourite graphs.

• Updates automatically when new data is available

Page 28: Data Warehousing, Data Mining & Data Visualisation

Tableau Desktop

Page 29: Data Warehousing, Data Mining & Data Visualisation

Gephi

Page 30: Data Warehousing, Data Mining & Data Visualisation

VOSViewer

Page 31: Data Warehousing, Data Mining & Data Visualisation

Hjalmar Gislason"Falling in Love with Data"

http://www.youtube.com/watch?v=fOg0QHUI-lM&feature=plcp

Page 32: Data Warehousing, Data Mining & Data Visualisation

20 Top Tools for Data Visualisation

http://m.netmagazine.com/features/top-20-data-visualisation-tools

Page 33: Data Warehousing, Data Mining & Data Visualisation

And another angle…

http://deverell.computing.dundee.ac.uk/~cjmartin/dataVis.m4v