Big data, Analytics and 4th Generation Data Warehousing

Preview:

Citation preview

Big Data, Analytics and 4th Generation Data Warehousing

Martyn Jones

Big Data Spain 2015

agenda

∙ Imperatives.∙ Data value chains.∙ Resources.∙ 4th Generation Data

Warehousing.∙ Analytics Data Store / Big Data.

∙ Information Supply Framework.

Friday 16th 12:30 - 13:15 #BDS15

Room 25 – Technical

0 5 10 15 20 25 30 35 40 45#BDS15

Quote, Unquote

"It is not consciousness of men that determines their being, but, on the contrary, their social being that determines their consciousness.“

Karl Marx

business background

Media presence

Twitter @GoodStratTweet

http://www.goodstrat.com http://www.linkedin.com/grp/home?gid=8338976

http://www.itworld.com/blog/it-circus/

Quote, Unquote

“Do Big Data initiatives require a business case? If so, have you ever seen one?” –Joseph Cotter, UK

“Big data - reinventing the wheel every day with a new and slightly different value for Pi.” – Karl Snowsill, Australia

“The Big Data Contrarians - A place where you can find a way to cut through BIG Bull…” – Sanjay Pandey, Canada

“"If you had all the answers in the world...what would your question be?“ - Yves de Hondt – Belgium

“Big Data in bite size sessions - walk this way !!” – Steve Scholes, MBA, UK

“The only sane spot in the Big Data asylum.” – Dominic Vincent Ligot, Phillipines

“Enforcing strict limits on koolaid consumption” – Gary Anderson, USA

the ages of data

B . C . L i f e o f B r i a n A . D .

C h a n g eI n s i g h tP o t e n t i a l l y

u s e f u l

Simplicity

A b u n d a n t

V o l u m e V e l o c i t y V a r i e t y

framework

O b t a i n I n t e g r a t e A n a l y s e P r e s e n t

D A T A

D A T A

D A T A

the road to Big Data success…

S t r a t e g i c

T a c t i c a l

O p e r a t i o n a l A n a l y t i c s

A r c h i t e c t e d

M a n a g e d

I n t e g r a t i o n

D a t a

scope

BIZ DATA DWBIG

DATASTATS PRES

Business ImperativesA good place to start

what’s important to business?

BE

NOTICEDCASH

FLOWBE

NOTICEDCASH

FLOWBE

NOTICEDCASH

FLOW

what else is important to business?

Market share

Differentiation

Ability to execute

Liquidity

Profitability

Time and place utility

React to

competitive threats

Enhance service

scope

Improving customer

service

Respond to price

pressure

Segmentation of n

Addressing short-term

attention spans

Ability to respond to

irrationality

Be noticed

Cash flow

Risk

Legislation

No pressBad press

Customer

centricity

Front office

empowerment

Excellence

Channel

excellence

Operational excellence

Product

excellence

Cultures

IT business

value

Base protection

Expansion

Diversification

Consolidation

Augmented Competitive Forces

Competition from

within the industrySuppliers Buyers

Replacements

Potential entrants

Threat of replacement product or service

Threat of newentrants

Bargainingpower

Bargainingpower

Sources: Michael Porter;Martyn R Jonesand others

Rivalry with existing

competitors

Pressure groupsMedia

Government

Power to change the game

Exposure

McKinsey 7S Framework

Culture

differentiated capabilities

operating models

Customer segments

Channels

Products

Services

Organsational design

Processes

Data & information

Physical assets

Development

Deployment

Organsational design

Performance management

Information technology

Business

model

Operating

model

People

model

Customers

Systems People

Processes Organisation

objectives

1. Information awareness corresponding to areas of operation and spheres of control

2. Comprehensive data and information supply framework

3. Continually seek to maintain and then improve data’s contribution to business

Business data everywhereWhere, when, what, who, why... how?

Data

I n t e r n a l P a s t

E x t e r n a l P r e s e n t

S h a r e d F u t u r e

Data

O p e r a t i o n a l O n l i n e

B i g D a t a A r c h i v e d

D a r k D a t a U n m a n a g e d

Data

A r c h i v e s S o c i a l M e d i a

D o c u m e n t s M a c h i n e L o g

M e d i a S e n s o r

B u s i n e s s

A p p l i c a t i o n s

D a t a

S t o r a g e

P u b l i c W e b

Activities, Abstractions and Relations

Velocity

Volume

Variety

Adequacy

Ambiguity

Small

Availability

Accuracy

Relevance

Persistence

Reliability

Value

Obtuseness

Listo

Complexity

Utility

Descriptiveness

Big

Velocidad

Volumen

Variedad

Adecuación

Ambigüedad

Precisión

Disponibilidad

Exactitud

Relevancia

Persistencia

Confiabilidad

Valor

Obtuso

Smart

Complejidad

Utilidad

Descriptivo

Grande

D a t a

Facets of Big DataFacets of Data

B I G D A T A

I n t e r n e t o f

T h i n g s

C L O U D

S t a t i s t i c s

D a t a

W a r e h o u s i n g

P r e s e n t a t i o n

D a t a S u p p l y F r a m e w o r k

The Data Warehouse25 years... of sometimes getting it right

Enterprise Data Warehousing – AS IS

S u b j e c t

o r i e n t e d

S t r a t e g i c

d e c i s i o n m a k i n g

I n t e g r a t e d

T i m e

v a r I a n tN o n – v o l a t i l e

Operational Systems Data Warehouse

Purchasing

HR

CreditOrder

Processing

Marketing

SalesLogistics

Billing

Arrangements

ProductsParty

TimeGeography

Transactions

Subject oriented

Operational Systems Data Warehouse

Euro Account Customer:Customer: Village Bank GmbHCountry code: D

Mutual Fund Customer:Customer: Village BankersRegion: Westphalia

NTIP Customer:Customer: Village Bank InternationalCountry: Germany

Account:Number Customer Type230956 441353 Euro010555 441353 MF291284 441353 NTIP

Party:Number: 100441353Name: Village Bank GmbHCountry: Germany

Integrated

Operational Systems Data Warehouse

0

10

20

30

40

50

60

70

80

90

100

Trading Activity Snapshots:

Date Security Amount

2006.09.01 MartyBank 79.000.000

2006.09.02 MartyBank 92.000.000

2006.09.03 MartyBank 44.000.000

2006.09.04 MartyBank 39.000.000

2006.09.05 MartyBank 80.000.000

Trading Activity: MartyBank

Time variant

Operational Systems Data Warehouse

Order

Processing

Create

Replace

Update Delete

Orders

Read Read

Read ReadWrite

Read

Non-volatile

Strategic decision support

Supporting strategy formulation,

choice and execution

Data Warehousing 2.0

Data Sources

Str

uc

ture

d D

ata

ETL

Extr

ac

t

Tra

nsf

orm

Loa

d

Internal

ODS

ODS

EDW

ETL

Extr

ac

t

Tra

nsf

orm

Loa

d

Data Marts

Str

uc

ture

d D

ata

Un

stru

ctu

red

Data

Mart

Data

Mart

Report Repository

Reports &

Extracts

Stats

Da

ta s

ele

ctio

n a

nd

re

pre

sen

tatio

n

Da

ta a

na

lytic

s

Re

po

rt s

et

an

d e

xtr

ac

t c

rea

tio

n

Service

Pu

sh /

Pu

ll Te

ch

no

log

y

Vis

ua

lisa

tio

n

An

no

tatio

n

Users

Inte

rna

l

Clie

nts

Oth

er

sta

ke

ho

lde

rs

Metadata, Workflow/Process Control and CIW Management

Metadata ProcessÊDW

Management

Staging

Staged

Data

EDW

Un

stru

ctu

red

EDW

Data

Mart

Str

uc

ture

d D

ata

Un

stru

ctu

red

The Data Warehouse25 years... of sometimes getting it right… and wrong

Enterprise Data Warehousing – AS A BODGE

G e t d a t a

W o n d e r w h y i t ‘ s n o t

m e e t i n g e x p e c t a t I o n s

D u m p d a t a

Q u e r y d a t a V i s u a l i s e d a t a

Enterprise Data Warehousing – AS A BODGE

DW BODGER TEAM HADOOP TEAM

We built a data dog house using Oracle and IBM technology and we called it a data

warehouse

We can do data warehousing too and it will be cheaper, faster and smarter

Data Supply FrameworkA data architecture for data sourcing, transformation, integration, storage, search, analysis and presentation

Data Supply Framework

Operational

Data Store

Data

Warehouse

Business

Intelligence

Data

logistics

Operational

applications

Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es

Allinformation

and data consumers

All

information

consumers

All digital data

All data processing, enrichmentand information creation

Internal

digital data

Data Supply Framework

External

digital data

Data logistics

Operational

Data Store

Data

Warehouse

Analytics

Data Store

Data Marts

Statistical

Analysis

Business

Intelligence

Scenarios

Data logistics

Primary data flow

Secondary data flow

Operational

applications

Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es

EDW

ADS

DM

DM

DM

Statistical analysis

ETL

T/ETL

ET(A)L

Staging & Reduction

SignalAppliance

Message Adapter

MessageQueue

Infrastructure Data

Write back

Message Adapter

MessageQueue

OLTP

Staging

ODS

ETLT/ETL

Complex data

Event DataEvent

Appliance

Scenario 1

Scenario 2

Scenario 3

TL

Data Supply FrameworkData Sources 4th Generation Data Warehousing

Data Sources Core Statistics

Cambriano Energy 2015

Core Data SourcingComprehensive data acquisition and transformation

ADSStatistical analysis

ET(A)L

Staging & Reduction

SignalAppliance

Message Adapter

MessageQueue

Infrastructure Data

Write back

Complex data

Event DataEvent

Appliance

Scenario 1

Scenario 2

Scenario 3

DW 3.0 Information Supply Framework

Cambriano Energy 2015

Core Data Warehousing

Core Statistics

Data Sources

MessageAdapter

Core Data Sourcing

•Most business data is highly structured

•Most business Big Data is web related

•There is a growing collection of tools for capturing, transforming and moving both

•The closer to the money that your data is, the higher its potential value

Core Data Sourcing

•Most business data is highly structured

•Most business Big Data is web related

•There is a growing collection of tools for capturing, transforming and moving both

•The closer to the money that your data is, the higher its potential value

4th Generation Data WarehousingProviding a solid foundation for strategic, tactical and operational decision making

Enterprise Data Warehousing – 4 GEN

S u b j e c t

o r i e n t e d

S t r a t e g i c ,

t a c t i c a l & o p e r a t i o n a l

s u p p o r t

I n t e g r a t e d

T i m e v a r i a n c e &

t i m e p e r s p e c t i v e s

C o n s t r a i n e d

v o l a t i l i t y

C l a s s i f i c a t i o n

s c h e m a

R u l e b a s e d

t r a n s f o r m a t i o n

4th Generation EDW

Interpretation

Prediction

Diagnosis

Design

Planning

Monitoring

Debugging

Repairing

Instruction

Control

S t r a t e g y

T a c t i c s

O p e r a t i o n s

Using, applying and measuring

Big Data

Big Data

Big Data

Predictive Analytics

Predictive Analytics

Outcomes

EDW 4.0

EDW 4.0E(A)TL

Using, applying and measuring

Big DataPredictive analytics

Select predictions

Define trackable actions

Apply outcomes and actions to EDW

4

Accumulate campaign Big

Data

Descriptive analytics

Select findingsCombine with

trackable actions

Apply outcomes and actions to EDW

4

Run campaign

Analyse campaign and performance of Big Data analytics

Forecasts and results – from all perspectives

-400

-300

-200

-100

0

100

200

300

400

500

01/15 02/15 03/15 04/15 05/15 06/15 07/15 08/15 09/15 10/15 11/15 12/15 01/16 02/16 03/16 04/16 05/16 06/16

Cambriano Big Data Campaign 2015-2016

Forecast Actual Strategy BD Costs Benefit

Values Relativity Dimensions HierarchiesStructuresPast Future

Using, applying and measuring

•Combining Big Data analytics with Data Warehousing 4.0

•Planning and managing initiatives

•Measuring, analysing and reporting the effectiveness of business initiatives

•Measuring, analysing and reporting the tangible contribution of the Big Data analytics process to the creation of business value

Big Data and Core StatisticsA multi-faceted data theatre for ad-hoc, speculative and immediate operational analytics

Internal

digital data

Data Supply Framework

External

digital data

Data

logistics

Operational

Data Store

Data

Warehouse

Analytics

Data Store

Data Marts

Statistical

Analysis

Business

Intelligence

Scenarios

Data

logistics

Primary data flow

Secondary data flow

Operational

applications

Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es

DSF 4.0 Data Value Chains

Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es

DATA INFORMATION KNOWLEDGE

Requires context Requires interpretation Requires wisdom

Relevant Correct Usable

Irrelevant Incorrect Useless

Meaningless Misleading Wrong

Value? Value? Value?

DSF 4.0 Data Assets in MOSCOW

Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es

RISK

ASSET

SECURE

BAU

Assurance

Highest High Medium/LowVery

low/None

MUST SHOULD COULD WON’T

Yes Yes Maybe Maybe/No

Yes Yes Yes Maybe/No

Yes Yes Yes Maybe/No

DSF 4.0 Data Assets in MOSCOW

Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es

RISK

ASSET

SECURE

BAU

Assurance

Highest High Medium/LowVery

low/None

MUST SHOULD COULD WON’T

Yes Yes Maybe Maybe/No

Yes Yes Yes Maybe/No

Yes Yes Yes Maybe/No

DSF 4.0 Data Supply Framework

External

digital data

Data

logistics

Operational

Data Store

Data

Warehouse

Analytics

Data Store

Data Marts

Statistical

Analysis

Business

Intelligence

Scenarios

Data

logistics

Primary data flow

Secondary data flow

Operational

applications

Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es

OLTP

Applications

‘What if ’

analysis

MIS /

Reporting

Visualisation

Publication

ºAll digital

data

Internal

digital data

DSF 4.0 Data Supply Framework

External

digital data

Data

logistics

Operational

Data Store

Data

Warehouse

Analytics

Data Store

Data Marts

Statistical

Analysis

Business

Intelligence

Scenarios

Data

logistics

Primary data flow

Secondary data flow

Operational

applications

Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es

All

information

consumersº

All digital

data

Internal

digital data

External

digital data

Primary data flow

Secondary data flow

Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es

º

Statistics

Data

Science

Big Data

Small Data

Smart Data

This Data

That Data

That

department

Messing

with dataMap Fatten

Retrospect

Reports

Alerts

Visualisation

Analytics

This

department

The other

department

Map Reduce

DSF 4.0 Data Supply Framework

DSF 4.0 Data Supply Framework

Operational

Data Store

Data

Warehouse

Business

Intelligence

Data

logistics

Operational

applications

Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es

Allinformation

and data consumers

All

information

consumers

All digital data

All data processing, enrichmentand information creation

EDW

ADS

DM

DM

DM

Statistical analysis

ETL

T/ETL

ET(A)L

Staging & Reduction

SignalAppliance

Message Adapter

MessageQueue

Infrastructure Data

Write back

Message Adapter

MessageQueue

OLTP

Staging

ODS

ETLT/ETL

Complex data

Event DataEvent

Appliance

Scenario 1

Scenario 2

Scenario 3

TL

DSF 4.0 Data Supply Framework

Core Data Warehousing

Core Statistics

Data Sources

Message Adapter

MessageAdapter

Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es

EDW

ADS

DM

DM

DM

Statistical analysis

ETL

T/ETL

ET(A)L

Staging & Reduction

SignalAppliance

Message Adapter

MessageQueue

Infrastructure Data

Write back

Message Adapter

MessageQueue

OLTP

Staging

ODS

ETLT/ETL

Complex data

Event DataEvent

Appliance

Scenario 1

Scenario 2

Scenario 3

TL

DSF 4.0 Data Supply Framework

Core Data Warehousing

Core Statistics

Data Sources

Message Adapter

MessageAdapter

Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es

EDW

ADS

DM

DM

DM

Statistical analysis

ETL

T/ETL

ET(A)L

Staging & Reduction

SignalAppliance

Message Adapter

MessageQueue

Infrastructure Data

Write back

Message Adapter

MessageQueue

OLTP

Staging

ODS

ETLT/ETL

Complex data

Event DataEvent

Appliance

Scenario 1

Scenario 2

Scenario 3

TL

DSF 4.0 Data Supply Framework

Core Data Warehousing

Core Statistics

Data Sources

Message Adapter

MessageAdapter

Data Sources – This element covers all the current sources, varieties andvolumes of data available which may be used to support processes of'challenge identification', 'option definition', decision making, includingstatistical analysis and scenario generation.

Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu

EDW

ADS

DM

DM

DM

Statistical analysis

ETL

T/ETL

ET(A)L

Staging & Reduction

SignalAppliance

Message Adapter

MessageQueue

Infrastructure Data

Write back

Message Adapter

MessageQueue

OLTP

Staging

ODS

ETLT/ETL

Complex data

Event DataEvent

Appliance

Scenario 1

Scenario 2

Scenario 3

TL

DSF 4.0 Data Supply Framework

Core Data Warehousing

Core Statistics

Data Sources

Message Adapter

MessageAdapter

Core Data Warehousing – This is a suggested evolution path of the DW 2.0model. It extends the Inmon paradigm to not only include unstructured andcomplex data but also the information and outcomes derived from statisticalanalysis performed outside of the 4th generation Data Warehousinglandscape.

Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu

EDW

ADS

DM

DM

DM

Statistical analysis

ETL

T/ETL

ET(A)L

Staging & Reduction

SignalAppliance

Message Adapter

MessageQueue

Infrastructure Data

Write back

Message Adapter

MessageQueue

OLTP

Staging

ODS

ETLT/ETL

Complex data

Event DataEvent

Appliance

Scenario 1

Scenario 2

Scenario 3

TL

DSF 4.0 Data Supply Framework

Core Data Warehousing

Core Statistics

Data Sources

Message Adapter

MessageAdapter

Core Statistics – This element covers the core body of statistical competence,especially but not only with regards to evolving data volumes, data velocityand speed, data quality and data variety.

Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu

ADSStatistical analysis

ET(A)L

Staging & Reduction

SignalAppliance

Message Adapter

MessageQueue

Infrastructure Data

Write back

Complex data

Event DataEvent

Appliance

Scenario 1

Scenario 2

Scenario 3

DSF 4.0 Data Supply Framework

Core Data Warehousing

Core Statistics

Data Sources

MessageAdapter

Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu

INTO THE ZONE!

ADSStatistical analysis

ET(A)L

Staging & Reduction

SignalAppliance

Message Adapter

MessageQueue

Infrastructure Data

Write back

Complex data

Event DataEvent

Appliance

Scenario 1

Scenario 2

Scenario 3

DSF 4.0 Data Supply Framework

Core Data Warehousing

Core Statistics

Data Sources

MessageAdapter

Complex Data – This is unstructured or highly complexly structured data contained in documents and other complex data artefacts, such as multimedia documents.

Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu

ADSStatistical analysis

ET(A)L

Staging & Reduction

SignalAppliance

Message Adapter

MessageQueue

Infrastructure Data

Write back

Complex data

Event DataEvent

Appliance

Scenario 1

Scenario 2

Scenario 3

DSF 4.0 Data Supply Framework

Core Data Warehousing

Core Statistics

Data Sources

MessageAdapter

Event Data – This is an aspect of Enterprise Process Data, and typically at a fine-grained level of abstraction. Here are the business process logs, the internet web activity logs and other similar sources of event data. The volumes generated by these sources will tend to be higher than other volumes of data, and are those that are currently associated with the Big Data term, covering as it does that masses of information generated by tracking even the most minor piece of 'behavioural data' from, for example, someone casually surfing a web site.

Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu

ADSStatistical analysis

ET(A)L

Staging & Reduction

SignalAppliance

Message Adapter

MessageQueue

Infrastructure Data

Write back

Complex data

Event DataEvent

Appliance

Scenario 1

Scenario 2

Scenario 3

DSF 4.0 Data Supply Framework

Core Data Warehousing

Core Statistics

Data Sources

MessageAdapter

Infrastructure Data – This aspect includes data which could well be described as signal data. Continuous high velocity streams of potentially highly volatile data that might be processed through complex event correlation and analysis components.

Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu

ADSStatistical analysis

ET(A)L

Staging & Reduction

SignalAppliance

Message Adapter

MessageQueue

Infrastructure Data

Write back

Complex data

Event DataEvent

Appliance

Scenario 1

Scenario 2

Scenario 3

DSF 4.0 Data Supply Framework

Core Data Warehousing

Core Statistics

Data Sources

MessageAdapter

Event Applicance – This puts the dynamic data collation, selection and reduction functionality as close to the point of event data generation as physically possible.

Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu

ADSStatistical analysis

ET(A)L

Staging & Reduction

SignalAppliance

Message Adapter

MessageQueue

Infrastructure Data

Write back

Complex data

Event DataEvent

Appliance

Scenario 1

Scenario 2

Scenario 3

DSF 4.0 Data Supply Framework

Core Data Warehousing

Core Statistics

Data Sources

MessageAdapter

Signal Applicance – This puts the dynamic data collation, selection and reduction functionality as close to the point of continuous streaming data generation as physically possible.

Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu

ADSStatistical analysis

ET(A)L

Staging & Reduction

SignalAppliance

Message Adapter

MessageQueue

Infrastructure Data

Write back

Complex data

Event DataEvent

Appliance

Scenario 1

Scenario 2

Scenario 3

DSF 4.0 Data Supply Framework

Core Data Warehousing

Core Statistics

Data Sources

MessageAdapter

Distributed Inter Process Communication – Different forms of messaging allow high volumes of data to be transmitted in near real time.

Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu

ADSStatistical analysis

ET(A)L

Staging & Reduction

SignalAppliance

Message Adapter

MessageQueue

Infrastructure Data

Write back

Complex data

Event DataEvent

Appliance

Scenario 1

Scenario 2

Scenario 3

DSF 4.0 Data Supply Framework

Core Data Warehousing

Core Statistics

Data Sources

MessageAdapter

Staging and Reduction – Traditional data staging combined with in-line data reduction.

Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu

ADSStatistical analysis

ET(A)L

Staging & Reduction

SignalAppliance

Message Adapter

MessageQueue

Infrastructure Data

Write back

Complex data

Event DataEvent

Appliance

Scenario 1

Scenario 2

Scenario 3

DSF 4.0 Data Supply Framework

Core Data Warehousing

Core Statistics

Data Sources

MessageAdapter

ET(A)L – Extending ETL to include data analytics components tightly integrated into parallel ETL job streams.

Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu

ADSStatistical analysis

ET(A)L

Staging & Reduction

SignalAppliance

Message Adapter

MessageQueue

Infrastructure Data

Write back

Complex data

Event DataEvent

Appliance

Scenario 1

Scenario 2

Scenario 3

DSF 4.0 Data Supply Framework

Core Data Warehousing

Core Statistics

Data Sources

MessageAdapter

ADS – The Analytics Data Store. 1. Statistics oriented 2. Integrated by focus area 3. Variable volatility 4. Time variant

Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu

ADSStatistical analysis

ET(A)L

Staging & Reduction

SignalAppliance

Message Adapter

MessageQueue

Infrastructure Data

Write back

Complex data

Event DataEvent

Appliance

Scenario 1

Scenario 2

Scenario 3

DSF 4.0 Data Supply Framework

Core Data Warehousing

Core Statistics

Data Sources

MessageAdapter

Statistical Analysis – Qualitative analysis. Diagnostic analysis, predictive analysis, speculative analysis, data mining, data exploration, modelling.

Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu

ADSStatistical analysis

ET(A)L

Staging & Reduction

SignalAppliance

Message Adapter

MessageQueue

Infrastructure Data

Write back

Complex data

Event DataEvent

Appliance

Scenario 1

Scenario 2

Scenario 3

DSF 4.0 Data Supply Framework

Core Data Warehousing

Core Statistics

Data Sources

MessageAdapter

Scenarios and outcomes – 1. Snapshots of outcomes of scenario analysis as the process of analyzing possible future events by generating alternative possible outcomes. 2. Captured outcomes of statistical analysis.

Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.eu

ADSStatistical analysis

ET(A)L

Staging & Reduction

SignalAppliance

Message Adapter

MessageQueue

Infrastructure Data

Write back

Complex data

Event DataEvent

Appliance

Scenario 1

Scenario 2

Scenario 3

DSF 4.0 Data Supply Framework

Martyn Richard Jones 2015 – martynjones.eu

Core Data Warehousing

Core Statistics

Data Sources

MessageAdapter

Write back – The ability to append data, update data and enrich data within the Analytics Data Store, and to provide scenario data to the Core Data Warehousing.

Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com

DSF 4.0 – Core Statistics: Analytics Data Store

Martyn Richard Jones 2015 – martynjones.eu

ADSStatistical analysis

ET(A)L

Staging & Reduction

SignalAppliance

Message Adapter

MessageQueue

Infrastructure Data

Write back

Complex data

Event DataEvent

Appliance

Scenario 1

Scenario 2

Scenario 3

Core Data Warehousing

Core Statistics

Data Sources

MessageAdapter

Cambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com

DSF 4.0 – Analytics Data Store

Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com

Distributed File SystemNon-relational distributed file storage / NoSQL

DFS (Including ‘refractoring’ of Unix primitives)

Unix File StorePOSIX compliant

Document DBMS

Graph DBMSKey-Value

DBMSIn-memory Column Oriented Relational

DBMS

Relational DBMS (MPP/SMP/Hybrid)

Object DBMS

POSIX compliant Unix / Linux primitives

Relational DBMS

DSF 4.0 – Analytics Data Store - Technologies

Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es Published by goodstrat.com

DSF 4.0 – What’s important?

Cambriano Energy 2015 - http://www.cambriano.es

Data Warehouse

Martyn Richard Jones 2015 – martynjones.euPublished by goodstrat.com

Business Intelligence

Operational Data Store

Analytics Data Store

Statistical Analysis

Dark Data

Big Data

Internet of Things

Knowledge Management

Structured Intellectual

Capital

Cloud

SummaryA good place to end, for now

DSF 4.0 Data Supply Framework

Operational

Data Store

Data

Warehouse

Business

Intelligence

Data

logistics

Operational

applications

Published by goodstrat.com Martyn Richard Jones 2015 – martynjones.euCambriano Energy 2015 - http://www.cambriano.es

Allinformation

and data consumers

All

information

consumers

All digital data

All data processing, enrichmentand information creation

DSF 4.0 Perspectives

Look back From now

From then

From before

From the future

Look at now

Look at near +/-

Look foward From now

From before

From the future

Multiple worlds and universes

DSF 4.0 Perspectives

What we got right

What we can do better

What we can retry at another time

What we can drop

DSF 4.0 Perspectives – Look Back

2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020

From nowFrom the

futureFrom then

Dimensions

Classification

From

before

Data

Summary

• Never open up too many data fronts at the same time

• Iterate and take baby steps

• Use agile where it makes sense

• Keep everything as close to the business as possible

• Involve the business – continuously

Summary

• Consider everything

• Question everything

• Never stop hypothesising

• Never stop testing

• For every initiative have a business imperative

• Make continuous engagement and involvement a goal

Muchas graciasMany thanks

Big Data Spain 2015

Big Data, Analytics and 4th Generation Data Warehousing

Big Data Spain 2015