29
ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical Researcher – Statistics Netherlands [email protected]

ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

Embed Size (px)

Citation preview

Page 1: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION

RESULTS OF STOCKTAKING,CONCLUSIONS OF FIRST YEAR

*Pieter Vlag

Senior Statistical Researcher – Statistics [email protected]

Page 2: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

ESSnet DWH: Main conclusions first year 2

Contents

• Answers on questionnaire• Results of visit to Statistics Finland• Results of visit to CSO-Ireland• Conclusions of the ESSnet DWH - group• Implications for work in 2012/2013

Page 3: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

ESSnet DWH: Main conclusions first year 3

Questionnaire

• Send to all National Statistical Institutes of the ESS and Switserland

• 24 NSIs responded• Respons is representative (no specific group of

countries missing)• In interpretation, distinction between questions on

opportunities/barriers

implementation

definition DataWareHouse

Page 4: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

4

Answers on questionnaire (opportunities/barries)

• Do you think that the results of this ESSnet are useful for your work ?

ESSnet DWH: Main conclusions first year

Page 5: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

5

Answers on questionnaire (opportunities/barriers)

• What do/did you see as the main motivation to start DWH in your business statistics systems ?

ESSnet DWH: Main conclusions first year

> 1 answer per NSI

Page 6: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

6

Answers on questionnaire (opportunities/barriers)

• What do you see as the main general methodological barriers to implementing an integrated system ?

ESSnet DWH: Main conclusions first year

> 1 answer per NSI

Page 7: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

7

Answers on questionnaire (opportunities/barriers)

• What do you see as the main technical methodological barriers to implementing an integrated system ?

ESSnet DWH: Main conclusions first year

> 1 answer per NSI

Page 8: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

8

Answers on questionnaire (opportunities/barriers)

• What do you see as the main IT barriers to implementing an integrated system ?

ESSnet DWH: Main conclusions first year

> 1 answer per NSI

Page 9: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

9

Answers on questionnaire (implementation)No NSI answers ‘YES’ on all these four questions- Do you have a single coherent system which covers most of your data in the production of business statistics ? - Is your metadata currently integrated into your data systems ?- Is your data input for current needs integrated into your data systems ?- Are your current output requirements integrated into your data systems ?

CONCLUSION: No NSI has a finished DWH system

ESSnet DWH: Main conclusions first year

Page 10: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

10

Answers on questionnaire (implementation)

On the other hand, the answers suggest that all responding NSIs are at the stage of •either considering to develop an integrated datawarehouse system •or developing a datawarehouse system •or implementing parts of a (prototype) datawarehouse system

ESSnet DWH: Main conclusions first year

Page 11: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

11

Answers on questionnaire (1st conclusions)NSIs

- recognise the opportunities of DWH-systems

- consider the high investments, or investment related issues, as most important barrier.

- are considering or developing DWH-systems.

- mention similar methodological and IT-issues

- expect “sharing knowledge and experiences” as outcome from this ESSnet.

Hence, Business Case for this ESSnet

ESSnet DWH: Main conclusions first year

Page 12: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

12

Answers on questionnaire (definition of a DWH)

In questionnaire two extremes presented

- Data model

- Process model

ESSnet DWH: Main conclusions first year

Page 13: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

13

Questionnaire (‘process model’ DWH)

ESSnet DWH: Main conclusions first year

In the “process” model perspective, the DWH is primarily a set of databases to store the data between the statistical data-processing steps. Statistical processing (weighting, consistency) is done outside. The DWH system is not primarily designed to produce flexible output, but

more intended to harmonise the statistical processes.

Production processes

Input 1

Input 2

Data warehouse

The ‘process model’ perspective

Output 1

Output 2

Output 3

Known production processes, exploiting synergies or experience

processes

Knowninputs

Knownoutputs

Metadataattached to production process

Page 14: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

14

Questionnaire (‘data model’ DWH)

ESSnet DWH: Main conclusions first year

Surveys

Admin data

Register data C

lea

ned

cohe

rent

da

ta s

ourc

es

Data warehouse

The ‘data model’ perspective

Store &process

Coherencywork

Registers

Com

mon

ex

trac

tion

proc

ess

Aggregate statistics

Microdata

Time series

No

n-st

and

ard

de

finiti

ons

in

mu

ltip

le fo

rma

ts,

gen

era

ted

by h

eter

ogen

eou

s pr

oce

sses

Sta

nda

rd f

orm

at,

sta

nda

rd p

roce

ss

Sta

nda

rd f

orm

at,

stan

dard

pr

oces

s, s

tand

ard

var

iabl

es

Ta

ilore

d p

rodu

ctio

n of

m

icro

and

agg

rega

te d

ata

Heterogeneous(unknown?) inputs

Heterogeneous(unknown?) outputs

Metadataattached todata items

In the “data model” model perspective, the DWH is primarily a unit for storing, processing and linking all available data, irrespective of where they have come from or where they are going to. Data acquisition is driven by availability of sources; output production is driven by availability of data in the store. Business registers and metadata have are even more important in these model than in regular statistical processes,

because they are essential for storing, processing, linking and flexible outputdata.

Page 15: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

15

Answers on questionnaire (definition of a DWH)

- How would you describe your single conceptual approach ?

ESSnet DWH: Main conclusions first year

Page 16: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

16

Answers on questionnaire (definition of a DWH)But,

answers on this question in conflict with

- follow-up inquiries

- follow-up visits

HENCE,

- presented models were multi-interpretative

- a straighter definition of a statistical DWH system was needed.

ESSnet DWH: Main conclusions first year

Page 17: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

17

Main conclusion from visit to Statistics Finland (figure)

ESSnet DWH: Main conclusions first year

Input I

Input II

Processing base

ActualDWH

Output I

Output IIEct.

Ect.integrated stat. data

Page 18: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

18

Main conclusion from visit to Statistics Finland (in words)

The Statistical DataWareHouse consists of two parts:•A processing (data)base in which all used input data are processed and integrated. •A publication (data)base, used for (micro)analyses and calculation of the aggregates (for publication).

* Data are transferred to the publication base after they have been approved in the processing database.

In contrast to the DataWareHouse concept at commercial enterprises, the processing part is much more emphasized at NSIs

ESSnet DWH: Main conclusions first year

Page 19: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

19

Main conclusion from visit to CSO-Ireland (figure)

ESSnet DWH: Main conclusions first year

Input I

Input II

Proc. base

ActualDWH

Output I

Output IIEct.

Ect.integrated stat. data

Proc. base

Proc. base

Architecture for data processing(depending on data ?)

Page 20: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

20

Main conclusion from visit to CSO-Ireland (in words)

CSO has two integrated processing systems:

- An older one, in which data are stored after each processing step. This system is used for survey data.

- A newer one (to be implemented), in which admin data are stored one time after performing all processing steps.

A reason for reducing the number of data storages might be related to a less extensive data cleaning for admin data. Hence, nature of the data (survey or admin data) might be a factor when defining a business architecture for the integrated processing system.

ESSnet DWH: Main conclusions first year

Page 21: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

21

Main conclusion of the ESSnet DWH group (in figure)

ESSnet DWH: Main conclusions first year

Input I

Input II

Imp.+

aggr ActualDWH

Output I

Output IIEct.

Ect.Integrated

Out of scope Out of scope

Stat BR(pop.

frame)

cleaning

Processing issues

DWH

Confidentiaiity issues

Integrated systems

Page 22: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

22

General conclusion of the ESSnet DWH group (in words)

A Statistical DataWareHouse consists of two parts:

Part I• A processing phase in which statistical input data are

- at a 1st stage linked to the Business Register

- at 2nd stage cleaned (between data source)

- at a 3rd stage made consistent between the sources by imputing missing data and correcting for inconsistencies between the sources

before being transferred to the actual DataWare House.

.

ESSnet DWH: Main conclusions first year

Page 23: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

23

General conclusion of the ESSnet DWH group (in words)

A Statistical DataWareHouse consists of two parts:

Part II•An actual DataWareHouse from which flexible aggregated and microdata, meant for output, can be generated. These generated aggregated and microdata themselves do not belong to the Statistical DataWareHouse System. The data in this DataWareHouse are completely integrated, interpretation of (the quality of) these data should theoretically be independent of the input source

Part II is more recognisable for commercial enterprises

.

ESSnet DWH: Main conclusions first year

Page 24: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

Main conclusion of the ESSnet DWH group (static SBR or SBR integral part of DWH)

ESSnet DWH: Main conclusions first year

Input I

Input II

Imp.+

aggrActualDWH

Output I

Output IIEct.

Ect.

Integrated data

SBR

cleaning

SBR preferably integral part:Feedback from oth. SourcesBut with moderation

feedback

Page 25: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

25

Main conclusion of the ESSnet DWH group (metadata)

ESSnet DWH: Main conclusions first year

Input I

Input II

Imp.+

aggrActualDWH

Output I

Output II

Ect.

Ect.Integrated data

SBR

cleaning

Confidentaility issues

Input

Descr.

Process (step) descr. (output)var. descr.

Page 26: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

Revenue Agency Chambers

Commerce

Survey NSurvey

1

SBR Customs Agency

Employees Data

Staging Data

SBR

Domains Estimation Univers/Cenus

Primary Micro Data

Staging Data

Data Mart Data Mart

Alimentation: -Extraction -Transformation -Loading

Sources Layer

Integration Layer

Data Access Layer

Interpretation and data analysis layer

Met

a D

ata

Institutional Output

Dashboards

Analysis ReportingData

Mining

Independent process

Inte

grat

ed s

yste

ms

Act

ual D

WH

Relationship with DWH-Architectural models (e.g.Kimball)

Page 27: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

27

Implication for work 2012/2013Metadata- Fitting statistical DWH in current metadatamodels.- Keep it manageable !

Methodology

- Fitting current (ESSnet) methodology into stat.DWH

1. data-linking & feedback to BR.

2. (selective ?) editing + (repeated) weighting

3. data confidentiality

IT and Architecture- Fitting ‘methodology’ into ‘adapted GSBPM-model- Relating ‘adapted’ GSBPM to Stat. DWH Architecture

.

ESSnet DWH: Main conclusions first year

Page 28: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

ESSnet DWH: Main conclusions first year 28

Summary

• Business Case for ESSnet DWH present• Questionnaire: Sorry for confusing DWH-model extremes.• Visits to Finland and Ireland useful for feedback/ideas ect. • Statistical DWH model developed, consisting of

- part 1: integrated systems

- part 2: actual DataWareHouse• Statistical DWH <> ‘commercial DWs, as more emphasizes on part 1• Actions defined for 2012/2013 on

metadata

methodology

IT and Architecture

Page 29: ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical

Statistical System in the Netherlands 29

Thank you for your attention!

Questions?