23
Standardising & industrialising “end to end” flows of statistical metadata within the statistical production process Initial practical steps at the ABS Helen Toole Jennifer Mitchell Alistair Hamilton

Standardising & industrialising “end to end” flows of statistical metadata within the statistical production process Initial practical steps at the ABS

Embed Size (px)

Citation preview

Standardising & industrialising “end to end” flows of statistical metadata within the

statistical production process

Initial practical steps at the ABSHelen Toole

Jennifer MitchellAlistair Hamilton

Structure of presentation

• Context– ABS IMTP (Information Management Transformation Program)

– Nascent international progress toward “industry” architecture for production of official statistics

• Metadata Registry/Repository (MRR) &Statistical Workflow Management (SWM)– Vision– Proof of Concept (PoC)

• Metadata “Census”• Learnings & next steps

3

IMTP Vision

An environment in which Australian Governments and the Community can easily

find, access, and combine statistical information which can then be used

confidently as an evidence base for policy, to target service delivery and to inform

decision making

IMTP• Drivers : Changed needs, changed expectations, changed opportunities

(eg data deluge) & threats (eg maintain relevance)

• Standardising & industrialising production is fundamental to delivering suites of outputs that are extensive, timely, flexible, sustainable and readily integrated for multifaceted analysis– A necessary enabler, although not sufficient on its own

• Required transformation is multifaceted– Business model, business process & practice, business applications,

organizational structure, wider national statistical system

• Producers face shared challenges, & opportunities, internationally– Strategic vision of the High-level group for strategic developments in business architecture in statistics– The case for an international statistical innovation program - Transforming national and international statistics systems

Business Process & Information• Enterprises require “process-centric” & “data-centric” perspectives on their

core business– This point is explored in more detail in GSIM presentation (Session VI)

• In our industry various classes of information are

– the core product (eg statistics), and – the core raw material (eg data)

• “Process” & “Information” are pillars for standardisation & industrialisation within IMTP– Appears fundamentally similar to thinking from Statistics Netherlands around steady

states and transformations, including information/metadata to describe & drive transformation

• Relevance of METIS CMF (Common Metadata Framework) Part C – Metadata and the Statistical Business Process

Strategic VisionFrom the High Level Group on Business Architecture in Statistics (HLG-BAS)

The road to industrialisation & standardisation

GSBPM

GSIM

Common

Reference

Model

Conc

eptu

alPr

actic

al

DDI

SDM

XSim

ulated Form

s

Builder

Simulated

Registry

Search

MRR

Harmonised

Methods Harm

onised

Tech

nology

SWM

SEMANTIC

REFERENCE

MODEL

Model (for Proof of Concept) of how vision might be

actualised locally

Unresolved discussion in ABS : Where would CORA + CORE constructs be positioned?

MRR + SWMfuture business context

Integrated “Statisticians’ Workbench” for Statistical Production (“Process Dashboard”)

Applications and services supporting statistical production

Statistical Workflow Management System (SWM)[Enables metadata driven processes & ensures efficient flows of metadata in production process]

Metadata* Registry/Repository (MRR)[Register & store all metadata used (input, output, guide, enabler) in statistical production process]

* More generally “Statistical Information” – including data and metadata

Diagram borrows heavily from Statistics Sweden’s presentation at MSIS 2011Tentative anatomy of a new generation of IT-architecture to support GSBPM-processes

Access / UserManagement

CorporateDirectory

BP InstanceRepository

Process Execution Engine

RulesEngine

RulesRepository

Statistical Workflow

MRR + SWM Conceptual Diagram

ResolutionServiceID Service

SchemaRepository

CentralisedData

Repository

CentralisedMetadataRepository

MetadataRegistry

ServicesRegistry

MetadataRepositories

DataRepositories

OtherServices

MRR

Challenges in reaching the future state

• Must support needs of 100+ statistical business processes spanning all statistical subject-matter domains– How can we ensure the information models supported, and services provided, by the MRR will meet the future

needs of each of these production processes?• The statistical business processes are necessarily heterogeneous in statistical frameworks, methodologies , required outputs

• Which existing needs and methods will need to be supported in future?– Many existing needs and methods will be harmonised during transformation

• It is not feasible to transform every single statistical business process and every single application from “As Is” to “To Be” at the same time– How to maintain consistency and integration during the period of transition where “legacy” processes and

applications (with “legacy” information requirements) need to be supported along side processes and applications transformed to a “standardised and industrialised” basis?

• Maintenance of business continuity (timely and quality assured delivery of agreed statistical outputs to the nation) cannot be risked during transition

• Require– extensive analysis (eg thorough understanding of “As Is” and “To Be”)– testing (eg Proofs of Concept)– etc (stakeholder communication and engagement, co-ordinated planning and project management,…)

GSBPM

GSIM

Common

Reference

Model

Conc

eptu

alPr

actic

al

DDI

SDM

XSim

ulated Form

s

Builder

Simulated

Registry

Search

MRR

Harmonised

Methods Harm

onised

Tech

nology

SWM

SEMANTIC

REFERENCE

MODEL

Common GenericIndustrialised Statistics

10/11 MRR Proof of Concept

Create common

frame

Select sample

Common Frame

Create survey frame

QEWS Frame

Forms design and approval

Forms

Load sample to

PIMSLabel files

Dispatch

SignificanceEditing

Time series analyses

NAB and FAS sign off

Time series to PPW

Collection and IFU

Data

Paradata(Collection

Information)

Clean Data

Time Series Databases

Published Data

Business Process Steps

Business Output and Input Artefacts

Derivation Processes

Sample

MRR Proof of Concept 2010/11Core case study was elements of statistical business process for Quarterly Business Indicators Survey (QBIS)

Simplified End-to-end Process For QBIS

Create common

frame

Select sample

Common Frame

Create survey frame

QEWS Frame

Forms design and approval

Forms

Load sample to

PIMSLabel files

Dispatch

SignificanceEditing

Time series analyses

NAB and FAS sign off

Time series to PPW

Collection and IFU

Data

Paradata(Collection

Information)

Clean Data

Time Series Databases

Published Data

Sample

Statistical Workflow Management

Metadata Registry and Repository

Ultimately want the metadata in the MRR to drive reuse in the above processes in conjunction with rules and processes stored in the SWM

Categories

Codes

Universe (population/ scope)

QBIS 2010 quarter 3

Question scheme (modules/parts)

Concepts

Variables

Study Unit (collection cycle)

Resource Packages

Categories

Codes

ANZSIC 06 (industry classification)

Questions

Standard Question Wording

Proof of Concept : Supported object types*

Interviewer instructions

SequencingData sets

Process metrics

Collection Instrument

Object type was supported in MRRObject type was simulated (not fully modelled)

* Relationships (eg between objects) are also an object type in their own right

Metadata Census• Initially conceived 2010.Q2 as project to understand all existing

Metadata Stores in ABS– Identify & analyse all Metadata Stores, – Classify types of metadata,– Identify what types of metadata are kept in which stores for which

applications

• Synthesise findings and provide empirical “bottom up” input to– MRR requirements and design– International “OCMIMF” collaboration project which is developing the

Generic Statistical Information Model (GSIM)• Maintain currency of information gathered

– reference when planning and managing transformation

What was Found (1)

• ABS has hundreds of systems/applications which:– Store data– Store metadata about data– Store metadata associated with data in other systems,– Run processes across data/metadata in systems– Duplicate data/metadata in other systems

• Production of comprehensive, integrated findings from the Metadata Census was not feasible within the given time and resource allocation

Example of primary “As Is” applications for one stream of production

What was Found (2)• Inconsistent use of terminology within ABS

• Inconsistent modelling (at conceptual, logical and physical levels) of some types of metadata

• eg classifications• eg concepts -> QDT “Properties”, CPCF Mat “Properties”, CPCF Mat “Concepts”, DER/QDT

“Concepts”, etc.

• Inconsistent and insufficient support for versioning of metadata

• Loss and redefinition of metadata throughout statistical process– One example

• A variable in the ABS Input Data Warehouse has meaning from previously defined metadata around concepts, questions and qualifiers.

• During processing, in particular moving data to ABS Output Data Warehouse, links to the earlier metadata isn’t carried through, and so is lost.

• As these links are lost, the metadata is being redefined repeatedly throughout the statistical process.

Second phase• Focus in depth on metadata for a specific statistical business process

– QBIS used as example

• Collate information in order to create an object model describing objects that would be registered in the MRR during the Proof of Concept– GSIM has not yet reached a level of detail and common agreement which would provide

a “top down” path for describing these objects• Current target for GSIM to reach required level of agreed detail is December 2012

– The model to be used in the meantime is termed the ABS Transitional Model• Anticipate ABS Transitional Model will be fundamentally compatible with GSIM in most regards• Anticipate adjusting ABS model for alignment with GSIM where appropriate

– Model for Proof of Concept was small in scope (the six primary object types)• ABS Transitional Model expected to grow to 20-40 object types by June 2012

– Design of ABS Transitional Model (and GSIM) recognises SDMX and DDI as valuable technical standards supporting implementation & interoperability• Seek to support “crosswalks” to information models underpinning SDMX and DDI where these are

relevant and fit for purpose

20

Example : Questions(Based on DDI Model)

21

Only a few objectsAlready many relationships!

Early work in progress on opportunities for rationalisation and reuse

Metadata Census : Conclusions

• Things can get very complicated very quickly• Comparing ‘To Be’ to what currently exists makes it

even more so.– Especially as what currently exists is inconsistent

• Practical analysis of statistical information (primarily metadata) flows throughout the statistical business process is invaluable input– GSBPM is a key reference point for processes– GSIM will be a key reference point for information

• Practical analyses in the meantime help build a better GSIM!