25
Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat Grazia Di Bella, Simone Ambroselli (Istat, Italy) Q2014 - European Conference on Quality in Official Statistics (Q2014) Vienna, 2 – 5 June 2014

Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

  • Upload
    trapper

  • View
    32

  • Download
    4

Embed Size (px)

DESCRIPTION

Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat Grazia Di Bella, Simone Ambroselli (Istat, Italy ) Q 2014 - European Conference on Quality in Official Statistics (Q2014) Vienna , 2 – 5 June 2014 . Prologue - PowerPoint PPT Presentation

Citation preview

Page 1: Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

Grazia Di Bella, Simone Ambroselli (Istat, Italy)

Q2014 - European Conference on Quality in Official Statistics (Q2014)

Vienna, 2 – 5 June 2014

Page 2: Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

Dealing with an increase of the use of the Administrative Data (AD) for

statistical purposes has become a common condition for the majority of

the NSIs in the last decade.

In Istat the administrative data sets acquired for statistical uses has

increased from 90 in the 2009 to 230 in the 2013 as many statistical

processes currently use them or are planning to review the production

processes in this direction.

Q2014, Vienna

Prologue

The use of administrative data increases

Page 3: Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

Central level coordination

A dedicated office named ADA (Administrative Data Acquisition and integration, under the Censuses and Statistical Registers Directorate) is responsible for the following tasks:

• acquiring AD• storing AD• integrating AD (Integrated System of Microdata - SIM)• evaluating AD quality• make AD and their metadata available to internal statistics

producers

Action

AD Management strategy for efficiency and quality [1]

Q2014, Vienna

Page 4: Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

AdvantagesIt allows to:

• better ensure compliance with the legislation on the confidentiality of the data

• optimize timeliness and efficiency in acquiring AD and in making them accessible to users within the institute

• unify common data treatments• provide a common description of the AD quality through a Quality

Report Card• facilitate the management of relationships with AD provider • activate those necessary feedback to improve AD quality, in

collaboration with AD producers.

AD Management strategy for efficiency and quality [2]

Q2014, Vienna

Page 5: Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

AD Management strategy for efficiency and quality [3]

Acquisition procedures

Integrated System of Microdata Repository

SIM

AD

quality evaluation

Statistical processes using AD

Dissemination to statistics users

ADA

func

tions

Page 6: Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

ADA centralized functions

A. Collection of AD requirements from statistics producers

B. Formulation of AD requests for each AD holder and for each AD source

C. AD acquisition

Acquisition procedures

1.5 Check data availability

3.1 Build collection instrument

4.3 Run collection

4.4 Finalize collection

1.1 Identify data needs (considering potential of AD)

GSBPM

2.3 Design collection

General Statistical Business Process Model

Q2014, Vienna

Page 7: Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

AD Management strategy for efficiency and quality [3]

Acquisition procedures

Integrated System of Microdata Repository

SIM

AD

quality evaluation

Statistical processes using AD

Dissemination to statistics users

ADA

func

tions

Page 8: Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

Def.: Repository of integrated administrative microdata to support the statistical production processes

Goals

• Make the AD accessible in a uniform way to users within the institute

• Avoid duplicate work

Titolo intervento, nome cognome relatore – Luogo, data

Integrated System of Microdata Repository - SIM

1

Page 9: Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

Titolo intervento, nome cognome relatore – Luogo, data

1

D. Formal Concept Analysis/ identification of objects and relations

E. Loading data into tables

F. AD Integration

G. Recoding

H. Dissemination to Istat statistics producers

Statistical processes using AD

Dissemination to statistics users

ADA centralized functions

5.2 Classify and code

5.1 Integrate data

GSBPM

SIM

General Statistical Business Process Model

Page 10: Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

The step of integration refers to the process of linkage among objects recorded in different sources: individuals, economic units, places (in progress). Each object entering the SIM is recognized/identified with a unique and stable (over time) ID number.

Depending on the linking variable(s) available, a suitable integration strategy and a set of algorithms are applied.

Data integration process feeds the development of the DBs for the integration of each subsystem.

The DBs for integration are warehouses of microdata useful to guarantee a unified view of the specific object under analysis showing information available in the different sources.

Titolo intervento, nome cognome relatore – Luogo, data

AD Integration

1

Page 11: Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

ETL process of AD

ADS 1 ADS 2ADS 2 ADS N

Identificationand Integration

Support in the development of the thematic DBs for statistical processes

Physical structures

Virtual structures

Statistical processes

Statistical Registers

StatisticalInformationSystems

Metadata system

Quality (QRCA) DBs for the

integration for each subsystem

SIM border

Page 12: Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

ADS 1

ADS 2

ADS 3

ADS I

ADS N

SUBSYSTEM OF INTEGRATION

SIM: the DBs for integration

Structure• the ID of the sources• the serial number internal to all the

sources in which the object is recognized• the ID of the object in the subsystem of

integration• the variables used for the integration for

all the sources in which the object is present

• the different kind of record linkage used to enter in the DB

• the time reference for linkage validity

The presence of the unique ID determines a spider web structure of relationships able to guarantee that every source is connected with the DB for integration and, at the same time, with all the others that are part of the same subsystem of integration.

Q2014, Vienna

Page 13: Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

SIM RELATIONSHIPS

AMONG INDIVIDUALS

SIM ECONOMIC

UNITSSIM UNITS

SIM PLACES

SIM RELATIONSHIPS

SIM RELATIONSHIPS

BEETWEEN INDIVIDUALS

AND ECONOMIC UNITS

INDIVIDUAL ID

ECONOMIC UNIT ID

INDIVIDUAL ID –FAMILY ID

ECONOMIC UNIT ID – LOCAL

UNITS ID

INDIVIDUAL ID – ECONOMIC

UNIT ID

INDIVIDUAL ID –

INDIVIDUAL PLACES ID

ECONOMIC UNIT ID –

LOCAL UNITS ID

SIM: the subsystems

Q2014, Vienna

SIM PLACES INDIVIDUALS

SIM INDIVIDUALS

SIM RELATIONSHIPS

AMONG ECONOMIC

UNITS

SIM PLACES ECONOMIC

UNITS

Page 14: Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

AD Management strategy for efficiency and quality [3]

Acquisition procedures

Integrated System of Microdata Repository

SIM

AD

quality evaluation

Statistical processes using AD

Dissemination to statistics users

ADA

func

tions

Page 15: Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

Survey data

Register data

Input

Data Treatment (transformation

function)

Administrative data

Q

Q

User statistics oriented

Producer statistics oriented

AD quality evaluation in the Statistical production process

Data Treatment (transformation

function)

Statistical Output

Quality Report Card for Administrative Data

“AD quality” is considered in relation to the AD reuse for statistical purposes, taking

into account that the AD are not primarily produced for statistical purposes.

Page 16: Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

Quality Report Card for Administrative data – QRCA objectives

• Assess the AD quality in terms of input of the statistical production process for its potential usability Usability analysis

• Monitoring AD for two main reasons: a) regulatory changes may induce discontinuity producing significant impacts on the statistics production; b) before AD enter into statistical production process an analysis must be carried out to verify the presence of unexpected lack of quality AD monitoring function

• Check AD compliance with respect to the requests and support the loading data process. Where appropriate define alert / warning to optimize the timing of the data acquisition and release Data supply monitoring function

For Istat potential users

For Istat current users

For the AD acquisition process

Q2014, Vienna

Page 17: Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

The AD quality framework considers a hierarchical and multidimensional

approach including issues directly connected with the AD quality and those

information for the AD management process aimed at improving the

statistical AD quality/usability

AD quality framework

The AD quality framework adopted is based on that originally defined by Statistics Netherlands [1] and then developed within the international BlueEts project, WP4 [2].

[1] Daas et al. (2009) Checklist for the Quality evaluation of AD Sources. Discussion paper 09042, Statistics Netherlands.

[2] Daas et al. (2011) Reports on methods preferred for the quality indicators of administrative data sources, Deliverable 4.2 of Workpackage 4 of the BLUE-ETS project. CBS, Netherlands, SSB, Norway, Istat, Italy, SCB, Sweden.

Q2014, Vienna

Page 18: Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

Titolo intervento, nome cognome relatore – Luogo, data

AD quality framework for AD supplied

1

HYPERDIMENSION DIMENSION

SOURCE F1. Information on ADSs

information needed to manage the AD acquisition process with the aim of evaluating and improving the quality of the acquired data

F2. AD source relevance

F3. Privacy e security in using AD

F4. Arrangements for the data delivery

F5. Relationships and feedback with administrative source holder

METADATA M1. Clarity and interpretability

Information for the quality assessment at the conceptual level and description of the acquisition procedures

M2. Comparability

M3. Data acquisition/treatment (by AD source holder)

DATA D1. Technical checks

Quality evaluation indicators for AD supplied

D2. Integrability

D3. Accuracy

D4. Completeness

D5. Time-related dimension

Page 19: Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

Adapting the Quality framework and the QRCA to the Istat ADA

process

Implementing the QRCA through interoperability among ADA

processes

With the purpose of complying the appropriate efficiency and

timeliness, a system that allows making the AD quality evaluation as

automated as possible is being planning. Following the OECD Core

principles for metadata management, the strategy aims to take

advantage of all the available metadata from the production process

using AD, that is to make metadata “active” to the greatest extent

possible for supporting the QRCA production.

QRCA and the ADA process

Q2014, Vienna

Page 20: Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

The implementation of the Source Hyperdimension quality indicators takes advantage of all the information used for the AD acquisition procedures. For the moment this information is managed in a not fully automatic way but Istat is proceeding in this direction storing and organizing it according with the AD quality framework in view of its reuse in the process of quality assessment. With respect to the Relevance quality Dimension, a specific Report for each main AD source is being finalized. It will provide information about all their statistical uses (derived automatically from the Acquisition procedures metadata) and about the compliance with the Istat requirements in terms of quality, timeliness, contents (derived from a very short questionnaire that could be submitted to AD source users).

Implementation of the Source Hyperdimension

Q2014, Vienna

Page 21: Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

The Clarity quality Dimension considers metadata that should be available from the data source holder. In this regard, a procedure is in place which should allow the acquisition of the metadata together with data.

But a strong collaboration with the AD holder has to be expected. In most cases definitions are deduced by free-form metadata available. To make the system more efficient these definitions could be shared among AD source users in the QRCA.

In addition, the phase D. “Formal Concept Analysis/ identification of objects and relations” of the ADS, and the consequent data loading in the relational database, can allow to automatically identify the set of objects /entities to be evaluated.

Titolo intervento, nome cognome relatore – Luogo, data

Implementation of the Metadata Hyperdimension [1]

1

Page 22: Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

About the Comparability it should be possible to define a bridge linking the statistical units with the corresponding administrative units and the output statistical variables with the administrative ones used in the production process. Following the strategy of the interoperability among systems, new Istat Unified Metadata System (SUM) [8] could support the QRCA production. In case of processes already using AD, starting from the reference metadata, describing content and quality of statistical data produced and disseminated by Istat through the I.Stat Dissemination System, the traceability task (one of the SUM objectives), should be pursued retaining metadata processes.

Implementation of the Metadata Hyperdimension [2]

Q2014, Vienna

Page 23: Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

ETL process metadata may support the Technical checks dimension (dataset readability, convertibility and compliance with AD requested). The AD integration process is documented in SIM then, for the Integrability quality dimension, it is possible to reuse metadata to compute indicators describing the quality of the Linking variable and, in general, the quality of the record linkage procedures. Using unique and stable (over time) ID number for objects in SIM, Objects Alignment and Comparability indicators may be implemented comparing AD from different sources in SIM or AD with the main Statistical Registers. With respect to the latter, coverage indicators of the Completeness quality dimension, where possible, may be implemented too. Linking the same object in a dataset over time, using the SIM ID number, may produce Dynamic of objects indicators in the Time-related dimension.

Implementation of the Data Hyperdimension

Q2014, Vienna

Page 24: Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

many indicators may be implemented automatically!

others are not derivable using interoperabilityit is the case of indicators of consistency checks (Accuracy dimension) for the implementation of which, an external intervention has to be considered for the check rules definition and for the AD source Relevance evaluation, concerning the compliance with the Istat requirements in terms of quality, timeliness, contents.the AD source users may provide this information insofar as they have an interest to share with others information on AD used. With this aim some “AD source users groups” are setting up in Istat for the most important data source holder.

Epilogue

Q2014, Vienna

Page 25: Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat

THANKS!