Upload
trapper
View
32
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat Grazia Di Bella, Simone Ambroselli (Istat, Italy ) Q 2014 - European Conference on Quality in Official Statistics (Q2014) Vienna , 2 – 5 June 2014 . Prologue - PowerPoint PPT Presentation
Citation preview
Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat
Grazia Di Bella, Simone Ambroselli (Istat, Italy)
Q2014 - European Conference on Quality in Official Statistics (Q2014)
Vienna, 2 – 5 June 2014
Dealing with an increase of the use of the Administrative Data (AD) for
statistical purposes has become a common condition for the majority of
the NSIs in the last decade.
In Istat the administrative data sets acquired for statistical uses has
increased from 90 in the 2009 to 230 in the 2013 as many statistical
processes currently use them or are planning to review the production
processes in this direction.
Q2014, Vienna
Prologue
The use of administrative data increases
Central level coordination
A dedicated office named ADA (Administrative Data Acquisition and integration, under the Censuses and Statistical Registers Directorate) is responsible for the following tasks:
• acquiring AD• storing AD• integrating AD (Integrated System of Microdata - SIM)• evaluating AD quality• make AD and their metadata available to internal statistics
producers
Action
AD Management strategy for efficiency and quality [1]
Q2014, Vienna
AdvantagesIt allows to:
• better ensure compliance with the legislation on the confidentiality of the data
• optimize timeliness and efficiency in acquiring AD and in making them accessible to users within the institute
• unify common data treatments• provide a common description of the AD quality through a Quality
Report Card• facilitate the management of relationships with AD provider • activate those necessary feedback to improve AD quality, in
collaboration with AD producers.
AD Management strategy for efficiency and quality [2]
Q2014, Vienna
AD Management strategy for efficiency and quality [3]
Acquisition procedures
Integrated System of Microdata Repository
SIM
AD
quality evaluation
Statistical processes using AD
Dissemination to statistics users
ADA
func
tions
ADA centralized functions
A. Collection of AD requirements from statistics producers
B. Formulation of AD requests for each AD holder and for each AD source
C. AD acquisition
Acquisition procedures
1.5 Check data availability
3.1 Build collection instrument
4.3 Run collection
4.4 Finalize collection
1.1 Identify data needs (considering potential of AD)
GSBPM
2.3 Design collection
General Statistical Business Process Model
Q2014, Vienna
AD Management strategy for efficiency and quality [3]
Acquisition procedures
Integrated System of Microdata Repository
SIM
AD
quality evaluation
Statistical processes using AD
Dissemination to statistics users
ADA
func
tions
Def.: Repository of integrated administrative microdata to support the statistical production processes
Goals
• Make the AD accessible in a uniform way to users within the institute
• Avoid duplicate work
Titolo intervento, nome cognome relatore – Luogo, data
Integrated System of Microdata Repository - SIM
1
Titolo intervento, nome cognome relatore – Luogo, data
1
D. Formal Concept Analysis/ identification of objects and relations
E. Loading data into tables
F. AD Integration
G. Recoding
H. Dissemination to Istat statistics producers
Statistical processes using AD
Dissemination to statistics users
ADA centralized functions
5.2 Classify and code
5.1 Integrate data
GSBPM
SIM
General Statistical Business Process Model
The step of integration refers to the process of linkage among objects recorded in different sources: individuals, economic units, places (in progress). Each object entering the SIM is recognized/identified with a unique and stable (over time) ID number.
Depending on the linking variable(s) available, a suitable integration strategy and a set of algorithms are applied.
Data integration process feeds the development of the DBs for the integration of each subsystem.
The DBs for integration are warehouses of microdata useful to guarantee a unified view of the specific object under analysis showing information available in the different sources.
Titolo intervento, nome cognome relatore – Luogo, data
AD Integration
1
ETL process of AD
ADS 1 ADS 2ADS 2 ADS N
Identificationand Integration
Support in the development of the thematic DBs for statistical processes
Physical structures
Virtual structures
Statistical processes
Statistical Registers
StatisticalInformationSystems
Metadata system
Quality (QRCA) DBs for the
integration for each subsystem
SIM border
ADS 1
ADS 2
ADS 3
ADS I
ADS N
SUBSYSTEM OF INTEGRATION
SIM: the DBs for integration
Structure• the ID of the sources• the serial number internal to all the
sources in which the object is recognized• the ID of the object in the subsystem of
integration• the variables used for the integration for
all the sources in which the object is present
• the different kind of record linkage used to enter in the DB
• the time reference for linkage validity
The presence of the unique ID determines a spider web structure of relationships able to guarantee that every source is connected with the DB for integration and, at the same time, with all the others that are part of the same subsystem of integration.
Q2014, Vienna
SIM RELATIONSHIPS
AMONG INDIVIDUALS
SIM ECONOMIC
UNITSSIM UNITS
SIM PLACES
SIM RELATIONSHIPS
SIM RELATIONSHIPS
BEETWEEN INDIVIDUALS
AND ECONOMIC UNITS
INDIVIDUAL ID
ECONOMIC UNIT ID
INDIVIDUAL ID –FAMILY ID
ECONOMIC UNIT ID – LOCAL
UNITS ID
INDIVIDUAL ID – ECONOMIC
UNIT ID
INDIVIDUAL ID –
INDIVIDUAL PLACES ID
ECONOMIC UNIT ID –
LOCAL UNITS ID
SIM: the subsystems
Q2014, Vienna
SIM PLACES INDIVIDUALS
SIM INDIVIDUALS
SIM RELATIONSHIPS
AMONG ECONOMIC
UNITS
SIM PLACES ECONOMIC
UNITS
AD Management strategy for efficiency and quality [3]
Acquisition procedures
Integrated System of Microdata Repository
SIM
AD
quality evaluation
Statistical processes using AD
Dissemination to statistics users
ADA
func
tions
Survey data
Register data
Input
Data Treatment (transformation
function)
Administrative data
Q
Q
User statistics oriented
Producer statistics oriented
AD quality evaluation in the Statistical production process
Data Treatment (transformation
function)
Statistical Output
Quality Report Card for Administrative Data
“AD quality” is considered in relation to the AD reuse for statistical purposes, taking
into account that the AD are not primarily produced for statistical purposes.
Quality Report Card for Administrative data – QRCA objectives
• Assess the AD quality in terms of input of the statistical production process for its potential usability Usability analysis
• Monitoring AD for two main reasons: a) regulatory changes may induce discontinuity producing significant impacts on the statistics production; b) before AD enter into statistical production process an analysis must be carried out to verify the presence of unexpected lack of quality AD monitoring function
• Check AD compliance with respect to the requests and support the loading data process. Where appropriate define alert / warning to optimize the timing of the data acquisition and release Data supply monitoring function
For Istat potential users
For Istat current users
For the AD acquisition process
Q2014, Vienna
The AD quality framework considers a hierarchical and multidimensional
approach including issues directly connected with the AD quality and those
information for the AD management process aimed at improving the
statistical AD quality/usability
AD quality framework
The AD quality framework adopted is based on that originally defined by Statistics Netherlands [1] and then developed within the international BlueEts project, WP4 [2].
[1] Daas et al. (2009) Checklist for the Quality evaluation of AD Sources. Discussion paper 09042, Statistics Netherlands.
[2] Daas et al. (2011) Reports on methods preferred for the quality indicators of administrative data sources, Deliverable 4.2 of Workpackage 4 of the BLUE-ETS project. CBS, Netherlands, SSB, Norway, Istat, Italy, SCB, Sweden.
Q2014, Vienna
Titolo intervento, nome cognome relatore – Luogo, data
AD quality framework for AD supplied
1
HYPERDIMENSION DIMENSION
SOURCE F1. Information on ADSs
information needed to manage the AD acquisition process with the aim of evaluating and improving the quality of the acquired data
F2. AD source relevance
F3. Privacy e security in using AD
F4. Arrangements for the data delivery
F5. Relationships and feedback with administrative source holder
METADATA M1. Clarity and interpretability
Information for the quality assessment at the conceptual level and description of the acquisition procedures
M2. Comparability
M3. Data acquisition/treatment (by AD source holder)
DATA D1. Technical checks
Quality evaluation indicators for AD supplied
D2. Integrability
D3. Accuracy
D4. Completeness
D5. Time-related dimension
Adapting the Quality framework and the QRCA to the Istat ADA
process
Implementing the QRCA through interoperability among ADA
processes
With the purpose of complying the appropriate efficiency and
timeliness, a system that allows making the AD quality evaluation as
automated as possible is being planning. Following the OECD Core
principles for metadata management, the strategy aims to take
advantage of all the available metadata from the production process
using AD, that is to make metadata “active” to the greatest extent
possible for supporting the QRCA production.
QRCA and the ADA process
Q2014, Vienna
The implementation of the Source Hyperdimension quality indicators takes advantage of all the information used for the AD acquisition procedures. For the moment this information is managed in a not fully automatic way but Istat is proceeding in this direction storing and organizing it according with the AD quality framework in view of its reuse in the process of quality assessment. With respect to the Relevance quality Dimension, a specific Report for each main AD source is being finalized. It will provide information about all their statistical uses (derived automatically from the Acquisition procedures metadata) and about the compliance with the Istat requirements in terms of quality, timeliness, contents (derived from a very short questionnaire that could be submitted to AD source users).
Implementation of the Source Hyperdimension
Q2014, Vienna
The Clarity quality Dimension considers metadata that should be available from the data source holder. In this regard, a procedure is in place which should allow the acquisition of the metadata together with data.
But a strong collaboration with the AD holder has to be expected. In most cases definitions are deduced by free-form metadata available. To make the system more efficient these definitions could be shared among AD source users in the QRCA.
In addition, the phase D. “Formal Concept Analysis/ identification of objects and relations” of the ADS, and the consequent data loading in the relational database, can allow to automatically identify the set of objects /entities to be evaluated.
Titolo intervento, nome cognome relatore – Luogo, data
Implementation of the Metadata Hyperdimension [1]
1
About the Comparability it should be possible to define a bridge linking the statistical units with the corresponding administrative units and the output statistical variables with the administrative ones used in the production process. Following the strategy of the interoperability among systems, new Istat Unified Metadata System (SUM) [8] could support the QRCA production. In case of processes already using AD, starting from the reference metadata, describing content and quality of statistical data produced and disseminated by Istat through the I.Stat Dissemination System, the traceability task (one of the SUM objectives), should be pursued retaining metadata processes.
Implementation of the Metadata Hyperdimension [2]
Q2014, Vienna
ETL process metadata may support the Technical checks dimension (dataset readability, convertibility and compliance with AD requested). The AD integration process is documented in SIM then, for the Integrability quality dimension, it is possible to reuse metadata to compute indicators describing the quality of the Linking variable and, in general, the quality of the record linkage procedures. Using unique and stable (over time) ID number for objects in SIM, Objects Alignment and Comparability indicators may be implemented comparing AD from different sources in SIM or AD with the main Statistical Registers. With respect to the latter, coverage indicators of the Completeness quality dimension, where possible, may be implemented too. Linking the same object in a dataset over time, using the SIM ID number, may produce Dynamic of objects indicators in the Time-related dimension.
Implementation of the Data Hyperdimension
Q2014, Vienna
many indicators may be implemented automatically!
others are not derivable using interoperabilityit is the case of indicators of consistency checks (Accuracy dimension) for the implementation of which, an external intervention has to be considered for the check rules definition and for the AD source Relevance evaluation, concerning the compliance with the Istat requirements in terms of quality, timeliness, contents.the AD source users may provide this information insofar as they have an interest to share with others information on AD used. With this aim some “AD source users groups” are setting up in Istat for the most important data source holder.
Epilogue
Q2014, Vienna
THANKS!