34
SHARING COMMON FONCTIONALITIES ESSNET ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: SCFE Project title: “Sharing common functionalities in the ESS” Name(s), title(s) and organization or the auhor(s): Zvone Klun Petra Blažič Dr. Mojca Noč Razinger Tomaž Špeh Statistical Office of the Republic of Slovenia Tel: +386 1 241 64 00 e-mail: [email protected] This document is licensed under a Creative Commons License: Attribution-ShareAlike 4.0 International

ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

SHARING COMMON

FONCTIONALITIES ESSNET

ESSnet SCFE DELIVERABLE D4-1

Initial list of services that are candidates for re-use in ESS

Project acronym:

SCFE

Project title:

“Sharing common functionalities in the ESS”

Name(s), title(s) and organization or the auhor(s):

Zvone Klun

Petra Blažič

Dr. Mojca Noč Razinger

Tomaž Špeh

Statistical Office of the Republic of Slovenia

Tel: +386 1 241 64 00

e-mail: [email protected]

This document is licensed under a Creative Commons License: Attribution-ShareAlike 4.0 International

Page 2: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

2

TITRE DU DOCUMENT

1. Introduction 3

1.1. Purpose 3

1.2. References 3

2. Setting up initial list of services 4

2.1. Analysis of work already done by other statistical organisations 4

2.2. Development of the tool for assessing statistical services’ costs/benefit 4

2.2.1. Cost Benefit Model 5

2.2.2. Multi criteria analysis 6

2.2.2.1. Select (sub) criteria and develop corresponding weights 6

2.2.2.2. Rating system 8

2.2.2.3. Rate each initiative using the ratings and weightings identified 8

2.2.2.4. Collate all information and analyze 9

2.3. Data collection and analysis of results 10

2.3.1. Survey methodology 10

2.3.2. Key findings of the survey 13

2.4. Setting up initial list of shareable statistical services 15

Appendix A: The detailed report of the survey performed 21

A.1 Survey process 21

A.1.1 Respondents selection 21

A.1.2 Questionnaire 21

A.1.3 Response rate and respondents 22

A.1.4 Evaluation of responses 23

A.2 The results 23

A.2.1 Evaluation of differences between IT staff and others 26

A.2.2 Evaluation of differences by the type of state (members, EFTA, candidates, etc.)

26

A.2.3 Evaluation of the open questions 27

A.3 Conclusion 27

A.4 The questionnaire content 29

Page 3: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

3

TITRE DU DOCUMENT

1. INTRODUCTION

The objective of WP4 - Identification of re-usable services and analysis of requirements was to identify

services that can be candidates for re-use in the ESS. This document contains identified services that can

be candidates for re-use in the ESS.

Within this activity work done in the UNECE CSPA project, ESS EA TF and ESS work on standardisation as

well as work already done by other statistical organisations was reviewed in order to set up an initial list

of services that are candidates for re-use in the ESS. It was important to clearly identify how each

statistical organisation will benefit by reusing such a statistical service. The analysis addressed how

"capability candidates" can be identified to be offered through service operations in an SOA environment.

Initial cost/benefit analysis of re-use for each of those services was performed.

Due to the complexity of the SOA, cost/benefit estimation for SOA-based software development and reuse

is more difficult than that of traditional software development. Traditional software cost/benefit

estimation approaches are inadequate to address the complex service-oriented systems. Therefore, a

simple, understandable and easy to use tool for assessing statistical services’ costs/benefit was developed.

The tool will help in driving the statistical organization forward rather than acting purely as a

classification framework. It is based on the ESS EA Reference Framework and the CSPA standard. It takes

into account other industries’ SOA best practices, roadmaps and maturity models where appropriate. It

requires the least work to get the best result and allow initial exclude/include decisions to be made early.

It presents views and information in simple language. Both benefit and risk is surfaced to aid decision

making. It considers current environment, ongoing projects and the transformation roadmaps. It allows

effective application at any point in a roadmap in any organizational situation. It supports varying levels of

statistical service granularity. It facilitates decision making. The tool was consulted with other members of

the consortium before being distributed within the statistical community in order to gather information

for setting up the initial list of shareable statistical services.

1.1. PURPOSE

The purpose of this document is to give a detailed description of work done in the process of identifying

and selecting candidate services for re-use in the ESS. This document’s primarily intention is to describe

the process of identifying candidate statistical services to be assessed, development of methodology for

cost benefit assessment, performing the survey and analysis of the results.

1.2. REFERENCES

Generic Statistical Business Process Model (GSBPM)

The GSBPM provides a reference framework for classifying and understanding the statistical production

activities of an NSI. It covers the entire production cycle of official statistics, including their evaluation and

gathering of user needs, the design and build aspects, and the collection, production and dissemination of

statistics. The GSBPM is a common reference framework for all NSIs and it is widely used within them. It is

the key instrument used to identify and define services.

Generic Statistical Information Model (GSIM)

The GSIM provides a reference framework and conceptual information objects for statistics. One of the

key aspects of the GSIM is that it provides a common language to describe statistical information,

therefore enabling sharing and modernisation. The GSIM is used to describe the conceptual inputs and

outputs of statistical services.

Page 4: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

4

TITRE DU DOCUMENT

Common Statistical Production Architecture (CSPA)

The CSPA provides a framework, principles and guidelines to develop statistical services. The aims of the

CSPA are to foster international collaboration to develop and share interoperable, reusable statistical

services. The CSPA is based on the principles of Service Oriented Architecture (SOA) and builds on the

GSBPM and the GSIM in order to define the statistical context for the SOA approach.

ESS Enterprise Architecture Reference Framework (ESS EARF)

The ESS EARF provides a series of artefacts that support and guide the implementation of Vision 2020.

The ESS EARF provides a Capability Model and a series of application Building Blocks for the ESS, as well

as related architectural design principles. The ESS uses the EARF for governance of programmes and

projects, ensuring that the deliverables of these are aligned with the EARF artefacts.

ESS Statistical Production Reference Architecture (SPRA)

The SPRA expands the Information System Architecture of the ESS EARF. It provides principles, examples

and guidance on how the different application building blocks interrelate and what services they support.

The SPRA can be used to guide the identification and definition of statistical services; priority should be

given to starting from the business architecture domain and the GSBPM.

2. SETTING UP INITIAL LIST OF SERVICES

The objective of WP4 - Identification of re-usable services and analysis of requirements was to identify

services that can be candidates for re-use in the ESS. The subtasks done for this deliverable are:

- Analysis of work done in the UNECE CSPA project, ESS EA TF and ESS work on standardisation as

well as work already done by other statistical organisations

- Development of the tool for assessing statistical services’ costs/benefit. The tool was discussed

with the members of the consortium before being distributed within the statistical community

- Data collection and analysis of results

- Setting up the initial list of shareable statistical services

The subtasks are further described in the following sections.

2.1. ANALYSIS OF WORK ALREADY DONE BY OTHER STATISTICAL

ORGANISATIONS

SURS has reviewed the existing service lists from the UNECE CSPA project, ESS EA TF and ESS work on

standardisation as well as work already done by other statistical organisations, e.g. ONS. With the input

from the ESS EARF, ESS SPRA, GSBPM, GSIM, CSPA and SOA the initial service list was created.

Additionally, the comments from the group members have been considered.

2.2. DEVELOPMENT OF THE TOOL FOR ASSESSING STATISTICAL SERVICES’

COSTS/BENEFIT

As the most appropriate tool for assessing statistical services costs/benefit AAA framework was selected.

AAA stands for Attractiveness (or return), Achievability (or risk) and Affordability as these are the criteria

Page 5: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

5

TITRE DU DOCUMENT

on which the candidate services for development are selected. The assessment is more strategic, and

focuses on the business needs of the ESS countries. The outcome of this assessment is a prioritised list of

most attractive statistical services that can then be developed. (As described in WP1 deliverable D1-1,

Chapter 2.6.) Statistical services can be available as generic solutions either by replication of solutions in

the national statistical production process or by exposing these as statistical services on the ESS SOA

platform(s).

The tool answers the following questions:

- Which statistical services are right to invest in/develop the next SERV2 project?

- Which are the most important/wanted statistical services?

- Which statistical services must be resourced first?

The model enables multi criteria analysis of cost/benefit for each candidate statistical service from initial

services list in order to identify the most important statistical services that will be developed first. It could

be used also when designing and constructing the system as well as for testing and implementation.

2.2.1. COST BENEFIT MODEL

The assessment is based on the AAA framework where:

- Attractiveness qualifies and quantifies the benefit claims and assesses the contribution to

strategic and operational objectives

- Achievability evaluates the likelihood that the objectives can be achieved within the stated

financial, resource and timescale constraints

- Affordability considers whether the implementation (develop, reuse) costs relative to its

realizable benefits are reasonable

An unlimited list of possible metrics grouped under the 3 main headings can be analysed.

Attractiveness considers the improvement of quality and efficiency, strategic impact and contribution to strategic objectives, confidence in benefits forecast, stakeholder commitment to the changes, etc. In the applied method selection or not selection of the service was used.

Achievability considers like-hood/confidence of delivery, capacity and competence, adequacy of resource provision, and mitigates against corporate risk. In the applied method the question “How likely the service corresponds to strategic and architecture goals of your organisation?” was used.

Affordability considers implementation costs, operational costs, alternatives certainty, and legal compliance. In the applied method the question “How likely the service reuse/implementation costs will be reasonable?” was used.

For the assessment the approach based on the multi criteria analysis method (MCA) was decided.

For the purposes of carrying out the SERV project the following metrics have been considered.

Page 6: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

6

TITRE DU DOCUMENT

Picture 1: Assesing statistical candidate service

2.2.2. MULTI CRITERIA ANALYSIS

Multi criteria analysis is suitable for the evaluation of projects as there are multiple objectives which are

often in conflict with each other. The main strength of the MCA is that benefits which are unable to be

readily quantified in monetary terms and are of major importance are included in the evaluation.

Multi criteria analysis gives the possibility to prioritise a group of service candidates based on a set of

weighted criteria and sub-criteria by using decision conferencing to debate and agree on:

1. Select criteria and sub-criteria 2. Develop corresponding weights 3. Agree on the rating system 4. Rate each initiative using the ratings and weightings identified 5. Collate all information and analyse

2.2.2.1. SELECT (SUB) CRITERIA AND DEVELOP CORRESPONDING WEIGHTS

Many organisations select their criteria under two or three main headings: Attractiveness, Achievability

and Affordability. The following picture is an example of the definition of the relative weight of three

criteria by using pairwise comparison.

Page 7: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

7

TITRE DU DOCUMENT

Picture 2: An example of the definition of the relative weight of three criteria by using pairwise

comparison

Compare criterion 1 with the other criteria and decide if criterion 1 is more or less important or

comparable with the other criteria. The following scores are proposed:

- 4: much more important

- 2: more important

- 1: comparable

- 1/2: less important

- 1/4: much less important

In the example criterion 1, Attractivenes, is more important than criterion 2, Achievability, and more

important than criterion 3, Afordability. By adding the scores row by row the total row score is calculated.

As a final step the relative weight of each criterion by dividing the total of a row with the total of all rows

are calculated.

The following picture does the same for the four sub-criteria for achievability.

Picture 3: An example of sub-criteria for achievability

Page 8: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

8

TITRE DU DOCUMENT

2.2.2.2. RATING SYSTEM

The next step, after defining the relative weight of the criteria, is to agree on the rating system or

contribution of each sub-criterion. Depending on the criterion the rating system can be set, for example:

- 0: no contribution, 5: some contribution, 10: high contribution

- 0: desirable, 5: highly desirable, 10: mission critical

The following picture shows the rating system for sub-criteria of attractiveness for service named Time

series.

Picture 4: Rating system for sub-criteria of attractiveness

2.2.2.3. RATE EACH INITIATIVE USING THE RATINGS AND WEIGHTINGS

IDENTIFIED

Al ingredients for the model are now in place. The next picture shows the result. From the picture you can

see that there are three main headings attractiveness, achievability and affordability, including their

mutual relative weight, all sub-criteria and their weights, the scores of the main headings and the overall

score of the candidate service. At the sub-criteria the scale of strategic impact & score is score 10. In the

model this will lead to a total score of 10 multiplied by the relative weight of 0.4 leading to a score of 4.

The sum of all weighted subcriteria is multiplied with the relative weight of attractiveness of 0.6. In total

the attractiveness score is 6.

Contribution Time series

Attractiveness No Some High Score

Improves efficiency 0 5 10 10

Strategic impact & contribution 0 5 10 10

Confidence in benefits forecast 0 5 10 10

Stackholder commitment 0 5 10 10

Page 9: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

9

TITRE DU DOCUMENT

Picture 5: Asessment by the AAA framework

2.2.2.4. COLLATE ALL INFORMATION AND ANALYZE

The model can be used to define the priority of all our candidate services. The next picture gives an

example of a candidate services bubble chart, which can be useful to communicate the key findings in a

clear and concise manner. In this case all candidate services are shown on the map. The upper right corner

is the most favourable one. The most important candidate service, Time series, based on its attractiveness

(horizontal line), achievability (vertical line) and affordability (bubble size) can be seen.

Picture 6: The result of assesment

Page 10: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

10

TITRE DU DOCUMENT

2.3. DATA COLLECTION AND ANALYSIS OF RESULTS

The AAA evaluation tool was first presented in the project meeting in Lisbon in July 2016. The complete

tool was again presented and tested by the consortium in the December 2016 meeting in Ljubljana. At the

end of February and March 2017 the questionnaire to the Member States was launched. The analysis of

the results is presented below. A detailed description is in Appendix A.

The survey was launched to the National Statistical Institutes (hereinafter NSI). The survey’s main

purpose was to identify re-usable services that could be useful for the production process in NSIs.

2.3.1. SURVEY METHODOLOGY

The questionnaire encompassed the questions on attractiveness, achievability and affordability of the

selected list of 32 services. The list is available in the following table.

Nr. Service Sub-process according to GSBPM

Description (source GSBPM)

1 Manage requirements

1. Specify needs (1.6 Prepare business case)

Enables description of "As-Is" business process with information how current statistics are produced, highlighting any inefficiencies and issues to be addressed. The proposed "To-Be" solution details how the statistical business process will be developed to produce the new or revised statistics. An assessment of costs and benefits, as well as any external constraints included.

2 Data sources management

2. Design (2.2 Design variable descriptions, 2.3 Design collection, 2.5 Design processing and analysis)

Specifies all relevant metadata, ready for use later in the statistical business process. Includes definitions of statistical variables to be collected via the collection instrument, as well as auxiliary variables in the processes. Preparation of metadata descriptions of collected and derived variables and classifications is a necessary precondition for subsequent phases. Includes any formal agreements relating to data supply, such as memoranda of understanding, and confirmation of the legal basis for data collection.

3 Questionnaire generator

2.3 Design collection, 3.1 Build collection instrument

Includes the design of collection instruments, questions and response templates. It is enabled by tools such as question libraries (to facilitate the reuse of questions and related attributes), questionnaire tools (to enable quick and easy compilation of questions into formats suitable for cognitive testing). It connects the questionnaire to the statistical metadata system, so that metadata can be more easily captured in the collection phase.

4 Coding / using machine learning

2.5 Design processing and analysis, 4.3 Run collection, 5.2 Classify and code

Codes the input data. The routine assigns numerical codes to text responses according to pre-determined classification scheme or classifies numbers into grades.

5 Design workflow 2.6 Design production systems and workflows, 3.7 Finalise production systems

The service where the workflow from data collection to dissemination can be designed. It enables the overview of all the processes required within the whole statistical production process. It enables definitions of who will be responsible for what and when. Special definitions for specifics of the processes are also possible.

6 Web questionnaire - visualization

3.1 Build collection instrument

The questionnaire is generated or built based on the design specifications created during the "Design" phase.

Page 11: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

11

TITRE DU DOCUMENT

Nr. Service Sub-process according to GSBPM

Description (source GSBPM)

7 Sample allocation 4.1 Create frame and select sample

Defines the sample size in each stratum according to different types of allocations (proportional, optimal - Neyman, uniform).

8 Sample selection 4.1 Create frame and select sample

Selects the sample for this iteration of the collection. (GSBPM) Selects the sample from the sampling frame based on the selected type of sampling design. It provides a list of selected units as output, but the sample frame and allocation (or the sample size) are input data files.

9 Interviewer workload management

4.3 Run collection Includes the allocation of the providers to the interviewers, changes of interviewers and reallocation of the providers to the interviewers. It records when and how providers were contacted.

10 Manage response burden

4.3 Run collection, 4.4 Finalise collection

Includes the management of the providers involved in the current collection, ensuring that the relationship between the statistical organisation and data providers remains positive. The service tracks in how many surveys the provider was selected. Some process variables are tracked by the system, among others also time to fulfil the questionnaire. The information can be aggregated to the higher levels.

11 Structural data validation

4.3 Run collection, 5.3 Review and validate

Provides basic validation of the structure and integrity of the information received (right format of files and expected fields). Formal control includes checking of length, format and classification.

12 Content data validation

4.3 Run collection, 5.3 Review and validate

Provides validation of the content based on the validation rules. The service displays units and variables that do not meet the conditions. There is also a possibility to define different types of errors (warning, error, etc.).

13 Record linking 5.1 Integrate data Matching / record linkage routines, with the aim of linking micro or macro data from different sources.

14 SDMX Coding and Transform

5.2. Classify and code The business outcome of using this service is that an existing dataset not fit for particular needs can easily be re-coded to be fit for that purpose.

15 Administrative data encryption

5.2. Classify and code Provides the mechanism that converts data for the purpose of disabling the recognition of the unit.

16 Identification service

5.2. Classify and code Provides identification of enterprise at the global level.

17 Outlier detection 5.3 Review and validate Finds outliers from the predefined rules.

18 Imputations 5.4 Edit and impute If data are considered incorrect, missing or unreliable, new values are inserted according to different methods. The steps include the determination of whether to add or change data, selecting the method to be used, adding or changing data values, writing of data values back to the data set and flagging them as changed.

Page 12: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

12

TITRE DU DOCUMENT

Nr. Service Sub-process according to GSBPM

Description (source GSBPM)

19 Error correction 5.4 Edit and impute If data are considered incorrect, missing or unreliable, new values are inserted according to deterministic rules. The rules include the determination of whether to add or change data, adding or changing data values, writing of data values back to the data set and flagging them as changed.

20 Weights calculation 5.6 Calculate weights Weights can be used to "gross-up" results to make them representative of the target population, or to adjust for non-response in total enumerations, or to adjust to population values by the auxiliary population variables.

21 Aggregation 5.7 Calculate aggregates Creates aggregate data and population totals from microdata or lower-level aggregates.

22 Standard error estimation

5.7 Calculate aggregates Estimates standard errors for aggregate data based on sampling design, estimator, weights.

23 Tabulation 6.1 Prepare draft outputs

From the database of aggregate data the tables in different forms are prepared. Code lists, names and titles of the tables, statistics and domains are considered.

24 Graphical analysis 6.2. Validate outputs Graphical presentation of the data, aiming at exploring cross-sectional as well as longitudinal data distribution.

25 Macrodata validation

6.2. Validate outputs Validation of the already aggregated data. Statistical outputs that are results of the aggregation process are validated according to the predefined set of validation rules.

26 Disclosure control 6.4 Apply disclosure control

Ensures that the data (and metadata) to be disseminated do not breach the appropriate rules on confidentiality. This may include checks for primary and secondary disclosure, as well as the application of data suppression or perturbation techniques.

27 Microdata access (confidentiality on the fly)

6.4. Apply disclosure control

In the service the confidentiality routine is applied dynamically when the data items are retrieved, after any selection. This results in the data returned being of the highest quality, as the effect of the confidentiality is not compounded.

28 Seasonal adjustment / Time series processing

6.5 Finalise outputs The impact of the season and calendar is eliminated from the time series if the impact is characteristic and relevant.

29 Geospatial visualisation

7.2 Produce dissemination products

Offers an interactive cartographic window to visualise a selection of statistical data on thematic maps with a spatial querying tool to delineate user-defined areas of interest for analysis and display of statistical data. The created views can be shared, exported as picture or downloaded as raw geospatial data sets. The service offers time series of official statistics presented on administrative units and grids, which makes it a powerful tool for monitoring the past development of a particular phenomenon and suggesting the future trends.

Page 13: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

13

TITRE DU DOCUMENT

Nr. Service Sub-process according to GSBPM

Description (source GSBPM)

30 Statistical chart generator - statistical data visualization

7.2 Produce dissemination products

Offers interactive charts visualising a selection of statistical data on charts with querying tool to delineate user-defined areas of interest for analysis and display of statistical data. The created charts can be shared, exported as picture or downloaded. The service offers time series of official statistics presented on charts, which makes it a powerful tool for monitoring the past development of a particular phenomenon and suggesting the future trends.

31 Release management

7.3 Manage release of dissemination products

Includes managing the timing of the release. It also includes the provision of products to customers (ministers, researchers, other) and managing access to confidential data by authorised user groups.

32 Metadata dissemination

7.3 Manage release of dissemination products

Provides dissemination of information on the source, concept, definition, methodology and details on collection, processing, interpretation and dissemination as well as availability of data.

Table 1: The list of the services offered withing the questionnaire

At the end of the questionnaire two open questions were set: first the question on additional services that

were not on the list and would be usable for the Statistical Office and second the question on already

existing services that would be also good candidates for sharing. It was highly recommended to consult

the answers with the staff responsible for strategic decisions regarding the development of statistical

production architecture. The desired reporters were IT directors.

The survey was launched on 22 February 2017. All communication was performed electronically.

Reporting units were ITDG members: EU Member States, EFTA members, candidates and potential

candidates for EU membership. Two reminders were sent. The deadline of the survey was 15 March 2017.

The questionnaire was still available for the respondents until 24 March 2017. Of the 38 countries

included in the survey 71% responded. Among the 27 respondents were 15 IT directors, 4 IT experts, 3

directors of methodology and 5 others.

2.3.2. KEY FINDINGS OF THE SURVEY

From services available for selection the ten most attractive services were: Disclosure control, Record

linking, SDMX coding and transform, Microdata access (confidentiality on the fly), Content data validation,

Imputations, Error correction, Questionnaire generator, Seasonal adjustment/Time series processing,

Web questionnaire - visualization. The chart below shows these services according to the other two

criteria (affordability and achievability). The size of the bubble represents attractiveness and the value is

shown next to the data label.

Page 14: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

14

TITRE DU DOCUMENT

Picture 7: Zoom view on the top ten selected services

The respondents were not interested in the usefulness of identification service and release management

service.

As the other services that are not on the list and could contribute to the goals of their organizations the

respondents and further communication with NSIs established:

- CATI supporting platforms

The respondents also named the existing services in the NSI that could be shared or that are considered to

be a good candidate for sharing:

- PC-AXIS

- Administrative data encryption (GSBPM 5.2)

- Selective data editing (GSBPM 4.3 and 5.4)

- IRIA (design, build, debug, run and manage all kind of surveys) (GSBPM: fully supported 3.1, 3.2,

3.4, 4.2, 4.3, 4.4, partially supported 2.1, 2.2, 2.3, 3.5, 3.6, 3.7)

- ATINE (generator of data processing applications for structural, aperiodic and small surveys)

(GSBPM: 5 except 5.2)

- Symmetric encryption keys management

- Services under development in ESSnet WP3: seasonal adjustment, questionnaire generator, and

metadata dissemination

Page 15: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

15

TITRE DU DOCUMENT

2.4. SETTING UP INITIAL LIST OF SHAREABLE STATISTICAL SERVICES

The initial list of most important services according to ranking collected with the questionnaire within

ESS is presented in the following table:

Service Sub-process according to GSBPM

Description (source GSBPM) Nr. Partially compliant existing services (sources GitHub and WP3, WP4, WP5 ESSnet SCFE)

Disclosure control

6.4 Apply disclosure control

Ensures that the data (and metadata) to be disseminated do not breach the appropriate rules on confidentiality. This may include checks for primary and secondary disclosure, as well as the application of data suppression or perturbation techniques.

26 R packages sdcMicro, sdcTable, simPop, (also: Tau-Argus and Mu-Argus), sdcTools

Record linking 5.1 Integrate data

Matching / record linkage routines, with the aim of linking micro or macro data from different sources.

13 R packages RecordLinkage, stringdist and fuzzyjoin, CDA, RELAIS, ATINE

SDMX Coding and Transform

5.2. Classify and code

The business outcome of using this service is that an existing dataset not fit for particular needs can easily be re-coded to be fit for that purpose.

14 R package rsdmx, StatMiner, SDMX Converter, SDMX-RI, SDMX-JSON, JSON-Stat, SDMX Framework

Microdata access (confidentiality on the fly)

6.4. Apply disclosure control

In the service the confidentiality routine is applied dynamically when the data items are retrieved, after any selection. This results in the data returned being of the highest quality, as the effect of the confidentiality is not compounded.

27

Content data validation

4.3 Run collection, 5.3 Review and validate

Provides validation of the content based on the validation rules. The service displays units and variables that do not meet the conditions. There is also a possibility to define different types of errors (warning, error, etc.).

12 R packages validate, errorlocate, IDEV,eSTATISTIK.core, EUSurvey, IRIA, Selective data editing

Imputations 5.4 Edit and impute

If data are considered incorrect, missing or unreliable, new values are inserted according to different methods. The steps include the determination of whether to add or change data, selecting the method to be used, adding or changing data values, writing of data values back to the data set and flagging them as changed.

18 R packages simputation, deductive, ATINE Selective data editing

Error correction 5.4 Edit and impute

If data are considered incorrect, missing or unreliable, new values are inserted according to deterministic rules. The rules include the determination of whether to add or change data, adding or changing data values, writing of data values back to the data set and flagging them as changed.

19 R package deductive, ATINE Selective data editing

Page 16: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

16

TITRE DU DOCUMENT

Service Sub-process according to GSBPM

Description (source GSBPM) Nr. Partially compliant existing services (sources GitHub and WP3, WP4, WP5 ESSnet SCFE)

Questionnaire generator

2.3 Design collection, 3.1 Build collection instrument

Includes the design of collection instruments, questions and response templates. It is enabled by tools such as question libraries (to facilitate the reuse of questions and related attributes), questionnaire tools (to enable quick and easy compilation of questions into formats suitable for cognitive testing). It connects the questionnaire to the statistical metadata system, so that metadata can be more easily captured in the collection phase.

3 EUSurvey, IRIA, questionnaire generator

Seasonal adjustment / Time series processing

6.5 Finalise outputs

The impact of the season and calendar is eliminated from the time series if the impact is characteristic and relevant.

28 DEMETRA+, JDemetra+, X-13ARIMA-SEATS, GENESIS, Seasonal Adjustment Toolkit, seasonal adjustment

Web questionnaire - visualization

3.1 Build collection instrument

The questionnaire is generated or built based on the design specifications created during the "Design" phase.

6 EUSurvey, IRIA

Coding / using machine learning

2.5 Design processing and analysis, 4.3 Run collection, 5.2 Classify and code

Codes the input data. The routine assigns numerical codes to text responses according to pre-determined classification scheme or classifies numbers into grades.

4

Structural data validation

4.3 Run collection, 5.3 Review and validate

Provides basic validation of the structure and integrity of the information received (right format of files and expected fields). Formal control includes checking of length, format and classification.

11 IDEV,eSTATISTIK.core, EUSurvey, IRIA

Macrodata validation

6.2. Validate outputs

Validation of the already aggregated data. Statistical outputs that are results of the aggregation process are validated according to the predefined set of validation rules.

25 R package validate

Outlier detection 5.3 Review and validate

Finds outliers from the predefined rules.

17 R packages extremevalues, SeleMix, ATINE

Weights calculation

5.6 Calculate weights

Weights can be used to "gross-up" results to make them representative of the target population, or to adjust for non-response in total enumerations, or to adjust to population values by the auxiliary population variables.

20 R packages calibrateSSB, survey, ATINE

Page 17: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

17

TITRE DU DOCUMENT

Service Sub-process according to GSBPM

Description (source GSBPM) Nr. Partially compliant existing services (sources GitHub and WP3, WP4, WP5 ESSnet SCFE)

Geospatial visualisation

7.2 Produce dissemination products

Offers an interactive cartographic window to visualise a selection of statistical data on thematic maps with a spatial querying tool to delineate user-defined areas of interest for analysis and display of statistical data. The created views can be shared, exported as picture or downloaded as raw geospatial data sets. The service offers time series of official statistics presented on administrative units and grids, which makes it a powerful tool for monitoring the past development of a particular phenomenon and suggesting the future trends.

29 R packages tmap, cartomap, gvSIG

Standard error estimation

5.7 Calculate aggregates

Estimates standard errors for aggregate data based on sampling design, estimator, weights.

22 R packages survey, hbsae, rsae, ReGenesees System, ATINE

Design workflow 2.6 Design production systems and workflows, 3.7 Finalise production systems

The service where the workflow from data collection to dissemination can be designed. It enables the overview of all the processes required within the whole statistical production process. It enables definitions of who will be responsible for what and when. Special definitions for specifics of the processes are also possible.

5 CORA

Administrative data encryption

5.2. Classify and code

Provides the mechanism that converts data for the purpose of disabling the recognition of the unit.

15 Administrative data encryption, Symmetric encryption keys management

Tabulation 6.1 Prepare draft outputs

From the database of aggregate data the tables in different forms are prepared. Code lists, names and titles of the tables, statistics and domains are considered.

23

Metadata dissemination

7.3 Manage release of dissemination products

Provides dissemination of information on the source, concept, definition, methodology and details on collection, processing, interpretation and dissemination as well as availability of data.

32 PC-AXIS, metadata dissemination

Page 18: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

18

TITRE DU DOCUMENT

Service Sub-process according to GSBPM

Description (source GSBPM) Nr. Partially compliant existing services (sources GitHub and WP3, WP4, WP5 ESSnet SCFE)

Statistical chart generator - statistical data visualization

7.2 Produce dissemination products

Offers interactive charts visualising a selection of statistical data on charts with querying tool to delineate user-defined areas of interest for analysis and display of statistical data. The created charts can be shared, exported as picture or downloaded. The service offers time series of official statistics presented on charts, which makes it a powerful tool for monitoring the past development of a particular phenomenon and suggesting the future trends.

30 R packages tabplot, treemap, GENESIS, PC-Axis, EUSurvey, gvSIG, PC-AXIS

Manage response burden

4.3 Run collection, 4.4 Finalise collection

Includes the management of the providers involved in the current collection, ensuring that the relationship between the statistical organisation and data providers remains positive. The service tracks in how many surveys the provider was selected. Some process variables are tracked by the system, among others also time to fulfil the questionnaire. The information can be aggregated to the higher levels.

10 eSTATISTIK.core, IRIA

Graphical analysis

6.2. Validate outputs

Graphical presentation of the data, aiming at exploring cross-sectional as well as longitudinal data distribution.

24 R package VIM, EUSurvey

Data sources management

2. Design (2.2 Design variable descriptions, 2.3 Design collection, 2.5 Design processing and analysis)

Specifies all relevant metadata, ready for use later in the statistical business process. Includes definitions of statistical variables to be collected via the collection instrument, as well as auxiliary variables in the processes. Preparation of metadata descriptions of collected and derived variables and classifications is a necessary precondition for subsequent phases. Includes any formal agreements relating to data supply, such as memoranda of understanding, and confirmation of the legal basis for data collection.

2 IRIA

Aggregation 5.7 Calculate aggregates

Creates aggregate data and population totals from microdata or lower-level aggregates.

21 R packages calibrateSSB, survey, EUSurvey, ReGenesees System, ATINE

Page 19: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

19

TITRE DU DOCUMENT

Service Sub-process according to GSBPM

Description (source GSBPM) Nr. Partially compliant existing services (sources GitHub and WP3, WP4, WP5 ESSnet SCFE)

Manage requirements

1. Specify needs (1.6 Prepare business case)

Enables description of "As-Is" business process with information how current statistics are produced, highlighting any inefficiencies and issues to be addressed. The proposed "To-Be" solution details how the statistical business process will be developed to produce the new or revised statistics. An assessment of costs and benefits, as well as any external constraints included.

1

Sample allocation

4.1 Create frame and select sample

Defines the sample size in each stratum according to different types of allocations (proportional, optimal - Neyman, uniform).

7 MAUSS-R

Interviewer workload management

4.3 Run collection

Includes the allocation of the providers to the interviewers, changes of interviewers and reallocation of the providers to the interviewers. It records when and how providers were contacted.

9 IDEV,eSTATISTIK.core, IRIA

Sample selection 4.1 Create frame and select sample

Selects the sample for this iteration of the collection. (GSBPM) Selects the sample from the sampling frame based on the selected type of sampling design. It provides a list of selected units as output, but the sample frame and allocation (or the sample size) are input data files.

8 R package sampling

Identification service

5.2. Classify and code

Provides identification of enterprise at the global level.

16

Release management

7.3 Manage release of dissemination products

Includes managing the timing of the release. It also includes the provision of products to customers (ministers, researchers, other) and managing access to confidential data by authorised user groups.

31

Table 2: Initial list of the statistical services

Identified additional candidates within the ESS for the next round of prioritisation are:

- CATI supporting platforms

- Quality management of the product

- Production workbench dashboard

Identified services that could be shared or are good candidates for sharing:

- PC-AXIS

Page 20: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

20

TITRE DU DOCUMENT

- Administrative data encryption (GSBPM 5.2)

- Selective data editing (GSBPM 4.3 and 5.4)

- IRIA (design, build, debug, run and manage all kind of surveys) (GSBPM: fully supported 3.1, 3.2,

3.4, 4.2, 4.3, 4.4, partially supported 2.1, 2.2, 2.3, 3.5, 3.6, 3.7)

- ATINE (generator of data processing applications for structural, aperiodic and small surveys)

(GSBPM: 5 except 5.2)

- Symmetric encryption keys management

- Services under development in ESSnet WP3: seasonal adjustment, questionnaire generator, and

metadata dissemination

Page 21: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

21

TITRE DU DOCUMENT

APPENDIX A: THE DETAILED REPORT OF THE SURVEY PERFORMED

A.1 SURVEY PROCESS

The survey was developed for the VIP.SERV ESSnet project on sharing common functionalities in the ESS,

specifically for the Work Package 4 - Identification of re-usable services and analysis of requirements.

Data were collected only with electronic questionnaire on 1ka (https://www.1ka.si/). After testing within

the Statistical Office of the Republic of Slovenia and testing within the project group in December 2016 the

corrections to the questionnaire were done. Data were collected in one month (from questionnaire launch

on 22 February 2017 to the end of data collection on 24 March 2017). During the data collection period

two reminders were sent. The first reminder was sent on 10 March 2017 and the second one on 15 March

2017. The deadline of the survey was 15 March 2017. It was still available online until 24 March 2017. The

timeline of the data collection is shown on the picture below. We did not use any additional actions to

increase the response rate. Results were analysed in a week.

Picture A.1: Time plan for the survey

A.1.1 RESPONDENTS SELECTION

The survey covered 38 countries. The questionnaire was sent to 17 ITDG members or observers. Where

we lacked the electronic address of the ITDG member or observer, we addressed the questionnaire to the

DIME member (7 respondents). In other cases we had to find the address on the NSI’s home page. For

Liechtenstein we did not find the appropriate electronic address.

A.1.2 QUESTIONNAIRE

The questionnaire was divided into three sections:

a. Section 1 encompassed the identification of the respondent, i.e. country and position within the NSI (IT, general methodology or other).

b. Section 2 offered the list of services with the corresponding GSBPM process. The respondents were asked to select at least five most attractive services for their NSIs. The list of services had a different sequence for each respondent to avoid bias. For each selected service the respondents had to decide and evaluate achievability and affordability with three grades, i.e. “no”, “some”, “very”, in the next question.

Page 22: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

22

TITRE DU DOCUMENT

c. In Section 3 the respondent was asked to list other services that were not on the list, but would be useful for the NSI. The second question was about listing the services that already exist in the NSI and could be good candidates for sharing.

d. At the end was the question about e-mail for the “thank you” mail.

Detailed content of the questionnaire is in Chapter A4.

A.1.3 RESPONSE RATE AND RESPONDENTS

Of the 38 NSIs invited to participate in the survey 71% responded. The response rate by different country

types is presented in the table below:

Country type The number of NSIs

surveyed

Response rate

EU member 28 79%

EU candidate 5 60%

EU potential 2 50%

EFTA member* 3 33%

Total 38 71%

* excluding Liechtenstein

Table A.1: Response rate by the type of respondents

The target response population was ITDG members. According to the position in the NSI we collected the

responses not only from IT directors and experts, but also from directors of methodology and other staff.

General methodology staff was not among the respondents.

Picture A.2: Number of responses

Page 23: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

23

TITRE DU DOCUMENT

A.1.4 EVALUATION OF RESPONSES

Responses were collected according to the AAA (Attractiveness, Achievability, Affordability) framework.

For each selected service (attractiveness) we designated 0 points for no selection and 10 points for

selection. The evaluation of achievability and affordability had three grades. For answer “no” we

designated 0 points, for answer “some” 5 points and for answer “very” 10 points. The results presented in

this report are not weighted.

A.2 THE RESULTS

Services ratings from all respondents according to the method described in point 6 are presented in the

following table. The results are normalised according to the number of responses. The data are listed by

column Total in decreasing order:

Service Nr. of the

service from

the initial list

Attractive Achievable Affordable Total

Disclosure control 26 4.44 3.89 2.78 11.1

Record linking 13 4.44 3.709 2.78 10.9

SDMX Coding and transform 14 4.07 3.89 2.22 10.2

Microdata access (confidentiality on

the fly) 27

4.07 3.52 2.22 9.81

Content data validation 12 4.07 3.33 2.04 9.44

Imputations 18 3.70 2.78 2.59 9.07

Error correction 19 3.70 2.78 2.41 8.89

Questionnaire generator 3 3.33 2.78 2.04 8.15

Seasonal adjustment/Time series

processing 28

3.33 2.22 1.85 7.41

Web questionnaire - visualization 6 2.96 2.59 1.67 7.22

Coding/using machine learning 4 2.96 2.22 1.85 7.04

Structural data validation 11 2.59 2.41 1.67 6.67

Macrodata validation 25 2.96 2.22 1.48 6.67

Outlier detection 17 2.22 2.04 1.48 5.74

Weights calculation 20 2.22 2.04 1.30 5.56

Geospatial visualisation 29 2.22 1.85 1.30 5.37

Standard error estimation 22 2.22 1.48 1.11 4.81

Design workflow 5 1.85 1.67 1.11 4.63

Administrative data encryption 15 1.85 1.67 1.11 4.63

Tabulation 23 1.85 1.67 1.11 4.63

Metadata dissemination 32 1.85 1.67 0.93 4.44

Statistical chart generator - statistical

data visualization 30

1.48 1.48 0.93 3.89

Manage response burden 10 1.48 1.30 0.74 3.52

Graphical analysis 24 1.11 0.93 0.93 2.96

Data sources management 2 1.11 0.93 0.74 2.78

Aggregation 21 1.11 0.93 0.56 2.59

Page 24: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

24

TITRE DU DOCUMENT

Service Nr. of the

service from

the initial list

Attractive Achievable Affordable Total

Manage requirements 1 0.74 0.56 0.37 1.67

Sample allocation 7 0.74 0.37 0.37 1.48

Interviewer workload management 9 0.37 0.19 0.19 0.74

Sample selection 8 0.37 0.37

Identification service 16

0

Release management 31

0

Table A.2: Service ratings from the survey

The two most attractive services are disclosure control and record linking. The two most achievable

services are disclosure control and SDMX coding and transform. The two most affordable services are the

same as the most attractive services. For the following services the sequence is different for each criterion.

None of the respondents selected identification service and release management.

Page 25: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

25

TITRE DU DOCUMENT

Picture A.3: Service ranking

Page 26: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

26

TITRE DU DOCUMENT

A.2.1 EVALUATION OF DIFFERENCES BETWEEN IT STAFF AND OTHERS

The responses from IT directors and IT staff were summed up and compared to the answers from other

types of respondents. From both groups we selected the top 10 selected services. Bolded responses are

the same in both groups, but are in different place of order. This shows that between different divisions in

the NSI there are differences regarding which service is the best choice.

IT directors and IT staff Others (director of methodology, statistician,

other)

1. Disclosure control

2. SDMX Coding and transform

3. Content data validation

4. Questionnaire generator

5. Web questionnaire - visualization

6. Record linking

7. Microdata access (confidentiality on the fly)

8. Outlier detection

9. Error correction

10. Coding/using machine learning

1. Record linking

2. Imputations

3. Error correction

4. Microdata access (confidentiality on the fly)

5. Design workflow

6. Disclosure control

7. Seasonal adjustment/Time series processing

8. Weights calculation

9. Standard error estimation

10. Tabulation

Table A.3: Differences between IT staff and others

A.2.2 EVALUATION OF DIFFERENCES BY THE TYPE OF STATE (MEMBERS, EFTA,

CANDIDATES, ETC.)

A test if the difference exists also by the type of state was done. The results of top 10 services are shown in

the following table. Because of the small number of respondents that are not EU members, they cannot be

inferred to any significance.

EU members EU candidate EU potential EFTA

1. Record linking

2. Disclosure control

3. Content data

validation

4. SDMX Coding and

transform

5. Microdata access

(confidentiality on the

fly)

6. Imputations

7. Macrodata validation

8. Seasonal

adjustment/Time

series processing

9. Error correction

10. Structural data

validation

1. Disclosure control

2. Error correction

3. Questionnaire

generator

4. Record linking

5. SDMX Coding and

transform

6. Microdata access

(confidentiality on

the fly)

7. Imputations

8. Structural data

validation

9. Coding/using

machine learning

10. Outlier detection

1. Error correction

2. Imputations

3. Content data

validation

4. Seasonal

adjustment/Time

series processing

5. Standard error

estimation

6. Record linking

7. Coding/using

machine learning

1. Questionnaire

generator

2. SDMX Coding

and transform

3. Microdata access

(confidentiality on

the fly)

4. Web

questionnaire -

visualization

5. Administrative

data encryption

Page 27: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

27

TITRE DU DOCUMENT

Table A.4: Differences by the type of state

A.2.3 EVALUATION OF THE OPEN QUESTIONS

At the end of the questionnaire two open questions were set up.

a. The first question was to name additional services that contribute to the goals of the NSI. Two answers were collected:

- From a candidate country from the IT Director: Online questionnaire generator; CATI supporting platforms

- From a potential candidate country from the Director of Methodology: Graphical analysis, Metadata dissemination, Administrative data encryption

b. The second question was to name the existing services in the NSI that could be shared. The answers provided by IT directors and IT experts are collected in the table below:

NSI Response

Portugal Administrative data encryption (GSBPM 5.2) - The application CDA

(Administrative data coding) creates an encrypted code for an ID that

identify a record in a dataset. It can be used for national ID, driving

licences, etc. This encrypted ID (sha64) should be shared between

organizations along the time of providing administrative information.

With that, one can assure a record linkage, maintaining privacy.

Spain 1.-Selective Data Editing (GSBPM 4.3 and 5.4): Process to select

efficiently records that should be checked. Stand-alone

2.-IRIA (GSBPM : Fully supported 3.1, 3.2, 3.4, 4.2, 4.3, 4.4, Partially

supported 2.1, 2.2, 2.3, 3.5, 3.6, 3.7). As a service. Design, build, debug,

run and manage all kind of surveys.

3.-ATINE (GSBPM : 5 except 5.2 ) : Generator of data processing

applications for structural, aperiodic and small surveys

France The 3 services already addressed by the ESSNET in WP 3: seasonal

adjustment, questionnaire generator, metadata dissemination

United Kingdom The UK ONS is currently re-architecting its IT estate based around a set

of common platforms to deliver the phases of the GSBPM, i.e. collect,

manage and disseminate. This is based around a micro-service

architecture to share services internally, and externally as required. In

the future, once work has been completed to develop and deliver these

platforms, it may be possible to share on a wider basis or to consume

services as described in this questionnaire.

Sweden The PC-axis platform could probably become a good shared service.

Switzerland Symmetric encryption keys management

Table A.5: Evaluation of the open questions

A.3 CONCLUSION

The results of this survey reveal that there is interest in services for disclosure control, record linking,

SDMX coding and transform, microdata access (confidentiality on the fly) and content data validation.

There was no interest in identification service and release management.

Page 28: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

28

TITRE DU DOCUMENT

Most of the services were on the list. Some countries listed their solutions that could be shared.

The difference between evaluations by the AAA framework shows that respondents are somehow

reserved on affordability of the services.

The results also suggest that there are differences in opinion between different divisions within the same

NSI that would need to be addressed in the future.

Page 29: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

29

TITRE DU DOCUMENT

A.4 THE QUESTIONNAIRE CONTENT

ESSnet - WP4: Identification of Re-usable Services and Analysis of Requirements

Q1 - Please select your respective country.

AL - Albania

AT - Austria

BA - Bosnia and Herzegovina

BE - Belgium

BG - Bulgaria

CH - Switzerland

CY - Cyprus

CZ - Czech Republic

DE - Germany

DK - Denmark

EE - Estonia

EL - Greece

ES - Spain

ESTAT - Eurostat

FI - Finland

FL - Liechtenstein

FR - France

HR - Croatia

HU - Hungary

IE - Ireland

IS - Iceland

IT - Italy

Kosovo - Kosovo

LT - Lithuania

LU - Luxembourg

LV - Latvia

ME - Montenegro

MK - The former Yugoslav Republic of Macedonia

MT - Malta

NL - The Netherlands

NO - Norway

PL - Poland

PT - Portugal

RO - Romania

RS - Serbia

SE - Sweden

SI - Slovenia

Page 30: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

30

TITRE DU DOCUMENT

SK - Slovakia

TR - Turkey

UK - United Kingdom

Q2 - What is your current job title?

IT director

IT expert

Director of methodology

General methodologist

Statistician

Other

Q3 - Select at least five most attractive statistical services that your organization would like to

reuse. You can find the detail definition of the services in the file to the attached to the invitation

letter.

Select at least 5 services.

Manage requirements (GSBPM

1.6)

Data sources management

(GSBPM 2.2, 2.3 and 2.5)

Questionnaire generator

(GSBPM 2.3 and 3.1)

Coding/using machine learning

(GSBPM 2.5, 4.3 and 5.2)

Design workflow (GSBPM 2.6

and 3.7)

Web questionnaire -

visualization (GSBPM 3.1)

Sample allocation (GSBPM 4.1)

Sample selection (GSBPM 4.1)

Interviewer workload

management (GSBPM 4.3)

Manage response burden

(GSBPM 4.3 and 4.4)

Structural data validation

(GSBPM 4.3 and 5.3)

Content data validation

(GSBPM 4.3 and 5.3)

Record linking (GSBPM

5.1)

SDMX Coding and

transform (GSBPM 5.2)

Administrative data

encryption (GSBPM 5.2)

Identification service

(GSBPM 5.2)

Outlier detection (GSBPM

5.3)

Imputations (GSBPM 5.4)

Error correction (GSBPM

5.4)

Weights calculation

(GSBPM 5.6)

Aggregation (GSBPM 5.7)

Standard error estimation

(GSBPM 5.7)

Tabulation (GSBPM 6.1)

Graphical analysis (GSBPM

6.2)

Macrodata validation (GSBPM

6.2)

Disclosure control (GSBPM

6.4)

Microdata access

(confidentiality on the fly)

(GSBPM 6.4)

Seasonal adjustment/Time

series processing (GSBPM 6.5)

Geospatial visualisation

(GSBPM 7.2)

Statistical chart generator -

statistical data visualization

(GSBPM 7.2)

Release management (GSBPM

7.3)

Metadata dissemination

(GSBPM 7.3)

Q4 - Please assess the preselected services according to the achievability and affordability.

Page 31: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

31

TITRE DU DOCUMENT

ACHIEVABLE

How likely the service

corresponds to strategic and

architecture goals of your

organisation?

AFFORDABLE

How likely the service

reuse/implementation costs

will be reasonable?

No Some Very No Some Very

Manage requirements - Enables description of

"As-Is" business process with information on how

current statistics are produced, highlighting any

inefficiencies and issues to be addressed.

Data sources management - Specifies all relevant

metadata, ready for use later in the statistical

business process.

Questionnaire generator - Includes the design of

collection instruments, questions and response

templates.

Coding/using machine learning - Codes the input

data.

Design workflow - The service where the

workflow from data collection to dissemination

can be designed.

Web questionnaire - visualization - The

questionnaire is generated or built based on the

design specifications created during the "Design"

phase.

Sample allocation - Defines the sample size in

each stratum according to different types of

allocations (proportional, optimal - Neyman,

uniform).

Sample selection - Selects the sample from the

sampling frame based on the selected type of

sampling design.

Interviewer workload management - Includes the

allocation of the providers to the interviewers,

changes of interviewers and reallocation of the

providers to the interviewers.

Manage response burden - Includes the

management of the providers involved in the

current collection, ensuring that the relationship

between the statistical organisation and data

Page 32: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

32

TITRE DU DOCUMENT

ACHIEVABLE

How likely the service

corresponds to strategic and

architecture goals of your

organisation?

AFFORDABLE

How likely the service

reuse/implementation costs

will be reasonable?

No Some Very No Some Very

providers remains positive.

Structural data validation - Provides basic

validation of the structure and integrity of the

information received (right format of files and

expected fields).

Content data validation - Provides validation of

the content based on the validation rules.

Record linking - Matching/record linkage

routines, with the aim of linking micro or macro

data from different sources.

SDMX Coding and Transform - The business

outcome of using this service is that an existing

dataset not fit for particular needs can easily be

re-coded to be fit for that purpose.

Administrative data encryption - Provides the

mechanism that converts data for the purpose of

disabling the recognition of the unit.

Identification service - Provides identification of

the enterprise at the global level.

Outlier detection - Finds outliers in line with the

predefined rules.

Imputations - If data are considered incorrect,

missing or unreliable, new values are inserted

according to different methods.

Error correction - If data are considered

incorrect, missing or unreliable, new values are

inserted according to deterministic rules.

Weights calculation - Weights can be used to

"gross-up" results to make them representative of

the target population, or to adjust them for non-

response in total enumerations, or to adjust them

to population values by the auxiliary population

variables.

Page 33: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

33

TITRE DU DOCUMENT

ACHIEVABLE

How likely the service

corresponds to strategic and

architecture goals of your

organisation?

AFFORDABLE

How likely the service

reuse/implementation costs

will be reasonable?

No Some Very No Some Very

Aggregation - Creates aggregate data and

population totals from microdata or lower-level

aggregates.

Standard error estimation - Estimates standard

errors for aggregate data based on sampling

design, estimator, weights.

Tabulation - From the database of aggregate data

the tables in different forms are prepared.

Graphical analysis - Graphical presentation of the

data, aiming at exploring cross-sectional as well

as longitudinal data distribution.

Macrodata validation - Validation of the already

aggregated data.

Disclosure control - Ensures that the data (and

metadata) to be disseminated do not breach the

appropriate rules on confidentiality.

Microdata access (confidentiality on the fly) - In

the service the confidentiality routine is applied

dynamically when the data items are retrieved,

after any selection.

Seasonal adjustment/Time series processing -

The impacts of the season and calendar is

eliminated from the time series if the impacts are

characteristic and relevant.

Geospatial visualisation - Offers an interactive

cartographic window to visualise a selection of

statistical data on thematic maps with a spatial

querying tool to delineate user-defined areas of

interest for analysis and display of statistical data.

Statistical chart generator - statistical data

visualization - Offers interactive charts visualising

a selection of statistical data on charts with a

querying tool to delineate user-defined areas of

interest for analysis and display of statistical data.

Page 34: ESSnet SCFE deliverable D4-1 - European Commission · SCFE ESSnet SCFE DELIVERABLE D4-1 Initial list of services that are candidates for re-use in ESS Project acronym: Project title:

34

TITRE DU DOCUMENT

ACHIEVABLE

How likely the service

corresponds to strategic and

architecture goals of your

organisation?

AFFORDABLE

How likely the service

reuse/implementation costs

will be reasonable?

No Some Very No Some Very

Release management - Includes managing the

timing of the release.

Metadata dissemination - Provides dissemination

of information on the source, concept, definition,

methodology and details on collection,

processing, interpretation and dissemination as

well as availability of data.

Q5 - Suggest other services for reuse that could contribute to the goals of your organisation.

Q39 - Is there any service in your organisation that could be shared or which you consider to be a

good candidate for sharing? Specify the name of the service, enumerate it according to GSBPM and add a short description of the service.

email - Please enter your email address.