ESSnet SCFE DELIVERABLE D4-2 · 3 TITRE DU DOCUMENT 1. INTRODUCTION The objective of WP4 - Identification of re-usable services and analysis of requirements was to identify services

SHARING COMMON

FONCTIONALITIES ESSNET

ESSnet SCFE DELIVERABLE D4-2

GRAPHAN – Graphical data analyses service requirements

analys

Project acronym:

SCFE

Project title:

“Sharing common functionalities in the ESS”

Name(s), title(s) and organization or the author(s):

Rudi Seljak

Zvone Klun

Simon Pelicon

Tomaž Špeh

Statistical Office of the Republic of Slovenia

Tel: +386 1 241 64 00

e-mail: [email protected]

This document is licensed under a Creative Commons License: Attribution-ShareAlike 4.0 International

https://creativecommons.org/licenses/by-sa/4.0/legalcode

2

TITRE DU DOCUMENT

1. Introduction 3

1.1. Purpose 3

1.2. References 3

2. Service description 4

2.1. Business Function Identification 4

2.1.1. Service name 4

2.1.2. Service version 4

2.1.3. Business Process - GSBPM 4

2.1.4. Service description 4

2.1.5. Purpose - Business Goals 5

Figure 1: Graphical analysis service scope 5

2.2. Outcomes 6

2.3. Input, Output metadata 6

2.3.1. GSIM objects 6

2.3.1.1. Input 6

2.3.1.2. Output 6

2.4. Description of the selected methods 6

2.4.1. Method 1 6

2.4.2. Method 2 6

2.4.3. Method 3 7

2.5. Pseudo code 7

3. Specific requirements 13

3.1. User interfaces 13

3.2. Use case diagrams 17

3.3. Activity diagram 18

3.4. Class diagram 19

3.5. Design constraints 19

3.6. State diagram - rest service interface 20

3

TITRE DU DOCUMENT

1. INTRODUCTION

The objective of WP4 - Identification of re-usable services and analysis of requirements was to identify

services that can be candidates for re-use in the ESS and to analyse the functional and technical

requirements of one service for re-use in at least 3 ESS members. This document contains a complete list

of requirements taking into account similarities and differences of requirements among the 3 ESS

members.

The theoretical framework from the first task of the WP4 has been practically implemented for the case of

selected statistical service GRAPHAN - Graphical data analysis service. The main criterion for the selection

of the service was its potential for wide usage across the statistical domains and organisations, as well as

the existence of the firm theoretical framework for its theoretical, methodological description. The

selection of the service was discussed and agreed by ESSNet partners, TF SERV and Eurostat.

Functional and technical requirements of the selected statistical service were analysed on the basis of the

mutually agreed methodology. The main principle was that the analyses should be based on the practical

experiences of statistical organisations. In the process of setting up the harmonised methodology the

already available methodological documents were studied and the results of already performed projects

and studies were taken into account.

The Netherland (CBS), Hungary (HCSO) and Croatia (Croatian Bureau of Statistics) participate in the

process of analysing functional and technical requirements. Proposals and comments were taken into

consideration in the final document.

1.1. PURPOSE

The purpose of this document is to give a detailed description of the requirements for the Graphical data

analysis service. It illustrates the purpose and complete declaration for the development of the Graphical

data analyses system. It also explains system constraints, interface and potential interactions with other

external services. This document is primarily intended to be proposed to statistical institutions for review

and coordination of requirements and as a reference for developing the first version of the system for the

development team. This document includes the pseudo code for developing service methods and REST

Api specification for developing REST service.

1.2. REFERENCES

Generic Statistical Business Process Model (GSBPM)

The GSBPM provides a reference framework for classifying and understanding the statistical production

activities of an NSI. It covers the entire production cycle of official statistics, including their evaluation and

gathering of user needs, the design and build aspects, and the collection, production and dissemination of

statistics. The GSBPM is a common reference framework for all NSIs and it is widely used within them. It is

the key instrument used to identify and define services.

Generic Statistical Information Model (GSIM)

The GSIM provides a reference framework and conceptual information objects for statistics. One of the

key aspects of the GSIM is that it provides a common language to describe statistical information,

therefore enabling sharing and modernisation. The GSIM is used to describe the conceptual inputs and

outputs of statistical services.

4

TITRE DU DOCUMENT

Common Statistical Production Architecture (CSPA)

The CSPA provides a framework, principles and guidelines to develop statistical services. The aims of the

CSPA are to foster international collaboration to develop and share interoperable, reusable statistical

services. The CSPA is based on the principles of Service Oriented Architecture (SOA) and builds on the

GSBPM and the GSIM in order to define the statistical context for the SOA approach.

ESS Enterprise Architecture Reference Framework (ESS EARF)

The ESS EARF provides a series of artefacts that support and guide the implementation of Vision 2020.

The ESS EARF provides a Capability Model and a series of application building blocks for the ESS, as well

as related architectural design principles. The ESS uses the EARF for governance of programmes and

projects, ensuring that the deliverables of these are aligned with the EARF artefacts.

ESS Statistical Production Reference Architecture (SPRA)

The SPRA expands the Information System Architecture of the ESS EARF. It provides principles, examples

and guidance on how the different application building blocks interrelate and what services they support.

The SPRA can be used to guide the identification and definition of statistical services; priority should be

given to starting from the business architecture domain and the GSBPM.

2. SERVICE DESCRIPTION

2.1. BUSINESS FUNCTION IDENTIFICATION

2.1.1. SERVICE NAME

GRAPHAN - Graphical data analysis service

2.1.2. SERVICE VERSION

Version 1.0 - Initial release

2.1.3. BUSINESS PROCESS - GSBPM

GSBPM 5.0 – Sub-process 5.3 [Review and Validate]

GSBPM 6.0 – Sub-process 5.3 [Validate outputs]

2.1.4. SERVICE DESCRIPTION

The aim of this service is to provide a tool for graphical representation of the data, enabling statisticians a

detailed insight into the data distribution, leading to detection of its deviations and suspicious values or

patterns. This procedure is mainly intended for detecting irregularities on the macro level; therefore, we

can classify it into the so-called macro editing part of the process. With this tool the suspicious and

potentially erroneous values are only detected. Other tools should be used for data correction.

5

TITRE DU DOCUMENT

On the general level the service should provide two basic types of analyses: cross-sectional and

longitudinal. The cross-sectional analysis explores the data of only one survey instance, mostly exploring

the univariate or multivariate data distribution. On the other hand, the data set that is explored with the

longitudinal analysis consists of the data from several survey instances. This analysis hence focuses on the

longitudinal aspect of the data distribution, aiming at detecting the irregularities in the temporal data

distribution.

2.1.5. PURPOSE - BUSINESS GOALS

The graphical analysis as an activity can be placed in different stages of the statistical process. It can for

instance be used at the very beginning of the data processing cycle, exploring the raw incoming data, or at

the very end of the processing, when the final aggregated data are verified before the tabulation and

dissemination activities. The main goal is to provide visual representation of the data, mostly on the

aggregated level, but it can also be used for visual representation of the microdata for the selected unit

(e.g. to explore its movement through time) or for visual representation of both levels (micro and macro)

together (e.g. to compare temporal movement of the microdata of the selected unit and the temporal

movement of the aggregate).

The basis for the graphical analyses is the incoming set of microdata. The visual representation should be

enabled on the level of the whole data set or on the level of the selected statistical domain.

Although this activity is classified as one of the “macro editing activities”, it is not aimed to directly

validate the aggregates (e.g. with the macro validation rules or more specifically VTL validation rules).

Another service should be defined and specified for this purpose.

Although this activity can be well used to (visually) detect outlying values in the data distribution, its aim

is not to explicitly list the outlying values by using the calculation procedure. Another tool, where the

different outlier detection methods will be incorporated, should be used for that purpose.

FIGURE 1: GRAPHICAL ANALYSIS SERVICE SCOPE

6

TITRE DU DOCUMENT

2.2. OUTCOMES

The outcome of the graphical data analyses service is a business function providing survey statisticians a

flexible and user-friendly tool that will enable quick insight into the data distribution and consequently

enable quick detection of eventual irregularities in the data on micro and on macro level. The service

should be a “methods-based service”, meaning that it should encompass several different methods for

graphical visualisation. The tool should be opened in the sense that new methods could easily be added.

2.3. INPUT, OUTPUT METADATA

2.3.1. GSIM OBJECTS

2.3.1.1. INPUT

The service will use the following inputs:

● Input microdata dataset to be analysed

● Structural metadata (name and description of the table, variables, etc.)

● Processing metadata (processing rules)

2.3.1.2. OUTPUT

The service should provide the following outputs:

● Machine readable and presentable charts and/or other (required) images

● Output tables with accompanying results (e.g. analysed aggregates, correlation coefficients,

regression coefficients, etc.)

2.4. DESCRIPTION OF THE SELECTED METHODS

2.4.1. METHOD 1

Notation: M1

Title: Scatter plot for selected variables

Type of analysis: Cross-sectional

Description: A scatter plot for two selected variables is plotted. The scatter plot can be plotted for the

variables’ values in the entire input dataset or only for the selected domain determined by the selected

categorical variable and its unique value. The range of values for which the scatter plot is plotted can

additionally be limited by the given logical expression.

2.4.2. METHOD 2

Notation: M2

Title: Bar chart of values of statistics in the selected domain categories

Type of analysis: Cross-sectional

7

TITRE DU DOCUMENT

Description: For the selected statistics and the selected domain (determined by the categorical variable),

a bar chart is created where the value of the statistics for each domain variable category is presented. The

values of the selected statistics are calculated from the input dataset on the appropriately provided

process metadata. The range of values for which the bar chart is plotted can additionally be limited by the

given logical expression.

2.4.3. METHOD 3

Notation: M3

Title: Line chart of values of statistics with and without selected unit

Type of analysis: Longitudinal

Description: For the selected statistics, selected time period, selected domain (determined by the

categorical variable) and selected unit, a line chart is created where for each survey reference period

(inside the given upper and lower time limits) the value of the statistics with and without selected unit is

presented. The values of the selected statistics are calculated from the input dataset on the appropriately

provided process metadata. The range of values for which the line chart is plotted can additionally be

limited by the given logical expression.

2.5. PSEUDO CODE

OBTAIN the respective process metadata

READ the input data set

IF METHOD=M1 THEN

OBTAIN the process metadata for method M1

● VAR1: Variable 1

● VAR2: Variable 2

● LOG_COND: Logical condition to reduce the dataset to be analysed (optional)

● DOM_VAR: Domain variable (optional)

● DOM_CAT: Value of the domain variable (category) to determine domain »cell« where the analyses

will be performed (optional)

IF ( LOG_COND Is Not Null) THEN

DATASET → DATASET (Where LOG_COND)

END IF

IF (DOM_VAR Is Not Null) THEN

DATASET → DATASET (Where DOM_VAR=DOM_CAT)

END IF

PLOT scatter plot from DATASET

CREATE output image of scatter plot

8

TITRE DU DOCUMENT

COMPUTE covariance matrix

CREATE output table COVMAT

VAR1 VAR2

VAR1 Var1 Cov1,2

VAR2 Cov2,1 Var2

CREATE output table OUTTABLE (IDENT, VAR1, VAR2)

END IF

IF METHOD=M2 THEN


● STAT_TYPE: Type of the statistical aggregate to be presented ; select from the following list: TOTAL,

AVERAGE, MEDIAN, RATIO OF TOTALS, CHAINED INDEX

● IF STAT_TYPE IN (“TOTAL“, “AVERAGE“, “MEDIAN”, “CHAINED INDEX“) THEN OBTAIN

■ VAR1: Variable 1

■ ELSE IF STAT_TYPE=“RATIO OF TOTALS“ THEN OBTAIN




● DOM_VAR: Domain variable

● W: Weight (optional)

IF NOT ( LOG_COND Is Null) THEN


END IF

CALCULATE values of the statistics in the DATASET for each domain category

IF STAT_TYPE= “TOTAL” and WEIGHT is Null THEN

FOR each category from DOM_VAR

9

TITRE DU DOCUMENT

(n: number of units in (reduced) DATASET)

END FOR

END IF

IF STAT_TYPE= “TOTAL” and WEIGHT is Not Null THEN


(𝑛: number of units in (reduced) DATASET

𝑤𝑖: weight)

END FOR

END IF

IF STAT_TYPE= “AVERAGE” and WEIGHT is Null THEN THEN



END FOR

END IF

IF STAT_TYPE= “AVERAGE” and WEIGHT is Not Null THEN


(n: number of units in (reduced) DATASET

10

TITRE DU DOCUMENT

𝑤𝑖 : weight)

END FOR

END IF

IF STAT_TYPE=“RATIO OF TOTALS“ and WEIGHT is Null THEN



END FOR

END IF

IF STAT_TYPE=“RATIO OF TOTALS“ and WEIGHT is Not Null THEN



𝑤𝑖 : weight)

END FOR

CREATE output table OUTTABLE (DOM, DOM_CAT, STAT_VAL)

PLOT bar chart from OUTTABLE

CREATE output image of bar chart

END IF

IF METHOD=M3 THEN


● STAT_TYPE: Type of the statistical aggregate to be presented ; select from the following list: TOTAL,

AVERAGE, RATIO OF TOTALS, CHAINED INDEX

11

TITRE DU DOCUMENT

● IF STAT_TYPE IN (“TOTAL“,“AVERAGE “,“CHAINED INDEX“) THEN OBTAIN


■ ELSE IF STAT_TYPE=“RATIO OF TOTALS“ THEN OBTAIN




● DOM_VAR: Domain variable

● DOM_CAT: Value of the domain variable (category) to determine domain »cell« where the analyses

will be performed (optional)

● VAR_REF: Date variable, which provides the reference period of the survey

● DATE_S: Value of the variable VAR_REF that represents the starting reference period of the time

series to be presented

● DATE_E: Value of the variable VAR_REF that represents the ending reference period of the time series

to be presented

● VAR_ID: Variable that represents the unique (in one reference period) identifier of the units

● ID_UNIT: Identification of the selected unit

● W: Weight (optional)

IF NOT ( LOG_COND Is Null) THEN


END IF

IF NOT (DOM_VAR Is Null) THEN

DATASET → DATASET (Where DOM_VAR=DOM_CAT)

END IF

IF STAT_TYPE= “TOTAL” THEN

FOR each value t of the variable VAR_REF (where VAR_REF>= DATE_S

and VAR_REF<= DATE_E)

DATASET → DATASET (Where VAR_REF =t)

IF Weight is Null THEN


DATASET → DATASET (Where VAR_REF =t AND VAR_ID <> ID_UNIT)


12

TITRE DU DOCUMENT

IF Weight is Not Null THEN


𝑤𝑖 : weight)



𝑤𝑖 : weight)

END FOR

END IF

IF STAT_TYPE= “ AVERAGE” THEN







END FOR

END IF

13

TITRE DU DOCUMENT

IF STAT_TYPE= “RATIO OF TOTALS” THEN







END FOR

END IF

CREATE output table OUTTABLE (REF_PERIOD, DOM, DOM_CAT, STAT_WITH, STAT_WITHOUT)

PLOT two line charts from OUTTABLE

CREATE output image of line chart

END IF

3. SPECIFIC REQUIREMENTS

This section contains all of the functional and quality requirements of the system. It gives detailed

description of the system and all its features.

3.1. USER INTERFACES

● Login and select Microdata & Method:

14

TITRE DU DOCUMENT

15

TITRE DU DOCUMENT

● Parameters for selected method:

16

TITRE DU DOCUMENT

● Example of result:

17

TITRE DU DOCUMENT

3.2. USE CASE DIAGRAMS

18

TITRE DU DOCUMENT

3.3. ACTIVITY DIAGRAM

19

TITRE DU DOCUMENT

3.4. CLASS DIAGRAM

3.5. DESIGN CONSTRAINTS

The graphical analysis service is designed as a passive and stateless service following the principles of a

REST design pattern. Key principles governing the service architecture design are as follows.

• Service loose coupling: Architectural design excludes direct interaction between services to minimize

dependencies.

• Service abstraction and autonomy: The service is a standalone component and internal logic is hidden

from the outside world. Services have control over the logic they encapsulate.

20

TITRE DU DOCUMENT

• Service statelessness: in order to design scalable services by separating them from their state data

whenever possible. This results in reduction of the resources consumed by a service as the actual state

data management is delegated to an external component or to an architectural extension. By reducing

resource consumption, the service can handle more requests in a reliable manner.

• Service granularity: The service is designed to perform its function with an optimal scope and on the

right granular level. The service has no functions in scope beyond the execution of graphical analyses.

• Service reusability: Design and documentation of the service promotes shareability and reusability In

line with the CSPA principles.

The envisioned high level target architecture is presented in the diagram below.

The main underlying architectural principles for this software are object-orientation and REST (and I18N

principles), for these reasons:

● Object-orientation means thinking in business objects, actors, “nouns”. This is also known as

“domain-driven design”. It mimics real-world thinking and is most typically the main underlying

principle of the programming language in use, thus easy to implement.

● REST is an architecture principle that reduces service interfaces (as in e.g. web client – backend

server communication) to CRUD (create / read / update / delete) operations on business objects,

actors, “nouns”.

● I18N or internationalization / localization which typically means preparing the software to work

with different UI languages.

To apply these principles, actual software requirements specifications are composed by using commonly

used UML diagrams plus UI mockups. For each aspect of the system (e.g. for each use case) artefacts are

presented below.

3.6. STATE DIAGRAM - REST SERVICE INTERFACE

21

TITRE DU DOCUMENT

CRUD chart operation

Create new chart

Start/Stop chart analysis

22

TITRE DU DOCUMENT

Show chart or chart table data

23

TITRE DU DOCUMENT

24

TITRE DU DOCUMENT

25

TITRE DU DOCUMENT

26

TITRE DU DOCUMENT

27

TITRE DU DOCUMENT

28

TITRE DU DOCUMENT

29

TITRE DU DOCUMENT

30

TITRE DU DOCUMENT

31

TITRE DU DOCUMENT

32

TITRE DU DOCUMENT

33

TITRE DU DOCUMENT

34

TITRE DU DOCUMENT

35

TITRE DU DOCUMENT

36

TITRE DU DOCUMENT

37

TITRE DU DOCUMENT

Documents

ESSnet SCFE DELIVERABLE D4-2 · 3 TITRE DU DOCUMENT 1. INTRODUCTION The objective of WP4 - Identification of re-usable services and analysis of requirements was to identify services