View
215
Download
1
Category
Tags:
Preview:
Citation preview
Eurostat business process (data processing)
&CVD
October 2007
2
ContentShown today:1. Eurostat business process (data processing) &
CVD – main presentation shown today1. Proposed business model2. Its correspondence to CVD architecture
For reference:1. List of sub-processes and sub-sub-processes2. CVD modules and their relation to business
sub-processes and sub-sub-processes3. CVD modules brief description Implementation
modes and availability schedule
3
Eurostat business process(data processing)
Process
Sub-process (or sub-sub-process)
Sub-process (or sub-sub-process) without software development component
4
ProcessesManage meta-information
5
Disseminate
4
Validate
2
Analyse
3
Collect
1
1. Proposal for discussion
2. Pick-and choose & mix and match
• No order in execution although the numbering follows typical order logic.
5
Data files
MH
TDS Statistical data and metadata Internet Portal
NUI
DL
CVD MANAGER
Pre-treated data
Validated data
Processed data
Reference Environment
data
BB
Domain specificsoftware
DR / DL
BB
Domain specific software
DR / DL
BB
Domain specificsoftware
ASSIST
User support
EDAMIS
CVD ARCHITECTURE
6
Notes
• Each BB can be run in batch (CVD) and interactive mode (stand alone)
• TDS, EDAMIS, NUI, MH, DL are (or will be) compulsory
• BBs a set of tool to mix and match• Domain specific software – procedures that are
unique to a few statistical applications and benefits from developing a generalised solution considered nonexistent
7
Cooperate with providers
1.4
Acquire domain intelligence
3.1
Set up collection1.1
Run collection1.2
Load data1.3
Edit2.1
Detect & treat outliers2.2
Impute2.3
Derive new variables2.4
Integrate and load data
2.5
Prepare tables forDissemination
3.5
Interpret and explain3.4
Check quality3.3
Produce statisticsor indicators
3.2
Manage customer queries
4.2
Produce products4.1
Collect1
Disseminate4
Analyse3
Validate2
Manage meta-Information
5
8
Data files
MH
TDS Statistical data and metadata Internet Portal
NUI
DL
CVD MANAGER
Pre-treated data
Validated data
Processed data
Reference Environment
data
BB
Domain specificsoftware
DR / DL
BB
Domain specific software
DR / DL
BB
Domain specificsoftware
COLLECT
VALIDATE
ANALYSE
DISSEMINATE
ASSIST
User support
EDAMIS
MANAGE META-INFORMATION
9
Summary of CVD modules & business process
+
Module especially designed for the sub-process
Module designed for other sub-process but could be used for this sub-process as well if the functionalities are appropriate
Modules that are used throughout many processes
Other uses may be possible in specific cases
10
TD
S
CV
D M
AN
AG
ER
ED
AM
IS
load
er B
B
read
er B
B
edit
ing
BB
ou
tlie
rs B
B
imp
uta
tio
n B
B
der
ivat
ion
BB
eco
no
mic
ind
ices
BB
GS
AS
T
seas
on
al a
dj B
B
AN
AL
YT
ICA
L
con
fid
enti
alit
y B
B
NU
I
AS
SIS
T
MH
1. COLLECT
1.1. Set up collection 1.2. Run collection 1.3. Load data +
2. VALIDATE
2.1. Edit + + + +2.2. Detect and treat outliers + + + +2.3. Impute + + + +2.4. Derive + + +2.5. Integrate and load data + +
3. ANALYSE
3.2. Produce statistics or indicators + + + 3.3. Check quality + + + + + + +3.4. Interpret and explain + + + + 3.5. Prepare tables for dissemination + + + +
4. DISSEMINATE
4.1. Produce products 4.2. Manage customer queries
5. MANAGE METAINFORMATION
11
END OF MAIN MODULE
12
Navigation
Previous slide
• Most of boxes and frames on presentation contain links
• Names of the CVD modules and BBs are usually link enabled
• Names of the processes as well
13
START OF SUB-SUB-PROCESS
LIST
14
Manage provider relationship
1.4.1
Maintain provider information
1.1.7
Manage provider burden across surveys
1.4.2
Train staff on collection1.1.6
Run collection test1.1.5
Set up collection security1.1.4
Configure collection systems
1.1.3
Pre-validate data1.3.2
Allocate collection responsibilities
1.1.2
Produce collection strategy and schedule
1.1.1
Monitor & report on collection
1.2.5
Follow up non-responses1.2.4
Collect data1.2.3
Request data1.2.2
Contact provider with pre-collection information
1.2.1
Load data & metadata to data environments
1.3.3
Receive electronic data1.3.1
Cooperate with providers1.4
Load data1.3
Run collection1.2
Set up collection1.1
From data arrival to data ready for processing (raw data)
COLLECT
15
Evaluate imputation results
2.3.4
Run imputation2.3.3
Identify items for special treatment
2.3.2
Impute
2.3
Revise existing data2.3.1
Detect and Treat Outliers
2.2
Integrate & load data2.5
Derive New Variables
2.4
Edit
2.1
Detect outliers2.2.1
Manually edit variables
2.1.3
Treat outliers2.2.2
Provide feedback to providers
2.5.3
Evaluate quality of incoming data
2.5.2
Derive variables / indicators
2.4.1
Prepare & load data2.5.1
Resolve versioning
2.1.1
Auto edit variables
2.1.2
From raw, collected data to validated data
VALIDATE
16
Prepare tables for dissemination
3.5
Produce statistics & indicators
3.2.1
Research data sources &
methodology 3.1.3
Produce seasonal adjustment
3.2.2
Assess quality measures against quality standards
3.3.4
Compare with previous periods
3.3.2
Check non-sampling errors3.3.1
Apply confidentiality rules3.5.1
Produce statistics & indicators
3.2
Interpret and explain
3.4
Check quality
3.3
Acquire domain intelligence
3.1
Collect external information
3.1.1
Collect internal data & information
3.1.2
Produce reports
3.1.6
Evaluate & synthesise knowledge
3.1.5
Manage domain knowledge
3.1.4
Prepare microdata files3.2.3
Carry out in-depth statistical analysis
3.4.2
Analyse time series dimension
3.4.1
Verify against expectations &
intelligence3.3.5
Confront with other data sources
3.3.3
Produce quality measures for
statistics3.2.4
Approve explanation and
statistics3.4.4
Identify story / commentary to the
data 3.4.3.
Carry out edit and consistency checks
3.5.2
Finalise tables
3.5.3
Approve tables
3.5.4
From validated data to analysed data and tables
ANALYSE
17
Get customer feedback
4.2.4
Set up for production4.1.1
Transfer data from internal to external
environment4.1.2
Media relations
4.1.3
Other DG / NSI relations
4.1.4
Lift embargo and release products
4.1.5
Analyse and resolve query4.2.3
Review and record customer query
4.2.1
Allocate query
4.2.2
Produce products
4.1
Manage customer queries
4.2
From tables and analysis to customised disseminated products
DISSEMINATE
18
Produce information and
explanation 5.2
Determine information and
explanation 5.1
Appraise the long-term value of
metadata5.5
Prepare metadata for repository
5.3
Load repositories5.4
MANAGE META-INFORMATION
19
END OF SUB-SUB-PROCESS LIST
START OF BB & SUB-PROCES CROSS REFERENCE
20
1.1 Set up collection
• EDAMIS
Train staff on collection1.1.6
Run collection test
1.1.5
Set up collection security1.1.4
Configure collection systems1.1.3
Allocate collection responsibilities
1.1.2
Produce collection strategy and schedule
1.1.1
Maintain provider information1.1.7
21
1.2 Run collection
• EDAMIS
Monitor & report on collection
1.2.5
Follow up non-responses
1.2.4
Collect data
1.2.3
Request data
1.2.2
Contact provider with pre-collection
information1.2.1
22
1.3 Load Data
• EDAMIS
• Editing BB / EDAMIS
• Loader BB
Pre-validate data
1.3.2
Load data & metadata to data
environments1.3.3
Receive electronic data1.3.1
23
2.1 Edit
• EDAMIS
• Reader BB
• Editing BB
• GSAST
• Loader BB
Resolve versioning
2.1.1
Auto edit variables
2.1.2
24
2.2 Detect and treat outliers
• Outliers BB• Reader BB• GSAST
• Derivation BB• GSAST• Loader BB
Detect outliers2.2.1
Treat outliers2.2.2
25
2.3 Impute
• Imputation BB
• Reader BB
• Derivation BB
• GSAST
• Loader BBEvaluate
imputation results2.3.4
Run imputation
2.3.3
Identify items for special treatment
2.3.2
Revise existing data2.3.1
26
2.4 Derive new variables
• Derivation BB• Reader BB
• GSAST
• Loader BB
Derive variables / indicators
2.4.1
27
2.5 Integrate and load data
• Loader BB
• GSAST
• Editing BB
Evaluate quality of incoming data
2.5.2
Prepare & load data2.5.1
28
3.2 Produce statistics or indicators• Reader BB• Derivation BB• GSAST• Economic indices BB
• Seasonal adjustment BB• GSAST
• GSAST• Derivation BB
• GSAST• Derivation BB
• Loader BB
Produce statistics & indicators
3.2.1
Produce seasonal adjustment
3.2.2
Prepare microdata files
3.2.3
Produce quality measures for
statistics3.2.4
29
3.3 Check Quality
• Editing BB• Derivation BB• ANALYTICAL• GSAST• Reader BB• Outliers BB• Economic indices BB• NUI
Assess quality measures
against quality standards
3.3.4
Compare with previous periods
3.3.2
Check non-sampling errors
3.3.1
Confront with other data sources
3.3.3 {
30
3.4 Interpret and explain
• Analytical BB
• GSAST
• Reader BB
• Seasonal adjustment BB
• Economic indices BB
• NUI
Carry out in-depth statistical
analysis3.4.2
Analyse time series dimension
3.4.1
Identify story / commentary to
the data 3.4.3
{
31
3.5 Prepare tables for dissemination
• Confidentiality BB• Reader BB
• Editing BB• GSAST
• Derivation BB• Loader BB
Apply confidentiality
rules3.5.1
Carry out edit and consistency
checks 3.5.2
Finalise tables
3.5.3
32
4.1 Produce products
• NUI
Set up for production
4.1.1
Transfer data from internal to external
environment4.1.2
Lift embargo and release products
4.1.5
33
4.2 Manage customer queries
• ASSIST
Get customer feedback
4.2.4
Analyse and resolve query4.2.3
Review and record customer query
4.2.1
Allocate query
4.2.2
34
5 Manage meta-information
• MH
Prepare metadata for repository
5.3
Load repositories
5.4
35
END OF BB & SUB-PROCES CROSS REFERENCE
START OF BB DESCRIPTION
36
Target Data Storage (TDS)
• not a software• unique structure of the database• contains both statistical data and metadata (all
kinds)• uniqueness allows to implement coherence rules
for the data and metadata throughout the CVD processes
• structure allowing new types (considering way they are used) of metadata to be added
37
CVD MANAGER
To implement a workflow approach for the production process • based on a design of the particular production process
To control and schedule the invoking of the CVD components within the various stages of statistical production process
At each stage of the production process will interact with the human domain manager or with a software component to:
• Launch software• Control output • Request input• Provide status reports on whole process and its individual
components
38
EDAMIS
• supports the transmission of statistical data from Member States to Eurostat
• ensures secure and well monitored transmission of data through a single reception point
• delivery of data to production environments• user access management • links to structural metadata• basic validation• format conversion
39
Loader BB
• loads data and reference metadata in the update or replace mode at the same time assuring its coherence with existing metadata
• algorithm contains coherence rules
• can be used any time during the processing for both data and reference metadata
40
Reader BB
• reads data and metadata and assembles for further processing by other BBs (various formats)
• can be used any time during the processing for both data and metadata
41
Editing BB
• executes editing rules optionally with reference data (lookup tables)
• intra-cell, intra-record (horizontal) and inter-record (vertical) rules
• reports on the rules execution
• allows interactive review of messages
• can be provided to MS for editing at source
42
Outliers BB
• basic and statistical methods to identify outliers
• methods: – Hidiroglou-Berthelot and σ-gap– top and bottom – number or percentiles and
conditions
• Reports on the execution• in future multidimensional distance
measures
43
Imputation BB
• t.b.d. note: possibly based on BANFF software, any system should be really very similar to BANFF
• Implementation of various mathematical imputation methods
• last BB to be developed• Scope not yet established
44
Derivation BB
• Derives new variables optionally with reference data (lookup tables)
• intra-cell, intra-record (horizontal) and inter-record (vertical) derivations
• reports execution
• allows interactive review of messages
• Uses the same engine (subset) as editing BB
45
Economic indices BBCalculates indices used in economy
– Weighted arithmetic mean
– Weighted geometric mean
– Weighted harmonic mean
– Laspeyres
– Paasche
– Lowe
– Edgeworth
– Bowley
– Fisher
– Laspeyres (Geometric)
– Paasche (Geometric)
– Törnqvist-Theil
– Laspeyres (harmonic)
– Paasche (harmonic)
– Chain index
– EKS(-S)
46
GSAST
• Generic system for treating micro-data and operations of micro and macro-data from surveys
• Based on SAS base, BI server and Enterprise guide functionalities
• Also for unique or unusual processing requirements
47
Seasonal adjustment BB
Calculates seasonally adjusted time series.
• Based on X12 and Tramo Seats methods
48
ANALYTICAL
• various mathematical and visual analysis and review of the data
• visualisation through graphing of the data
• statistical analysis
(the exact scope not yet determined – possibly through SAS)
49
Confidentiality BB
• performs confidentiality verification of tables
• applies various masking techniques assuring confidentiality of published statistics
• Based on CSB μ-argus and τ-argus
50
NUI
To provide access to the statistical reference databases of Eurostat.
• single tool for all data and metadata• based on the principles of graphical tools • highly interactive operation• metadata is presented to the user • shows relation of different types of metadata• can be used inside Eurostat
51
ASSIST
• User support tool• Parallel to e-mail system (with attachments)• Service request• Request follow-up• Searchable, central public knowledge database• Decentralised help centres / persons• Sub-systems by subject matter, geography or
any other classification• Access management (to appropriate parts of the
system by administrative privileges or subject matter)
52
MH – metadata handler (1 of 3)
System for handling all the production aspects of classifications, associations and other statistical metadata.
• Updates on the Nomenclatures Codes• Updates on label values of Nomenclature Codes• Updates on the Relations between codes• Updates on label values of Relations• Export classifications, relations to files• Create aggregates from relationships
53
MH – metadata handler (2 of 3)
• Check Relationship Completeness• Footnotes on labels• Materialized View classifications & relationships: allows to create a subset of a classification or a relation by defining:
• Selection rules (wildcard expressions)• SQL statements (SQL Generation wizard)
• Dictionary• Automatic Creation and Update of relationships
• Creation through other existing relationships• Update through Successor/Predecessor
• Multidimensional Nomenclatures• Simple or Subkeys as code
54
MH – metadata handler (3 of 3)Allows management:• Dataset trees • Composite and normal datasets (creation, update, etc…)• Visibility and accessibility flags on objects (datasets,
dictionaries, classifications, etc.)• Classification’s default attribute• Transposition of datasets (micro-data)• implementation list (dictionary)• Access Control Lists• methods• Confidentiality scripts• attachments• cells attachments (footnotes)• presence table.
55
END OF BB DESCRIPTION
Recommended