Upload
bertina-beryl-nelson
View
223
Download
2
Embed Size (px)
Citation preview
1Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX Basics
Core ElementsInformation ModelData Structure Definition (DSD)SDMX-ML MessagesMajor changes in SDMX v 2.1
2Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
THE SDMX COMPONENTS
Technical Specifications
The SDMX
Information Model
Guidelines to
Hamonise Content
The Content Oriented Guidelines (COG)
Tools
IT Architectures for data exchange
SDMX compliant tools
3Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
The SDMX Information Model is a meta-model describing the objects involved in:
The collection The dissemination The publication
of aggregated statistics and related metadata
The abstract model is like a structured set of containers
Everything in SDMX is model-driven: All messages and interfaces are implementations of the
information model
THE SDMX INFORMATION MODEL
4Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX INFORMATION MODEL – SCOPE
DATA & METADATA
FLOWS
Structure Definition
Category Scheme
Category
ConstraintProvision Agreement
Data Provider
Data & Metadata set
5Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX Information Model
6Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
STATISTICAL DATA & METADATA
Time series data representation
Cross-sectional data representation
Statistical Data (Figures)
Statistical Metadata (Identifiers, Descriptors)
Structural metadata
Reference metadata
Statistical Metadata (Methodology, Quality)
7Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Statistical data - Cube
Time
20052006
Country FR ITESAT
Tourism activity
A100
B010
B020
2007
Time series
Cross-section for 2006
time/activity B0102005 81742006 81382007 8052
Number of tourist campsites - France - annual data
geo/activity B010AT 542ES 1216FR 8138IT 2510
Number of tourist campsites - national - 2006
817481388052
542121681382510
STATISTICAL DATA & METADATA
Two different ways to represent data
8Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
STATISTICAL DATA - TIME SERIES REPRESENTATION
9Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
STATISTICAL DATA - CROSS-SECTIONAL REPRESENTATION
10Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
From a number to statistical data
11353511 11353511
STRUCTURAL METADATA Introduction
11Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
CONCEPTS
STRUCTURAL METADATA
Identify and describe data
Dimension, Attribute or
Measure in a DSD to define a Data set’s structure
Attributes in a MSD to define the
structure of a Metadata set
12Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
I ndicatorTime
2002A00 33411 2374 61479
2003A00 33480 2530 58526
2004A00 33518 2529 56586
2005A00 33527 2411 68385
2006A00 33768 2510 68376
2007A00 34058 2587 61810
Number of touristic establishmentsin I taly, annual data
A100Hotels and similar
B010Tourist Campsites
B020Holiday dwellings
STRUCTURAL METADATAFrom a statistical table to its descriptor concepts
13Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
STRUCTURAL METADATA – CONCEPTS AND ROLES
14Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
DSD
STRUCTURAL METADATA: DATA STRUCTURE DEFINTION To easily exchange and process data, we first define a standard container based on the structure of the real statistical table: The Data Structure Definition (DSD)
Code listsCode lists
Code listsCode lists
Code listsCode lists
DimensionsDimensions
AttributesAttributes
MeasuresMeasures
ConceptsConcepts
UNITTIME_PERIOD
COUNTRY
OBSERVATIONS
The DSD can be seen as a "logical container" for a specific set of data that we want to exchange. It includes the concepts that represent the data, gives them roles (Dimension, Measure, Attributes) and links them to code lists.
15Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
ELEMENTS OF A DATA STRUCTURE DEFINITION
16Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – 10-11 and 14-15 March 2011
DatasetDSD
SDMX does not introduce any new concept for statisticians. It just provides a framework for what statisticians already know.
Code lists
Observations
Table structure The SMDX dataset is a standard container in which statistical data are represented together with the structural metadata, according to the DSD.
SDMX INFORMATION MODEL - DATA SET
Now you have an easy way to exchange and process data and metadata automatically.
17Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
DATA SET
KEYKEYKEY
GROUP KEYGROUP KEYGROUP KEY
KEY VALUESKEY VALUESKEY VALUES
TIME PERIODOBSERVATIO
N
VALUE
ATTRIBUTE
VALUEAttribute attachmentAttribute attachment
Cross-section
Time series
SDMX INFORMATION MODEL - DATA SET
18Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX INFORMATION MODEL - DATA SET
19Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX INFORMATION MODEL - DATA SET
20Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
REFERENCE METADATA
21Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Reference Metadata Set
SDMX INFORMATION MODEL - METADATA SETConcepts
22Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX INFORMATION MODEL – DATA & METADATA FLOW
DATA & METADATA
FLOWS
Structure Definition
Category Scheme
Category
ConstraintProvision Agreement
Data Provider
Data & Metadata set
23Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX INFORMATION MODEL – CATEGORIES
24Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX IM – DATA PROVIDERS & PROVISION AGREEMENT
Production and dissemination of Statistical data
Production and dissemination of
Reference Metadata
25Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
DATA & METADATA
FLOWS
ConstraintProvision Agreement
SDMX IM - CONSTRAINTS
26Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX IM - CONSTRAINTS
Example: A data provider can restrict his reporting of monthly data to only some months.
Example: A data provider can restrict his reporting of data to subsets of statistical cubes.
27Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX IM - SUMMARY
28Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
THE SDMX COMPONENTS
Technical Specifications
The SDMX
Information Model
Guidelines to
Hamonise Content
The Content Oriented Guidelines (COG)
Tools
IT Architectures for data exchange
SDMX compliant tools
29Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
IT ARCHITECTURES FOR DATA EXCHANGE
30Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX REGISTRY
REGISTRY
31Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX REGISTRY DEMONSTRATION
32Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX Data Structure Definition (DSD)
33
COMPLIANCE & IMPLEMENTATION
Generally the following four steps need to be done:
1.Preparation: The statisticians from the organisations involved in the data exchange describe the data and the different dataflows, dataset and provision agreements.
2.Compliance: you create all the necessary objects according to the SDMX Technical Specifications.
3.Implementation: Now we put into practice. Standard software is installed and configured to use the DSDs. The exchange process is set up and tested.
4.Production: use the objects in the production process. SDMX implementation is achieved when the data and metadata exchanges within the domain are carried out according to SDMX-compliant specifications.
34Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Define the DSD– List of concepts (Concept scheme)– Roles of concepts (Dimension, Attribute, Measure)– Code lists
Provide the related Dataflows (e.g. STSRTD_TURN_M, DEMOGRAPHY_RQ)
CREATE ALL THE NECESSARY OBJECTS
35Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
THE STEPS TO BUILD A DATA STRUCTURE DEFINITION
Identification of the descriptor concepts for the data Choose the type of data representation (Time Series
and Cross-sectional )
Identification of the descriptor concepts for the data Choose the type of data representation (Time Series
and Cross-sectional )
Choice of Cross Domain code lists or definition of specific code
lists for coded concepts
Choice of Cross Domain code lists or definition of specific code
lists for coded conceptsDefinition of the text format
for non coded concepts
Definition of the text format for non coded concepts
Definition of the concept role (Dimension, Attribute or Measure)
Definition of the concept role (Dimension, Attribute or Measure)
Define Dimensions for Time Series and Cross-sectional
data representation
Define Dimensions for Time Series and Cross-sectional
data representation
Define Attributes with the attachment levels Time
Series and Cross-sectional data representation
Define Attributes with the attachment levels Time
Series and Cross-sectional data representation
Define Time Series primary measure and/or Cross-
sectional measures with their measure concepts
Define Time Series primary measure and/or Cross-
sectional measures with their measure concepts
Create the defined artefacts in a SDMX Data Structure Definition tool (e.g. DSW)
Create the defined artefacts in a SDMX Data Structure Definition tool (e.g. DSW)
1
2
3
4
5
36Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
1- IDENTIFICATION OF THE DESCRIPTOR CONCEPTS
37Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
2 – DEFINE THE CODE LISTS
38Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Cross-sectional slice
Time
serie
s
slic
e
Statistical data - Cube
Country ES ITFRAT
Tourism activity
A100
B010
B020
Time
20052006
2007
Time series
Cross-section for 2006
geo/activity B010AT 542ES 1216FR 8138IT 2510
Number of tourist campsites - national - 2006
125012161220
542121681382510
3- CHOOSE THE TYPE OF DATA REPRESENTATION TIME SERIES (TS) / CROSS-SECTIONAL (CS)
39Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
DATA REPRESENTATION – TIME SERIES
40Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
DATA REPRESENTATION – CROSS-SECTIONAL
41Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
4- DEFINE ROLES OF CONCEPTS AND LIST OF CONCEPTS
42Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
5 – DEFINE GROUPS AND ATTRIBUTE ATTACHEMENTS
43Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011Eurostat Unit B5 – Statistical Information TechnologiesSDMX Training for Statisticians – March 2010
6 – DEFINE THE VIEW OF THE DATA STRUCTURE
44Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Year MonthTurnover
index Status Confidentiality
2002 January 84.5 actual free
2002 February 85.6 actual free2002 March 95.4 actual free2002 April 106.2 actual free2002 May 98.0 actual free2002 June 95.3 actual free2002 July 105.4 actual free2002 August 107.1 actual free2002 September 105.2 actual free2002 October 109.4 actual free2002 November 104.5 actual free2002 December 111.9 actual free2003 January 89.1 provisional free
2003 February 88.3 provisional free2003 March 96.1 provisional free
Source: National Statistical Service of GreeceData prepared to be transmitted to the European Commission (including EUROSTAT)
Table 1. Deflated turnover index (on volume of sales) for retail trade for Greece (no adjustment). Reference period: January 2002 to March 2003.
(monthly data - Base year: 2000)
EXAMPLE: STS SAMPLE DATASET
Dimensions
Attributes
Primary Measure
Dimensions
45Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
EXAMPLE: STS SAMPLE DATASET
STS_INDICATORTITLE STS_ACTIVITY
REFERENCE_AREA
FREQ STS_ BASE_YEARADJT
46Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
OBS_STATUSOBS_VALUE
REFERENCE_PERIOD
OBS_CONF
STS_INSTITUTION
EXAMPLE: STS SAMPLE DATASET
47Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
M;GR;N;TOTV;NS5201;1;2000;200201;88.8;A;FM;GR;N;TOTV;NS5201;1;2000;200202;84.7;A;FM;GR;N;TOTV;NS5201;1;2000;200203;88.8;A;FM;GR;N;TOTV;NS5201;1;2000;200204;93.0;A;FM;GR;N;TOTV;NS5201;1;2000;200205;60.8;A;FM;GR;N;TOTV;NS5201;1;2000;200206;78.2;A;FM;GR;N;TOTV;NS5201;1;2000;200207;89.9;A;F
AttributesPrimary MeasureDimensions
M;GR;N;TOTV;NS5201;1;2000;200201;88.8;A,F
Reference PeriodGroup
EXAMPLE: STS SAMPLE DATASETIDENTIYING CONCEPTS AND GROUPING SERIES IN CSV FILES
48Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
DSD OF DATAFLOW STSRTD_IND_M
Concept Concept ID
frequency FREQ reference area REF_AREA
adjustment ADJUSTMENT
type of index STS_INDICATOR
activity STS_ACTIVITY
type of institution STS_INSTITUTION
base year STS_BASE_YEAR reference period TIME_PERIOD
turnover idex OBS_VALUE status OBS_STATUS
confidentiality OBS_CONF time duration set TIME_FORMAT
Title TITLEdecimals DECIMALS
Example of value Remark
M Monthly GR Greece N No
TOVV Turnover deflated (volume of sales)
NS5201 Retail trade
11=NSI or 2=National
Bbank 2000
200201 CCYYMM 108.6 observation
A actual data F Free of publication
P1M ISO8601 1 One
Code List
CL_FREQ CL_AREA_EE
CL_ADJUSTMENT
CL_STS_INDICATOR CL_STS_ACTIVITY
CL_STS_INSTITUTION CL_STS_BASE_YEAR
CL_OBS_STATUS CL_OBS_CONF
CL_TIME_FORMAT
CL_DECIMALS
Dimensions
Measure Attributes
Attachment level
Obs Obs
Series Group
Group
List of variables ValuesCodesRolesFootnotes
49Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
STRUCTURE OF THE DATASET FOR TIME SERIES
Group of seriesGroup of series
SeriesSeries M;GR;N;TOTV;NS5201;1;2000;200201;88.8;A;FM;GR;N;TOTV;NS5201;1;2000;200202;84.7;A;FM;GR;N;TOTV;NS5201;1;2000;200203;88.8;A;FM;GR;N;TOTV;NS5201;1;2000;200204;93.0;A;F
REF_AREA="GR" ADJUSTMENT="N" STS_INDICATOR="TOTV" STS_ACTIVITY="NS5201" STS_INSTITUTION="1" STS_BASE_YEAR="2000" DECIMAL="1" TITLE="Retail trade"
Attributes and attachment level: groupAttributes and attachment level: group
M;GR;N;TOTV;N15220;1;2000;200201;60.8;A;FM;GR;N;TOTV;N15220;1;2000;200202;78.2;A;FM;GR;N;TOTV;N15220;1;2000;200203;89.9;A;F
Group of seriesGroup of series REF_AREA="GR" ADJUSTMENT="N" STS_INDICATOR="TOTV" STS_ACTIVITY="N15220" STS_INSTITUTION="1" STS_BASE_YEAR="2000" DECIMAL="1" TITLE="Retail sale of food"
Attributes can be attached to groups
Attributes can be attached to groups
SeriesSeries
SeriesSeries
SeriesSeries
SeriesSeries
SeriesSeries
SeriesSeries
50Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Definition of Series 1
Definition of Series 1
M;GR;N;TOTV;NS0006;1;2000;200201;88.8;A;FM;GR;N;TOTV;NS0006;1;2000;200202;84.7;A;FM;GR;N;TOTV;NS0006;1;2000;200203;88.8;A;F
FREQ="M" REF_AREA="GR" ADJUSTMENT="N" STS_INDICATOR="TOTV" STS_ACTIVITY="NS0006" STS_INSTITUTION="1" STS_BASE_YEAR="2000" TIME_FORMAT="P1M"
Attributes and attachment level: seriesAttributes and attachment level: series
M;GR;N;TOTV;N14500;1;2000;200201;60.8;A;FM;GR;N;TOTV;NS0006;1;2000;200202;78.2;A;FM;GR;N;TOTV;NS0006;1;2000;200203;89.9;A;F
Definition of Series 2
Definition of Series 2
FREQ="M" REF_AREA="GR" ADJUSTMENT="N" STS_INDICATOR="TOTV" STS_ACTIVITY="N14500" STS_INSTITUTION="1" STS_BASE_YEAR="2000" TIME_FORMAT="P1M"
Attributes can be attached to series
Attributes can be attached to seriesAttributes can be attached to series
Attributes can be attached to series
Series 1Series 1
Series 1Series 1
Series 1Series 1
Series 2Series 2
Series 2Series 2
Series 2Series 2
STRUCTURE OF THE DATASET FOR TIME SERIES
51Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Definition of Series 1
Definition of Series 1
FREQ="M" REF_AREA="GR" ADJUSTMENT="N" STS_INDICATOR="TOTV" STS_ACTIVITY="NS0006" STS_INSTITUTION="1" STS_BASE_YEAR="2000" TIME_FORMAT="P1M"
Attributes and attachment level: seriesAttributes and attachment level: series
Attributes can be attached to observations
Attributes can be attached to observations
Definition of Observation 1
Definition of Observation 1
TIME_PERIOD="200201" OBS_VALUE="88.8" OBS_STATUS="A" OBS_CONF="F"
Definition of Observation 2
Definition of Observation 2
TIME_PERIOD="200202" OBS_VALUE="84.7" OBS_STATUS="A" OBS_CONF="F"
Definition of Observation 2
Definition of Observation 2
TIME_PERIOD="200203" OBS_VALUE="88.8" OBS_STATUS="A" OBS_CONF="F"
M;GR;N;TOTV;NS0006;1;2000;200201;88.8;A;FM;GR;N;TOTV;NS0006;1;2000;200202;84.7;A;FM;GR;N;TOTV;NS0006;1;2000;200203;88.8;A;F
Observation 1Observation 1
Observation 2Observation 2
Observation 3Observation 3CSVCSV
STRUCTURE OF THE DATASET FOR TIME SERIES
52Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
EXAMPLE 2: DEMOGRAPHY SAMPLE DATASET
Measures
AttributesDimensionsDimensionsDimensionsDimensions
53Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
TITLE
TIME_PERIODTIME_PERIODTIME_PERIODTIME_PERIOD
TAB_NUM
REV_NUM OBS_STATUSFREQFREQFREQFREQ
COUNTRYCOUNTRYCOUNTRYCOUNTRY
Dimensions attached to the dataset level
Dimensions attached to the dataset level
Dimensions attached to the group level
Dimensions attached to the group level
EXAMPLE 2: DEMOGRAPHY SAMPLE DATASET
54Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
OBS-VALUEOBS-VALUE
DEMODEMODEMODEMO
SEXSEXSEXSEXUNITUNIT
MALEMALE
Dimensions attached to the observation level
Dimensions attached to the observation level
Measure Dimension
Measure Dimension
FEMALEFEMALE TOTALTOTAL
EXAMPLE 2: DEMOGRAPHY SAMPLE DATASET
55Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
DSD FOR DATAFLOW: DEMOGRAPHY_RQ Attachment
level Concept Concept ID Code List Values
reference
period TIME_PERIOD 2005
reporting country COUNTRY CL_COUNTRY Fi (for Finland)
sex SEX CL_SEX
demographic characteristic DEMO CL_DEMO # of births, etc.
frequency FREQ CL_FREQ A (for annual)
Male MALE number of persons Female FEMALE number of persons
Total TOTAL number of persons
dataset title TITLE dataset version REV_NUM 1st revision
dataset reference
table TAB_NUM RQFI05V1 Section (Series) unit of value UNIT CL_UNIT PERS (for persons)
observation status OBS_STATUS CL_OBS_STATUS provisional data
observation series time duration set TIME_FORMAT CL_TIME_FORMAT P1M
Concept Concept ID Code List Values
TIME_PERIOD 2005
COUNTRY CL_COUNTRY Fi (for Finland)
sex SEX CL_SEX
M (male), F (Female),
DEMO CL_DEMO # of births, etc.
frequency FREQ CL_FREQ A (for annual)
Male MALE number of persons Female FEMALE number of persons
Total TOTAL number of persons
dataset title TITLE Title of the
exchanged dataset dataset version REV_NUM 1st revision
dataset TAB_NUM RQFI05V1
unit of value UNIT CL_UNIT PERS (for persons) observation status OBS_STATUS CL_OBS_STATUS provisional data
observation TIME_FORMAT CL_TIME_FORMAT P1M
Concept Concept ID Code List Values
TIME_PERIOD 2005
COUNTRY CL_COUNTRY Fi (for Finland)
sex SEX CL_SEX
DEMO CL_DEMO # of births, etc.
frequency FREQ CL_FREQ A (for annual)
Male MALE number of persons Female FEMALE number of persons
Total TOTAL number of persons
dataset title TITLE dataset version REV_NUM 1st revision
dataset TAB_NUM RQFI05V1
unit of value UNIT CL_UNIT PERS (for persons) observation status OBS_STATUS CL_OBS_STATUS provisional data
observation TIME_FORMAT CL_TIME_FORMAT P1M
Concept Concept ID Code List Values
TIME_PERIOD 2005
COUNTRY CL_COUNTRY Fi (for Finland)
sex SEX CL_SEX
DEMO CL_DEMO # of births, etc.
frequency FREQ CL_FREQ A (for annual)
Male MALE number of persons Female FEMALE number of persons
Total TOTAL number of persons
dataset title TITLE dataset version REV_NUM 1st revision
dataset TAB_NUM RQFI05V1
unit of value UNIT CL_UNIT PERS (for persons) observation status OBS_STATUS CL_OBS_STATUS provisional data
observation TIME_FORMAT CL_TIME_FORMAT P1M
Dimensions
Cross-sectional Measures
Attributes
Attachment level Concept Concept ID Code List Values
reference
period TIME_PERIOD 2005
reporting country COUNTRY CL_COUNTRY Fi (for Finland)
sex SEX CL_SEX
demographic characteristic DEMO CL_DEMO # of births, etc.
frequency FREQ CL_FREQ A (for annual)
Male MALE number of persons Female FEMALE number of persons
Total TOTAL number of persons
dataset title TITLE dataset version REV_NUM 1st revision
dataset reference
table TAB_NUM RQFI05V1 Section (Series) unit of value UNIT CL_UNIT PERS (for persons)
observation status OBS_STATUS CL_OBS_STATUS provisional data
observation series time duration set TIME_FORMAT CL_TIME_FORMAT P1M
Dimensions
Cross-sectional Measures
Attributes
56Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
DatasetDataset
Attributes and attachment levelAttributes and attachment level
Attribute attached to groupAttribute attached to group
COUNTRY="FI"
GroupGroup REF_PERIOD="2005" FREQ="A" TIME_FORMAT="P1Y"
SectionSection DECI="0" UNIT="PERS" UNIT_MULT="0"
Dimension attached to datasetDimension attached to dataset
Attributes attached to sectionsAttributes attached to sections
Dimension attached to groupDimension attached to group
ObservationObservation FEMALE OBS_VALUE="35" DEMO="ADJT" OBS_STATUS="P"
Cross–sectional measureCross–sectional measure Dimensions attached to observation
Dimensions attached to observation
Attribute attached to observation
Attribute attached to observation
MALE OBS_VALUE="29400" DEMO="LBIRTHST" OBS_STATUS="P"
TOTAL OBS_VALUE="8986" DEMO="NETMT" OBS_STATUS="P"
ObservationObservation
ObservationObservation
STRUCTURE OF THE DATASET FOR CROSS SECTIONAL
57Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Organisation SchemesOrganisation Schemes
DSDsDSDs
Concept SchemesConcept Schemes
Category SchemesCategory Schemes
DataFlowsDataFlows
Code listsCode lists
CREATION OF THE DSDTHE SDMX OBJECTS RELATED TO THE DATA STRUCTURE
58Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
DSW – “standalone” desktop application
(replaced KeyFamily AccessDB tool)
Offline version of Eurostat’s SDMX Registry
Maintenance of SDMX v2.0 data and meta data
structures (create, modify, delete, query)
Import/Export SDMX-ML structures (validate
structure messages)
Import/Export GESMES/TS structure files
Reporting of structures
Advanced search features
Export metadata for use with the GENEDI tool
Data Authoring (building SDMX-ML sample datasets)
Interaction with any SDMX v2.0 compliant Registry
Query SDMX v2.0 Registry
Submit data structures to SDMX v2.0 Registry
SDMX RegistrySDMX
Registry
Import/Export SDMX-ML messages
CREATION OF THE DSD: DATA STRUCTURE WIZARD
59Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Example - DSD import / creationusing the DSW
60Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
LIFE DEMONSTRATION - DSD IMPORT / CREATION USING THE DATA STRUCTURE WIZARD
61Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
DATA STRUCTURE DEFINITION
ID FISH_CATCH_A
Name Catches for all fishing areas
Version 1.0
AgencyID ESTAT
Valid From
Valid To
EXERCISE: CREATION OF THE DSD: FISH_CATCH_A
62Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
DIMENSIONS
Position in Key
CONCEPT REPRESENTATION
Dimension TypeID Name
CONCEPT SCHEME CODELISTTEXT
FORMATID VER AGENCY ID VERAGENC
Y
1 FREQ Frequency CS_FISHERIES 1.0 ESTAT CL_FREQ 1.1 ESTAT Frequency
2 REPORTING_AREACountry ISO3 codes (extended)
CS_FISHERIES 1.0 ESTATCL_REPORTING_AREA
1.0 ESTAT
3PRODUCTION_AREA
Production Area (from major area to sub-unit)
CS_FISHSTAT 1.0 FAOCL_PRODUCTION_AREA
1.0 FAO
4 SPECIESASFIS Species Alpha 3 Code
CS_FISHSTAT 1.0 FAOCL_SPECIES
1.0 FAO
TIME TIME_PERIOD Reference year CS_FISHERIES 1.0 ESTAT
EXERCISE: CREATION OF THE DSD: FISH_CATCH_A
63Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
MEASURES
TYPE
CONCEPT REPRESENTATIONMEASUR
E DIMENSI
ON
CODEID Name
CONCEPT SCHEME CODELISTTEXT
FORMATID VER AGENCY ID VER AGENCY
Primary OBS_VALUE Value of the measureCS_FISHERIES
1.0 ESTAT N/A N/A
ATTRIBUTES
ATTACHMENT LEVEL
CONCEPT REPRESENTATION
ATTRIBUTE TYPE
ASSIGNMENT STATUS
ID Name
CONCEPT SCHEME CODELISTTEXT
FORMATID VER AGENCY ID VER AGENCY
Observation UNIT unit CS_FISHERIES 1.0 ESTAT CL_UNIT 1.1 ESTAT C
EXERCISE: CREATION OF THE DSD: FISH_CATCH_A
64Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX Converter Data Structure Wizard
SDMX Technical Standard v2.0 (http://www.sdmx.org/index.php?page_id=16)
Help-desk: [email protected]
USEFUL LINKS
65Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
SDMX-ML Messages
66Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Based on a common Information Model– SDMX-EDI (GESMES/TS)
• EDIFACT syntax• Time-series oriented – One format for Data
Sets– SDMX-ML
• XML syntax• Four different formats for Data Sets• Easier validation (XML based)
SYNTAXES FOR SDMX MESSAGES
67Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Element Example id TEST0000 test true truncated false name FISH_AQ_TEST prepared 2010-30-01T09:30:47+01:00 senderid ESTAT sendername Eurostat sendercontactname G. Smith sendercontactdepartment Statistics sendercontactrole Response sendercontacttelephone 0210 2222222 sendercontactfax 0210 00010999 sendercontactx400 sendercontacturi www.sdmx.org sendercontactemail [email protected] receiverid NSI_GB receivername CSO receivercontactname P. Mustermann receivercontactdepartment Statistics receivercontactrole Statistician receivercontacttelephone 02101234567 receivercontactfax 02103810999 receivercontactx400 receivercontacturi www.sdmx.org receivercontactemail [email protected] datasetagency ESTAT datasetid FISH_AQX datasetaction Append extracted 2010-30-01T09:30:47+01:00 reportingbegin 2008-01-01T00:00:00 reportingend 2008-12-31T00:00:00 source DH lang en
SDMX DATA COMMON HEADERS
68Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Equivalent representations for reporting DatasetsEquivalent representations for reporting Datasets
SDMX DATA MESSAGES
Version 2.0 Version 2.1
4 data messages, each with a distinct format.
GenericData
CrossSectional DataCompact Data
UtilityData
Therefore, there are now 4 data messages which are based on two general formats:
• GenericData GenericTimeSeriesData
• StructureSpecificData StructureSpecificTimeSeriesData
Phased out
69Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
EXAMPLE OF GENERIC SDMX-ML MESSAGE
70Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
EXAMPLE OF COMPACT SDMX-ML MESSAGE
71Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
EXAMPLE OF CROSS-SECTIONAL SDMX-ML MESSAGE
72Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Equivalent formatsEquivalent formats
Generic SDMX-ML
Cross-sectional SDMX-ML
Compact SDMX-ML
Can be expanded to other formats (e.g. CSV, GESMES)
Can be expanded to other formats (e.g. CSV, GESMES)
Based on the
same IM
Based on the
same IM
Exceptions:
If a Cross-Sectional DSD does NOT contain a
time dimension
Exceptions:
If a Cross-Sectional DSD does NOT contain a
time dimension
CONVERSIONS SDMX V2.0
73Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Read the input messageRead the input message
ParsingParsing Populate the data model of the tool
(based on the SDMX v2.0 information
model)
Populate the data model of the tool
(based on the SDMX v2.0 information
model)
Write the converted messageWrite the converted message
Uses the data model to write the output message in the required
target format.
Uses the data model to write the output message in the required
target format.
Information retrieved from the RegistryInformation retrieved from the Registry
Data flow ID is used to retrieve the data flow definition from the
Registry.
Data flow ID is used to retrieve the data flow definition from the
Registry.
The DSD ID, version and agencyID are retrieved from the data flow definition
and are used to acquire the DSD
The DSD ID, version and agencyID are retrieved from the data flow definition
and are used to acquire the DSD
SDMX CONVERTER
74Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Possible conversionsPossible conversions
CSV
Compact SDMX-ML
Generic SDMX-ML
Utility SDMX-ML
Cross-sectional SDMX-ML *
SDMX-EDI (GESMES/TS)
CSV
Compact SDMX-ML
Generic SDMX-ML
Utility SDMX-ML
Cross-sectional SDMX-ML *
SDMX-EDI (GESMES/TS)
CSV
Compact SDMX-ML
Generic SDMX-ML
Utility SDMX-ML
Cross-sectional SDMX-ML
SDMX-EDI (GESMES/TS)
CSV
Compact SDMX-ML
Generic SDMX-ML
Utility SDMX-ML
Cross-sectional SDMX-ML
SDMX-EDI (GESMES/TS)
Main use: Conversion CSV Compact SDMX-ML Main use: Conversion CSV Compact SDMX-ML
SDMX CONVERTER MAIN FUNCTIONALITY
SDMX training session on basic principles, Major Changes in version 2.1
Fabien JACQUET
SDMX Basics
MMMM 2011
Select the Input file Select the output file
Select the input and output formats
Select the DSD on the local driveIdentify a DSD to
download from the SDMX Registry
Identify a dataflow linked to the DSD to download from the SDMX Registry Select / manage
headers for CSV input formats
Select mapping / transoding tables
CSV parameters
GESMES representation for GESMES output
formats
Load / save the current settings
XML parameters for SDMX output formats
76Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Conversion Example
77
Major changes in SDMX v 2.1
78Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Overview of the changes
Structural Metadata– Data Structure Definition (DSD)– Metadata Structure Definition
(MSD)– Constraint– Code List– Organisation Scheme– Categorising Structures– Process– Provision Agreement– Transformations and
Expressions
Data Set– Message Changes– Structured Data
Mechanism Revised Metadata Set
– Message Changes– Alignment of Formats– Structured Metadata
Mechanism Revised
79Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Data structure Definition (DSD)
Support for non-time-series data structures
Measure Dimension
DSD
Code listsCode lists
Code listsCode lists
Code listsCode lists
DimensionsAnd
Measure dimension
DimensionsAnd
Measure dimension
AttributesAttributes
MeasuresMeasures
ConceptsConcepts
DSD
Version 2.0 Version 2.1
Measure DimensionMeasure
Dimension
DimensionsDimensions
AttributesAttributes
Primary MeasurePrimary Measure
ConceptsConcepts
Concept SchemeConcept Scheme
Code listsCode lists
Code listsCode lists
80Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Maintainable artefact
Constraint
Version 2.0 Version 2.1
Dataflow
Provision agreement
Constraint
Constraint
Registry Constraint
Dataflow Code list
Provision agreement
DSD
81Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Code List
Common
Code list
Common
Code listConstraint 1 Par
tial
DSD DSD
Constraint 2
Version 2.1
82Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Categorising Structures
Version 2.0 Version 2.1
Category Scheme
Data/Metadata flow
Reference
Categorisation
Data/Metadataflow Code list
Category
ReferenceProvision
agreementDSD
Category
Only
Maintainable artefact
83Eurostat Unit B5 – Statistical Information TechnologiesSDMX Basics – October 2011
Version 2.0 Version 2.1
Message Changes
Data Set
4 data messages, each with a distinct format.
GenericData
CrossSectionalDataCompactData
UtilityData
Therefore, there are now 4 data messages which are based on two general formats:
• GenericData o GenericTimeSeriesData
• StructureSpecificData o StructureSpecificTimeSeriesData
Phased out