36
Dataset description: DCAT and other vocabularies Valeria Pesce Secretariat of Global Forum on Agricultural Research (GFAR) and Secretariat of the Global Open Data for Agriculture and Nutrition (GODAN) initiative DRTC/ISI-ICSU/CODATA International Workshop on Open Data Repositories

Dataset description: DCAT and other vocabularies

Embed Size (px)

Citation preview

Page 1: Dataset description: DCAT and other vocabularies

Dataset descriptionDCAT and other

vocabulariesValeria Pesce

Secretariat of Global Forum on Agricultural Research (GFAR)and

Secretariat of the Global Open Data for Agriculture and Nutrition (GODAN) initiative

DRTCISI-ICSUCODATAInternational Workshop on Open Data Repositories

Data and datasets

WikipedialdquoData are values of qualitative or quantitative variables belonging to a set of items Data as an abstract concept can be viewed as the lowest level of abstraction from which information and then knowledge are derivedrdquo

Wikipedia A dataset (or data set) is a collection of data(Narrow definition) hellipthe contents of a single database table or a single statistical data matrix where each column of the table represents a particular variable and each row corresponds to a given member of the dataset in question [hellip]Nontabular datasets can take the form of marked up strings of characters such as an XML file

W3C Government Linked Data Working Group (DCAT vocabulary) httpwwww3orgTRvocab-dcatclass--dataset

A collection of data published or curated by a single source and available for access or download in one or more formats

DATA

DATASETS

Single dataset

bull As long as you consider a dataset alone you may not need structured metadata about the dataset for further re-use as long as the dataset uses sector standards and is documented in some way

bull When managing a single dataset or a few homogeneous datasets between colleagues it may be enough to use sector-specific standards (eg Multi-crop descriptors and Darwin Core for germplasm INSPIRE for soilgeo ldquoMinimum Setrdquo recommendations for observations sector code lists vocabularies) or application-specific standards

Datasets in repositories

But what happens when you have a big data repository with heterogeneous datasets

Or you have few datasets but you want to contribute them to a huge data repositorycatalog where many other datasets are

Users will want to find your dataset among many others possibly together with datasets with similar data using the same standards measures syntax

They will use tools to search for datasets

Then dataset metadata (or a machine-readable dataset description)

becomes important

Why machine-readable descriptions

bull Data will be re-used by applicationsOthers will search and make use of your data through tools

Datasets have to be found by applications Datasets have to be understood by applications

bull Datasets should be managed in data repositories data catalogs Data catalogs have to provide enough dataset metadata to applications to allow them to find and understand datasets

bull Data catalogs themselves are implemented as applications so they need machine-readable dataset metadata

Ref ID URI Dimension2 Dimension3 Dimension4

Value11 Value12 Value13 Value14

Value21 Value22 Value23 Value24

Value31 Value32 Value33 Value34

Value41 Value42 Value43 Value44

Dataset

Datum

Record ldquomemberrdquo observation

File system

Data repository

DatasetRef Matadata1 Metadata2 Metadata3

Ref1 Address1 Value12 Value13

Ref2 Address2 Value22 Value23

Data catalog (also a dataset)

Tabular only for the sake of simplification it could be triples or other data structures

CatalogID Value1 Value2 Value3

Search ExportData type

We only focus on the data catalog level

Dataset metadata

So what metadata do applications need to find in data catalogs

Dataset metadata for applications

1) General metadata about the dataset1) identifier(s)2) who is responsible for it3) when and where the data were collected4) relations to organizations persons publications software

projects fundinghellip5) the conditions for re-use (rights licenses)6) provenance versions7) the specific coverage of the dataset (type of data thematic

coverage geographic coverage)

8) time and space slices subsets 9) the ldquodimensionsrdquo and ldquovariablesrdquo covered by the dataset10) the semantics of the dimensions (units of measure time

granularity syntax reference taxonomies)

WHY dimensions and semantics

Example of search by researcher

Irsquom doing research on plant phenotypes give me all datasets of crop phenotypic data that include the dimensions of time geographic location and height plus units of measure used for time and height where geographic location is expressed as coordinates (because my software only processes coordinates)

Data seralizations

bull The metadata above refer to the collection as a whole but additional technical metadata refer to the different ldquoserializationsrdquo of the datahellip

bull In many dataset description models the metadata about the data serialization is attached to the dataset

bull In other dataset description models information about the data serializations is not considered inherent to the nature and content of the data collection so it is not attached to the dataset but rather to related entities called ldquodistributionsrdquo

Distribution metadata for applications

Applications have to find metadata about the actual ldquoserializationsrdquo or ldquodistributionsrdquo of the dataset to understand

1 Where to retrieve it URL (data dump servicehellip)

2 the necessary technical specifications to retrieve and parse a

distribution of the dataset

- format (file format data format)

- protocol API parametershellip

And if different for different distributions again3 the conditions for re-use (rights licenses)

4 the semantics of the dimensions (units of measure time granularity syntax reference taxonomies) if different semantics for different distributions

WHY protocols and API params

Example

Some datasets are available behind a service eg RDF datasets are often retrieved as subsets of an RDF store through SPARQL queries or national research institutes provide access to datasets behind a web service accepting parameters to filter the datasets

Use case I have an application that can fetch from several SOAP web services (protocol) at once automatically if it knows the parameters required by the SOAP service and the required syntax and type of the parameters

General issue with all metadata

Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc

- The value should be standardized possibly a URI

- The value should be part of an authority list code list

Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values

No out-of-the-box solution

bull Do existing data catalog tools normally cover these metadataNO

bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO

BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)

Dataset description vocabularies

Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed

for full interoperability

Semantic interoperability

In this presentation we cover only RDF vocabularies with special focus on semantic interoperability

Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below

Dataset description vocabularies

bull DCAT vocabularybull RDF vocabulary for describing any dataset

bull Datasets can be standalone or part of a ldquocatalogrdquo

bull Metadata about dataset (collection) and related distributions

bull DataCube vocabularybull RDF vocabulary for describing statistical datasets

bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset

bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets

bull Useful especially for metadata related to RDF data services

Definition of ldquodatasetrdquo in DCAT

Definition given by the W3C Government Linked Data Working Group

A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo

The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions

Examples of distributions include a downloadable CSV file an API or an RSS feed

c

DCAT model

1) identifier(s)2) who is responsible for it3) when and where the data were

collected4) relations to organizations persons

publications software projects fundinghellip NO

5) the conditions for re-use (rights licenses)

6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO

DCAT and DCAT-AP

The DCAT Application profile for data portals in

Europe (DCAT-AP) is an extension of DCAT

It combines DCAT with the W3C Asset Description Metadata

Schema (ADMS) vocabulary plus classes and properties from

Dublin Core SKOS and Vcard in an Application profile

The elaboration of the DCAT-AP was a joint initiative of DG

CONNECT the EU Publications Office and the ISA Programme

A diagram of the full DCAT-AP specification is on the next slide

Full DCAT AP

versions

rights and provenance

standards

rights

format

relation

1) who is responsible for it MORE2) relations to organizations projects publications funding Partly

3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO

More than DCAT

Limitations of DCAT

It doesnrsquot cover

bull semantic relations to organisations persons software projects fundinghellip

bull dimensions and variables and syntax semantics of dimensions and variables

bull protocols and parameters for datasets available through APIs

bull time and space slices subsets

Combining DCAT with other vocabularies

ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo

(from the DCAT specification)

DataCube structure definition

A cube is organized according to a set of dimensions attributes and measures

bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)

bull The measure components represent the phenomenon being observed (eg life expectancy)

bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)

DataCube model for dataset structure

This part of the model could be re-used for describing the dimensions of any dataset also non-statistical

1) dimensions and semantics YES2) slices subsets YES

More than DCAT-AP

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 2: Dataset description: DCAT and other vocabularies

Data and datasets

WikipedialdquoData are values of qualitative or quantitative variables belonging to a set of items Data as an abstract concept can be viewed as the lowest level of abstraction from which information and then knowledge are derivedrdquo

Wikipedia A dataset (or data set) is a collection of data(Narrow definition) hellipthe contents of a single database table or a single statistical data matrix where each column of the table represents a particular variable and each row corresponds to a given member of the dataset in question [hellip]Nontabular datasets can take the form of marked up strings of characters such as an XML file

W3C Government Linked Data Working Group (DCAT vocabulary) httpwwww3orgTRvocab-dcatclass--dataset

A collection of data published or curated by a single source and available for access or download in one or more formats

DATA

DATASETS

Single dataset

bull As long as you consider a dataset alone you may not need structured metadata about the dataset for further re-use as long as the dataset uses sector standards and is documented in some way

bull When managing a single dataset or a few homogeneous datasets between colleagues it may be enough to use sector-specific standards (eg Multi-crop descriptors and Darwin Core for germplasm INSPIRE for soilgeo ldquoMinimum Setrdquo recommendations for observations sector code lists vocabularies) or application-specific standards

Datasets in repositories

But what happens when you have a big data repository with heterogeneous datasets

Or you have few datasets but you want to contribute them to a huge data repositorycatalog where many other datasets are

Users will want to find your dataset among many others possibly together with datasets with similar data using the same standards measures syntax

They will use tools to search for datasets

Then dataset metadata (or a machine-readable dataset description)

becomes important

Why machine-readable descriptions

bull Data will be re-used by applicationsOthers will search and make use of your data through tools

Datasets have to be found by applications Datasets have to be understood by applications

bull Datasets should be managed in data repositories data catalogs Data catalogs have to provide enough dataset metadata to applications to allow them to find and understand datasets

bull Data catalogs themselves are implemented as applications so they need machine-readable dataset metadata

Ref ID URI Dimension2 Dimension3 Dimension4

Value11 Value12 Value13 Value14

Value21 Value22 Value23 Value24

Value31 Value32 Value33 Value34

Value41 Value42 Value43 Value44

Dataset

Datum

Record ldquomemberrdquo observation

File system

Data repository

DatasetRef Matadata1 Metadata2 Metadata3

Ref1 Address1 Value12 Value13

Ref2 Address2 Value22 Value23

Data catalog (also a dataset)

Tabular only for the sake of simplification it could be triples or other data structures

CatalogID Value1 Value2 Value3

Search ExportData type

We only focus on the data catalog level

Dataset metadata

So what metadata do applications need to find in data catalogs

Dataset metadata for applications

1) General metadata about the dataset1) identifier(s)2) who is responsible for it3) when and where the data were collected4) relations to organizations persons publications software

projects fundinghellip5) the conditions for re-use (rights licenses)6) provenance versions7) the specific coverage of the dataset (type of data thematic

coverage geographic coverage)

8) time and space slices subsets 9) the ldquodimensionsrdquo and ldquovariablesrdquo covered by the dataset10) the semantics of the dimensions (units of measure time

granularity syntax reference taxonomies)

WHY dimensions and semantics

Example of search by researcher

Irsquom doing research on plant phenotypes give me all datasets of crop phenotypic data that include the dimensions of time geographic location and height plus units of measure used for time and height where geographic location is expressed as coordinates (because my software only processes coordinates)

Data seralizations

bull The metadata above refer to the collection as a whole but additional technical metadata refer to the different ldquoserializationsrdquo of the datahellip

bull In many dataset description models the metadata about the data serialization is attached to the dataset

bull In other dataset description models information about the data serializations is not considered inherent to the nature and content of the data collection so it is not attached to the dataset but rather to related entities called ldquodistributionsrdquo

Distribution metadata for applications

Applications have to find metadata about the actual ldquoserializationsrdquo or ldquodistributionsrdquo of the dataset to understand

1 Where to retrieve it URL (data dump servicehellip)

2 the necessary technical specifications to retrieve and parse a

distribution of the dataset

- format (file format data format)

- protocol API parametershellip

And if different for different distributions again3 the conditions for re-use (rights licenses)

4 the semantics of the dimensions (units of measure time granularity syntax reference taxonomies) if different semantics for different distributions

WHY protocols and API params

Example

Some datasets are available behind a service eg RDF datasets are often retrieved as subsets of an RDF store through SPARQL queries or national research institutes provide access to datasets behind a web service accepting parameters to filter the datasets

Use case I have an application that can fetch from several SOAP web services (protocol) at once automatically if it knows the parameters required by the SOAP service and the required syntax and type of the parameters

General issue with all metadata

Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc

- The value should be standardized possibly a URI

- The value should be part of an authority list code list

Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values

No out-of-the-box solution

bull Do existing data catalog tools normally cover these metadataNO

bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO

BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)

Dataset description vocabularies

Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed

for full interoperability

Semantic interoperability

In this presentation we cover only RDF vocabularies with special focus on semantic interoperability

Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below

Dataset description vocabularies

bull DCAT vocabularybull RDF vocabulary for describing any dataset

bull Datasets can be standalone or part of a ldquocatalogrdquo

bull Metadata about dataset (collection) and related distributions

bull DataCube vocabularybull RDF vocabulary for describing statistical datasets

bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset

bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets

bull Useful especially for metadata related to RDF data services

Definition of ldquodatasetrdquo in DCAT

Definition given by the W3C Government Linked Data Working Group

A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo

The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions

Examples of distributions include a downloadable CSV file an API or an RSS feed

c

DCAT model

1) identifier(s)2) who is responsible for it3) when and where the data were

collected4) relations to organizations persons

publications software projects fundinghellip NO

5) the conditions for re-use (rights licenses)

6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO

DCAT and DCAT-AP

The DCAT Application profile for data portals in

Europe (DCAT-AP) is an extension of DCAT

It combines DCAT with the W3C Asset Description Metadata

Schema (ADMS) vocabulary plus classes and properties from

Dublin Core SKOS and Vcard in an Application profile

The elaboration of the DCAT-AP was a joint initiative of DG

CONNECT the EU Publications Office and the ISA Programme

A diagram of the full DCAT-AP specification is on the next slide

Full DCAT AP

versions

rights and provenance

standards

rights

format

relation

1) who is responsible for it MORE2) relations to organizations projects publications funding Partly

3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO

More than DCAT

Limitations of DCAT

It doesnrsquot cover

bull semantic relations to organisations persons software projects fundinghellip

bull dimensions and variables and syntax semantics of dimensions and variables

bull protocols and parameters for datasets available through APIs

bull time and space slices subsets

Combining DCAT with other vocabularies

ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo

(from the DCAT specification)

DataCube structure definition

A cube is organized according to a set of dimensions attributes and measures

bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)

bull The measure components represent the phenomenon being observed (eg life expectancy)

bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)

DataCube model for dataset structure

This part of the model could be re-used for describing the dimensions of any dataset also non-statistical

1) dimensions and semantics YES2) slices subsets YES

More than DCAT-AP

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 3: Dataset description: DCAT and other vocabularies

Single dataset

bull As long as you consider a dataset alone you may not need structured metadata about the dataset for further re-use as long as the dataset uses sector standards and is documented in some way

bull When managing a single dataset or a few homogeneous datasets between colleagues it may be enough to use sector-specific standards (eg Multi-crop descriptors and Darwin Core for germplasm INSPIRE for soilgeo ldquoMinimum Setrdquo recommendations for observations sector code lists vocabularies) or application-specific standards

Datasets in repositories

But what happens when you have a big data repository with heterogeneous datasets

Or you have few datasets but you want to contribute them to a huge data repositorycatalog where many other datasets are

Users will want to find your dataset among many others possibly together with datasets with similar data using the same standards measures syntax

They will use tools to search for datasets

Then dataset metadata (or a machine-readable dataset description)

becomes important

Why machine-readable descriptions

bull Data will be re-used by applicationsOthers will search and make use of your data through tools

Datasets have to be found by applications Datasets have to be understood by applications

bull Datasets should be managed in data repositories data catalogs Data catalogs have to provide enough dataset metadata to applications to allow them to find and understand datasets

bull Data catalogs themselves are implemented as applications so they need machine-readable dataset metadata

Ref ID URI Dimension2 Dimension3 Dimension4

Value11 Value12 Value13 Value14

Value21 Value22 Value23 Value24

Value31 Value32 Value33 Value34

Value41 Value42 Value43 Value44

Dataset

Datum

Record ldquomemberrdquo observation

File system

Data repository

DatasetRef Matadata1 Metadata2 Metadata3

Ref1 Address1 Value12 Value13

Ref2 Address2 Value22 Value23

Data catalog (also a dataset)

Tabular only for the sake of simplification it could be triples or other data structures

CatalogID Value1 Value2 Value3

Search ExportData type

We only focus on the data catalog level

Dataset metadata

So what metadata do applications need to find in data catalogs

Dataset metadata for applications

1) General metadata about the dataset1) identifier(s)2) who is responsible for it3) when and where the data were collected4) relations to organizations persons publications software

projects fundinghellip5) the conditions for re-use (rights licenses)6) provenance versions7) the specific coverage of the dataset (type of data thematic

coverage geographic coverage)

8) time and space slices subsets 9) the ldquodimensionsrdquo and ldquovariablesrdquo covered by the dataset10) the semantics of the dimensions (units of measure time

granularity syntax reference taxonomies)

WHY dimensions and semantics

Example of search by researcher

Irsquom doing research on plant phenotypes give me all datasets of crop phenotypic data that include the dimensions of time geographic location and height plus units of measure used for time and height where geographic location is expressed as coordinates (because my software only processes coordinates)

Data seralizations

bull The metadata above refer to the collection as a whole but additional technical metadata refer to the different ldquoserializationsrdquo of the datahellip

bull In many dataset description models the metadata about the data serialization is attached to the dataset

bull In other dataset description models information about the data serializations is not considered inherent to the nature and content of the data collection so it is not attached to the dataset but rather to related entities called ldquodistributionsrdquo

Distribution metadata for applications

Applications have to find metadata about the actual ldquoserializationsrdquo or ldquodistributionsrdquo of the dataset to understand

1 Where to retrieve it URL (data dump servicehellip)

2 the necessary technical specifications to retrieve and parse a

distribution of the dataset

- format (file format data format)

- protocol API parametershellip

And if different for different distributions again3 the conditions for re-use (rights licenses)

4 the semantics of the dimensions (units of measure time granularity syntax reference taxonomies) if different semantics for different distributions

WHY protocols and API params

Example

Some datasets are available behind a service eg RDF datasets are often retrieved as subsets of an RDF store through SPARQL queries or national research institutes provide access to datasets behind a web service accepting parameters to filter the datasets

Use case I have an application that can fetch from several SOAP web services (protocol) at once automatically if it knows the parameters required by the SOAP service and the required syntax and type of the parameters

General issue with all metadata

Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc

- The value should be standardized possibly a URI

- The value should be part of an authority list code list

Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values

No out-of-the-box solution

bull Do existing data catalog tools normally cover these metadataNO

bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO

BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)

Dataset description vocabularies

Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed

for full interoperability

Semantic interoperability

In this presentation we cover only RDF vocabularies with special focus on semantic interoperability

Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below

Dataset description vocabularies

bull DCAT vocabularybull RDF vocabulary for describing any dataset

bull Datasets can be standalone or part of a ldquocatalogrdquo

bull Metadata about dataset (collection) and related distributions

bull DataCube vocabularybull RDF vocabulary for describing statistical datasets

bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset

bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets

bull Useful especially for metadata related to RDF data services

Definition of ldquodatasetrdquo in DCAT

Definition given by the W3C Government Linked Data Working Group

A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo

The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions

Examples of distributions include a downloadable CSV file an API or an RSS feed

c

DCAT model

1) identifier(s)2) who is responsible for it3) when and where the data were

collected4) relations to organizations persons

publications software projects fundinghellip NO

5) the conditions for re-use (rights licenses)

6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO

DCAT and DCAT-AP

The DCAT Application profile for data portals in

Europe (DCAT-AP) is an extension of DCAT

It combines DCAT with the W3C Asset Description Metadata

Schema (ADMS) vocabulary plus classes and properties from

Dublin Core SKOS and Vcard in an Application profile

The elaboration of the DCAT-AP was a joint initiative of DG

CONNECT the EU Publications Office and the ISA Programme

A diagram of the full DCAT-AP specification is on the next slide

Full DCAT AP

versions

rights and provenance

standards

rights

format

relation

1) who is responsible for it MORE2) relations to organizations projects publications funding Partly

3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO

More than DCAT

Limitations of DCAT

It doesnrsquot cover

bull semantic relations to organisations persons software projects fundinghellip

bull dimensions and variables and syntax semantics of dimensions and variables

bull protocols and parameters for datasets available through APIs

bull time and space slices subsets

Combining DCAT with other vocabularies

ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo

(from the DCAT specification)

DataCube structure definition

A cube is organized according to a set of dimensions attributes and measures

bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)

bull The measure components represent the phenomenon being observed (eg life expectancy)

bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)

DataCube model for dataset structure

This part of the model could be re-used for describing the dimensions of any dataset also non-statistical

1) dimensions and semantics YES2) slices subsets YES

More than DCAT-AP

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 4: Dataset description: DCAT and other vocabularies

Datasets in repositories

But what happens when you have a big data repository with heterogeneous datasets

Or you have few datasets but you want to contribute them to a huge data repositorycatalog where many other datasets are

Users will want to find your dataset among many others possibly together with datasets with similar data using the same standards measures syntax

They will use tools to search for datasets

Then dataset metadata (or a machine-readable dataset description)

becomes important

Why machine-readable descriptions

bull Data will be re-used by applicationsOthers will search and make use of your data through tools

Datasets have to be found by applications Datasets have to be understood by applications

bull Datasets should be managed in data repositories data catalogs Data catalogs have to provide enough dataset metadata to applications to allow them to find and understand datasets

bull Data catalogs themselves are implemented as applications so they need machine-readable dataset metadata

Ref ID URI Dimension2 Dimension3 Dimension4

Value11 Value12 Value13 Value14

Value21 Value22 Value23 Value24

Value31 Value32 Value33 Value34

Value41 Value42 Value43 Value44

Dataset

Datum

Record ldquomemberrdquo observation

File system

Data repository

DatasetRef Matadata1 Metadata2 Metadata3

Ref1 Address1 Value12 Value13

Ref2 Address2 Value22 Value23

Data catalog (also a dataset)

Tabular only for the sake of simplification it could be triples or other data structures

CatalogID Value1 Value2 Value3

Search ExportData type

We only focus on the data catalog level

Dataset metadata

So what metadata do applications need to find in data catalogs

Dataset metadata for applications

1) General metadata about the dataset1) identifier(s)2) who is responsible for it3) when and where the data were collected4) relations to organizations persons publications software

projects fundinghellip5) the conditions for re-use (rights licenses)6) provenance versions7) the specific coverage of the dataset (type of data thematic

coverage geographic coverage)

8) time and space slices subsets 9) the ldquodimensionsrdquo and ldquovariablesrdquo covered by the dataset10) the semantics of the dimensions (units of measure time

granularity syntax reference taxonomies)

WHY dimensions and semantics

Example of search by researcher

Irsquom doing research on plant phenotypes give me all datasets of crop phenotypic data that include the dimensions of time geographic location and height plus units of measure used for time and height where geographic location is expressed as coordinates (because my software only processes coordinates)

Data seralizations

bull The metadata above refer to the collection as a whole but additional technical metadata refer to the different ldquoserializationsrdquo of the datahellip

bull In many dataset description models the metadata about the data serialization is attached to the dataset

bull In other dataset description models information about the data serializations is not considered inherent to the nature and content of the data collection so it is not attached to the dataset but rather to related entities called ldquodistributionsrdquo

Distribution metadata for applications

Applications have to find metadata about the actual ldquoserializationsrdquo or ldquodistributionsrdquo of the dataset to understand

1 Where to retrieve it URL (data dump servicehellip)

2 the necessary technical specifications to retrieve and parse a

distribution of the dataset

- format (file format data format)

- protocol API parametershellip

And if different for different distributions again3 the conditions for re-use (rights licenses)

4 the semantics of the dimensions (units of measure time granularity syntax reference taxonomies) if different semantics for different distributions

WHY protocols and API params

Example

Some datasets are available behind a service eg RDF datasets are often retrieved as subsets of an RDF store through SPARQL queries or national research institutes provide access to datasets behind a web service accepting parameters to filter the datasets

Use case I have an application that can fetch from several SOAP web services (protocol) at once automatically if it knows the parameters required by the SOAP service and the required syntax and type of the parameters

General issue with all metadata

Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc

- The value should be standardized possibly a URI

- The value should be part of an authority list code list

Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values

No out-of-the-box solution

bull Do existing data catalog tools normally cover these metadataNO

bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO

BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)

Dataset description vocabularies

Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed

for full interoperability

Semantic interoperability

In this presentation we cover only RDF vocabularies with special focus on semantic interoperability

Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below

Dataset description vocabularies

bull DCAT vocabularybull RDF vocabulary for describing any dataset

bull Datasets can be standalone or part of a ldquocatalogrdquo

bull Metadata about dataset (collection) and related distributions

bull DataCube vocabularybull RDF vocabulary for describing statistical datasets

bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset

bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets

bull Useful especially for metadata related to RDF data services

Definition of ldquodatasetrdquo in DCAT

Definition given by the W3C Government Linked Data Working Group

A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo

The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions

Examples of distributions include a downloadable CSV file an API or an RSS feed

c

DCAT model

1) identifier(s)2) who is responsible for it3) when and where the data were

collected4) relations to organizations persons

publications software projects fundinghellip NO

5) the conditions for re-use (rights licenses)

6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO

DCAT and DCAT-AP

The DCAT Application profile for data portals in

Europe (DCAT-AP) is an extension of DCAT

It combines DCAT with the W3C Asset Description Metadata

Schema (ADMS) vocabulary plus classes and properties from

Dublin Core SKOS and Vcard in an Application profile

The elaboration of the DCAT-AP was a joint initiative of DG

CONNECT the EU Publications Office and the ISA Programme

A diagram of the full DCAT-AP specification is on the next slide

Full DCAT AP

versions

rights and provenance

standards

rights

format

relation

1) who is responsible for it MORE2) relations to organizations projects publications funding Partly

3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO

More than DCAT

Limitations of DCAT

It doesnrsquot cover

bull semantic relations to organisations persons software projects fundinghellip

bull dimensions and variables and syntax semantics of dimensions and variables

bull protocols and parameters for datasets available through APIs

bull time and space slices subsets

Combining DCAT with other vocabularies

ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo

(from the DCAT specification)

DataCube structure definition

A cube is organized according to a set of dimensions attributes and measures

bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)

bull The measure components represent the phenomenon being observed (eg life expectancy)

bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)

DataCube model for dataset structure

This part of the model could be re-used for describing the dimensions of any dataset also non-statistical

1) dimensions and semantics YES2) slices subsets YES

More than DCAT-AP

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 5: Dataset description: DCAT and other vocabularies

Why machine-readable descriptions

bull Data will be re-used by applicationsOthers will search and make use of your data through tools

Datasets have to be found by applications Datasets have to be understood by applications

bull Datasets should be managed in data repositories data catalogs Data catalogs have to provide enough dataset metadata to applications to allow them to find and understand datasets

bull Data catalogs themselves are implemented as applications so they need machine-readable dataset metadata

Ref ID URI Dimension2 Dimension3 Dimension4

Value11 Value12 Value13 Value14

Value21 Value22 Value23 Value24

Value31 Value32 Value33 Value34

Value41 Value42 Value43 Value44

Dataset

Datum

Record ldquomemberrdquo observation

File system

Data repository

DatasetRef Matadata1 Metadata2 Metadata3

Ref1 Address1 Value12 Value13

Ref2 Address2 Value22 Value23

Data catalog (also a dataset)

Tabular only for the sake of simplification it could be triples or other data structures

CatalogID Value1 Value2 Value3

Search ExportData type

We only focus on the data catalog level

Dataset metadata

So what metadata do applications need to find in data catalogs

Dataset metadata for applications

1) General metadata about the dataset1) identifier(s)2) who is responsible for it3) when and where the data were collected4) relations to organizations persons publications software

projects fundinghellip5) the conditions for re-use (rights licenses)6) provenance versions7) the specific coverage of the dataset (type of data thematic

coverage geographic coverage)

8) time and space slices subsets 9) the ldquodimensionsrdquo and ldquovariablesrdquo covered by the dataset10) the semantics of the dimensions (units of measure time

granularity syntax reference taxonomies)

WHY dimensions and semantics

Example of search by researcher

Irsquom doing research on plant phenotypes give me all datasets of crop phenotypic data that include the dimensions of time geographic location and height plus units of measure used for time and height where geographic location is expressed as coordinates (because my software only processes coordinates)

Data seralizations

bull The metadata above refer to the collection as a whole but additional technical metadata refer to the different ldquoserializationsrdquo of the datahellip

bull In many dataset description models the metadata about the data serialization is attached to the dataset

bull In other dataset description models information about the data serializations is not considered inherent to the nature and content of the data collection so it is not attached to the dataset but rather to related entities called ldquodistributionsrdquo

Distribution metadata for applications

Applications have to find metadata about the actual ldquoserializationsrdquo or ldquodistributionsrdquo of the dataset to understand

1 Where to retrieve it URL (data dump servicehellip)

2 the necessary technical specifications to retrieve and parse a

distribution of the dataset

- format (file format data format)

- protocol API parametershellip

And if different for different distributions again3 the conditions for re-use (rights licenses)

4 the semantics of the dimensions (units of measure time granularity syntax reference taxonomies) if different semantics for different distributions

WHY protocols and API params

Example

Some datasets are available behind a service eg RDF datasets are often retrieved as subsets of an RDF store through SPARQL queries or national research institutes provide access to datasets behind a web service accepting parameters to filter the datasets

Use case I have an application that can fetch from several SOAP web services (protocol) at once automatically if it knows the parameters required by the SOAP service and the required syntax and type of the parameters

General issue with all metadata

Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc

- The value should be standardized possibly a URI

- The value should be part of an authority list code list

Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values

No out-of-the-box solution

bull Do existing data catalog tools normally cover these metadataNO

bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO

BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)

Dataset description vocabularies

Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed

for full interoperability

Semantic interoperability

In this presentation we cover only RDF vocabularies with special focus on semantic interoperability

Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below

Dataset description vocabularies

bull DCAT vocabularybull RDF vocabulary for describing any dataset

bull Datasets can be standalone or part of a ldquocatalogrdquo

bull Metadata about dataset (collection) and related distributions

bull DataCube vocabularybull RDF vocabulary for describing statistical datasets

bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset

bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets

bull Useful especially for metadata related to RDF data services

Definition of ldquodatasetrdquo in DCAT

Definition given by the W3C Government Linked Data Working Group

A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo

The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions

Examples of distributions include a downloadable CSV file an API or an RSS feed

c

DCAT model

1) identifier(s)2) who is responsible for it3) when and where the data were

collected4) relations to organizations persons

publications software projects fundinghellip NO

5) the conditions for re-use (rights licenses)

6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO

DCAT and DCAT-AP

The DCAT Application profile for data portals in

Europe (DCAT-AP) is an extension of DCAT

It combines DCAT with the W3C Asset Description Metadata

Schema (ADMS) vocabulary plus classes and properties from

Dublin Core SKOS and Vcard in an Application profile

The elaboration of the DCAT-AP was a joint initiative of DG

CONNECT the EU Publications Office and the ISA Programme

A diagram of the full DCAT-AP specification is on the next slide

Full DCAT AP

versions

rights and provenance

standards

rights

format

relation

1) who is responsible for it MORE2) relations to organizations projects publications funding Partly

3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO

More than DCAT

Limitations of DCAT

It doesnrsquot cover

bull semantic relations to organisations persons software projects fundinghellip

bull dimensions and variables and syntax semantics of dimensions and variables

bull protocols and parameters for datasets available through APIs

bull time and space slices subsets

Combining DCAT with other vocabularies

ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo

(from the DCAT specification)

DataCube structure definition

A cube is organized according to a set of dimensions attributes and measures

bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)

bull The measure components represent the phenomenon being observed (eg life expectancy)

bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)

DataCube model for dataset structure

This part of the model could be re-used for describing the dimensions of any dataset also non-statistical

1) dimensions and semantics YES2) slices subsets YES

More than DCAT-AP

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 6: Dataset description: DCAT and other vocabularies

Ref ID URI Dimension2 Dimension3 Dimension4

Value11 Value12 Value13 Value14

Value21 Value22 Value23 Value24

Value31 Value32 Value33 Value34

Value41 Value42 Value43 Value44

Dataset

Datum

Record ldquomemberrdquo observation

File system

Data repository

DatasetRef Matadata1 Metadata2 Metadata3

Ref1 Address1 Value12 Value13

Ref2 Address2 Value22 Value23

Data catalog (also a dataset)

Tabular only for the sake of simplification it could be triples or other data structures

CatalogID Value1 Value2 Value3

Search ExportData type

We only focus on the data catalog level

Dataset metadata

So what metadata do applications need to find in data catalogs

Dataset metadata for applications

1) General metadata about the dataset1) identifier(s)2) who is responsible for it3) when and where the data were collected4) relations to organizations persons publications software

projects fundinghellip5) the conditions for re-use (rights licenses)6) provenance versions7) the specific coverage of the dataset (type of data thematic

coverage geographic coverage)

8) time and space slices subsets 9) the ldquodimensionsrdquo and ldquovariablesrdquo covered by the dataset10) the semantics of the dimensions (units of measure time

granularity syntax reference taxonomies)

WHY dimensions and semantics

Example of search by researcher

Irsquom doing research on plant phenotypes give me all datasets of crop phenotypic data that include the dimensions of time geographic location and height plus units of measure used for time and height where geographic location is expressed as coordinates (because my software only processes coordinates)

Data seralizations

bull The metadata above refer to the collection as a whole but additional technical metadata refer to the different ldquoserializationsrdquo of the datahellip

bull In many dataset description models the metadata about the data serialization is attached to the dataset

bull In other dataset description models information about the data serializations is not considered inherent to the nature and content of the data collection so it is not attached to the dataset but rather to related entities called ldquodistributionsrdquo

Distribution metadata for applications

Applications have to find metadata about the actual ldquoserializationsrdquo or ldquodistributionsrdquo of the dataset to understand

1 Where to retrieve it URL (data dump servicehellip)

2 the necessary technical specifications to retrieve and parse a

distribution of the dataset

- format (file format data format)

- protocol API parametershellip

And if different for different distributions again3 the conditions for re-use (rights licenses)

4 the semantics of the dimensions (units of measure time granularity syntax reference taxonomies) if different semantics for different distributions

WHY protocols and API params

Example

Some datasets are available behind a service eg RDF datasets are often retrieved as subsets of an RDF store through SPARQL queries or national research institutes provide access to datasets behind a web service accepting parameters to filter the datasets

Use case I have an application that can fetch from several SOAP web services (protocol) at once automatically if it knows the parameters required by the SOAP service and the required syntax and type of the parameters

General issue with all metadata

Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc

- The value should be standardized possibly a URI

- The value should be part of an authority list code list

Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values

No out-of-the-box solution

bull Do existing data catalog tools normally cover these metadataNO

bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO

BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)

Dataset description vocabularies

Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed

for full interoperability

Semantic interoperability

In this presentation we cover only RDF vocabularies with special focus on semantic interoperability

Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below

Dataset description vocabularies

bull DCAT vocabularybull RDF vocabulary for describing any dataset

bull Datasets can be standalone or part of a ldquocatalogrdquo

bull Metadata about dataset (collection) and related distributions

bull DataCube vocabularybull RDF vocabulary for describing statistical datasets

bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset

bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets

bull Useful especially for metadata related to RDF data services

Definition of ldquodatasetrdquo in DCAT

Definition given by the W3C Government Linked Data Working Group

A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo

The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions

Examples of distributions include a downloadable CSV file an API or an RSS feed

c

DCAT model

1) identifier(s)2) who is responsible for it3) when and where the data were

collected4) relations to organizations persons

publications software projects fundinghellip NO

5) the conditions for re-use (rights licenses)

6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO

DCAT and DCAT-AP

The DCAT Application profile for data portals in

Europe (DCAT-AP) is an extension of DCAT

It combines DCAT with the W3C Asset Description Metadata

Schema (ADMS) vocabulary plus classes and properties from

Dublin Core SKOS and Vcard in an Application profile

The elaboration of the DCAT-AP was a joint initiative of DG

CONNECT the EU Publications Office and the ISA Programme

A diagram of the full DCAT-AP specification is on the next slide

Full DCAT AP

versions

rights and provenance

standards

rights

format

relation

1) who is responsible for it MORE2) relations to organizations projects publications funding Partly

3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO

More than DCAT

Limitations of DCAT

It doesnrsquot cover

bull semantic relations to organisations persons software projects fundinghellip

bull dimensions and variables and syntax semantics of dimensions and variables

bull protocols and parameters for datasets available through APIs

bull time and space slices subsets

Combining DCAT with other vocabularies

ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo

(from the DCAT specification)

DataCube structure definition

A cube is organized according to a set of dimensions attributes and measures

bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)

bull The measure components represent the phenomenon being observed (eg life expectancy)

bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)

DataCube model for dataset structure

This part of the model could be re-used for describing the dimensions of any dataset also non-statistical

1) dimensions and semantics YES2) slices subsets YES

More than DCAT-AP

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 7: Dataset description: DCAT and other vocabularies

Dataset metadata

So what metadata do applications need to find in data catalogs

Dataset metadata for applications

1) General metadata about the dataset1) identifier(s)2) who is responsible for it3) when and where the data were collected4) relations to organizations persons publications software

projects fundinghellip5) the conditions for re-use (rights licenses)6) provenance versions7) the specific coverage of the dataset (type of data thematic

coverage geographic coverage)

8) time and space slices subsets 9) the ldquodimensionsrdquo and ldquovariablesrdquo covered by the dataset10) the semantics of the dimensions (units of measure time

granularity syntax reference taxonomies)

WHY dimensions and semantics

Example of search by researcher

Irsquom doing research on plant phenotypes give me all datasets of crop phenotypic data that include the dimensions of time geographic location and height plus units of measure used for time and height where geographic location is expressed as coordinates (because my software only processes coordinates)

Data seralizations

bull The metadata above refer to the collection as a whole but additional technical metadata refer to the different ldquoserializationsrdquo of the datahellip

bull In many dataset description models the metadata about the data serialization is attached to the dataset

bull In other dataset description models information about the data serializations is not considered inherent to the nature and content of the data collection so it is not attached to the dataset but rather to related entities called ldquodistributionsrdquo

Distribution metadata for applications

Applications have to find metadata about the actual ldquoserializationsrdquo or ldquodistributionsrdquo of the dataset to understand

1 Where to retrieve it URL (data dump servicehellip)

2 the necessary technical specifications to retrieve and parse a

distribution of the dataset

- format (file format data format)

- protocol API parametershellip

And if different for different distributions again3 the conditions for re-use (rights licenses)

4 the semantics of the dimensions (units of measure time granularity syntax reference taxonomies) if different semantics for different distributions

WHY protocols and API params

Example

Some datasets are available behind a service eg RDF datasets are often retrieved as subsets of an RDF store through SPARQL queries or national research institutes provide access to datasets behind a web service accepting parameters to filter the datasets

Use case I have an application that can fetch from several SOAP web services (protocol) at once automatically if it knows the parameters required by the SOAP service and the required syntax and type of the parameters

General issue with all metadata

Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc

- The value should be standardized possibly a URI

- The value should be part of an authority list code list

Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values

No out-of-the-box solution

bull Do existing data catalog tools normally cover these metadataNO

bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO

BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)

Dataset description vocabularies

Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed

for full interoperability

Semantic interoperability

In this presentation we cover only RDF vocabularies with special focus on semantic interoperability

Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below

Dataset description vocabularies

bull DCAT vocabularybull RDF vocabulary for describing any dataset

bull Datasets can be standalone or part of a ldquocatalogrdquo

bull Metadata about dataset (collection) and related distributions

bull DataCube vocabularybull RDF vocabulary for describing statistical datasets

bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset

bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets

bull Useful especially for metadata related to RDF data services

Definition of ldquodatasetrdquo in DCAT

Definition given by the W3C Government Linked Data Working Group

A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo

The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions

Examples of distributions include a downloadable CSV file an API or an RSS feed

c

DCAT model

1) identifier(s)2) who is responsible for it3) when and where the data were

collected4) relations to organizations persons

publications software projects fundinghellip NO

5) the conditions for re-use (rights licenses)

6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO

DCAT and DCAT-AP

The DCAT Application profile for data portals in

Europe (DCAT-AP) is an extension of DCAT

It combines DCAT with the W3C Asset Description Metadata

Schema (ADMS) vocabulary plus classes and properties from

Dublin Core SKOS and Vcard in an Application profile

The elaboration of the DCAT-AP was a joint initiative of DG

CONNECT the EU Publications Office and the ISA Programme

A diagram of the full DCAT-AP specification is on the next slide

Full DCAT AP

versions

rights and provenance

standards

rights

format

relation

1) who is responsible for it MORE2) relations to organizations projects publications funding Partly

3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO

More than DCAT

Limitations of DCAT

It doesnrsquot cover

bull semantic relations to organisations persons software projects fundinghellip

bull dimensions and variables and syntax semantics of dimensions and variables

bull protocols and parameters for datasets available through APIs

bull time and space slices subsets

Combining DCAT with other vocabularies

ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo

(from the DCAT specification)

DataCube structure definition

A cube is organized according to a set of dimensions attributes and measures

bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)

bull The measure components represent the phenomenon being observed (eg life expectancy)

bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)

DataCube model for dataset structure

This part of the model could be re-used for describing the dimensions of any dataset also non-statistical

1) dimensions and semantics YES2) slices subsets YES

More than DCAT-AP

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 8: Dataset description: DCAT and other vocabularies

Dataset metadata for applications

1) General metadata about the dataset1) identifier(s)2) who is responsible for it3) when and where the data were collected4) relations to organizations persons publications software

projects fundinghellip5) the conditions for re-use (rights licenses)6) provenance versions7) the specific coverage of the dataset (type of data thematic

coverage geographic coverage)

8) time and space slices subsets 9) the ldquodimensionsrdquo and ldquovariablesrdquo covered by the dataset10) the semantics of the dimensions (units of measure time

granularity syntax reference taxonomies)

WHY dimensions and semantics

Example of search by researcher

Irsquom doing research on plant phenotypes give me all datasets of crop phenotypic data that include the dimensions of time geographic location and height plus units of measure used for time and height where geographic location is expressed as coordinates (because my software only processes coordinates)

Data seralizations

bull The metadata above refer to the collection as a whole but additional technical metadata refer to the different ldquoserializationsrdquo of the datahellip

bull In many dataset description models the metadata about the data serialization is attached to the dataset

bull In other dataset description models information about the data serializations is not considered inherent to the nature and content of the data collection so it is not attached to the dataset but rather to related entities called ldquodistributionsrdquo

Distribution metadata for applications

Applications have to find metadata about the actual ldquoserializationsrdquo or ldquodistributionsrdquo of the dataset to understand

1 Where to retrieve it URL (data dump servicehellip)

2 the necessary technical specifications to retrieve and parse a

distribution of the dataset

- format (file format data format)

- protocol API parametershellip

And if different for different distributions again3 the conditions for re-use (rights licenses)

4 the semantics of the dimensions (units of measure time granularity syntax reference taxonomies) if different semantics for different distributions

WHY protocols and API params

Example

Some datasets are available behind a service eg RDF datasets are often retrieved as subsets of an RDF store through SPARQL queries or national research institutes provide access to datasets behind a web service accepting parameters to filter the datasets

Use case I have an application that can fetch from several SOAP web services (protocol) at once automatically if it knows the parameters required by the SOAP service and the required syntax and type of the parameters

General issue with all metadata

Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc

- The value should be standardized possibly a URI

- The value should be part of an authority list code list

Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values

No out-of-the-box solution

bull Do existing data catalog tools normally cover these metadataNO

bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO

BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)

Dataset description vocabularies

Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed

for full interoperability

Semantic interoperability

In this presentation we cover only RDF vocabularies with special focus on semantic interoperability

Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below

Dataset description vocabularies

bull DCAT vocabularybull RDF vocabulary for describing any dataset

bull Datasets can be standalone or part of a ldquocatalogrdquo

bull Metadata about dataset (collection) and related distributions

bull DataCube vocabularybull RDF vocabulary for describing statistical datasets

bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset

bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets

bull Useful especially for metadata related to RDF data services

Definition of ldquodatasetrdquo in DCAT

Definition given by the W3C Government Linked Data Working Group

A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo

The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions

Examples of distributions include a downloadable CSV file an API or an RSS feed

c

DCAT model

1) identifier(s)2) who is responsible for it3) when and where the data were

collected4) relations to organizations persons

publications software projects fundinghellip NO

5) the conditions for re-use (rights licenses)

6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO

DCAT and DCAT-AP

The DCAT Application profile for data portals in

Europe (DCAT-AP) is an extension of DCAT

It combines DCAT with the W3C Asset Description Metadata

Schema (ADMS) vocabulary plus classes and properties from

Dublin Core SKOS and Vcard in an Application profile

The elaboration of the DCAT-AP was a joint initiative of DG

CONNECT the EU Publications Office and the ISA Programme

A diagram of the full DCAT-AP specification is on the next slide

Full DCAT AP

versions

rights and provenance

standards

rights

format

relation

1) who is responsible for it MORE2) relations to organizations projects publications funding Partly

3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO

More than DCAT

Limitations of DCAT

It doesnrsquot cover

bull semantic relations to organisations persons software projects fundinghellip

bull dimensions and variables and syntax semantics of dimensions and variables

bull protocols and parameters for datasets available through APIs

bull time and space slices subsets

Combining DCAT with other vocabularies

ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo

(from the DCAT specification)

DataCube structure definition

A cube is organized according to a set of dimensions attributes and measures

bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)

bull The measure components represent the phenomenon being observed (eg life expectancy)

bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)

DataCube model for dataset structure

This part of the model could be re-used for describing the dimensions of any dataset also non-statistical

1) dimensions and semantics YES2) slices subsets YES

More than DCAT-AP

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 9: Dataset description: DCAT and other vocabularies

WHY dimensions and semantics

Example of search by researcher

Irsquom doing research on plant phenotypes give me all datasets of crop phenotypic data that include the dimensions of time geographic location and height plus units of measure used for time and height where geographic location is expressed as coordinates (because my software only processes coordinates)

Data seralizations

bull The metadata above refer to the collection as a whole but additional technical metadata refer to the different ldquoserializationsrdquo of the datahellip

bull In many dataset description models the metadata about the data serialization is attached to the dataset

bull In other dataset description models information about the data serializations is not considered inherent to the nature and content of the data collection so it is not attached to the dataset but rather to related entities called ldquodistributionsrdquo

Distribution metadata for applications

Applications have to find metadata about the actual ldquoserializationsrdquo or ldquodistributionsrdquo of the dataset to understand

1 Where to retrieve it URL (data dump servicehellip)

2 the necessary technical specifications to retrieve and parse a

distribution of the dataset

- format (file format data format)

- protocol API parametershellip

And if different for different distributions again3 the conditions for re-use (rights licenses)

4 the semantics of the dimensions (units of measure time granularity syntax reference taxonomies) if different semantics for different distributions

WHY protocols and API params

Example

Some datasets are available behind a service eg RDF datasets are often retrieved as subsets of an RDF store through SPARQL queries or national research institutes provide access to datasets behind a web service accepting parameters to filter the datasets

Use case I have an application that can fetch from several SOAP web services (protocol) at once automatically if it knows the parameters required by the SOAP service and the required syntax and type of the parameters

General issue with all metadata

Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc

- The value should be standardized possibly a URI

- The value should be part of an authority list code list

Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values

No out-of-the-box solution

bull Do existing data catalog tools normally cover these metadataNO

bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO

BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)

Dataset description vocabularies

Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed

for full interoperability

Semantic interoperability

In this presentation we cover only RDF vocabularies with special focus on semantic interoperability

Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below

Dataset description vocabularies

bull DCAT vocabularybull RDF vocabulary for describing any dataset

bull Datasets can be standalone or part of a ldquocatalogrdquo

bull Metadata about dataset (collection) and related distributions

bull DataCube vocabularybull RDF vocabulary for describing statistical datasets

bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset

bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets

bull Useful especially for metadata related to RDF data services

Definition of ldquodatasetrdquo in DCAT

Definition given by the W3C Government Linked Data Working Group

A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo

The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions

Examples of distributions include a downloadable CSV file an API or an RSS feed

c

DCAT model

1) identifier(s)2) who is responsible for it3) when and where the data were

collected4) relations to organizations persons

publications software projects fundinghellip NO

5) the conditions for re-use (rights licenses)

6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO

DCAT and DCAT-AP

The DCAT Application profile for data portals in

Europe (DCAT-AP) is an extension of DCAT

It combines DCAT with the W3C Asset Description Metadata

Schema (ADMS) vocabulary plus classes and properties from

Dublin Core SKOS and Vcard in an Application profile

The elaboration of the DCAT-AP was a joint initiative of DG

CONNECT the EU Publications Office and the ISA Programme

A diagram of the full DCAT-AP specification is on the next slide

Full DCAT AP

versions

rights and provenance

standards

rights

format

relation

1) who is responsible for it MORE2) relations to organizations projects publications funding Partly

3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO

More than DCAT

Limitations of DCAT

It doesnrsquot cover

bull semantic relations to organisations persons software projects fundinghellip

bull dimensions and variables and syntax semantics of dimensions and variables

bull protocols and parameters for datasets available through APIs

bull time and space slices subsets

Combining DCAT with other vocabularies

ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo

(from the DCAT specification)

DataCube structure definition

A cube is organized according to a set of dimensions attributes and measures

bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)

bull The measure components represent the phenomenon being observed (eg life expectancy)

bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)

DataCube model for dataset structure

This part of the model could be re-used for describing the dimensions of any dataset also non-statistical

1) dimensions and semantics YES2) slices subsets YES

More than DCAT-AP

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 10: Dataset description: DCAT and other vocabularies

Data seralizations

bull The metadata above refer to the collection as a whole but additional technical metadata refer to the different ldquoserializationsrdquo of the datahellip

bull In many dataset description models the metadata about the data serialization is attached to the dataset

bull In other dataset description models information about the data serializations is not considered inherent to the nature and content of the data collection so it is not attached to the dataset but rather to related entities called ldquodistributionsrdquo

Distribution metadata for applications

Applications have to find metadata about the actual ldquoserializationsrdquo or ldquodistributionsrdquo of the dataset to understand

1 Where to retrieve it URL (data dump servicehellip)

2 the necessary technical specifications to retrieve and parse a

distribution of the dataset

- format (file format data format)

- protocol API parametershellip

And if different for different distributions again3 the conditions for re-use (rights licenses)

4 the semantics of the dimensions (units of measure time granularity syntax reference taxonomies) if different semantics for different distributions

WHY protocols and API params

Example

Some datasets are available behind a service eg RDF datasets are often retrieved as subsets of an RDF store through SPARQL queries or national research institutes provide access to datasets behind a web service accepting parameters to filter the datasets

Use case I have an application that can fetch from several SOAP web services (protocol) at once automatically if it knows the parameters required by the SOAP service and the required syntax and type of the parameters

General issue with all metadata

Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc

- The value should be standardized possibly a URI

- The value should be part of an authority list code list

Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values

No out-of-the-box solution

bull Do existing data catalog tools normally cover these metadataNO

bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO

BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)

Dataset description vocabularies

Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed

for full interoperability

Semantic interoperability

In this presentation we cover only RDF vocabularies with special focus on semantic interoperability

Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below

Dataset description vocabularies

bull DCAT vocabularybull RDF vocabulary for describing any dataset

bull Datasets can be standalone or part of a ldquocatalogrdquo

bull Metadata about dataset (collection) and related distributions

bull DataCube vocabularybull RDF vocabulary for describing statistical datasets

bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset

bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets

bull Useful especially for metadata related to RDF data services

Definition of ldquodatasetrdquo in DCAT

Definition given by the W3C Government Linked Data Working Group

A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo

The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions

Examples of distributions include a downloadable CSV file an API or an RSS feed

c

DCAT model

1) identifier(s)2) who is responsible for it3) when and where the data were

collected4) relations to organizations persons

publications software projects fundinghellip NO

5) the conditions for re-use (rights licenses)

6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO

DCAT and DCAT-AP

The DCAT Application profile for data portals in

Europe (DCAT-AP) is an extension of DCAT

It combines DCAT with the W3C Asset Description Metadata

Schema (ADMS) vocabulary plus classes and properties from

Dublin Core SKOS and Vcard in an Application profile

The elaboration of the DCAT-AP was a joint initiative of DG

CONNECT the EU Publications Office and the ISA Programme

A diagram of the full DCAT-AP specification is on the next slide

Full DCAT AP

versions

rights and provenance

standards

rights

format

relation

1) who is responsible for it MORE2) relations to organizations projects publications funding Partly

3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO

More than DCAT

Limitations of DCAT

It doesnrsquot cover

bull semantic relations to organisations persons software projects fundinghellip

bull dimensions and variables and syntax semantics of dimensions and variables

bull protocols and parameters for datasets available through APIs

bull time and space slices subsets

Combining DCAT with other vocabularies

ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo

(from the DCAT specification)

DataCube structure definition

A cube is organized according to a set of dimensions attributes and measures

bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)

bull The measure components represent the phenomenon being observed (eg life expectancy)

bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)

DataCube model for dataset structure

This part of the model could be re-used for describing the dimensions of any dataset also non-statistical

1) dimensions and semantics YES2) slices subsets YES

More than DCAT-AP

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 11: Dataset description: DCAT and other vocabularies

Distribution metadata for applications

Applications have to find metadata about the actual ldquoserializationsrdquo or ldquodistributionsrdquo of the dataset to understand

1 Where to retrieve it URL (data dump servicehellip)

2 the necessary technical specifications to retrieve and parse a

distribution of the dataset

- format (file format data format)

- protocol API parametershellip

And if different for different distributions again3 the conditions for re-use (rights licenses)

4 the semantics of the dimensions (units of measure time granularity syntax reference taxonomies) if different semantics for different distributions

WHY protocols and API params

Example

Some datasets are available behind a service eg RDF datasets are often retrieved as subsets of an RDF store through SPARQL queries or national research institutes provide access to datasets behind a web service accepting parameters to filter the datasets

Use case I have an application that can fetch from several SOAP web services (protocol) at once automatically if it knows the parameters required by the SOAP service and the required syntax and type of the parameters

General issue with all metadata

Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc

- The value should be standardized possibly a URI

- The value should be part of an authority list code list

Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values

No out-of-the-box solution

bull Do existing data catalog tools normally cover these metadataNO

bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO

BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)

Dataset description vocabularies

Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed

for full interoperability

Semantic interoperability

In this presentation we cover only RDF vocabularies with special focus on semantic interoperability

Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below

Dataset description vocabularies

bull DCAT vocabularybull RDF vocabulary for describing any dataset

bull Datasets can be standalone or part of a ldquocatalogrdquo

bull Metadata about dataset (collection) and related distributions

bull DataCube vocabularybull RDF vocabulary for describing statistical datasets

bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset

bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets

bull Useful especially for metadata related to RDF data services

Definition of ldquodatasetrdquo in DCAT

Definition given by the W3C Government Linked Data Working Group

A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo

The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions

Examples of distributions include a downloadable CSV file an API or an RSS feed

c

DCAT model

1) identifier(s)2) who is responsible for it3) when and where the data were

collected4) relations to organizations persons

publications software projects fundinghellip NO

5) the conditions for re-use (rights licenses)

6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO

DCAT and DCAT-AP

The DCAT Application profile for data portals in

Europe (DCAT-AP) is an extension of DCAT

It combines DCAT with the W3C Asset Description Metadata

Schema (ADMS) vocabulary plus classes and properties from

Dublin Core SKOS and Vcard in an Application profile

The elaboration of the DCAT-AP was a joint initiative of DG

CONNECT the EU Publications Office and the ISA Programme

A diagram of the full DCAT-AP specification is on the next slide

Full DCAT AP

versions

rights and provenance

standards

rights

format

relation

1) who is responsible for it MORE2) relations to organizations projects publications funding Partly

3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO

More than DCAT

Limitations of DCAT

It doesnrsquot cover

bull semantic relations to organisations persons software projects fundinghellip

bull dimensions and variables and syntax semantics of dimensions and variables

bull protocols and parameters for datasets available through APIs

bull time and space slices subsets

Combining DCAT with other vocabularies

ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo

(from the DCAT specification)

DataCube structure definition

A cube is organized according to a set of dimensions attributes and measures

bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)

bull The measure components represent the phenomenon being observed (eg life expectancy)

bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)

DataCube model for dataset structure

This part of the model could be re-used for describing the dimensions of any dataset also non-statistical

1) dimensions and semantics YES2) slices subsets YES

More than DCAT-AP

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 12: Dataset description: DCAT and other vocabularies

WHY protocols and API params

Example

Some datasets are available behind a service eg RDF datasets are often retrieved as subsets of an RDF store through SPARQL queries or national research institutes provide access to datasets behind a web service accepting parameters to filter the datasets

Use case I have an application that can fetch from several SOAP web services (protocol) at once automatically if it knows the parameters required by the SOAP service and the required syntax and type of the parameters

General issue with all metadata

Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc

- The value should be standardized possibly a URI

- The value should be part of an authority list code list

Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values

No out-of-the-box solution

bull Do existing data catalog tools normally cover these metadataNO

bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO

BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)

Dataset description vocabularies

Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed

for full interoperability

Semantic interoperability

In this presentation we cover only RDF vocabularies with special focus on semantic interoperability

Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below

Dataset description vocabularies

bull DCAT vocabularybull RDF vocabulary for describing any dataset

bull Datasets can be standalone or part of a ldquocatalogrdquo

bull Metadata about dataset (collection) and related distributions

bull DataCube vocabularybull RDF vocabulary for describing statistical datasets

bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset

bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets

bull Useful especially for metadata related to RDF data services

Definition of ldquodatasetrdquo in DCAT

Definition given by the W3C Government Linked Data Working Group

A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo

The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions

Examples of distributions include a downloadable CSV file an API or an RSS feed

c

DCAT model

1) identifier(s)2) who is responsible for it3) when and where the data were

collected4) relations to organizations persons

publications software projects fundinghellip NO

5) the conditions for re-use (rights licenses)

6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO

DCAT and DCAT-AP

The DCAT Application profile for data portals in

Europe (DCAT-AP) is an extension of DCAT

It combines DCAT with the W3C Asset Description Metadata

Schema (ADMS) vocabulary plus classes and properties from

Dublin Core SKOS and Vcard in an Application profile

The elaboration of the DCAT-AP was a joint initiative of DG

CONNECT the EU Publications Office and the ISA Programme

A diagram of the full DCAT-AP specification is on the next slide

Full DCAT AP

versions

rights and provenance

standards

rights

format

relation

1) who is responsible for it MORE2) relations to organizations projects publications funding Partly

3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO

More than DCAT

Limitations of DCAT

It doesnrsquot cover

bull semantic relations to organisations persons software projects fundinghellip

bull dimensions and variables and syntax semantics of dimensions and variables

bull protocols and parameters for datasets available through APIs

bull time and space slices subsets

Combining DCAT with other vocabularies

ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo

(from the DCAT specification)

DataCube structure definition

A cube is organized according to a set of dimensions attributes and measures

bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)

bull The measure components represent the phenomenon being observed (eg life expectancy)

bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)

DataCube model for dataset structure

This part of the model could be re-used for describing the dimensions of any dataset also non-statistical

1) dimensions and semantics YES2) slices subsets YES

More than DCAT-AP

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 13: Dataset description: DCAT and other vocabularies

General issue with all metadata

Standardization of the values eg for ldquothematic coveragerdquo or ldquodimensionsrdquo of datasets ldquoformatrdquo or ldquoprotocol usedrdquo of distributions etc

- The value should be standardized possibly a URI

- The value should be part of an authority list code list

Andhellip There is no authority ldquovalue vocabularyrdquo or code list for many of these values

No out-of-the-box solution

bull Do existing data catalog tools normally cover these metadataNO

bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO

BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)

Dataset description vocabularies

Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed

for full interoperability

Semantic interoperability

In this presentation we cover only RDF vocabularies with special focus on semantic interoperability

Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below

Dataset description vocabularies

bull DCAT vocabularybull RDF vocabulary for describing any dataset

bull Datasets can be standalone or part of a ldquocatalogrdquo

bull Metadata about dataset (collection) and related distributions

bull DataCube vocabularybull RDF vocabulary for describing statistical datasets

bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset

bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets

bull Useful especially for metadata related to RDF data services

Definition of ldquodatasetrdquo in DCAT

Definition given by the W3C Government Linked Data Working Group

A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo

The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions

Examples of distributions include a downloadable CSV file an API or an RSS feed

c

DCAT model

1) identifier(s)2) who is responsible for it3) when and where the data were

collected4) relations to organizations persons

publications software projects fundinghellip NO

5) the conditions for re-use (rights licenses)

6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO

DCAT and DCAT-AP

The DCAT Application profile for data portals in

Europe (DCAT-AP) is an extension of DCAT

It combines DCAT with the W3C Asset Description Metadata

Schema (ADMS) vocabulary plus classes and properties from

Dublin Core SKOS and Vcard in an Application profile

The elaboration of the DCAT-AP was a joint initiative of DG

CONNECT the EU Publications Office and the ISA Programme

A diagram of the full DCAT-AP specification is on the next slide

Full DCAT AP

versions

rights and provenance

standards

rights

format

relation

1) who is responsible for it MORE2) relations to organizations projects publications funding Partly

3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO

More than DCAT

Limitations of DCAT

It doesnrsquot cover

bull semantic relations to organisations persons software projects fundinghellip

bull dimensions and variables and syntax semantics of dimensions and variables

bull protocols and parameters for datasets available through APIs

bull time and space slices subsets

Combining DCAT with other vocabularies

ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo

(from the DCAT specification)

DataCube structure definition

A cube is organized according to a set of dimensions attributes and measures

bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)

bull The measure components represent the phenomenon being observed (eg life expectancy)

bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)

DataCube model for dataset structure

This part of the model could be re-used for describing the dimensions of any dataset also non-statistical

1) dimensions and semantics YES2) slices subsets YES

More than DCAT-AP

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 14: Dataset description: DCAT and other vocabularies

No out-of-the-box solution

bull Do existing data catalog tools normally cover these metadataNO

bull Do existing metadata vocabularies (RDF and not) cover all these metadata Or do they adopt the same modelNO

BUT by using even basic metadata to describe datasets in data catalogs ldquopublishers increase discoverability and enable applications to consume metadata from multiple catalogs This further enables decentralized publishing of catalogs and facilitates federated dataset search across sitesrdquo(from W3C page on ldquoBest Practices for Publishing Linked Datardquo)

Dataset description vocabularies

Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed

for full interoperability

Semantic interoperability

In this presentation we cover only RDF vocabularies with special focus on semantic interoperability

Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below

Dataset description vocabularies

bull DCAT vocabularybull RDF vocabulary for describing any dataset

bull Datasets can be standalone or part of a ldquocatalogrdquo

bull Metadata about dataset (collection) and related distributions

bull DataCube vocabularybull RDF vocabulary for describing statistical datasets

bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset

bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets

bull Useful especially for metadata related to RDF data services

Definition of ldquodatasetrdquo in DCAT

Definition given by the W3C Government Linked Data Working Group

A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo

The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions

Examples of distributions include a downloadable CSV file an API or an RSS feed

c

DCAT model

1) identifier(s)2) who is responsible for it3) when and where the data were

collected4) relations to organizations persons

publications software projects fundinghellip NO

5) the conditions for re-use (rights licenses)

6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO

DCAT and DCAT-AP

The DCAT Application profile for data portals in

Europe (DCAT-AP) is an extension of DCAT

It combines DCAT with the W3C Asset Description Metadata

Schema (ADMS) vocabulary plus classes and properties from

Dublin Core SKOS and Vcard in an Application profile

The elaboration of the DCAT-AP was a joint initiative of DG

CONNECT the EU Publications Office and the ISA Programme

A diagram of the full DCAT-AP specification is on the next slide

Full DCAT AP

versions

rights and provenance

standards

rights

format

relation

1) who is responsible for it MORE2) relations to organizations projects publications funding Partly

3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO

More than DCAT

Limitations of DCAT

It doesnrsquot cover

bull semantic relations to organisations persons software projects fundinghellip

bull dimensions and variables and syntax semantics of dimensions and variables

bull protocols and parameters for datasets available through APIs

bull time and space slices subsets

Combining DCAT with other vocabularies

ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo

(from the DCAT specification)

DataCube structure definition

A cube is organized according to a set of dimensions attributes and measures

bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)

bull The measure components represent the phenomenon being observed (eg life expectancy)

bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)

DataCube model for dataset structure

This part of the model could be re-used for describing the dimensions of any dataset also non-statistical

1) dimensions and semantics YES2) slices subsets YES

More than DCAT-AP

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 15: Dataset description: DCAT and other vocabularies

Dataset description vocabularies

Letrsquos see if the main vocabularies to describe datasets provide themetadata we think are needed

for full interoperability

Semantic interoperability

In this presentation we cover only RDF vocabularies with special focus on semantic interoperability

Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below

Dataset description vocabularies

bull DCAT vocabularybull RDF vocabulary for describing any dataset

bull Datasets can be standalone or part of a ldquocatalogrdquo

bull Metadata about dataset (collection) and related distributions

bull DataCube vocabularybull RDF vocabulary for describing statistical datasets

bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset

bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets

bull Useful especially for metadata related to RDF data services

Definition of ldquodatasetrdquo in DCAT

Definition given by the W3C Government Linked Data Working Group

A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo

The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions

Examples of distributions include a downloadable CSV file an API or an RSS feed

c

DCAT model

1) identifier(s)2) who is responsible for it3) when and where the data were

collected4) relations to organizations persons

publications software projects fundinghellip NO

5) the conditions for re-use (rights licenses)

6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO

DCAT and DCAT-AP

The DCAT Application profile for data portals in

Europe (DCAT-AP) is an extension of DCAT

It combines DCAT with the W3C Asset Description Metadata

Schema (ADMS) vocabulary plus classes and properties from

Dublin Core SKOS and Vcard in an Application profile

The elaboration of the DCAT-AP was a joint initiative of DG

CONNECT the EU Publications Office and the ISA Programme

A diagram of the full DCAT-AP specification is on the next slide

Full DCAT AP

versions

rights and provenance

standards

rights

format

relation

1) who is responsible for it MORE2) relations to organizations projects publications funding Partly

3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO

More than DCAT

Limitations of DCAT

It doesnrsquot cover

bull semantic relations to organisations persons software projects fundinghellip

bull dimensions and variables and syntax semantics of dimensions and variables

bull protocols and parameters for datasets available through APIs

bull time and space slices subsets

Combining DCAT with other vocabularies

ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo

(from the DCAT specification)

DataCube structure definition

A cube is organized according to a set of dimensions attributes and measures

bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)

bull The measure components represent the phenomenon being observed (eg life expectancy)

bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)

DataCube model for dataset structure

This part of the model could be re-used for describing the dimensions of any dataset also non-statistical

1) dimensions and semantics YES2) slices subsets YES

More than DCAT-AP

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 16: Dataset description: DCAT and other vocabularies

Semantic interoperability

In this presentation we cover only RDF vocabularies with special focus on semantic interoperability

Dataset metadata have been managed in several ways before semantic technologies see NetCDF or HDF5 structures and various hierarchical array-based structures used especially in observations datasets ndashincluding dataset metadata at the top and data arrays below

Dataset description vocabularies

bull DCAT vocabularybull RDF vocabulary for describing any dataset

bull Datasets can be standalone or part of a ldquocatalogrdquo

bull Metadata about dataset (collection) and related distributions

bull DataCube vocabularybull RDF vocabulary for describing statistical datasets

bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset

bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets

bull Useful especially for metadata related to RDF data services

Definition of ldquodatasetrdquo in DCAT

Definition given by the W3C Government Linked Data Working Group

A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo

The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions

Examples of distributions include a downloadable CSV file an API or an RSS feed

c

DCAT model

1) identifier(s)2) who is responsible for it3) when and where the data were

collected4) relations to organizations persons

publications software projects fundinghellip NO

5) the conditions for re-use (rights licenses)

6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO

DCAT and DCAT-AP

The DCAT Application profile for data portals in

Europe (DCAT-AP) is an extension of DCAT

It combines DCAT with the W3C Asset Description Metadata

Schema (ADMS) vocabulary plus classes and properties from

Dublin Core SKOS and Vcard in an Application profile

The elaboration of the DCAT-AP was a joint initiative of DG

CONNECT the EU Publications Office and the ISA Programme

A diagram of the full DCAT-AP specification is on the next slide

Full DCAT AP

versions

rights and provenance

standards

rights

format

relation

1) who is responsible for it MORE2) relations to organizations projects publications funding Partly

3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO

More than DCAT

Limitations of DCAT

It doesnrsquot cover

bull semantic relations to organisations persons software projects fundinghellip

bull dimensions and variables and syntax semantics of dimensions and variables

bull protocols and parameters for datasets available through APIs

bull time and space slices subsets

Combining DCAT with other vocabularies

ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo

(from the DCAT specification)

DataCube structure definition

A cube is organized according to a set of dimensions attributes and measures

bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)

bull The measure components represent the phenomenon being observed (eg life expectancy)

bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)

DataCube model for dataset structure

This part of the model could be re-used for describing the dimensions of any dataset also non-statistical

1) dimensions and semantics YES2) slices subsets YES

More than DCAT-AP

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 17: Dataset description: DCAT and other vocabularies

Dataset description vocabularies

bull DCAT vocabularybull RDF vocabulary for describing any dataset

bull Datasets can be standalone or part of a ldquocatalogrdquo

bull Metadata about dataset (collection) and related distributions

bull DataCube vocabularybull RDF vocabulary for describing statistical datasets

bull Useful for attaching metadata about the ldquodata structurerdquo of a dataset

bull VOID vocabularybull RDF vocabulary for expressing metadata about RDF datasets

bull Useful especially for metadata related to RDF data services

Definition of ldquodatasetrdquo in DCAT

Definition given by the W3C Government Linked Data Working Group

A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo

The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions

Examples of distributions include a downloadable CSV file an API or an RSS feed

c

DCAT model

1) identifier(s)2) who is responsible for it3) when and where the data were

collected4) relations to organizations persons

publications software projects fundinghellip NO

5) the conditions for re-use (rights licenses)

6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO

DCAT and DCAT-AP

The DCAT Application profile for data portals in

Europe (DCAT-AP) is an extension of DCAT

It combines DCAT with the W3C Asset Description Metadata

Schema (ADMS) vocabulary plus classes and properties from

Dublin Core SKOS and Vcard in an Application profile

The elaboration of the DCAT-AP was a joint initiative of DG

CONNECT the EU Publications Office and the ISA Programme

A diagram of the full DCAT-AP specification is on the next slide

Full DCAT AP

versions

rights and provenance

standards

rights

format

relation

1) who is responsible for it MORE2) relations to organizations projects publications funding Partly

3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO

More than DCAT

Limitations of DCAT

It doesnrsquot cover

bull semantic relations to organisations persons software projects fundinghellip

bull dimensions and variables and syntax semantics of dimensions and variables

bull protocols and parameters for datasets available through APIs

bull time and space slices subsets

Combining DCAT with other vocabularies

ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo

(from the DCAT specification)

DataCube structure definition

A cube is organized according to a set of dimensions attributes and measures

bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)

bull The measure components represent the phenomenon being observed (eg life expectancy)

bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)

DataCube model for dataset structure

This part of the model could be re-used for describing the dimensions of any dataset also non-statistical

1) dimensions and semantics YES2) slices subsets YES

More than DCAT-AP

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 18: Dataset description: DCAT and other vocabularies

Definition of ldquodatasetrdquo in DCAT

Definition given by the W3C Government Linked Data Working Group

A dataset is ldquoa collection of data published or curated by a single source and available for access or download in one or more formatsrdquo

The ldquoinstancesrdquo of the dataset ldquoavailable for access or download in one or more formatsrdquo are called ldquodistributionsrdquo A dataset can have many distributions

Examples of distributions include a downloadable CSV file an API or an RSS feed

c

DCAT model

1) identifier(s)2) who is responsible for it3) when and where the data were

collected4) relations to organizations persons

publications software projects fundinghellip NO

5) the conditions for re-use (rights licenses)

6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO

DCAT and DCAT-AP

The DCAT Application profile for data portals in

Europe (DCAT-AP) is an extension of DCAT

It combines DCAT with the W3C Asset Description Metadata

Schema (ADMS) vocabulary plus classes and properties from

Dublin Core SKOS and Vcard in an Application profile

The elaboration of the DCAT-AP was a joint initiative of DG

CONNECT the EU Publications Office and the ISA Programme

A diagram of the full DCAT-AP specification is on the next slide

Full DCAT AP

versions

rights and provenance

standards

rights

format

relation

1) who is responsible for it MORE2) relations to organizations projects publications funding Partly

3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO

More than DCAT

Limitations of DCAT

It doesnrsquot cover

bull semantic relations to organisations persons software projects fundinghellip

bull dimensions and variables and syntax semantics of dimensions and variables

bull protocols and parameters for datasets available through APIs

bull time and space slices subsets

Combining DCAT with other vocabularies

ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo

(from the DCAT specification)

DataCube structure definition

A cube is organized according to a set of dimensions attributes and measures

bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)

bull The measure components represent the phenomenon being observed (eg life expectancy)

bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)

DataCube model for dataset structure

This part of the model could be re-used for describing the dimensions of any dataset also non-statistical

1) dimensions and semantics YES2) slices subsets YES

More than DCAT-AP

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 19: Dataset description: DCAT and other vocabularies

c

DCAT model

1) identifier(s)2) who is responsible for it3) when and where the data were

collected4) relations to organizations persons

publications software projects fundinghellip NO

5) the conditions for re-use (rights licenses)

6) provenance version NO7) coverage of the dataset8) dimensions and semantics NO9) slices subsets NO10) URL11) Format12) Protocols parameters NO

DCAT and DCAT-AP

The DCAT Application profile for data portals in

Europe (DCAT-AP) is an extension of DCAT

It combines DCAT with the W3C Asset Description Metadata

Schema (ADMS) vocabulary plus classes and properties from

Dublin Core SKOS and Vcard in an Application profile

The elaboration of the DCAT-AP was a joint initiative of DG

CONNECT the EU Publications Office and the ISA Programme

A diagram of the full DCAT-AP specification is on the next slide

Full DCAT AP

versions

rights and provenance

standards

rights

format

relation

1) who is responsible for it MORE2) relations to organizations projects publications funding Partly

3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO

More than DCAT

Limitations of DCAT

It doesnrsquot cover

bull semantic relations to organisations persons software projects fundinghellip

bull dimensions and variables and syntax semantics of dimensions and variables

bull protocols and parameters for datasets available through APIs

bull time and space slices subsets

Combining DCAT with other vocabularies

ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo

(from the DCAT specification)

DataCube structure definition

A cube is organized according to a set of dimensions attributes and measures

bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)

bull The measure components represent the phenomenon being observed (eg life expectancy)

bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)

DataCube model for dataset structure

This part of the model could be re-used for describing the dimensions of any dataset also non-statistical

1) dimensions and semantics YES2) slices subsets YES

More than DCAT-AP

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 20: Dataset description: DCAT and other vocabularies

DCAT and DCAT-AP

The DCAT Application profile for data portals in

Europe (DCAT-AP) is an extension of DCAT

It combines DCAT with the W3C Asset Description Metadata

Schema (ADMS) vocabulary plus classes and properties from

Dublin Core SKOS and Vcard in an Application profile

The elaboration of the DCAT-AP was a joint initiative of DG

CONNECT the EU Publications Office and the ISA Programme

A diagram of the full DCAT-AP specification is on the next slide

Full DCAT AP

versions

rights and provenance

standards

rights

format

relation

1) who is responsible for it MORE2) relations to organizations projects publications funding Partly

3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO

More than DCAT

Limitations of DCAT

It doesnrsquot cover

bull semantic relations to organisations persons software projects fundinghellip

bull dimensions and variables and syntax semantics of dimensions and variables

bull protocols and parameters for datasets available through APIs

bull time and space slices subsets

Combining DCAT with other vocabularies

ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo

(from the DCAT specification)

DataCube structure definition

A cube is organized according to a set of dimensions attributes and measures

bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)

bull The measure components represent the phenomenon being observed (eg life expectancy)

bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)

DataCube model for dataset structure

This part of the model could be re-used for describing the dimensions of any dataset also non-statistical

1) dimensions and semantics YES2) slices subsets YES

More than DCAT-AP

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 21: Dataset description: DCAT and other vocabularies

Full DCAT AP

versions

rights and provenance

standards

rights

format

relation

1) who is responsible for it MORE2) relations to organizations projects publications funding Partly

3) the conditions for re-use (rights licenses) MORE4) provenance version YES5) protocols parameters NO6) dimensions and semantics NO

More than DCAT

Limitations of DCAT

It doesnrsquot cover

bull semantic relations to organisations persons software projects fundinghellip

bull dimensions and variables and syntax semantics of dimensions and variables

bull protocols and parameters for datasets available through APIs

bull time and space slices subsets

Combining DCAT with other vocabularies

ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo

(from the DCAT specification)

DataCube structure definition

A cube is organized according to a set of dimensions attributes and measures

bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)

bull The measure components represent the phenomenon being observed (eg life expectancy)

bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)

DataCube model for dataset structure

This part of the model could be re-used for describing the dimensions of any dataset also non-statistical

1) dimensions and semantics YES2) slices subsets YES

More than DCAT-AP

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 22: Dataset description: DCAT and other vocabularies

Limitations of DCAT

It doesnrsquot cover

bull semantic relations to organisations persons software projects fundinghellip

bull dimensions and variables and syntax semantics of dimensions and variables

bull protocols and parameters for datasets available through APIs

bull time and space slices subsets

Combining DCAT with other vocabularies

ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo

(from the DCAT specification)

DataCube structure definition

A cube is organized according to a set of dimensions attributes and measures

bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)

bull The measure components represent the phenomenon being observed (eg life expectancy)

bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)

DataCube model for dataset structure

This part of the model could be re-used for describing the dimensions of any dataset also non-statistical

1) dimensions and semantics YES2) slices subsets YES

More than DCAT-AP

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 23: Dataset description: DCAT and other vocabularies

Combining DCAT with other vocabularies

ldquoOther complementary vocabularies may be used together with DCAT to provide more detailed format-specific information For example properties from the VoID vocabulary can be used if that dataset is in RDF formatrdquo

(from the DCAT specification)

DataCube structure definition

A cube is organized according to a set of dimensions attributes and measures

bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)

bull The measure components represent the phenomenon being observed (eg life expectancy)

bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)

DataCube model for dataset structure

This part of the model could be re-used for describing the dimensions of any dataset also non-statistical

1) dimensions and semantics YES2) slices subsets YES

More than DCAT-AP

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 24: Dataset description: DCAT and other vocabularies

DataCube structure definition

A cube is organized according to a set of dimensions attributes and measures

bull The dimension components serve to identify the observed dimensions (eg time geographic region gender elevationhellip)

bull The measure components represent the phenomenon being observed (eg life expectancy)

bull The attribute components enable specification of the units of measure any scaling factors and metadata such as the status of the observation (eg estimated provisional)

DataCube model for dataset structure

This part of the model could be re-used for describing the dimensions of any dataset also non-statistical

1) dimensions and semantics YES2) slices subsets YES

More than DCAT-AP

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 25: Dataset description: DCAT and other vocabularies

DataCube model for dataset structure

This part of the model could be re-used for describing the dimensions of any dataset also non-statistical

1) dimensions and semantics YES2) slices subsets YES

More than DCAT-AP

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 26: Dataset description: DCAT and other vocabularies

VOID model

c

dctlicensewvnorms wvwaiver

3) URL MORE4) the conditions for re-use (rights licenses) MORE5) Protocols parameters Partly6) dimensions and semantics of dimensions Partly7) slices subsets YES

More than DCAT-AP

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 27: Dataset description: DCAT and other vocabularies

Complementing DCAT

bull For dimensions semantics of dimensions slicingbull DataCube

bull DDI

bull For API aspectsbull VOID (linked data)

bull Web services descriptions (Hydra (WSDL WADL))

bull For relations to organizations projects publications fundinghellipbull CERIF for datasets

bull VIVO Datastar

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 28: Dataset description: DCAT and other vocabularies

Many vocabulariesVocabularies with relations to DCAT or same model

bull DCAT-AP and other extensions

bull W3C HCLS dataset descriptions DataID extension with capabilities to describe dataset hierarchies fine-grained technical details of datasets dataset permissions dataset distributions and machine-readable licensing information GeoDCAT-AP geospatial extension of DCAT

bull DDI-RDF Discovery Vocabulary (mapped to Data Cube DCAT and XKOS DDI XML exportable from Dataverse)

bull Schemaorg

Other vocabulariesbull DataCite and re3data

bull CERIF for datasets

bull VIVO Datastar

bull INSPIRE

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 29: Dataset description: DCAT and other vocabularies

Examples of application of DCAT

bull CKAN data catalog tool(more in your workshop)

bull Data catalogsbull datagov datagovuk datagovaubull Africa Open Data Indonesia Data Portalbull EU Data Portalbull More here httpsckanorginstances

bull CIARD RING federated data catalogmanaged by GFAR

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 30: Dataset description: DCAT and other vocabularies

CIARD RING

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 31: Dataset description: DCAT and other vocabularies

Datasets in the RING dataset hub

bull Datasets can be registered as standalone sources or as part of a ldquocollectionrdquo (DCAT model)

bull A dataset is identified by uniform type of content uniform data structure (dimensions metadata set encoding reference value lists)

bull One dataset can be made available accessible as different ldquodistributionsrdquo (format protocol URL)

bull The RING uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary

a ldquoRING DCAT profilerdquo will be published

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 32: Dataset description: DCAT and other vocabularies

Sample dataset record

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 33: Dataset description: DCAT and other vocabularies

RDF of the recordltrdfDescription rdfabout=httpringciardnetnode19517gt

ltrdftype rdfresource=httpwwww3orgnsdcatDatasetgt

ltrdftype rdfresource=httprdfsorgnsvoidDatasetgt

ltrdftype rdfresource=httpschemaorgDatasetgt

ltrdftype rdfresource=httpwwww3orgnsadmsAssetgt

ltdcttitlegtNational Soil Database representative Soil Systems geographyltns1titlegt

ltschemanamegtNational Soil Database representative Soil Systems geographyltns2namegt

ltdctdescriptiongtNational Soil Database representative Soil Systems geography (1500000) ltns1descriptiongt

ltdcatlandingPage rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltschemaurl rdfresource=httpsoilmapsentecraititabancadati1htmlgt

ltdctpublisher rdfresource=httpringciardnetnode19510gt

ltschemapublisher rdfresource=httpringciardnetnode19510gt

ltdctissued rdfdatatype=xsdgYeargt1990ltns1issuedgt

ltdctspatial rdfresource=httpringciardnettaxonomy_term326gt

ltdctspatialgtNationalltns1spatialgt

ltdcattheme rdfresource=httpringciardnettaxonomy_term2052gt

ltschemaabout rdfresource=httpringciardnettaxonomy_term2052gt

ltdctconformsTo rdfresource=httpringciardnetnode19239gt

ltdctidentifiergthttpringciardnetnode19517rdfltns1identifiergt

ltschemacontentLocation rdfresource=httpringciardnettaxonomy_term326gt

ltdcttype rdfresource=httpringciardnettaxonomy_term81gt

ltdcatcatalog rdfresource=httpringciardnetnode19436gt

ltdcatdistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltschemadistribution rdfresource=httpringciardnetfield_collection_item5055gt

ltvoiddataDump rdfresource=httpringciardnetfield_collection_item5055gt

ltdctrights rdfresource=httpringciardnettaxonomy_term2053gt

ltrdfDescriptiongt

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 34: Dataset description: DCAT and other vocabularies

References1 Issues and Recommendations Associated with Distributed Computation and Data

Management Systems for the Space SciencesNational Research Council (US) Space Science BoardNational Academies 1986 - 111 paginehttpsbooksgooglecoukbooksid=h4krAAAAYAAJ

2 Sacchi S Wickett K M amp Renear A H (2010) Dataset definitions Champaign IL Center for Informatics Research in Science and Scholarship (Rep No CIRSSDATACONS--20101VER01+DCDC)

3 Alexander K Cyganiak R Hausenblas M amp Zhao J (2009) Describing Linked Datasets-On the Design and Usage of voiD theVocabulary of Interlinked Datasets In Linked Data on the Web Workshop (LDOW 09) in conjunction with 18th International World Wide Web Conference (WWW 09)

4 Renear A H Sacchi S Wickett K M (2010) Definitions of Dataset in the Scientific and Technical Literature httpmailasistorgasist2010proceedingsproceedingsASIST_AM10submissions240_Final_Submissionpdf

5 W3C Government Linked Data Working Grouphttpwwww3org2011gldwikiMain_Page

6 UK Gov Linked Data Working Group LD registryhttpsgithubcomderukl-registry-pocwiki

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 35: Dataset description: DCAT and other vocabularies

Vocabularies and catalogsbull DCAT httpwwww3orgTRvocab-dcat

bull DCAT AP httpsjoinupeceuropaeuassetdcat_application_profileasset_releasedcat-application-

profile-data-portals-europe-final

bull DataCube httppurlorglinked-datacube

bull VOID httprdfsorgnsvoid-guide

bull DDI-RDF Discovery Vocabulary httprdf-vocabularyddiallianceorgdiscoveryhtml

bull VIVO Datastar httpsourceforgenetprojectsvivofilesDatastar20ontology

bull CERIF for datasets httpscerif4datasetswordpresscomc4d-deliverables

bull CKAN httpckanorg

bull Dataverse httpdataverseorg

bull Datahub httpdatahubio

bull DataCite httpsearchdataciteorguiq=subject3Aagriculture

bull Re3data httpwwwre3dataorg

bull OpenAIRE httpswwwopenaireeu

bull CIARD RING httpringciardinfo

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg

Page 36: Dataset description: DCAT and other vocabularies

Dataset description and DCAT

DRTCISI-ICSUCODATAInternational Workshop on

Open Data Repositories

Thank you

Valeria Pesce

valeriapescefaoorg